Customizable, Ubiquitous Real Time Communication over the Web (CU-RTC-Web)

UPDATE: See our latest W3C WebRTC Working Group blog post on 01-17-2013 https://aka.ms/WebRTCPrototypeBlog describing our new CU-RTC-Web prototype that you can download on HTML5 Labs.

From:

Matthew Kaufman - Inventor of RTMFP, the most widely used browser-to-browser RTC protocol on the web
Principal Architect, Skype, Microsoft Corp.

Martin Thomson
Senior Architect, Skype, Microsoft Corp.

Jonathan Rosenberg - Inventor of SIP and SDP offer/answer
GM Research Product & Strategy, Skype, Microsoft Corp.

Bernard Aboba
Principal Architect, Lync, Microsoft Corp.

Jean Paoli
President, Microsoft Open Technologies, Inc.

Adalberto Foresti
Senior Program Manager, Microsoft Open Technologies, Inc.

 

 

Today, we are pleased to announce Microsoft’s contribution of the CU-RTC-Web proposal to the W3C WebRTC working group.

Thanks in no small part to the exponential improvements in broadband infrastructure over the last few years, it is now possible to leverage the digital backbone of the Internet to create experiences for which dedicated media and networks were necessary until not too long ago.

Inexpensive, real time video conferencing is one such experience.

The Internet Engineering Task Force and the World Wide Web Consortium created complementary working groups to bring these experiences to the most familiar and widespread application used to access the Internet: the web browser. The goal of this initiative is to add a new level of interactivity for web users with real-time communications (Web RTC) in the browser.

While the overarching goal is simple to describe, there are several critical requirements that a successful, widely adoptable Web RTC browser API will need to meet:

  • Honoring key web tenets – The Web favors stateless interactions which do not saddle either party of a data exchange with the responsibility to remember what the other did or expects. Doing otherwise is a recipe for extreme brittleness in implementations; it also raises considerably the development cost which reduces the reach of the standard itself.
  • Customizable response to changing network quality – Real time media applications have to run on networks with a wide range of capabilities varying in terms of bandwidth, latency, and packet loss.  Likewise these characteristics can change while an application is running. Developers should be able to control how the user experience adapts to fluctuations in communication quality.  For example, when communication quality degrades, the developer may prefer to favor the video channel, favor the audio channel, or suspend the app until acceptable quality is restored.  An effective protocol and API should provide developers with the tools to tailor the application response to the exact needs of the moment.
  • Ubiquitous deployability on existing network infrastructure – Interoperability is critical if WebRTC users are to communicate with the rest of the world with users on different browsers, VoIP phones, and mobile phones, from behind firewalls and across routers and equipment that is unlikely to be upgraded to the current state of the art anytime soon.
  • Flexibility in its support of popular media formats and codecs as well as openness to future innovation – A successful standard cannot be tied to individual codecs, data formats or scenarios. They may soon be supplanted by newer versions that would make such a tightly coupled standard obsolete just as quickly. The right approach is instead to support multiple media formats and to bring the bulk of the logic to the application layer, enabling developers to innovate.

While a useful start at realizing the Web RTC vision, we feel that the existing proposal falls short of meeting these requirements. In particular:

  • No Ubiquitous deployability: it shows no signs of offering real world interoperability with existing VoIP phones, and mobile phones, from behind firewalls and across routers and instead focuses on video communication between web browsers under ideal conditions. It does not allow an application to control how media is transmitted on the network. On the other hand, implementing innovative, real-world applications like security consoles, audio streaming services or baby monitoring through this API would be unwieldy, assuming it could be made to work at all. A Web RTC standard must equip developers with the ability to implement all scenarios, even those we haven’t thought of.
  • No fit with key web tenets: it is inherently not stateless, as it takes a significant dependency on the legacy of SIP technology, which is a suboptimal choice for use in Web APIs. In particular, the negotiation model of the API relies on the SDP offer/answer model, which forces applications to parse and generate SDP in order to effect a change in browser behavior. An application is forced to only perform certain changes when the browser is in specific states, which further constrains options and increases complexity. Furthermore, the set of permitted transformations to SDP are constrained in non-obvious and undiscoverable ways, forcing applications to resort to trial-and-error and/or browser-specific code. All of this added complexity is an unnecessary burden on applications with little or no benefit in return.

 

The Microsoft Proposal for Customizable, Ubiquitous Real Time Communication over the Web

For these reasons, Microsoft has contributed the CU-RTC-Web proposal that we believe does address the four key requirements above.

  • This proposal adds a real-time, peer-to-peer transport layer that empowers web developers by having greater flexibility and transparency, putting developers directly in control over the experience they provide to their users.
  • It dispenses with the constraints imposed by unnecessary state machines and complex SDP and provides simple, transparent objects.
  • It elegantly builds on and integrates with the existing W3C getUserMedia API, making it possible for an application to connect a microphone or a camera in one browser to the speaker or screen of another browser. getUserMedia is an increasingly popular API that Microsoft has been prototyping and that is applicable to a broad set of applications with an HTML5 client, including video authoring and voice commands.

The following diagram shows how our proposal empowers developers to create applications that take advantage of the tremendous benefits offered by real-time media in a clear, straightforward fashion.

image

We are looking forward to continued work in the IETF and the W3C, with an open and fruitful conversation that converges on a standard that is both future-proof and an answer to today’s communication needs on the web.We would love to get community feedback on the details of our CU-RTC-Web proposal document and we invite you to stay tuned for additional content that we will soon publish on http://html5labs.com in support of our proposal.