RTCWEB Working Group G. Garcia Internet-Draft TokBox Intended status: Informational August 05, 2013 Expires: February 06, 2014 Simulcast and layered video coding support in WebRTC draft-garcia-simulcast-and-layered-video-webrtc-00 Abstract This document describes the use cases and requirements for simulcast and layered video coding support in WebRTC. These techniques simplify the implementation of video stream adaptation to different participants in centralized conferencing solutions. This document also includes a proposal to expose these capabilities in the existing PeerConnection API by defining new media constraint properties. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 06, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. Garcia Expires February 06, 2014 [Page 1] Internet-Draft Simulcast and layered video coding August 2013 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Browser support status . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Adaptation to devices with different capabilities . . . . 3 2.2. Adaptation to participants with different network conditions . . . . . . . . . . . . . . . . . . . . . . . 3 2.3. Recording . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4. Increasing video quality for active speaker . . . . . . . 4 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Proposed API . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Simulcasted streams . . . . . . . . . . . . . . . . . . . 5 4.2. Layered video coding . . . . . . . . . . . . . . . . . . 5 4.3. Example . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . 7 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction Video conferencing using a central server is one of the typical use cases for real-time communication capabilities in browsers [I-D.ietf-rtcweb-use-cases-and-requirements]. Most of today's multiparty videoconference solutions make use of centralized servers to reduce the bandwidth and CPU consumption in the endpoints. Those servers receive streams from each participant and send the streams to rest of the participants, which usually have heterogeneous capabilities (screen size, CPU, bandwidth, etc.). One of the biggest issues is how to perform the adaption to different participants' constraints with the minimum possible impact on video quality and server performance. Garcia Expires February 06, 2014 [Page 2] Internet-Draft Simulcast and layered video coding August 2013 There are two approaches to adapt the streams to different destinations: one is transcoding (sometimes including mixing), and the other is switching between multiple streams or sub-streams received from the originator. The first solution is computationally expensive and can degrade video quality. The second solution makes a suboptimal use of network resources by sending redundant information, and in addition it is codec-specific. The requirements and proposed API in this document are based on existing JSEP API version and VP8 capabilities. These are the technologies available in existing WebRTC browsers, but this proposal could be extended to other codecs or mapped to other APIs. 1.1. Browser support status It is possible to use simulcast with existing WebRTC implementations. However, this requires the use of different PeerConnection objects, and all streams will have the same resolution. Multi-layer encoding is implemented and working in existing WebRTC browsers, and it has been tested in prototypes, but currently there is no way for developers to enable it. In VP8 there is support for temporal scalability, while VP9 will include more advanced control and support for both temporal and spatial scalability. 1.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Use-cases The use cases envisioned for these new WebRTC capabilities are focused on centralized conferencing solutions. 2.1. Adaptation to devices with different capabilities Some endpoints connected to a centralized conferencing server have small screens and do not need to receive high-resolution video, or the CPU power and battery consumption make it impossible to receive and decode high-resolution video in real-time. In this situation, it is desirable to send lower-resolution video to those endpoints. 2.2. Adaptation to participants with different network conditions Garcia Expires February 06, 2014 [Page 3] Internet-Draft Simulcast and layered video coding August 2013 Some endpoints connected to a centralized conferencing server do not have enough available bandwidth to receive high-quality video, while other endpoints have enough available bandwidth. In this situation is desirable to send lower-bitrate video to those endpoints. 2.3. Recording A conferencing server implements recording and wants to record video in the highest quality possible, while forwarding it in lower quality to endpoints. 2.4. Increasing video quality for active speaker A videoconference application shows the video of the active speaker in a larger size than videos of the other participants. It is desirable to increase the resolution and quality of that highlighted video stream, to maintain the perceived video quality. One possible implementation to increase the quality is to have a paused high-quality stream that resumes when voice activity is detected. 3. Requirements This section contains the requirements for the API exposed in the browser, derived from the use-cases in Section 2. Requirements on how and when to enable scalable video coding: o REQ-1. It must be possible to enable and configure the scalable video coding before initiating a peer connection. o REQ-2. It must be possible to enable and configure the scalable video coding before answering a peer connection. o REQ-3. It must be possible to enable/disable and re-configure the scalable video coding to update a peer connection. Requirements on the parameters that needs to be configurable: o REQ-5. It must be possible to configure the number of simulcasted streams. o REQ-6. It must be possible to configure the minimum and maximum bitrate of each simulcasted stream. Garcia Expires February 06, 2014 [Page 4] Internet-Draft Simulcast and layered video coding August 2013 o REQ-7. It must be possible to configure the resolution of each simulcasted stream. o REQ-8. It must be possible to configure the number of temporal layers (1 to 4). This should be the only mandatory parameter when enabling temporal scalability. o REQ-9. It must be possible to configure the bitrate, frame rate decimation factor and membership of frames to layers for each temporal layer of the VP8 stream. Requirements regarding RTP usage: o REQ-10. Congestion control must be supported for all the simulcasted streams between the configured boundaries (min/max bitrate). o REQ-11. Transmission of simulcasted streams must be signaled and negotiated in the SDP and transmitted in RTP sessions, making use of existing standard attributes [I-D.westerlund-avtcore-multistream-and-simulcast]. o REQ-12. Any endpoint should be prepared to receive VP8 multi- layered encoded video not requiring out of band negotiation in SDP. Non functional requirements: o REQ-13. The exposed API must be extensible to new codecs or new codec parameters. 4. Proposed API The existing solution in the WebRTC API to modify settings of a PeerConnection is to use media constraints. This section defines some new media constrains to enable and configure the usage of simulcasted and layered video streams. 4.1. Simulcasted streams Simulcast capabilities are codec-agnostic and do not require new media constraints. Existing media constrains for resolution, frame rate and bitrate can be reused, but the API needs to support receiving a list of them instead of just one. 4.2. Layered video coding Garcia Expires February 06, 2014 [Page 5] Internet-Draft Simulcast and layered video coding August 2013 Multi-layer capabilities are codec-dependent. For VP8, these are the configuration parameters exposed in the codec, and that needs to be translated to media constraints (the descriptions are taken from VP8 source code): o tsNumberLayers: This value specifies the number of coding layers to be used. o tsTargetBitrate: These values specify the target coding bitrate for each coding layer. o tsRateDecimator: These values specify the frame rate decimation factors to apply to each layer. o tsPeriodicity: This value specifies the length of the sequence that defines the membership of frames to layers. For example, if tsPeriodicity=8 then frames are assigned to coding layers with a repeated sequence of length 8. o tsLayerId: This array defines the membership of frames to coding layers. For a 2-layer encoding that assigns even numbered frames to one layer (0) and odd numbered frames to a second layer (1) with tsPeriodicity=8, then tsLayerId = (0,1,0,1,0,1,0,1). 4.3. Example Example of media constraints to request two simulcasted streams, the first one with four temporal layers and default bitrate and the second one with a single layer and fixed bitrate. { video: [{ width: 640, height: 480, codecs: { vp8: { tsNumberLayers: 4 } } }, { width: 320, height: 240, bitrate: { min: 100000, max: 100000 } }] } } Garcia Expires February 06, 2014 [Page 6] Internet-Draft Simulcast and layered video coding August 2013 5. Acknowledgements 6. IANA Considerations This memo includes no request to IANA. 7. Security Considerations No security implications foreseen. 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 8.2. Informative References [I-D.ietf-rtcweb-use-cases-and-requirements] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- Time Communication Use-cases and Requirements", draft- ietf-rtcweb-use-cases-and-requirements-11 (work in progress), June 2013. [I-D.narten-iana-considerations-rfc2434bis] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", draft-narten-iana- considerations-rfc2434bis-09 (work in progress), March 2008. [I-D.westerlund-avtcore-multistream-and-simulcast] Westerlund, M. and B. Burman, "RTP Multiple Stream Sessions and Simulcast", draft-westerlund-avtcore- multistream-and-simulcast-00 (work in progress), July 2011. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [VP8] The WebM Project, "VP8 source code", 2013, . Garcia Expires February 06, 2014 [Page 7] Internet-Draft Simulcast and layered video coding August 2013 Author's Address Gustavo Garcia TokBox 115 Stillman Street San Francisco, CA US Email: gustavo@tokbox.com Garcia Expires February 06, 2014 [Page 8]