wdiff rfc8834v5.txt rfc8834.txt

Internet Engineering Task Force (IETF) C. Perkins
Request for Comments: 8834 University of Glasgow
Category: Standards Track M. Westerlund
ISSN: 2070-1721 Ericsson
J. Ott
Aalto
Technical University
December 2020 Munich
January 2021

Media Transport and Use of RTP in WebRTC

Abstract

The framework for Web Real-Time Communication (WebRTC) provides
support for direct interactive rich communication using audio, video,
text, collaboration, games, etc. between two peers' web browsers.
This memo describes the media transport aspects of the WebRTC
framework. It specifies how the Real-time Transport Protocol (RTP)
is used in the WebRTC context and gives requirements for which RTP
features, profiles, and extensions need to be supported.

Status of This Memo

This is an Internet Standards Track document.

This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.

Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8834.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.

Table of Contents

1. Introduction
2. Rationale
3. Terminology
4. WebRTC Use of RTP: Core Protocols
4.1. RTP and RTCP
4.2. Choice of the RTP Profile
4.3. Choice of RTP Payload Formats
4.4. Use of RTP Sessions
4.5. RTP and RTCP Multiplexing
4.6. Reduced Size RTCP
4.7. Symmetric RTP/RTCP
4.8. Choice of RTP Synchronization Source (SSRC)
4.9. Generation of the RTCP Canonical Name (CNAME)
4.10. Handling of Leap Seconds
5. WebRTC Use of RTP: Extensions
5.1. Conferencing Extensions and Topologies
5.1.1. Full Intra Request (FIR)
5.1.2. Picture Loss Indication (PLI)
5.1.3. Slice Loss Indication (SLI)
5.1.4. Reference Picture Selection Indication (RPSI)
5.1.5. Temporal-Spatial Trade-Off Request (TSTR)
5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR)
5.2. Header Extensions
5.2.1. Rapid Synchronization
5.2.2. Client-to-Mixer Audio Level
5.2.3. Mixer-to-Client Audio Level
5.2.4. Media Stream Identification
5.2.5. Coordination of Video Orientation
6. WebRTC Use of RTP: Improving Transport Robustness
6.1. Negative Acknowledgements and RTP Retransmission
6.2. Forward Error Correction (FEC)
7. WebRTC Use of RTP: Rate Control and Media Adaptation
7.1. Boundary Conditions and Circuit Breakers
7.2. Congestion Control Interoperability and Legacy Systems
8. WebRTC Use of RTP: Performance Monitoring
9. WebRTC Use of RTP: Future Extensions
10. Signaling Considerations
11. WebRTC API Considerations
12. RTP Implementation Considerations
12.1. Configuration and Use of RTP Sessions
12.1.1. Use of Multiple Media Sources within an RTP Session
12.1.2. Use of Multiple RTP Sessions
12.1.3. Differentiated Treatment of RTP Streams
12.2. Media Source, RTP Streams, and Participant Identification
12.2.1. Media Source Identification
12.2.2. SSRC Collision Detection
12.2.3. Media Synchronization Context
13. Security Considerations
14. IANA Considerations
15. References
15.1. Normative References
15.2. Informative References
Acknowledgements
Authors' Addresses

1. Introduction

The Real-time Transport Protocol (RTP) [RFC3550] provides a framework
for delivery of audio and video teleconferencing data and other real-
time media applications. Previous work has defined the RTP protocol,
along with numerous profiles, payload formats, and other extensions.
When combined with appropriate signaling, these form the basis for
many teleconferencing systems.

The Web Real-Time Communication (WebRTC) framework provides the
protocol building blocks to support direct, interactive, real-time
communication using audio, video, collaboration, games, etc. between
two peers' web browsers. This memo describes how the RTP framework
is to be used in the WebRTC context. It proposes a baseline set of
RTP features that are to be implemented by all WebRTC endpoints,
along with suggested extensions for enhanced functionality.

This memo specifies a protocol intended for use within the WebRTC
framework but is not restricted to that context. An overview of the
WebRTC framework is given in [RFC8825].

The structure of this memo is as follows. Section 2 outlines our
rationale for preparing this memo and choosing these RTP features.
Section 3 defines terminology. Requirements for core RTP protocols
are described in Section 4, and suggested RTP extensions are
described in Section 5. Section 6 outlines mechanisms that can
increase robustness to network problems, while Section 7 describes
congestion control and rate adaptation mechanisms. The discussion of
mandated RTP mechanisms concludes in Section 8 with a review of
performance monitoring and network management tools. Section 9 gives
some guidelines for future incorporation of other RTP and RTP Control
Protocol (RTCP) extensions into this framework. Section 10 describes
requirements placed on the signaling channel. Section 11 discusses
the relationship between features of the RTP framework and the WebRTC
application programming interface (API), and Section 12 discusses RTP
implementation considerations. The memo concludes with security
considerations (Section 13) and IANA considerations (Section 14).

2. Rationale

The RTP framework comprises the RTP data transfer protocol, the RTP
control protocol, and numerous RTP payload formats, profiles, and
extensions. This range of add-ons has allowed RTP to meet various
needs that were not envisaged by the original protocol designers and
support many new media encodings, but it raises the question of what
extensions are to be supported by new implementations. The
development of the WebRTC framework provides an opportunity to review
the available RTP features and extensions and define a common
baseline RTP feature set for all WebRTC endpoints. This builds on
the past 20 years of RTP development to mandate the use of extensions
that have shown widespread utility, while still remaining compatible
with the wide installed base of RTP implementations where possible.

RTP and RTCP extensions that are not discussed in this document can
be implemented by WebRTC endpoints if they are beneficial for new use
cases. However, they are not necessary to address the WebRTC use
cases and requirements identified in [RFC7478].

While the baseline set of RTP features and extensions defined in this
memo is targeted at the requirements of the WebRTC framework, it is
expected to be broadly useful for other conferencing-related uses of
RTP. In particular, it is likely that this set of RTP features and
extensions will be appropriate for other desktop or mobile video-
conferencing systems, or for room-based high-quality telepresence
applications.

3. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. Lower- or mixed-case uses of these key
words are not to be interpreted as carrying special significance in
this memo.

We define the following additional terms:

WebRTC MediaStream: The MediaStream concept defined by the W3C in
the WebRTC API [W3C-MEDIA-CAPTURE]. [W3C.WD-mediacapture-streams]. A MediaStream
consists of zero or more MediaStreamTracks.

MediaStreamTrack: Part of the MediaStream concept defined by the W3C
in the WebRTC API [W3C-MEDIA-CAPTURE]. [W3C.WD-mediacapture-streams]. A
MediaStreamTrack is an individual stream of media from any type of
media source such as a microphone or a camera, but conceptual
sources such as an audio mix or a video composition are also
possible.

Transport-layer flow: A unidirectional flow of transport packets
that are identified by a particular 5-tuple of source IP address,
source port, destination IP address, destination port, and
transport protocol.

Bidirectional transport-layer flow: A bidirectional transport-layer
flow is a transport-layer flow that is symmetric. That is, the
transport-layer flow in the reverse direction has a 5-tuple where
the source and destination address and ports are swapped compared
to the forward path transport-layer flow, and the transport
protocol is the same.

This document uses the terminology from [RFC7656] and [RFC8825].
Other terms are used according to their definitions from the RTP
specification [RFC3550]. In particular, note the following
frequently used terms: RTP stream, RTP session, and endpoint.

4. WebRTC Use of RTP: Core Protocols

The following sections describe the core features of RTP and RTCP
that need to be implemented, along with the mandated RTP profiles.
Also described are the core extensions providing essential features
that all WebRTC endpoints need to implement to function effectively
on today's networks.

4.1. RTP and RTCP

The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be
implemented as the media transport protocol for WebRTC. RTP itself
comprises two parts: the RTP data transfer protocol and the RTP
Control Protocol (RTCP). RTCP is a fundamental and integral part of
RTP and MUST be implemented and used in all WebRTC endpoints.

The following RTP and RTCP features are sometimes omitted in limited-
functionality implementations of RTP, but they are REQUIRED in all
WebRTC endpoints:

* Support for use of multiple simultaneous synchronization source
(SSRC) values in a single RTP session, including support for RTP
endpoints that send many SSRC values simultaneously, following
[RFC3550] and [RFC8108]. The RTCP optimizations for multi-SSRC
sessions defined in [RFC8861] MAY be supported; if supported, the
usage MUST be signaled.

* Random choice of SSRC on joining a session; collision detection
and resolution for SSRC values (see also Section 4.8).

* Support for reception of RTP data packets containing contributing
source (CSRC) lists, as generated by RTP mixers, and RTCP packets
relating to CSRCs.

* Sending correct synchronization information in the RTCP Sender
Reports, to allow receivers to implement lip synchronization; see
Section 5.2.1 regarding support for the rapid RTP synchronization
extensions.

* Support for multiple synchronization contexts. Participants that
send multiple simultaneous RTP packet streams SHOULD do so as part
of a single synchronization context, using a single RTCP CNAME for
all streams and allowing receivers to play the streams out in a
synchronized manner. For compatibility with potential future
versions of this specification, or for interoperability with non-
WebRTC devices through a gateway, receivers MUST support multiple
synchronization contexts, indicated by the use of multiple RTCP
CNAMEs in an RTP session. This specification mandates the usage
of a single CNAME when sending RTP streams in some circumstances;
see Section 4.9.

* Support for sending and receiving RTCP SR, RR, Sender Report (SR),
Receiver Report (RR), Source Description (SDES), and BYE packet
types. Note that support for other RTCP packet types is OPTIONAL
unless mandated by other parts of this specification. Note that
additional RTCP packet types are used by the RTP/SAVPF profile
(Section 4.2) and the other RTCP extensions (Section 5). WebRTC
endpoints that implement the Session Description Protocol (SDP)
bundle negotiation extension will use the SDP Grouping Framework
"mid" attribute to identify media streams. Such endpoints MUST
implement the RTCP SDES media identification (MID) item described
in [RFC8843].

* Support for multiple endpoints in a single RTP session, and for
scaling the RTCP transmission interval according to the number of
participants in the session; support for randomized RTCP
transmission intervals to avoid synchronization of RTCP reports;
support for RTCP timer reconsideration (Section 6.3.6 of
[RFC3550]) and reverse reconsideration (Section 6.3.4 of
[RFC3550]).

* Support for configuring the RTCP bandwidth as a fraction of the
media bandwidth, and for configuring the fraction of the RTCP
bandwidth allocated to senders -- e.g., using the SDP "b=" line
[RFC4566] [RFC3556].

* Support for the reduced minimum RTCP reporting interval described
in Section 6.2 of [RFC3550]. When using the reduced minimum RTCP
reporting interval, the fixed (nonreduced) minimum interval MUST
be used when calculating the participant timeout interval (see
Sections 6.2 and 6.3.5 of [RFC3550]). The delay before sending
the initial compound RTCP packet can be set to zero (see
Section 6.2 of [RFC3550] as updated by [RFC8108]).

* Support for discontinuous transmission. RTP allows endpoints to
pause and resume transmission at any time. When resuming, the RTP
sequence number will increase by one, as usual, while the increase
in the RTP timestamp value will depend on the duration of the
pause. Discontinuous transmission is most commonly used with some
audio payload formats, but it is not audio specific and can be
used with any RTP payload format.

* Ignore unknown RTCP packet types and RTP header extensions. This
is to ensure robust handling of future extensions, middlebox
behaviors, etc., that can result in receiving RTP header
extensions or RTCP packet types that were not signaled. If a
compound RTCP packet that contains a mixture of known and unknown
RTCP packet types is received, the known packet types need to be
processed as usual, with only the unknown packet types being
discarded.

It is known that a significant number of legacy RTP implementations,
especially those targeted at systems with only Voice over IP (VoIP),
do not support all of the above features and in some cases do not
support RTCP at all. Implementers are advised to consider the
requirements for graceful degradation when interoperating with legacy
implementations.

Other implementation considerations are discussed in Section 12.

4.2. Choice of the RTP Profile

The complete specification of RTP for a particular application domain
requires the choice of an RTP profile. For WebRTC use, the extended
secure RTP profile for RTCP-based feedback (RTP/SAVPF) [RFC5124], as
extended by [RFC7007], MUST be implemented. The RTP/SAVPF profile is
the combination of the basic RTP/AVP profile [RFC3551], the RTP
profile for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure
RTP profile (RTP/SAVP) [RFC3711].

The RTCP-based feedback extensions [RFC4585] are needed for the
improved RTCP timer model. This allows more flexible transmission of
RTCP packets in response to events, rather than strictly according to
bandwidth, and is vital for being able to report congestion signals
as well as media events. These extensions also allow saving RTCP
bandwidth, and an endpoint will commonly only use the full RTCP
bandwidth allocation if there are many events that require feedback.
The timer rules are also needed to make use of the RTP conferencing
extensions discussed in Section 5.1.

| Note: The enhanced RTCP timer model defined in the RTP/AVPF
| profile is backwards compatible with legacy systems that
| implement only the RTP/AVP or RTP/SAVP profile, given some
| constraints on parameter configuration such as the RTCP
| bandwidth value and "trr-int". The most important factor for
| interworking with RTP/(S)AVP endpoints via a gateway is to set
| the "trr-int" parameter to a value representing 4 seconds; see
| Section 7.1.3 of [RFC8108].

The secure RTP (SRTP) profile extensions [RFC3711] are needed to
provide media encryption, integrity protection, replay protection,
and a limited form of source authentication. WebRTC endpoints MUST
NOT send packets using the basic RTP/AVP profile or the RTP/AVPF
profile; they MUST employ the full RTP/SAVPF profile to protect all
RTP and RTCP packets that are generated. In other words,
implementations MUST use SRTP and SRTCP. Secure RTCP (SRTCP). The RTP/SAVPF
profile MUST be configured using the cipher suites, DTLS-SRTP
protection profiles, keying mechanisms, and other parameters
described in [RFC8827].

4.3. Choice of RTP Payload Formats

Mandatory-to-implement audio codecs and RTP payload formats for
WebRTC endpoints are defined in [RFC7874]. Mandatory-to-implement
video codecs and RTP payload formats for WebRTC endpoints are defined
in [RFC7742]. WebRTC endpoints MAY additionally implement any other
codec for which an RTP payload format and associated signaling has
been defined.

WebRTC endpoints cannot assume that the other participants in an RTP
session understand any RTP payload format, no matter how common. The
mapping between RTP payload type numbers and specific configurations
of particular RTP payload formats MUST be agreed before those payload
types/formats can be used. In an SDP context, this can be done using
the "a=rtpmap:" and "a=fmtp:" attributes associated with an "m="
line, along with any other SDP attributes needed to configure the RTP
payload format.

Endpoints can signal support for multiple RTP payload formats or
multiple configurations of a single RTP payload format, as long as
each unique RTP payload format configuration uses a different RTP
payload type number. As outlined in Section 4.8, the RTP payload
type number is sometimes used to associate an RTP packet stream with
a signaling context. This association is possible provided unique
RTP payload type numbers are used in each context. For example, an
RTP packet stream can be associated with an SDP "m=" line by
comparing the RTP payload type numbers used by the RTP packet stream
with payload types signaled in the "a=rtpmap:" lines in the media
sections of the SDP. This leads to the following considerations:

If RTP packet streams are being associated with signaling contexts
based on the RTP payload type, then the assignment of RTP payload
type numbers MUST be unique across signaling contexts.

If the same RTP payload format configuration is used in multiple
contexts, then a different RTP payload type number has to be
assigned in each context to ensure uniqueness.

If the RTP payload type number is not being used to associate RTP
packet streams with a signaling context, then the same RTP payload
type number can be used to indicate the exact same RTP payload
format configuration in multiple contexts.

A single RTP payload type number MUST NOT be assigned to different
RTP payload formats, or different configurations of the same RTP
payload format, within a single RTP session (note that the "m=" lines
in an SDP BUNDLE group [RFC8843] form a single RTP session).

An endpoint that has signaled support for multiple RTP payload
formats MUST be able to accept data in any of those payload formats
at any time, unless it has previously signaled limitations on its
decoding capability. This requirement is constrained if several
types of media (e.g., audio and video) are sent in the same RTP
session. In such a case, a source (SSRC) is restricted to switching
only between the RTP payload formats signaled for the type of media
that is being sent by that source; see Section 4.4. To support rapid
rate adaptation by changing codecs, RTP does not require advance
signaling for changes between RTP payload formats used by a single
SSRC that were signaled during session setup.

If performing changes between two RTP payload types that use
different RTP clock rates, an RTP sender MUST follow the
recommendations in Section 4.1 of [RFC7160]. RTP receivers MUST
follow the recommendations in Section 4.3 of [RFC7160] in order to
support sources that switch between clock rates in an RTP session.
These recommendations for receivers are backwards compatible with the
case where senders use only a single clock rate.

4.4. Use of RTP Sessions

An association amongst a set of endpoints communicating using RTP is
known as an RTP session [RFC3550]. An endpoint can be involved in
several RTP sessions at the same time. In a multimedia session, each
type of media has typically been carried in a separate RTP session
(e.g., using one RTP session for the audio and a separate RTP session
using a different transport-layer flow for the video). WebRTC
endpoints are REQUIRED to implement support for multimedia sessions
in this way, separating each RTP session using different transport-
layer flows for compatibility with legacy systems (this is sometimes
called session multiplexing).

In modern-day networks, however, with the widespread use of network
address/port translators (NAT/NAPT) and firewalls, it is desirable to
reduce the number of transport-layer flows used by RTP applications.
This can be done by sending all the RTP packet streams in a single
RTP session, which will comprise a single transport-layer flow. This
will prevent the use of some quality-of-service mechanisms, as
discussed in Section 12.1.3. Implementations are therefore also
REQUIRED to support transport of all RTP packet streams, independent
of media type, in a single RTP session using a single transport-layer
flow, according to [RFC8860] (this is sometimes called SSRC
multiplexing). If multiple types of media are to be used in a single
RTP session, all participants in that RTP session MUST agree to this
usage. In an SDP context, the mechanisms described in [RFC8843] can
be used to signal such a bundle of RTP packet streams forming a
single RTP session.

Further discussion about the suitability of different RTP session
structures and multiplexing methods to different scenarios can be
found in [RFC8872].

4.5. RTP and RTCP Multiplexing

Historically, RTP and RTCP have been run on separate transport-layer
flows (e.g., two UDP ports for each RTP session, one for RTP and one
for RTCP). With the increased use of Network Address/Port
Translation (NAT/NAPT), this has become problematic, since
maintaining multiple NAT bindings can be costly. It also complicates
firewall administration, since multiple ports need to be opened to
allow RTP traffic. To reduce these costs and session setup times,
implementations are REQUIRED to support multiplexing RTP data packets
and RTCP control packets on a single transport-layer flow [RFC5761].
Such RTP and RTCP multiplexing MUST be negotiated in the signaling
channel before it is used. If SDP is used for signaling, this
negotiation MUST use the mechanism defined in [RFC5761].
Implementations can also support sending RTP and RTCP on separate
transport-layer flows, but this is OPTIONAL to implement. If an
implementation does not support RTP and RTCP sent on separate
transport-layer flows, it MUST indicate that using the mechanism
defined in [RFC8858].

Note that the use of RTP and RTCP multiplexed onto a single
transport-layer flow ensures that there is occasional traffic sent on
that port, even if there is no active media traffic. This can be
useful to keep NAT bindings alive [RFC6263].

4.6. Reduced Size RTCP

RTCP packets are usually sent as compound RTCP packets, and [RFC3550]
requires that those compound packets start with a Sender Report (SR) an SR or Receiver Report (RR) RR packet.
When using frequent RTCP feedback messages under the RTP/AVPF profile
[RFC4585], these statistics are not needed in every packet, and they
unnecessarily increase the mean RTCP packet size. This can limit the
frequency at which RTCP packets can be sent within the RTCP bandwidth
share.

To avoid this problem, [RFC5506] specifies how to reduce the mean
RTCP message size and allow for more frequent feedback. Frequent
feedback, in turn, is essential to make real-time applications
quickly aware of changing network conditions and to allow them to
adapt their transmission and encoding behavior. Implementations MUST
support sending and receiving noncompound RTCP feedback packets
[RFC5506]. Use of noncompound RTCP packets MUST be negotiated using
the signaling channel. If SDP is used for signaling, this
negotiation MUST use the attributes defined in [RFC5506]. For
backwards compatibility, implementations are also REQUIRED to support
the use of compound RTCP feedback packets if the remote endpoint does
not agree to the use of noncompound RTCP in the signaling exchange.

4.7. Symmetric RTP/RTCP

To ease traversal of NAT and firewall devices, implementations are
REQUIRED to implement and use symmetric RTP [RFC4961]. The reason
for using symmetric RTP is primarily to avoid issues with NATs and
firewalls by ensuring that the send and receive RTP packet streams,
as well as RTCP, are actually bidirectional transport-layer flows.
This will keep alive the NAT and firewall pinholes and help indicate
consent that the receive direction is a transport-layer flow the
intended recipient actually wants. In addition, it saves resources,
specifically ports at the endpoints, but also in the network, because
the NAT mappings or firewall state is not unnecessarily bloated. The
amount of per-flow QoS state kept in the network is also reduced.

4.8. Choice of RTP Synchronization Source (SSRC)

Implementations are REQUIRED to support signaled RTP synchronization
source (SSRC) identifiers. If SDP is used, this MUST be done using
the "a=ssrc:" SDP attribute defined in Sections 4.1 and 5 of
[RFC5576] and the "previous-ssrc" source attribute defined in
Section 6.2 of [RFC5576]; other per-SSRC attributes defined in
[RFC5576] MAY be supported.

While support for signaled SSRC identifiers is mandated, their use in
an RTP session is OPTIONAL. Implementations MUST be prepared to
accept RTP and RTCP packets using SSRCs that have not been explicitly
signaled ahead of time. Implementations MUST support random SSRC
assignment and MUST support SSRC collision detection and resolution,
according to [RFC3550]. When using signaled SSRC values, collision
detection MUST be performed as described in Section 5 of [RFC5576].

It is often desirable to associate an RTP packet stream with a non-
RTP context. For users of the WebRTC API, a mapping between SSRCs
and MediaStreamTracks is provided per Section 11. For gateways or
other usages, it is possible to associate an RTP packet stream with
an "m=" line in a session description formatted using SDP. If SSRCs
are signaled, this is straightforward (in SDP, the "a=ssrc:" line
will be at the media level, allowing a direct association with an
"m=" line). If SSRCs are not signaled, the RTP payload type numbers
used in an RTP packet stream are often sufficient to associate that
packet stream with a signaling context. For example, if RTP payload
type numbers are assigned as described in Section 4.3 of this memo,
the RTP payload types used by an RTP packet stream can be compared
with values in SDP "a=rtpmap:" lines, which are at the media level in
SDP and so map to an "m=" line.

4.9. Generation of the RTCP Canonical Name (CNAME)

The RTCP Canonical Name (CNAME) provides a persistent transport-level
identifier for an RTP endpoint. While the SSRC identifier for an RTP
endpoint can change if a collision is detected or when the RTP
application is restarted, its RTCP CNAME is meant to stay unchanged
for the duration of an RTCPeerConnection [W3C.WebRTC], so that RTP
endpoints can be uniquely identified and associated with their RTP
packet streams within a set of related RTP sessions.

Each RTP endpoint MUST have at least one RTCP CNAME, and that RTCP
CNAME MUST be unique within the RTCPeerConnection. RTCP CNAMEs
identify a particular synchronization context -- i.e., all SSRCs
associated with a single RTCP CNAME share a common reference clock.
If an endpoint has SSRCs that are associated with several
unsynchronized reference clocks, and hence different synchronization
contexts, it will need to use multiple RTCP CNAMEs, one for each
synchronization context.

Taking the discussion in Section 11 into account, a WebRTC endpoint
MUST NOT use more than one RTCP CNAME in the RTP sessions belonging
to a single RTCPeerConnection (that is, an RTCPeerConnection forms a
synchronization context). RTP middleboxes MAY generate RTP packet
streams associated with more than one RTCP CNAME, to allow them to
avoid having to resynchronize media from multiple different endpoints
that are part of a multiparty RTP session.

The RTP specification [RFC3550] includes guidelines for choosing a
unique RTP CNAME, but these are not sufficient in the presence of NAT
devices. In addition, long-term persistent identifiers can be
problematic from a privacy viewpoint (Section 13). Accordingly, a
WebRTC endpoint MUST generate a new, unique, short-term persistent
RTCP CNAME for each RTCPeerConnection, following [RFC7022], with a
single exception; if explicitly requested at creation, an
RTCPeerConnection MAY use the same CNAME as an existing
RTCPeerConnection within their common same-origin context.

A WebRTC endpoint MUST support reception of any CNAME that matches
the syntax limitations specified by the RTP specification [RFC3550]
and cannot assume that any CNAME will be chosen according to the form
suggested above.

4.10. Handling of Leap Seconds

The guidelines given in [RFC7164] regarding handling of leap seconds
to limit their impact on RTP media play-out and synchronization
SHOULD be followed.

5. WebRTC Use of RTP: Extensions

There are a number of RTP extensions that are either needed to obtain
full functionality, or extremely useful to improve on the baseline
performance, in the WebRTC context. One set of these extensions is
related to conferencing, while others are more generic in nature.
The following subsections describe the various RTP extensions
mandated or suggested for use within WebRTC.

5.1. Conferencing Extensions and Topologies

RTP is a protocol that inherently supports group communication.
Groups can be implemented by having each endpoint send its RTP packet
streams to an RTP middlebox that redistributes the traffic, by using
a mesh of unicast RTP packet streams between endpoints, or by using
an IP multicast group to distribute the RTP packet streams. These
topologies can be implemented in a number of ways as discussed in
[RFC7667].

While the use of IP multicast groups is popular in IPTV systems, the
topologies based on RTP middleboxes are dominant in interactive
video-conferencing environments. Topologies based on a mesh of
unicast transport-layer flows to create a common RTP session have not
seen widespread deployment to date. Accordingly, WebRTC endpoints
are not expected to support topologies based on IP multicast groups
or mesh-based topologies, such as a point-to-multipoint mesh
configured as a single RTP session ("Topo-Mesh" in the terminology of
[RFC7667]). However, a point-to-multipoint mesh constructed using
several RTP sessions, implemented in WebRTC using independent
RTCPeerConnections [W3C.WebRTC], can be expected to be used in WebRTC
and needs to be supported.

WebRTC endpoints implemented according to this memo are expected to
support all the topologies described in [RFC7667] where the RTP
endpoints send and receive unicast RTP packet streams to and from
some peer device, provided that peer can participate in performing
congestion control on the RTP packet streams. The peer device could
be another RTP endpoint, or it could be an RTP middlebox that
redistributes the RTP packet streams to other RTP endpoints. This
limitation means that some of the RTP middlebox-based topologies are
not suitable for use in WebRTC. Specifically:

* Video-switching Multipoint Control Units (MCUs) (Topo-Video-
switch-MCU) SHOULD NOT be used, since they make the use of RTCP
for congestion control and quality-of-service reports problematic
(see Section 3.8 of [RFC7667]).

* The Relay-Transport Translator (Topo-PtM-Trn-Translator) topology
SHOULD NOT be used, because its safe use requires a congestion
control algorithm or RTP circuit breaker that handles point to
multipoint, which has not yet been standardized.

The following topology can be used, however it has some issues worth
noting:

* Content-modifying MCUs with RTCP termination (Topo-RTCP-
terminating-MCU) MAY be used. Note that in this RTP topology, RTP
loop detection and identification of active senders is the
responsibility of the WebRTC application; since the clients are
isolated from each other at the RTP layer, RTP cannot assist with
these functions (see Section 3.9 of [RFC7667]).

The RTP extensions described in Sections 5.1.1 to 5.1.6 are designed
to be used with centralized conferencing, where an RTP middlebox
(e.g., a conference bridge) receives a participant's RTP packet
streams and distributes them to the other participants. These
extensions are not necessary for interoperability; an RTP endpoint
that does not implement these extensions will work correctly but
might offer poor performance. Support for the listed extensions will
greatly improve the quality of experience; to provide a reasonable
baseline quality, some of these extensions are mandatory to be
supported by WebRTC endpoints.

The RTCP conferencing extensions are defined in "Extended RTP Profile
for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/
AVPF)" [RFC4585] and "Codec Control Messages in the RTP Audio-Visual
Profile with Feedback (AVPF)" [RFC5104]; they are fully usable by the
secure variant of this profile (RTP/SAVPF) [RFC5124].

5.1.1. Full Intra Request (FIR)

The Full Intra Request message is defined in Sections 3.5.1 and 4.3.1
of Codec Control Messages [RFC5104]. It is used to make the mixer
request a new Intra picture from a participant in the session. This
is used when switching between sources to ensure that the receivers
can decode the video or other predictive media encoding with long
prediction chains. WebRTC endpoints that are sending media MUST
understand and react to FIR feedback messages they receive, since
this greatly improves the user experience when using centralized
mixer-based conferencing. Support for sending FIR messages is
OPTIONAL.

5.1.2. Picture Loss Indication (PLI)

The Picture Loss Indication message is defined in Section 6.3.1 of
the RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the
sending encoder that it lost the decoder context and would like to
have it repaired somehow. This is semantically different from the
Full Intra Request above, as there could be multiple ways to fulfill
the request. WebRTC endpoints that are sending media MUST understand
and react to PLI feedback messages as a loss-tolerance mechanism.
Receivers MAY send PLI messages.

5.1.3. Slice Loss Indication (SLI)

The Slice Loss Indication message is defined in Section 6.3.2 of the
RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the
encoder that it has detected the loss or corruption of one or more
consecutive macro blocks and would like to have these repaired
somehow. It is RECOMMENDED that receivers generate SLI feedback
messages if slices are lost when using a codec that supports the
concept of macro blocks. A sender that receives an SLI feedback
message SHOULD attempt to repair the lost slice(s).

5.1.4. Reference Picture Selection Indication (RPSI)

Reference Picture Selection Indication (RPSI) messages are defined in
Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video-encoding
standards allow the use of older reference pictures than the most
recent one for predictive coding. If such a codec is in use, and if
the encoder has learned that encoder-decoder synchronization has been
lost, then a known-as-correct reference picture can be used as a base
for future coding. The RPSI message allows this to be signaled.
Receivers that detect that encoder-decoder synchronization has been
lost SHOULD generate an RPSI feedback message if the codec being used
supports reference-picture selection. An RTP packet-stream sender
that receives such an RPSI message SHOULD act on that messages to
change the reference picture, if it is possible to do so within the
available bandwidth constraints and with the codec being used.

5.1.5. Temporal-Spatial Trade-Off Request (TSTR)

The temporal-spatial trade-off request and notification are defined
in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used
to ask the video encoder to change the trade-off it makes between
temporal and spatial resolution -- for example, to prefer high
spatial image quality but low frame rate. Support for TSTR requests
and notifications is OPTIONAL.

5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR)

The Temporary Maximum Media Stream Bit Rate Request (TMMBR) feedback
message is defined in Sections 3.5.4 and 4.2.1 of Codec Control
Messages [RFC5104]. This request and its corresponding Temporary
Maximum Media Stream Bit Rate Notification (TMMBN) message [RFC5104]
are used by a media receiver to inform the sending party that there
is a current limitation on the amount of bandwidth available to this
receiver. There can be various reasons for this: for example, an RTP
mixer can use this message to limit the media rate of the sender
being forwarded by the mixer (without doing media transcoding) to fit
the bottlenecks existing towards the other session participants.
WebRTC endpoints that are sending media are REQUIRED to implement
support for TMMBR messages and MUST follow bandwidth limitations set
by a TMMBR message received for their SSRC. The sending of TMMBR
messages is OPTIONAL.

5.2. Header Extensions

The RTP specification [RFC3550] provides the capability to include
RTP header extensions containing in-band data, but the format and
semantics of the extensions are poorly specified. The use of header
extensions is OPTIONAL in WebRTC, but if they are used, they MUST be
formatted and signaled following the general mechanism for RTP header
extensions defined in [RFC8285], since this gives well-defined
semantics to RTP header extensions.

As noted in [RFC8285], the requirement from the RTP specification
that header extensions are "designed so that the header extension may
be ignored" [RFC3550] stands. To be specific, header extensions MUST
only be used for data that can safely be ignored by the recipient
without affecting interoperability and MUST NOT be used when the
presence of the extension has changed the form or nature of the rest
of the packet in a way that is not compatible with the way the stream
is signaled (e.g., as defined by the payload type). Valid examples
of RTP header extensions might include metadata that is additional to
the usual RTP information but that can safely be ignored without
compromising interoperability.

5.2.1. Rapid Synchronization

Many RTP sessions require synchronization between audio, video, and
other content. This synchronization is performed by receivers, using
information contained in RTCP SR packets, as described in the RTP
specification [RFC3550]. This basic mechanism can be slow, however,
so it is RECOMMENDED that the rapid RTP synchronization extensions
described in [RFC6051] be implemented in addition to RTCP SR-based
synchronization.