rfc8845v5.txt   rfc8845.txt 
Internet Engineering Task Force (IETF) M. Duckworth, Ed. Internet Engineering Task Force (IETF) M. Duckworth, Ed.
Request for Comments: 8845 Request for Comments: 8845
Category: Standards Track A. Pepperell Category: Standards Track A. Pepperell
ISSN: 2070-1721 Acano ISSN: 2070-1721 Acano
S. Wenger S. Wenger
Tencent Tencent
November 2020 January 2021
Framework for Telepresence Multi-Streams Framework for Telepresence Multi-Streams
Abstract Abstract
This document defines a framework for a protocol to enable devices in This document defines a framework for a protocol to enable devices in
a telepresence conference to interoperate. The protocol enables a telepresence conference to interoperate. The protocol enables
communication of information about multiple media streams so a communication of information about multiple media streams so a
sending system and receiving system can make reasonable decisions sending system and receiving system can make reasonable decisions
about transmitting, selecting, and rendering the media streams. This about transmitting, selecting, and rendering the media streams. This
skipping to change at line 38 skipping to change at line 38
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841. Internet Standards is available in Section 2 of RFC 7841.
Information about the current status of this document, any errata, Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8845. https://www.rfc-editor.org/info/rfc8845.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at line 122 skipping to change at line 122
The framework is intended to support the use cases described in "Use The framework is intended to support the use cases described in "Use
Cases for Telepresence Multistreams" [RFC7205] and to meet the Cases for Telepresence Multistreams" [RFC7205] and to meet the
requirements in "Requirements for Telepresence Multistreams" requirements in "Requirements for Telepresence Multistreams"
[RFC7262]. This includes cases using multiple media streams that are [RFC7262]. This includes cases using multiple media streams that are
not necessarily telepresence. not necessarily telepresence.
The basic session setup for the use cases is based on SIP [RFC3261] The basic session setup for the use cases is based on SIP [RFC3261]
and SDP offer/answer [RFC3264]. In addition to basic SIP & SDP and SDP offer/answer [RFC3264]. In addition to basic SIP & SDP
offer/answer, signaling that is ControLling mUltiple streams for offer/answer, signaling that is ControLling mUltiple streams for
tElepresence (CLUE) specific is required to exchange the information tElepresence (CLUE) specific is required to exchange the information
describing the multiple media streams. The motivation for this describing the multiple Media Streams. The motivation for this
framework, an overview of the signaling, and the information required framework, an overview of the signaling, and the information required
to be exchanged are described in subsequent sections of this to be exchanged are described in subsequent sections of this
document. Companion documents describe the signaling details document. Companion documents describe the signaling details
[RFC8848], the data model [RFC8846], and the protocol [RFC8847]. [RFC8848], the data model [RFC8846], and the protocol [RFC8847].
2. Requirements Language 2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
skipping to change at line 153 skipping to change at line 153
Consumer describing specific aspects of the content of the Media Consumer describing specific aspects of the content of the Media
and any restrictions it has in terms of being able to provide and any restrictions it has in terms of being able to provide
certain Streams simultaneously. certain Streams simultaneously.
Audio Capture (AC): Media Capture for audio. Denoted as "ACn" in Audio Capture (AC): Media Capture for audio. Denoted as "ACn" in
the examples in this document. the examples in this document.
Capture: Same as Media Capture. Capture: Same as Media Capture.
Capture Device: A device that converts physical input, such as Capture Device: A device that converts physical input, such as
audio, video or text, into an electrical signal, in most cases to audio, video, or text, into an electrical signal, in most cases to
be fed into a Media encoder. be fed into a Media encoder.
Capture Encoding: A specific Encoding of a Media Capture, to be sent Capture Encoding: A specific Encoding of a Media Capture, to be sent
by a Media Provider to a Media Consumer via RTP. by a Media Provider to a Media Consumer via RTP.
Capture Scene: A structure representing a spatial region captured by Capture Scene: A structure representing a spatial region captured by
one or more Capture Devices, each capturing Media representing a one or more Capture Devices, each capturing Media representing a
portion of the region. The spatial region represented by a portion of the region. The spatial region represented by a
Capture Scene may correspond to a real region in physical space, Capture Scene may correspond to a real region in physical space,
such as a room. A Capture Scene includes attributes and one or such as a room. A Capture Scene includes attributes and one or
skipping to change at line 199 skipping to change at line 199
are not considered CLUE-enabled. are not considered CLUE-enabled.
Conference: Used as defined in "A Framework for Conferencing within Conference: Used as defined in "A Framework for Conferencing within
the Session Initiation Protocol (SIP)" [RFC4353]. the Session Initiation Protocol (SIP)" [RFC4353].
Configure Message: A CLUE message a Media Consumer sends to a Media Configure Message: A CLUE message a Media Consumer sends to a Media
Provider specifying which content and Media Streams it wants to Provider specifying which content and Media Streams it wants to
receive, based on the information in a corresponding Advertisement receive, based on the information in a corresponding Advertisement
message. message.
Consumer: short for Media Consumer. Consumer: Short for Media Consumer.
Encoding: short for Individual Encoding. Encoding: Short for Individual Encoding.
Encoding Group: A set of Encoding parameters representing a total Encoding Group: A set of Encoding parameters representing a total
Media Encoding capability to be subdivided across potentially Media Encoding capability to be subdivided across potentially
multiple Individual Encodings. multiple Individual Encodings.
Endpoint: A CLUE-capable device that is the logical point of final Endpoint: A CLUE-capable device that is the logical point of final
termination through receiving, decoding and Rendering, and/or termination through receiving, decoding and Rendering, and/or
initiation through capturing, encoding, and sending of Media initiation through capturing, encoding, and sending of Media
Streams. An Endpoint consists of one or more physical devices Streams. An Endpoint consists of one or more physical devices
that source and sink Media Streams, and exactly one [RFC4353] that source and sink Media Streams, and exactly one [RFC4353]
skipping to change at line 261 skipping to change at line 261
Denoted as "MCCn" in the example cases in this document. Denoted as "MCCn" in the example cases in this document.
Plane of Interest: The spatial plane within a Scene containing the Plane of Interest: The spatial plane within a Scene containing the
most-relevant subject matter. most-relevant subject matter.
Provider: Same as a Media Provider. Provider: Same as a Media Provider.
Render: The process of generating a representation from Media, such Render: The process of generating a representation from Media, such
as displayed motion video or sound emitted from loudspeakers. as displayed motion video or sound emitted from loudspeakers.
Scene: Same as a Capture Scene Scene: Same as a Capture Scene.
Simultaneous Transmission Set: A set of Media Captures that can be Simultaneous Transmission Set: A set of Media Captures that can be
transmitted simultaneously from a Media Provider. transmitted simultaneously from a Media Provider.
Single Media Capture: A Capture that contains Media from a single Single Media Capture: A Capture that contains Media from a single
source Capture Device, e.g., an Audio Capture from a single source Capture Device, e.g., an Audio Capture from a single
microphone or a Video Capture from a single camera. microphone or a Video Capture from a single camera.
Spatial Relation: The arrangement of two objects in space, in Spatial Relation: The arrangement of two objects in space, in
contrast to relation in time or other relationships. contrast to relation in time or other relationships.
Stream: A Capture Encoding sent from a Media Provider to a Media Stream: A Capture Encoding sent from a Media Provider to a Media
Consumer via RTP [RFC3550]. Consumer via RTP [RFC3550].
Stream Characteristics: The Media stream attributes commonly used in Stream Characteristics: The Media Stream attributes commonly used in
non-CLUE SIP/SDP environments (such as Media codec, bitrate, non-CLUE SIP/SDP environments (such as Media codec, bitrate,
resolution, profile/level, etc.) as well as CLUE-specific resolution, profile/level, etc.) as well as CLUE-specific
attributes, such as the Capture ID or a spatial location. attributes, such as the Capture ID or a spatial location.
Video Capture (VC): Media Capture for video. Denoted as VCn in the Video Capture (VC): Media Capture for video. Denoted as VCn in the
example cases in this document. example cases in this document.
Video Composite: A single image that is formed, normally by an RTP Video Composite: A single image that is formed, normally by an RTP
mixer inside an MCU, by combining visual elements from separate mixer inside an MCU, by combining visual elements from separate
sources. sources.
skipping to change at line 3118 skipping to change at line 3118
get the information it needs to construct MCC4, it has to send get the information it needs to construct MCC4, it has to send
Configure Messages to Endpoints A and B asking to receive MCC1 from Configure Messages to Endpoints A and B asking to receive MCC1 from
each of them, along with their AC1 audio. Now the MCU can use audio each of them, along with their AC1 audio. Now the MCU can use audio
energy information from the two incoming audio Streams from Endpoints energy information from the two incoming audio Streams from Endpoints
A and B to determine which of those alternatives is the current A and B to determine which of those alternatives is the current
talker. Based on that, the MCU uses either MCC1 from A or MCC1 from talker. Based on that, the MCU uses either MCC1 from A or MCC1 from
B as the source of MCC4 to send to Endpoint C. B as the source of MCC4 to send to Endpoint C.
13. IANA Considerations 13. IANA Considerations
This document does not require any IANA actions. This document has no IANA actions.
14. Security Considerations 14. Security Considerations
There are several potential attacks related to telepresence, There are several potential attacks related to telepresence,
specifically the protocols used by CLUE. This is the case due to specifically the protocols used by CLUE. This is the case due to
conferencing sessions, the natural involvement of multiple Endpoints, conferencing sessions, the natural involvement of multiple Endpoints,
and the many, often user-invoked, capabilities provided by the and the many, often user-invoked, capabilities provided by the
systems. systems.
An MCU involved in a CLUE session can experience many of the same An MCU involved in a CLUE session can experience many of the same
skipping to change at line 3267 skipping to change at line 3267
[RFC6351] Perreault, S., "xCard: vCard XML Representation", [RFC6351] Perreault, S., "xCard: vCard XML Representation",
RFC 6351, DOI 10.17487/RFC6351, August 2011, RFC 6351, DOI 10.17487/RFC6351, August 2011,
<https://www.rfc-editor.org/info/rfc6351>. <https://www.rfc-editor.org/info/rfc6351>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8846] Presta, R. and S P. Romano, "An XML Schema for the [RFC8846] Presta, R. and S P. Romano, "An XML Schema for the
Controlling Multiple Streams for Telepresence (CLUE) Data Controlling Multiple Streams for Telepresence (CLUE) Data
Model", DOI 10.17487/RFC8846, RFC 8846, November 2020, Model", RFC 8846, DOI 10.17487/RFC8846, January 2021,
<http://www.rfc-editor.org/info/rfc8846>. <http://www.rfc-editor.org/info/rfc8846>.
[RFC8847] Presta, R. and S P. Romano, "Protocol for Controlling [RFC8847] Presta, R. and S P. Romano, "Protocol for Controlling
Multiple Streams for Telepresence (CLUE)", RFC 8847, Multiple Streams for Telepresence (CLUE)", RFC 8847,
DOI 10.17487/RFC8847, November 2020, DOI 10.17487/RFC8847, January 2021,
<https://www.rfc-editor.org/info/rfc8847>. <https://www.rfc-editor.org/info/rfc8847>.
[RFC8848] Hanton, R., Kyzivat, P., Xiao, L., and C. Groves, "Session [RFC8848] Hanton, R., Kyzivat, P., Xiao, L., and C. Groves, "Session
Signaling for Controlling Multiple Streams for Signaling for Controlling Multiple Streams for
Telepresence (CLUE)", RFC 8848, DOI 10.17487/RFC8848, Telepresence (CLUE)", RFC 8848, DOI 10.17487/RFC8848,
November 2020, <https://www.rfc-editor.org/info/rfc8848>. January 2021, <https://www.rfc-editor.org/info/rfc8848>.
[RFC8850] Holmberg, C., "Controlling Multiple Streams for [RFC8850] Holmberg, C., "Controlling Multiple Streams for
Telepresence (CLUE) Protocol Data Channel", RFC 8850, Telepresence (CLUE) Protocol Data Channel", RFC 8850,
DOI 10.17487/RFC8850, November 2020, DOI 10.17487/RFC8850, January 2021,
<https://www.rfc-editor.org/info/rfc8850>. <https://www.rfc-editor.org/info/rfc8850>.
15.2. Informative References 15.2. Informative References
[RFC4353] Rosenberg, J., "A Framework for Conferencing with the [RFC4353] Rosenberg, J., "A Framework for Conferencing with the
Session Initiation Protocol (SIP)", RFC 4353, Session Initiation Protocol (SIP)", RFC 4353,
DOI 10.17487/RFC4353, February 2006, DOI 10.17487/RFC4353, February 2006,
<https://www.rfc-editor.org/info/rfc4353>. <https://www.rfc-editor.org/info/rfc4353>.
[RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
skipping to change at line 3317 skipping to change at line 3317
Telepresence Multistreams", RFC 7262, Telepresence Multistreams", RFC 7262,
DOI 10.17487/RFC7262, June 2014, DOI 10.17487/RFC7262, June 2014,
<https://www.rfc-editor.org/info/rfc7262>. <https://www.rfc-editor.org/info/rfc7262>.
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
DOI 10.17487/RFC7667, November 2015, DOI 10.17487/RFC7667, November 2015,
<https://www.rfc-editor.org/info/rfc7667>. <https://www.rfc-editor.org/info/rfc7667>.
[RFC8849] Even, R. and J. Lennox, "Mapping RTP Streams to [RFC8849] Even, R. and J. Lennox, "Mapping RTP Streams to
Controlling Multiple Streams for Telepresence (CLUE) Media Controlling Multiple Streams for Telepresence (CLUE) Media
Captures", RFC 8849, DOI 10.17487/RFC8849, November 2020, Captures", RFC 8849, DOI 10.17487/RFC8849, January 2021,
<https://www.rfc-editor.org/info/rfc8849>. <https://www.rfc-editor.org/info/rfc8849>.
Acknowledgements Acknowledgements
Allyn Romanow and Brian Baldino were authors of early draft versions. Allyn Romanow and Brian Baldino were authors of early draft versions.
Mark Gorzynski also contributed much to the initial approach. Many Mark Gorzynski also contributed much to the initial approach. Many
others also contributed, including Christian Groves, Jonathan Lennox, others also contributed, including Christian Groves, Jonathan Lennox,
Paul Kyzivat, Rob Hanton, Roni Even, Christer Holmberg, Stephen Paul Kyzivat, Rob Hanton, Roni Even, Christer Holmberg, Stephen
Botzko, Mary Barnes, John Leslie, and Paul Coverdale. Botzko, Mary Barnes, John Leslie, and Paul Coverdale.
 End of changes. 14 change blocks. 
14 lines changed or deleted 14 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/