CLUE WGInternet Engineering Task Force (IETF) A. RomanowInternet-DraftRequest for Comments: 7262 Cisco SystemsIntended status:Category: Informational S. BotzkoExpires: June 15, 2014ISSN: 2070-1721 Polycom M. BarnesPolycom December 12, 2013MLB@Realtime Communications, LLC June 2014 Requirements for TelepresenceMulti-Streams draft-ietf-clue-telepresence-requirements-07.txtMultistreams Abstract This memo discusses the requirements forspecifications,specifications that enable telepresence interoperability by describing behaviors and protocols for Controlling Multiple Streams for Telepresence (CLUE). In addition, the problem statement and related definitions are also covered herein. Status ofthisThis Memo ThisInternet-Draftdocument issubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are amaximumcandidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status ofsix monthsthis document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on June 15, 2014.http://www.rfc-editor.org/info/rfc7262. Copyright Notice Copyright (c)20132014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . .. 32 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . .43 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . .. 76 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . ..10 7.IANASecurity Considerations . . . . . . . . . . . . . . . . . . .. .10 8.Security Considerations . . . . . . . . . . . . . . . . . . . 11 9.Informative References . . . . . . . . . . . . . . . . . . ..11Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 12 A.1. Changes from draft -06 . . . . . . . . . . . . . . . . . . 12 A.2. Changes from draft -05 . . . . . . . . . . . . . . . . . . 12 A.3. Changes from draft -04 . . . . . . . . . . . . . . . . . . 13 A.4. Changes from draft -03 . . . . . . . . . . . . . . . . . . 13 A.5. Changes from draft -02 . . . . . . . . . . . . . . . . . . 13 A.6. Changes from draft -01 . . . . . . . . . . . . . . . . . . 13 A.7. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 1. Introduction Telepresence systems greatly improve collaboration. In a telepresence conference (as used herein), the goal is to create an environment that gives the users a feeling of (co-located) presence - the feeling that a local user is in the same room with other local users and the remote parties. Currently, systems1. Introduction Telepresence systems greatly improve collaboration. In a telepresence conference (as used herein), the goal is to create an environment that gives the users a feeling of (co-located) presence -- the feeling that a local user is in the same room with other local users and remote parties. Currently, systems from different vendors often do not interoperate because they do the same tasks differently, as discussed in the Problem Statement sectionbelow.below (see Section 4). The approach taken in this memo is to set requirements for a future specification(s) that, when fulfilled by an implementation of the specification(s), provide for interoperability between IETFprotocolprotocol- based telepresence systems. It is anticipated that a solution for the requirements set out in this memo likely involves the exchange of adequate information about participating sites; this information that is currently not standardized by the IETF. The purpose of this document is to describe the requirements for a specification that enables interworking between different SIP-based [RFC3261] telepresence systems, by exchanging and negotiating appropriate information. In the context of the requirements in this document and related solution documents, this includes bothpoint topoint-to- point SIP sessions as well asSIP basedSIP-based conferences as described in the SIP conferencing framework [RFC4353] and theSIP basedSIP-based conference control [RFC4579] specifications.Non IETF protocol basedNon-IETF protocol-based systems, such as those based on ITU-T Rec.H.323,H.323 [ITU.H323], are out of scope. These requirements are for the specification, they are not requirements on the telepresence systems implementing thesolution/protocolsolution/ protocol that will be specified.TelepresenceToday, telepresence systems of differentvendors, today,vendors can follow radically different architectural approaches while offering a similar user experience. CLUE will not dictate telepresence architectural and implementation choices;howeverhowever, it will describe a protocol architecture for CLUE and how it relates to other protocols. CLUE enables interoperability between telepresence systems by exchanging information about the systems' characteristics. Systems can use this information to control their behavior to allow for interoperability between those systems. A telepresence session requires at least one sending and one receiving endpoint. Multiparty telepresence sessions include more thantwo endpoints,2 endpoints and centralized infrastructure such as Multipoint Control Units (MCUs) or equivalent. CLUE specifies the syntax, semantics, and control flow of information to enable the best possible user experience at those endpoints. Sending endpoints, or MCUs, are not mandated to use any of the CLUE specifications that describe their capabilities, attributes, or behavior. Similarly, it is not envisioned that endpoints or MCUsmustwill ever have to takeinto accountinformationreceived.received into account. However, by making available as much information as possible, and by taking into account as much information as has been received or exchanged, MCUs and endpoints are expected to select operation modes that enable the best possible user experience under their constraints. The document structure is as follows:Definitionsdefinitions are set out, followed by a description of the problem of telepresence interoperability that led to this work. Then the requirementstofor a specification addressing the current shortcomings are enumerated and discussed. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Definitions The following terms are used throughout this document and serve as a reference for other documents. Audio Mixing: refers to the accumulation of scaled audio signals to produce a single audio stream. SeeRTP Topologies,"RTP Topologies" [RFC5117]. Conference: used as defined in[RFC4353], A"A Framework for Conferencing within the Session Initiation Protocol(SIP).(SIP)" [RFC4353]. Endpoint: The logical point of final termination through receiving, decoding and rendering, and/or initiation through capturing, encoding, and sending of media streams. An endpoint consists of one or more physical deviceswhichthat source and sink media streams, and exactly one participant [RFC4353]Participant(which, in turn, includes exactly one SIPUser Agent).user agent). In contrast to an endpoint, an MCU may also send and receive media streams, but it is not the initiatornoror the final terminator in the sense thatMediamedia isCapturedcaptured orRendered.rendered. Endpoints can be anything from multiscreen/multicamera rooms to handheld devices. Endpoint Characteristics: include placement ofCapturecapture andRendering Devices,rendering devices, capture/render angle, resolution of cameras and screens, spatiallocationlocation, and mixing parameters of microphones. Endpoint characteristics are not specific to individual media streams sent by the endpoint. Layout: How rendered media streams are spatially arranged with respect to each other on asingle screen/mono audiotelepresenceendpoint,endpoint with a single screen and a single loudspeaker, and how rendered media streams are arranged with respect to each other on amultiple screen/speakertelepresenceendpoint.endpoint with multiple screens or loudspeakers. Note that audio as well as videoisare encompassed by the termlayout--inlayout -- in other words, included is the placement of audio streams onspeakersloudspeakers as well as video streams on video screens. Local: Sender and/or receiver physically co-located ("local") in the context of the discussion. MCU: Multipoint Control Unit (MCU) - a device that connects two or more endpoints together into one single multimedia conference [RFC5117]. An MCU may include aMixermixer [RFC4353]. Media: Any data that, after suitable encoding, can be conveyed over RTP, including audio,videovideo, or timed text. Model: a set of assumptions a telepresence system of a given vendor adheres to and expects the remote telepresence system(s)alsoto also adhere to. Remote: Sender and/or receiver on the other side of the communication channel (depending on context); i.e., notLocal.local. A remote can be anEndpointendpoint or an MCU. Render: the process of generating a representation from a media, such as displayed motion video or sound emitted from loudspeakers. Telepresence: an environment that givesnon co-locatednon-co-located users or user groups a feeling of (co-located) presence--- the feeling that aLocallocal user is in the same room with otherLocallocal users and theRemoteremote parties. The inclusion of Remote parties is achieved through multimedia communication including at least audio and video signals of high fidelity. 4. Problem Statement In order to create a "being there" experience characteristic of telepresence, media inputs need to be transported, received, and coordinated between participating systems. Different telepresence systems take diverse approaches in crafting a solution,or,or they implement similar solutions quite differently. They use disparate techniques, and they describe, control and negotiate media in dissimilar fashions. Such diversity creates an interoperability problem. The same issues are solved in different ways by different systems, so that they are not directly interoperable. This makes interworking difficult at best and sometimes impossible. Worse,many telepresence systems use proprietary protocol extensions to solve telepresence-related problems,even if those extensions are based on common standards such asSIP.SIP, many telepresence systems use proprietary protocol extensions to solve telepresence-related problems. Some degree of interworking between systems from different vendors is possible through transcoding and translation. This requires additional devices, which are expensive, are often not entirely automatic, andtheysometimes introduce unwelcome side effects, such as additional delay or degraded performance. Specialized knowledge is currently required to operate a telepresence conference with endpoints from different vendors, for example to configure transcoding and translating devices. Often such conferences do not start asplanned,planned or are interrupted by difficulties that arise. The general problem that needs to be solved can be described as follows. Today, each endpointsendsrenders the audio and video capturesbased uponit receives according to an implicitly assumed modelfor renderingthat stipulates how to produce a realistic depictionbased on this information.of the remote location. If all endpoints are manufactured by the same vendor, theywork withall share the same implicit model and render theinformation according to the model implicitly assumed by the vendor.received captures correctly. However, if the devices are from different vendors, the modelsthey each useused for rendering presence can and usually do differ. The result can be that the telepresence systems actually connect, but the user experiencesuffers,will suffer, for examplebecauseone system assumes that the first video stream is captured from the right camera, whereas the other assumes the first video stream is captured from the left camera. If Alice and Bob are at different sites, Alice needs to tell Bob about the camera and sound equipment arrangement at her site so that Bob's receiver can create an accurate rendering of her site. Alice and Bob need to agree on what the salient characteristics are as well as how to represent and communicate them. Characteristics may include number, placement, capture/render angle, resolution of cameras and screens, spatiallocationlocation, and audio mixing parameters of microphones. The telepresencemulti-streammultistream work seeks to describe the sender situation in a way that allows the receiver to render it realistically even though it may have a different rendering model than the sender. 5. Requirements Although some aspects of these requirements can be met by existing technology, such asSDP,the Session Description Protocol (SDP) [RFC4566], they are stated here to have a complete record ofwhatthe requirements forCLUE are,CLUE. Determining whether a requirement needs new workis neededorthey can be met by existing technology. Figuring this outnot will be part of the solution development,rather than part of the requirements. Note,and is not discussed in this document. Note that the term "solution" is used in these requirements to mean the protocol specifications, including extensions to existing protocols as well as any new protocols, developed to support the use cases. The solutioncanmight introduce additional functionality thatisn'tis not mapped directly to theserequirements -requirements; e.g., the detailed information carried in the signaling protocol(s). In cases where the requirements are directlyrelatedrelevant toaspecific usecase,cases as described in [RFC7205], a reference to the use case is provided.REQMT-1:REQ-1: The solution MUST support a description of the spatial arrangement of source video images sent in video streamswhichthat enables a satisfactory reproduction at the receiver of the original scene. This applies to each site in apoint to pointpoint- to-point or a multipoint meeting and refers to the spatial ordering within a site, not to the ordering of images between sites.Use case pointThis requirement relates topoint symmetric, andallotherthe usecases. REQMT-1a:cases described in [RFC7205]. REQ-1a: The solution MUST support a means of allowing the preservation of the order of images in the captured scene. For example, if John is to Susan's right in the image capture, John is also to Susan's right in the rendered image.REQMT-1b:REQ-1b: The solution MUST support a means of allowing the preservation of order of images in the scene in two dimensions - horizontal and vertical.REQMT-1c:REQ-1c: The solution MUST support a means to identify the relative location, within a scene, of the point of capture of individual video captures in three dimensions.REQMT-1d:REQ-1d: The solution MUST support a means to identify the area ofcoveragecoverage, within a scene, of individual video captures in three dimensions.REQMT-2:REQ-2: The solution MUST support a description of the spatial arrangement of captured source audio sent in audio streamswhichthat enables a satisfactory reproduction at the receiver in a spatially correct manner. This applies to each site in a point to point or a multipoint meeting and refers to the spatial ordering within a site, not the ordering of channels between sites.Use case pointThis requirement relates topoint symmetric, andall the use cases described in [RFC7205], but is particularly important in the Heterogeneous Systems usecases, especially heterogeneous. REQMT-2a:case. REQ-2a: The solution MUST support a means of preserving the spatial order of audio in the captured scene. For example, if John sounds as if he isaton Susan's right in the captured audio, John voice is also placedaton Susan's right in the rendered image.REQMT-2b:REQ-2b: The solution MUST support a means to identify the number and spatial arrangement of audio channels including monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels.REQMT-2c:REQ-2c: The solution MUST support a means to identify the point of capture of individual audio captures in three dimensions.REQMT-2d:REQ-2d: The solution MUST support a means to identify the area of coverage of individual audio captures in three dimensions.REQMT-3:REQ-3: The solution MUST enable individual audio streams to be associated with one or more video image captures, and individual video image captures to be associated with one or more audio captures, for the purpose of rendering proper position.Use case is pointThis requirement relates topoint symmetric, andall the usecases. REQMT-4:cases described in [RFC7205]. REQ-4: The solution MUST enable interoperability between endpoints that have a different number of similar devices. For example,onean endpoint may have 1 screen, 1speaker,loudspeaker, 1 camera, 1 mic, and another endpoint may have 3 screens, 2speakers,loudspeakers, 3 cameras and 2 microphones. Or, in amulti- pointmultipoint conference,onean endpoint may haveone1 screen, another may have 2screensscreens, and a third may have 3 screens. This includes endpoints where the number of devices of a given type is zero.Use case is asymmetric pointThis requirement relates topointthe Point-to-Point Meeting: Symmetric andmultipoint. REQMT-5:Multipoint Meeting use cases described in [RFC7205]. REQ-5: The solution MUST support means of enabling interoperability between telepresence endpoints where cameras are of different picture aspect ratios.REQMT-6:REQ-6: The solution MUST provide scaling informationwhichthat enables rendering of a video image at the actual size of the captured scene.REQMT-7:REQ-7: The solution MUST support means of enabling interoperability between telepresence endpoints where displays are of different resolutions.REQMT-8:REQ-8: The solution MUST support methods for handling different bit rates in the same conference.REQMT-9:REQ-9: The solution MUST support means of enabling interoperability between endpoints that send and receive different numbers of media streams.Use case heterogeneousThis requirement relates to the Heterogeneous Systems andmultipoint. REQMT-10:Multipoint Meeting use cases. REQ-10: The solution MUST ensure that endpoints that support telepresence extensions can establish a session with a SIP endpoint that does not support the telepresence extensions. For example, in the case of a SIP endpoint that supports a single audio and a single video stream, an endpoint that supports the telepresence extensions would setup a session with a single audio and single video stream using existing SIP and SDP mechanisms.REQMT-11:REQ-11: The solution MUST support a mechanism for determining whether or not an endpoint or MCU is capable of telepresence extensions.REQMT-12:REQ-12: The solution MUST support a means to enable more than two endpoints to participate in a teleconference.Use case multipoint. REQMT-13:This requirement relates to the Multipoint Meeting use case. REQ-13: The solution MUST support both transcoding and switching approachestofor providing multipoint conferences.REQMT-14:REQ-14: The solution MUST support mechanisms to allow media from one source endpoint or/and multiple source endpoints to be sent to a remote endpoint at a particular point in time. Which media is sent at a point in time may be based on local policy.REQMT-15:REQ-15: The solution MUST provide mechanisms to support the following: * Presentations with different media sources * Presentations for which the media streams are visible to all endpoints * Multiple, simultaneous presentation media streams, including presentation media streams that are spatially related to each other.Use case is presentation. REQMT-16:The requirement relates to the Presentation use case. REQ-16: The specification of any new protocols for the solution MUST provide extensibility mechanisms.REQMT-17:REQ-17: The solution MUST support a mechanism for allowing information about media captures to change during a conference.REQMT-18:REQ-18: The solution MUST provide a mechanism for the secure exchange of information about the media captures. 6. Acknowledgements Thisdraftdocument hasbenefittedbenefited from all the comments on the CLUE mailing list and a number of discussions. So many people contributed that it is not possible to list them all. However, the comments provided by Roberta Presta, Christian Groves and Paul Coverdale during WGLC were particularly helpful in completing the WG document. 7.IANA Considerations There are no IANA considerations associated with this specification. 8.Security ConsiderationsRequirement REQMT-18REQ-18 identifies the need to securely transport the information about media captures. It is important to note that session setup for a telepresence session will use SIP for basic session setup and either SIP orCCMPthe Centralized Conferencing Manipulation Protocol (CCMP) [RFC6503] for amulti-partymultiparty telepresence session. Information carried in the SIP signaling can be secured by the SIP security mechanisms as defined in [RFC3261]. In the case of conference control using CCMP, the security model and mechanisms as defined in theXCONCentralized Conferencing (XCON) Framework [RFC5239] and CCMP [RFC6503] documents would meet the requirement. Any additional signaling mechanism used to transport the information about media captureswould needneeds to define the mechanisms bythewhich the information is secure. The details for the mechanisms needs to be defined and described in the CLUE framework document and related solution document(s).9.8. Informative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC4353] Rosenberg, J., "A Framework for Conferencing with the Session Initiation Protocol (SIP)", RFC 4353, February 2006. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol (SIP) Call Control - Conferencing for User Agents", BCP 119, RFC 4579, August 2006. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. [RFC5239] Barnes, M., Boulton, C., and O. Levin, "A Framework for Centralized Conferencing", RFC 5239, June 2008. [RFC6503] Barnes, M., Boulton, C., Romano, S., and H. Schulzrinne, "Centralized Conferencing Manipulation Protocol", RFC 6503, March 2012.Appendix A. Changes From Earlier Versions Note to the RFC-Editor: please remove this section prior to publication as an RFC. A.1. Changes from draft -06 Addressing IETF LC comments/editorial nits resulting in the following changes: o Included expansion of CLUE in the abstract. o Deleted definitions for "Left" and "Right". o Section 5 - clarified that solution = protocol specifications to support requirements. o REQMT-1d, REQMT-2d: Changed term "extent" to "area of coverage" o REQMT-10 - clarified requirement with regards to interworking with non-CLUE endpoints o REQMT-15 - reworded to be more specific and normative o REQMT-16 - expanded on what is meant by "extensibility" A.2. Changes from draft -05 Addressing WGLC comments resulting in the following changes: o REQMT-12: Changed term "site" to "endpoint" o Intro: clarified that SIP based conferencing also is relevant to CLUE. o Intro: clarified that while CLUE doesn't dictate implementation choices, it does describe a framework for the protocol solution. o Clarified that mapping to use cases isn't comprehensive (i.e., only done when there is a direct correlation). o Added text that the requirements do not reflect all those required for the solution - i.e., the solution can provide more functionality as needed. o Editorial nits and clarifications - changed lc "must" to UC (REQMT-17). A.3. Changes from draft -04 o Removed REQMT-2c, related to issue #37 in the tracker. o Deleted REQMT-3b. Condensed REQMT-3 to subsume REQMT-3a. This is related to Issue #38 in the tracker. o Updated REQMT-14 based on (mailing list) resolution of Issue #39. o Deleted OPEN issue section as those were transferred to the ID tracker[RFC7205] Romanow, A., Botzko, S., Duckworth, M., andhave been resolved either by changes to this document or to earlier versions of the document A.4. Changes from draft -03 o Added a tad more text to the security section Paragraph 18. A.5. Changes from draft -02 o Updated IANA section - i.e., no IANA registrations required. o Added security requirement Paragraph 18. o Added some initial text to the security section. A.6. Changes from draft -01 o Cleaned up the Problem Statement section, re-worded. o Added Requirement Paragraph 17 in response to WG Issue #4 to make a requirementR. Even, "Use Cases fordynamically changing information. Approved by WG o Added requirements #1.c and #1.d. Approved by WG o Added requirements #2.d and #2.e. Approved by WG A.7. Changes From Draft -00 o Requirement #2, The solution MUST support a means to identify monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels. changed to The solution MUST support a means to identify the number and spatial arrangement of audio channels including monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels. o Added back references to the Use case document. * Requirement #1 Use case point to point symmetric, and all other use cases. * Requirement #2 Use case point to point symmetric, and all use cases, especially heterogeneous. * Requirement #3 Use case point to point symmetric, and all use cases. * Requirement #4 Use case is asymmetric point to point, and multipoint. * Requirement #9 Use case heterogeneous and multipoint. * Requirement #12 Use case multipoint.Telepresence Multistreams", RFC 7205, April 2014. [ITU.H323] ITU-T, "Packet-based Multimedia Communications Systems", ITU-T Recommendation H.323, December 2009. Authors' Addresses Allyn Romanow Cisco Systems San Jose, CA 95134 USAEmail:EMail: allyn@cisco.com Stephen Botzko Polycom Andover, MA 01810US Email:USA EMail: stephen.botzko@polycom.com Mary BarnesPolycom Email:MLB@Realtime Communications, LLC EMail: mary.ietf.barnes@gmail.com