codecInternet Engineering Task Force (IETF) J. SkoglundInternet-DraftRequest for Comments: 8486 Google LLC Updates: 7845(if approved)M. GraczykIntended status:Category: Standards TrackAugust 27,October 2018Expires: February 28, 2019ISSN: 2070-1721 Ambisonics in an Ogg Opus Containerdraft-ietf-codec-ambisonics-10Abstract This document defines an extension to the Opus audio codec to encapsulate codedambisonicsAmbisonics using the Ogg format. It also contains updates to RFC 7845 to reflect necessary changes in the description of channel mapping families. Status of This Memo ThisInternet-Draftissubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsan Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved fora maximumpublication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 ofsix monthsRFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 28, 2019.https://www.rfc-editor.org/info/rfc8486. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. AmbisonicsWithwith Ogg Opus . . . . . . . . . . . . . . . . . . 3 3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3 3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4 3.3. Allowed Numbers of Channels . . . . . . . . . . . . . . . 5 4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Updates to RFC 7845 . . . . . . . . . . . . . . . . . . . . . 6 5.1. Format of the Channel Mapping Table . . . . . . . . . . . 7 5.2. Unknown Mapping Families . . . . . . . . . . . . . . . . 8 6. Experimental Mapping Families . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 9.AcknowledgmentsReferences . . . . . . . . . . . . . . . . . . . . . . . . . 910.9.1. Normative References . . . . . . . . . . . . . . . . . . 9 9.2. Informative References . . . . . . .9 10.1. Normative References. . . . . . . . . . 10 Acknowledgments . . . . . . . .9 10.2. Informative References. . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction Ambisonics is a representation format forthree dimensionalthree-dimensional sound fieldswhichthat can be used for surround sound and immersivevirtualvirtual- reality playback. See[gerzon75][fellgett75] and [daniel04] for technical details on theambisonicsAmbisonics format. For the purposes of the this document,ambisonicsAmbisonics can be considered a multichannel audio stream. A separate stereo stream can be used alongside theambisonicsAmbisonics in a head-tracked virtual reality experience to provide so-called non- diegetic audio--- that is, audiowhichthat should remain unchanged bylistener head rotation; e.g.,rotation of the listener's head, such as narration or stereo music. Ogg is ageneral purposegeneral-purpose container, supporting audio, video, and other media. It can be used to encapsulate audio streams coded using the Opus codec. See [RFC6716] and [RFC7845] for technical details on the Opus codec and its encapsulation in the Oggcontainercontainer, respectively. This document extends the Ogg Opus format by defining two new channel mapping families for encodingambisonics.Ambisonics. The Ogg Opus format is extended indirectly by adding items with values 2 and 3 to theIANA"Opus Channel Mapping Families" IANA registry. When 2 or 3 are used as the Channel Mapping Family Number in an Ogg stream, the semantic meaning of the channels in the multichannel Opus stream is one of theambisonicsAmbisonics layouts defined in this document. This mapping can also be used in other contextswhichthat make use of the channel mappings defined by theOpus"Opus Channel MappingFamiliesFamilies" registry. Furthermore, mapping families 240 through 254 (inclusively) are reserved for experimental use. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. AmbisonicsWithwith Ogg Opus Ambisonics can be encapsulated in the Ogg format by encoding with the Opus codec and setting the channel mapping family value to 2 or 3 in the Ogg identificationheader (ID).(ID) header. A demuxer implementation encounteringChannel Mapping Familychannel mapping family 2 orFamily3 MUST interpret the Opus stream as containingambisonicsAmbisonics with the format described inSectionSections 3.1 orSection3.2, respectively. 3.1. Channel Mapping Family 2 This channel mapping uses the same channel mapping table format used by channel mapping family 1. The output channels areambisonicAmbisonic components ordered in Ambisonic Channel Number (ACN)order,order (which is defined in Figure1,1) followed by two optional channels ofnon-diegeticnon- diegetic stereo indexed (left, right). The termsorder"order" anddegree"degree" are defined according to [ambix]. ACN = n * (n + 1) + m, for order n and degree m. Figure 1: Ambisonic Channel Number (ACN) For theambisonic channelsAmbisonic channels, the ACN component corresponds to channel index as k = ACN. The reverse correspondence can also be computed for anambisonicAmbisonic channel with index k. order n = floor(sqrt(k)), degree m = k - n * (n + 1). Figure 2: Ambisonic Degree and Order from ACN Note that channel mapping family 2 allows for so-calledmixed order ambisonic representation wheremixed-order Ambisonic representation, in which only a subset of the fullambisonicAmbisonic order number of channels is encoded. By specifying the full number in the channel count field, the inactive ACNs can then be indicated in the channel mapping field using the index 255. Ambisonic channels are normalized with Schmidt Semi-Normalization (SN3D). The interpretation of theambisonicsAmbisonics signal as well as detailed definitions of ACN channel ordering and SN3D normalization are described in[ambix][ambix], Section 2.1. 3.2. Channel Mapping Family 3 In this mapping, C output channels (the channel count) are generated at the decoder by multiplying K = N + M decoded channels with a designated demixing matrix, D, having C rows and K columns (C and K do not have to be equal). Here, N denotes the number of streamsencodedencoded, and M is the number of thesewhichencoded streams that are coupled to produce two channels. As for channel mapping family22, this mapping family also allows for the encoding and decoding offull order ambisonics, mixed order ambisonics,full-order Ambisonics andfor non-diegeticmixed-order Ambisonics, as well as non- diegetic stereochannels, but alsochannels. Furthermore, it has the added flexibility of mixing channels. Let X denote a column vector containing K decoded channels X1, X2, ..., XK (from N streams), and let S denote a column vector containing C output streams S1, S2, ..., SC.ThenThen, S = D X,i.e.,as shown in Figure 3. / \ / \ / \ | S1 | | D11 D12 ... D1K | | X1 | | S2 | | D21 D22 ... D2K | | X2 | | ... | = | ... ... ... ... | | ... | | SC | | DC1 DC2 ... DCK | | XK | \ / \ / \ / Figure 3: Demixing in Channel Mapping Family 3 The matrix MUST be provided in the channel mapping table part of the identificationheader,header; seesectionSection 5.1.1inof [RFC7845]. The matrix replaces the need for a channel mappingfield andfield; for channel mapping family33, the mapping table has the following layout: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+ | Stream Count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Coupled Count | Demixing Matrix : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Channel Mapping Table for Channel Mapping Family 3 The fields in the channel mapping table have the following meaning: 1. Stream Count'N'"N" (8 bits, unsigned): This is the total number of streams encoded in each Ogg packet. 2. Coupled Stream Count'M'"M" (8 bits, unsigned): This is the number of the N streams whose decoders are to be configured to produce two channels (stereo). 3. Demixing Matrix (16*K*C bits, signed): The coefficients of the demixing matrix stored in column-major order as 16-bit, signed, two's complement fixed-point values with 15 fractional bits (Q15), little endian. If needed, the output gain field can be used for a normalization scale. Formixedmixed- orderambisonicAmbisonic representations, the silent ACN channels are indicated by all zeros in the corresponding rows of the mixing matrix. Thisallowsalso allows for mixed order with non-diegetic stereo as the number of columns implies the presence of non- diegetic channels. Note that [RFC7845] specifies that the identification header cannot exceed one "page", which is 65,025 octets. This limits theambisonicAmbisonic order, which then MUST be lower than 12, if full order is utilized and the number of coded streams is the same as theambisonicAmbisonic order plus the two non-diegetic channels. The total output channel number, C, MUST be set in the3rdthird field of the identification header. 3.3. Allowed Numbers of Channels For both channel mappingfamilyfamilies 2 andfamily3, the allowed numbers ofchannels:channels are (1 + n)^2 + 2j for n = 0, 1, ..., 14 and j = 0 or 1, where n denotes the (highest)ambisonicAmbisonic order and j denotes whether or not there is a separate non-diegetic stereo stream. This corresponds to periphonicambisonicsAmbisonics from zeroth to fourteenth order plus potentially two channels of non-diegetic stereo.ExplicitlyExplicitly, the allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 196, 198, 225, and 227. Note again that if fullambisonicAmbisonic order is used and the number of coded streams is the same as theambisonicAmbisonic order plus the two non-diegetic channels,due to the identification header length limit,the order must then be lower than12.12, due to the identification header length limit. 4. Downmixing The downmixing matrices in this section are only examples known to give acceptable results for stereo downmixing fromambisonics,Ambisonics, but other mixing strategies will be allowed, e.g., to emphasize a certain panning. An Ogg Opus player MAY use the matrix in Figure 5 to implement downmixing from multichannel files usingChannel Mapping Familychannel mapping families 2 and3,3 when there is no non-diegetic stereo. The first and secondambisonicAmbisonic channels are known as "W" and"Y""Y", respectively. The omitted coefficients in the matrix in the figure have the value 0.0. / \ / \ / \ | L | | 0.5 0.5 0.0 ... | | W | | R | = | 0.5 -0.5 0.0 ... | | Y | \ / \ / | ... | \ / Figure 5: Stereo Downmixing Matrix for Channel MappingFamilyFamilies 2 and 3 -onlyOnly Ambisonic Channels The firstambisonicAmbisonic channel (W) is a mono audio streamwhichthat represents the average audio signal over all directions. Since W is not directional, Ogg Opus players MAY use W directly for mono playback. If a non-diegetic stereo track is present, the player MAY use the matrix in Figure 6 for downmixing. Ls and Rs denote the two non- diegetic stereo channels. / \ / \ / \ | L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W | | R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y | \ / \ / | ... | | Ls | | Rs | \ / Figure 6: Stereo Downmixing Matrix for Channel MappingFamilyFamilies 2 and 3 - Ambisonic Channels Plus aNon-diegeticNon-Diegetic Stereo Stream 5. Updates to RFC 7845 5.1. Format of the Channel Mapping Table The language insectionSection 5.1.1inof [RFC7845] (copied below) implies that the channel mapping table, when present, has a fixed format for all channel mapping families: The order and meaning of these channels are defined by a channel mapping, which consists of the 'channel mapping family' octet and, for channel mapping families other than family 0, a 'channel mapping table', as illustrated in Figure 3. This document updates [RFC7845] to clarify that the format of the channel mapping table may depend on the channel mapping family: The order and meaning of these channels are defined by a channel mapping, which consists of the 'channel mapping family' octet and for channel mapping families other than family 0, a 'channel mapping table'. The format of the channel mapping table depends on the channel mapping family. Unless the channel mapping family requires a custom format for its channel mapping table, the RECOMMENDED channel mapping table format for new mapping families is illustrated in Figure 3. The change above is not meant to change how families 1 and 255 currently work. To ensure that, the first paragraph of Section 5.1.1.2 is changed from: Allowed numbers of channels: 1...8. Vorbis channel order (see below).toto: Allowed numbers of channels: 1...8, with the mapping specified according to Figure 3. Vorbis channel order (see below).Similary,Similarly, the first paragraph of Section 5.1.1.3 is changed from: Allowed numbers of channels: 1...255. No defined channel meaning.toto: Allowed numbers of channels: 1...255, with the mapping specified according to Figure 3. No defined channel meaning. 5.2. Unknown Mapping Families The treatment of unknown mapping families is changed slightly. Section 5.1.1.4 of [RFC7845] states: The remaining channel mapping families (2...254) are reserved. A demuxer implementation encountering a reserved 'channel mapping family' value SHOULD act as though the value is 255. This is changed to: The remaining channel mapping families (2...254) are reserved. A demuxer implementation encountering a 'channel mapping family' value that it does not recognize SHOULD NOT attempt to decode the packets and SHOULD NOT use any information except for the first 19 octets of the ID header packet(Fig.(Figure 2) and the comment header(Fig.(Figure 10). 6. Experimental Mapping Families To make development of new mapping families easier while reducing the risk of creating compatibility issues with non-finalversionversions of mapping families, mapping families 240 through 254 (inclusively) are now reserved for experiments and implementations of in-development families. Note that thesemapping familymapping-family experiments are not restricted toambisonics.Ambisonics. Implementers SHOULD attempt to use experimental family numbers that have not recently been used and SHOULD advertise what experimental numbers they use(e.g.(e.g., for Internet-Drafts). TheambisonicsAmbisonics mapping experiments that led to this document used experimental family 254 for family 2 and experimental family 253 for family 3. 7. Security Considerations Implementations of the Ogg container need to take appropriate security considerations into account, as outlined in Section108 of [RFC7845]. The extension defined in this document requires that semantic meaning be assigned to more channels than the existing Ogg format requires. Since more allocations will be required to encode and decode these semantically meaningful channels, care should be taken in any new allocation paths. Implementations MUST NOT overrun their allocated memory nor read from uninitialized memory when managing theambisonicAmbisonic channel mapping. 8. IANA Considerations This document updates the IANAMedia Typesregistry "Opus Channel Mapping Families" to add 17 new assignments.+---------+------------------------------+--------------------------++---------+----------------------+----------------------------------+ | Value | Description | Reference |+---------+------------------------------+--------------------------++---------+----------------------+----------------------------------+ | 0 | Mono, L/R stereo | Section 5.1.1.1 of [RFC7845], | | | |[RFC7845]Section 5 of this document | | | | | | 1 | 1-8 channel surround | Section 5.1.1.2 of [RFC7845], | | | |[RFC7845]Section 5 of this document | | | | | | 2 | Ambisonics asindividual| Section 3.1 of this document | | | individual channels |document| | | | | | 3 | Ambisonics withdemixing| Section 3.2 of this document | | | demixing matrix |document| | | | | | 240-254 | Experimental use | Section 6 of this| | | |document | | | | | | 255 | Discrete channels | Section 5.1.1.3 of [RFC7845], | | | |[RFC7845]Section 5 of this document |+---------+------------------------------+--------------------------++---------+----------------------+----------------------------------+ 9.Acknowledgments Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin Gorzel, and Andrew Allen for their guidance and valuable contributions to this document. 10.References10.1.9.1. Normative References [ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi, "AMBIX - A SUGGESTED AMBISONICS FORMAT", Ambisonics Symposium, June 2011, <http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/ ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997,<http://www.rfc-editor.org/info/rfc2119>.<https://www.rfc-editor.org/info/rfc2119>. [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012,<http://www.rfc-editor.org/info/rfc6716>.<https://www.rfc-editor.org/info/rfc6716>. [RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845, April 2016,<http://www.rfc-editor.org/info/rfc7845>.<https://www.rfc-editor.org/info/rfc7845>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.10.2.9.2. Informative References [daniel04] Daniel, J. and S. Moreau, "Further Study of Sound Field Coding with Higher Order Ambisonics", Audio Engineering Society Convention Paper, May 2004,<http://pcfarina.eng.unipr.it/Public/phd-thesis/ aes116%20high-passed%20hoa.pdf>. [gerzon75] Gerzon, M.,<https://www.researchgate.net/publication/277841868_Furthe r_Study_of_Sound_Field_Coding_with_Higher_Order_Ambisonics >. [fellgett75] Fellgett, P., "Ambisonics. Part one: General system description", Studio Sound vol. 17, no. 8, pp. 20-22, August 1975, <http://www.michaelgerzonphotos.org.uk/articles/ Ambisonics%201.pdf>. Acknowledgments Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin Gorzel, and Andrew Allen for their guidance and valuable contributions to this document. Authors' Addresses Jan Skoglund Google LLC 345 Spear Street San Francisco, CA 94105USAUnited States of America Email: jks@google.com Michael Graczyk Email: michael@mgraczyk.com