rfc9328.original | rfc9328.txt | |||
---|---|---|---|---|
avtcore S. Zhao | Internet Engineering Task Force (IETF) S. Zhao | |||
Internet-Draft Intel | Request for Comments: 9328 Intel | |||
Intended status: Standards Track S. Wenger | Category: Standards Track S. Wenger | |||
Expires: 2 February 2023 Tencent | ISSN: 2070-1721 Tencent | |||
Y. Sanchez | Y. Sanchez | |||
Fraunhofer HHI | Fraunhofer HHI | |||
Y.-K. Wang | Y.-K. Wang | |||
Bytedance Inc. | Bytedance Inc. | |||
M. M Hannuksela | M. M Hannuksela | |||
Nokia Technologies | Nokia Technologies | |||
1 August 2022 | December 2022 | |||
RTP Payload Format for Versatile Video Coding (VVC) | RTP Payload Format for Versatile Video Coding (VVC) | |||
draft-ietf-avtcore-rtp-vvc-18 | ||||
Abstract | Abstract | |||
This memo describes an RTP payload format for the video coding | This memo describes an RTP payload format for the Versatile Video | |||
standard ITU-T Recommendation H.266 and ISO/IEC International | Coding (VVC) specification, which was published as both ITU-T | |||
Standard 23090-3, both also known as Versatile Video Coding (VVC) and | Recommendation H.266 and ISO/IEC International Standard 23090-3. VVC | |||
developed by the Joint Video Experts Team (JVET). The RTP payload | was developed by the Joint Video Experts Team (JVET). The RTP | |||
format allows for packetization of one or more Network Abstraction | payload format allows for packetization of one or more Network | |||
Layer (NAL) units in each RTP packet payload as well as fragmentation | Abstraction Layer (NAL) units in each RTP packet payload, as well as | |||
of a NAL unit into multiple RTP packets. The payload format has wide | fragmentation of a NAL unit into multiple RTP packets. The payload | |||
applicability in videoconferencing, Internet video streaming, and | format has wide applicability in videoconferencing, Internet video | |||
high-bitrate entertainment-quality video, among other applications. | streaming, and high-bitrate entertainment-quality video, among other | |||
applications. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 2 February 2023. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9328. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
1.1. Overview of the VVC Codec . . . . . . . . . . . . . . . . 3 | 1.1. Overview of the VVC Codec | |||
1.1.1. Coding-Tool Features (informative) . . . . . . . . . 4 | 1.1.1. Coding-Tool Features (Informative) | |||
1.1.2. Systems and Transport Interfaces (informative) . . . 6 | 1.1.2. Systems and Transport Interfaces (Informative) | |||
1.1.3. High-Level Picture Partitioning (informative) . . . . 11 | 1.1.3. High-Level Picture Partitioning (Informative) | |||
1.1.4. NAL Unit Header . . . . . . . . . . . . . . . . . . . 13 | 1.1.4. NAL Unit Header | |||
1.2. Overview of the Payload Format . . . . . . . . . . . . . 15 | 1.2. Overview of the Payload Format | |||
2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 2. Conventions | |||
3. Definitions and Abbreviations . . . . . . . . . . . . . . . . 15 | 3. Definitions and Abbreviations | |||
3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 15 | 3.1. Definitions | |||
3.1.1. Definitions from the VVC Specification . . . . . . . 16 | 3.1.1. Definitions from the VVC Specification | |||
3.1.2. Definitions Specific to This Memo . . . . . . . . . . 19 | 3.1.2. Definitions Specific to This Memo | |||
3.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 19 | 3.2. Abbreviations | |||
4. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . 20 | 4. RTP Payload Format | |||
4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 21 | 4.1. RTP Header Usage | |||
4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22 | 4.2. Payload Header Usage | |||
4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22 | 4.3. Payload Structures | |||
4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23 | 4.3.1. Single NAL Unit Packets | |||
4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23 | 4.3.2. Aggregation Packets (APs) | |||
4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 28 | 4.3.3. Fragmentation Units | |||
4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 31 | 4.4. Decoding Order Number | |||
5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 33 | 5. Packetization Rules | |||
6. De-packetization Process . . . . . . . . . . . . . . . . . . 34 | 6. De-packetization Process | |||
7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 36 | 7. Payload Format Parameters | |||
7.1. Media Type Registration . . . . . . . . . . . . . . . . . 36 | 7.1. Media Type Registration | |||
7.2. Optional Parameters Definition . . . . . . . . . . . . . 37 | 7.2. Optional Parameters Definition | |||
7.3. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 47 | 7.3. SDP Parameters | |||
7.3.1. Mapping of Payload Type Parameters to SDP . . . . . . 48 | 7.3.1. Mapping of Payload Type Parameters to SDP | |||
7.3.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 50 | 7.3.2. Usage with SDP Offer/Answer Model | |||
7.3.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 59 | 7.3.3. Multicast | |||
7.3.4. Usage in Declarative Session Descriptions . . . . . . 59 | 7.3.4. Usage in Declarative Session Descriptions | |||
7.3.5. Considerations for Parameter Sets . . . . . . . . . . 61 | 7.3.5. Considerations for Parameter Sets | |||
8. Use with Feedback Messages | ||||
8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 61 | 8.1. Picture Loss Indication (PLI) | |||
8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 61 | 8.2. Full Intra Request (FIR) | |||
8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 61 | 9. Security Considerations | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 62 | 10. Congestion Control | |||
10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 63 | 11. IANA Considerations | |||
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64 | 12. References | |||
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 64 | 12.1. Normative References | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 64 | 12.2. Informative References | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . 64 | Acknowledgements | |||
13.2. Informative References . . . . . . . . . . . . . . . . . 66 | Authors' Addresses | |||
Appendix A. Change History . . . . . . . . . . . . . . . . . . . 68 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 68 | ||||
1. Introduction | 1. Introduction | |||
The Versatile Video Coding specification was formally published as | The Versatile Video Coding specification was formally published as | |||
both ITU-T Recommendation H.266 [VVC] and ISO/IEC International | both ITU-T Recommendation H.266 [VVC] and ISO/IEC International | |||
Standard 23090-3 [ISO23090-3]. VVC is reported to provide | Standard 23090-3 [ISO23090-3]. VVC is reported to provide | |||
significant coding efficiency gains over High Efficiency Video Coding | significant coding efficiency gains over High Efficiency Video Coding | |||
[HEVC], also known as H.265, and other earlier video codecs. | [HEVC], also known as H.265, and other earlier video codecs. | |||
This memo specifies an RTP payload format for VVC. It shares its | This memo specifies an RTP payload format for VVC. It shares its | |||
basic design with the NAL (Network Abstraction Layer) unit based RTP | basic design with the NAL-unit-based RTP payload formats of Advanced | |||
payload formats of AVC Video Coding [RFC6184], Scalable Video Coding | Video Coding (AVC) [RFC6184], Scalable Video Coding (SVC) [RFC6190], | |||
(SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and | and High Efficiency Video Coding (HEVC) [RFC7798], as well as their | |||
their respective predecessors. With respect to design philosophy, | respective predecessors. With respect to design philosophy, | |||
security, congestion control, and overall implementation complexity, | security, congestion control, and overall implementation complexity, | |||
it has similar properties to those earlier payload format | it has similar properties to those earlier payload format | |||
specifications. This is a conscious choice, as at least RFC 6184 is | specifications. This is a conscious choice, as at least [RFC6184] is | |||
widely deployed and generally known in the relevant implementer | widely deployed and generally known in the relevant implementer | |||
communities. Certain scalability-related mechanisms known from | communities. Certain scalability-related mechanisms known from | |||
[RFC6190] were incorporated into this document, as VVC version 1 | [RFC6190] were incorporated into this document, as VVC version 1 | |||
supports temporal, spatial, and signal-to-noise ratio (SNR) | supports temporal, spatial, and signal-to-noise ratio (SNR) | |||
scalability. | scalability. | |||
1.1. Overview of the VVC Codec | 1.1. Overview of the VVC Codec | |||
VVC and HEVC share a similar hybrid video codec design. In this | VVC and HEVC share a similar hybrid video codec design. In this | |||
memo, we provide a very brief overview of those features of VVC that | memo, we provide a very brief overview of those features of VVC that | |||
are, in some form, addressed by the payload format specified herein. | are, in some form, addressed by the payload format specified herein. | |||
Implementers have to read, understand, and apply the ITU-T/ISO/IEC | Implementers have to read, understand, and apply the ITU-T/ISO/IEC | |||
specifications pertaining to VVC to arrive at interoperable, well- | specifications pertaining to VVC to arrive at interoperable, well- | |||
performing implementations. | performing implementations. | |||
Conceptually, both VVC and HEVC include a Video Coding Layer (VCL), | Conceptually, both VVC and HEVC include a Video Coding Layer (VCL), | |||
which is often used to refer to the coding-tool features, and a NAL, | which is often used to refer to the coding-tool features, and a NAL, | |||
which is often used to refer to the systems and transport interface | which is often used to refer to the systems and transport interface | |||
aspects of the codecs. | aspects of the codecs. | |||
1.1.1. Coding-Tool Features (informative) | 1.1.1. Coding-Tool Features (Informative) | |||
Coding tool features are described below with occasional reference to | Coding-tool features are described below with occasional reference to | |||
the coding tool set of HEVC, which is well known in the community. | the coding-tool set of HEVC, which is well known in the community. | |||
Similar to earlier hybrid-video-coding-based standards, including | Similar to earlier hybrid-video-coding-based standards, including | |||
HEVC, the following basic video coding design is employed by VVC. A | HEVC, the following basic video coding design is employed by VVC. A | |||
prediction signal is first formed by either intra- or motion- | prediction signal is first formed by either intra- or motion- | |||
compensated prediction, and the residual (the difference between the | compensated prediction, and the residual (the difference between the | |||
original and the prediction) is then coded. The gains in coding | original and the prediction) is then coded. The gains in coding | |||
efficiency are achieved by redesigning and improving almost all parts | efficiency are achieved by redesigning and improving almost all parts | |||
of the codec over earlier designs. In addition, VVC includes several | of the codec over earlier designs. In addition, VVC includes several | |||
tools to make the implementation on parallel architectures easier. | tools to make the implementation on parallel architectures easier. | |||
Finally, VVC includes temporal, spatial, and SNR scalability as well | Finally, VVC includes temporal, spatial, and SNR scalability, as well | |||
as multiview coding support. | as multiview coding support. | |||
Coding blocks and transform structure | Coding blocks and transform structure | |||
Among major coding-tool differences between HEVC and VVC, one of | ||||
Among major coding-tool differences between HEVC and VVC, one of the | the important improvements is the more flexible coding tree | |||
important improvements is the more flexible coding tree structure in | structure in VVC, i.e., multi-type tree. In addition to quadtree, | |||
VVC, i.e., multi-type tree. In addition to quadtree, binary and | binary and ternary trees are also supported, which contributes | |||
ternary trees are also supported, which contributes significant | significant improvement in coding efficiency. Moreover, the | |||
improvement in coding efficiency. Moreover, the maximum size of a | maximum size of a coding tree unit (CTU) is increased from 64x64 | |||
coding tree unit (CTU) is increased from 64x64 to 128x128. To | to 128x128. To improve the coding efficiency of chroma signal, | |||
improve the coding efficiency of chroma signal, luma chroma separated | luma-chroma-separated trees at CTU level may be employed for intra | |||
trees at CTU level may be employed for intra-slices. The square | slices. The square transforms in HEVC are extended to non-square | |||
transforms in HEVC are extended to non-square transforms for | transforms for rectangular blocks resulting from binary and | |||
rectangular blocks resulting from binary and ternary tree splits. | ternary tree splits. Besides, VVC supports multiple transform | |||
Besides, VVC supports multiple transform sets (MTS), including DCT-2, | sets (MTSs), including DCT-2, DST-7, and DCT-8, as well as the | |||
DST-7, and DCT-8 as well as the non-separable secondary transform. | non-separable secondary transform. The transforms used in VVC can | |||
The transforms used in VVC can have different sizes with support for | have different sizes with support for larger transform sizes. For | |||
larger transform sizes. For DCT-2, the transform sizes range from | DCT-2, the transform sizes range from 2x2 to 64x64, and for DST-7 | |||
2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from | and DCT-8, the transform sizes range from 4x4 to 32x32. In | |||
4x4 to 32x32. In addition, VVC also support sub-block transform for | addition, VVC also support sub-block transform for both intra- and | |||
both intra and inter coded blocks. For intra coded blocks, intra | inter-coded blocks. For intra-coded blocks, intra sub- | |||
sub-partitioning (ISP) may be used to allow sub-block based intra | partitioning (ISP) may be used to allow sub-block-based intra | |||
prediction and transform. For inter blocks, sub-block transform may | prediction and transform. For inter blocks, sub-block transform | |||
be used assuming that only a part of an inter-block has non-zero | may be used assuming that only a part of an inter block has non- | |||
transform coefficients. | zero transform coefficients. | |||
Entropy coding | Entropy coding | |||
Similar to HEVC, VVC uses a single entropy-coding engine, which is | ||||
Similar to HEVC, VVC uses a single entropy-coding engine, which is | based on context adaptive binary arithmetic coding [CABAC] but | |||
based on context adaptive binary arithmetic coding [CABAC], but with | with the support of multi-window sizes. The window sizes can be | |||
the support of multi-window sizes. The window sizes can be | initialized differently for different context models. Due to such | |||
initialized differently for different context models. Due to such a | a design, it has more efficient adaptation speed and better coding | |||
design, it has more efficient adaptation speed and better coding | efficiency. A joint chroma residual coding scheme is applied to | |||
efficiency. A joint chroma residual coding scheme is applied to | further exploit the correlation between the residuals of two color | |||
further exploit the correlation between the residuals of two color | components. In VVC, different residual coding schemes are applied | |||
components. In VVC, different residual coding schemes are applied | for regular transform coefficients and residual samples generated | |||
for regular transform coefficients and residual samples generated | using transform-skip mode. | |||
using transform-skip mode. | ||||
In-loop filtering | In-loop filtering | |||
VVC has more feature support in loop filters than HEVC. The | ||||
VVC has more feature support in loop filters than HEVC. The | deblocking filter in VVC is similar to HEVC but operates at a | |||
deblocking filter in VVC is similar to HEVC but operates at a smaller | smaller grid. After deblocking and sample adaptive offset (SAO), | |||
grid. After deblocking and sample adaptive offset (SAO), an adaptive | an adaptive loop filter (ALF) may be used. As a Wiener filter, | |||
loop filter (ALF) may be used. As a Wiener filter, ALF reduces | ALF reduces distortion of decoded pictures. Besides, VVC | |||
distortion of decoded pictures. Besides, VVC introduces a new module | introduces a new module called luma mapping with chroma scaling to | |||
called luma mapping with chroma scaling to fully utilize the dynamic | fully utilize the dynamic range of signal so that rate-distortion | |||
range of signal so that rate-distortion performance of both Standard | performance of both Standard Dynamic Range (SDR) and High Dynamic | |||
Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved. | Range (HDR) content is improved. | |||
Motion prediction and coding | Motion prediction and coding | |||
Compared to HEVC, VVC introduces several improvements in this | ||||
area. First, there is the adaptive motion vector resolution | ||||
(AMVR), which can save bit cost for motion vectors by adaptively | ||||
signaling motion vector resolution. Then, the affine motion | ||||
compensation is included to capture complicated motion-like | ||||
zooming and rotation. Meanwhile, prediction refinement with the | ||||
optical flow (PROF) with affine mode is further deployed to mimic | ||||
affine motion at the pixel level. Thirdly, the decoder-side | ||||
motion vector refinement (DMVR) is a method to derive the motion | ||||
vector at the decoder side based on block matching so that fewer | ||||
bits may be spent on motion vectors. Bidirectional optical flow | ||||
(BDOF) is a similar method to PROF. BDOF adds a sample-wise | ||||
offset at the 4x4 sub-block level that is derived with equations | ||||
based on gradients of the prediction samples and a motion | ||||
difference relative to coding-unit (CU) motion vectors. | ||||
Furthermore, merge with motion vector difference (MMVD) is a | ||||
special mode that further signals a limited set of motion vector | ||||
differences on top of merge mode. In addition to MMVD, there are | ||||
another three types of special merge modes, i.e., sub-block merge, | ||||
triangle, and combined intra/inter prediction (CIIP). The sub- | ||||
block merge list includes one candidate of sub-block temporal | ||||
motion vector prediction (SbTMVP) and up to four candidates of | ||||
affine motion vectors. Triangle is based on triangular block | ||||
motion compensation. CIIP combines intra and inter predictions | ||||
with weighting. Adaptive weighting may be employed with a block- | ||||
level tool called bi-prediction with CU-based weighting (BCW), | ||||
which provides more flexibility than in HEVC. | ||||
Compared to HEVC, VVC introduces several improvements in this area. | Intra prediction and intra coding | |||
First, there is the adaptive motion vector resolution (AMVR), which | To capture the diversified local image texture directions with | |||
can save bit cost for motion vectors by adaptively signaling motion | finer granularity, VVC supports 65 angular directions instead of | |||
vector resolution. Then the affine motion compensation is included | 33 directions in HEVC. The intra mode coding is based on a 6- | |||
to capture complicated motion like zooming and rotation. Meanwhile, | most-probable-modes scheme, and the 6 most probable modes are | |||
prediction refinement with the optical flow with affine mode (PROF) | derived using the neighboring intra prediction directions. In | |||
is further deployed to mimic affine motion at the pixel level. | addition, to deal with the different distributions of intra | |||
Thirdly the decoder side motion vector refinement (DMVR) is a method | prediction angles for different block aspect ratios, a wide-angle- | |||
to derive MV vector at decoder side based on block matching so that | intra-prediction (WAIP) scheme is applied in VVC by including | |||
fewer bits may be spent on motion vectors. Bi-directional optical | intra prediction angles beyond those present in HEVC. Unlike | |||
flow (BDOF) is a similar method to PROF. BDOF adds a sample wise | HEVC, which only allows using the most adjacent line of reference | |||
offset at 4x4 sub-block level that is derived with equations based on | samples for intra prediction, VVC also allows using two further | |||
gradients of the prediction samples and a motion difference relative | reference lines, known as multi-reference-line (MRL) intra | |||
to CU motion vectors. Furthermore, merge with motion vector | prediction. The additional reference lines can be only used for | |||
difference (MMVD) is a special mode, which further signals a limited | the 6 most probable intra prediction modes. To capture the strong | |||
set of motion vector differences on top of merge mode. In addition | correlation between different color components, in VVC, a cross- | |||
to MMVD, there are another three types of special merge modes, i.e., | component linear mode (CCLM) is utilized, which assumes a linear | |||
sub-block merge, triangle, and combined intra-/inter-prediction | relationship between the luma sample values and their associated | |||
(CIIP). Sub-block merge list includes one candidate of sub-block | chroma samples. For intra prediction, VVC also applies a | |||
temporal motion vector prediction (SbTMVP) and up to four candidates | position-dependent prediction combination (PDPC) for refining the | |||
of affine motion vectors. Triangle is based on triangular block | prediction samples closer to the intra prediction block boundary. | |||
motion compensation. CIIP combines intra- and inter- predictions | Matrix-based intra prediction (MIP) modes are also used in VVC, | |||
with weighting. Adaptive weighting may be employed with a block- | which generates an up to 8x8 intra prediction block using a | |||
level tool called bi-prediction with CU based weighting (BCW) which | weighted sum of downsampled neighboring reference samples, and the | |||
provides more flexibility than in HEVC. | weights are hard-coded constants. | |||
Intra prediction and intra-coding | ||||
To capture the diversified local image texture directions with finer | ||||
granularity, VVC supports 65 angular directions instead of 33 | ||||
directions in HEVC. The intra mode coding is based on a 6-most- | ||||
probable-mode scheme, and the 6 most probable modes are derived using | ||||
the neighboring intra prediction directions. In addition, to deal | ||||
with the different distributions of intra prediction angles for | ||||
different block aspect ratios, a wide-angle intra prediction (WAIP) | ||||
scheme is applied in VVC by including intra prediction angles beyond | ||||
those present in HEVC. Unlike HEVC which only allows using the most | ||||
adjacent line of reference samples for intra prediction, VVC also | ||||
allows using two further reference lines, as known as multi- | ||||
reference-line (MRL) intra prediction. The additional reference | ||||
lines can be only used for the 6 most probable intra prediction | ||||
modes. To capture the strong correlation between different colour | ||||
components, in VVC, a cross-component linear mode (CCLM) is utilized | ||||
which assumes a linear relationship between the luma sample values | ||||
and their associated chroma samples. For intra prediction, VVC also | ||||
applies a position-dependent prediction combination (PDPC) for | ||||
refining the prediction samples closer to the intra prediction block | ||||
boundary. Matrix-based intra prediction (MIP) modes are also used in | ||||
VVC which generates an up to 8x8 intra prediction block using a | ||||
weighted sum of downsampled neighboring reference samples, and the | ||||
weights are hardcoded constants. | ||||
Other coding-tool features | Other coding-tool features | |||
VVC introduces dependent quantization (DQ) to reduce quantization | ||||
error by state-based switching between two quantizers. | ||||
VVC introduces dependent quantization (DQ) to reduce quantization | 1.1.2. Systems and Transport Interfaces (Informative) | |||
error by state-based switching between two quantizers. | ||||
1.1.2. Systems and Transport Interfaces (informative) | ||||
VVC inherits the basic systems and transport interfaces designs from | VVC inherits the basic systems and transport interface designs from | |||
HEVC and AVC. These include the NAL-unit-based syntax structure, the | HEVC and AVC. These include the NAL-unit-based syntax structure, the | |||
hierarchical syntax and data unit structure, the supplemental | hierarchical syntax and data unit structure, the supplemental | |||
enhancement information (SEI) message mechanism, and the video | enhancement information (SEI) message mechanism, and the video | |||
buffering model based on the hypothetical reference decoder (HRD). | buffering model based on the hypothetical reference decoder (HRD). | |||
The scalability features of VVC are conceptually similar to the | The scalability features of VVC are conceptually similar to the | |||
scalable variant of HEVC known as SHVC. The hierarchical syntax and | scalable extension of HEVC, known as SHVC. The hierarchical syntax | |||
data unit structure consists of parameter sets at various levels | and data unit structure consists of parameter sets at various levels | |||
(decoder, sequence (pertaining to all), sequence (pertaining to a | (i.e., decoder, sequence (pertaining to all), sequence (pertaining to | |||
single), picture), picture-level header parameters, slice-level | a single), and picture), picture-level header parameters, slice-level | |||
header parameters, and lower-level parameters. | header parameters, and lower-level parameters. | |||
A number of key components that influenced the network abstraction | A number of key components that influenced the network abstraction | |||
layer design of VVC as well as this memo are described below | layer design of VVC, as well as this memo, are described below | |||
Decoding capability information | Decoding capability information | |||
The decoding capability information includes parameters that stay | The decoding capability information (DCI) includes parameters that | |||
constant for the lifetime of a VVC bitstream in the duration of a | stay constant for the lifetime of a VVC bitstream in the duration | |||
video conference, continuous video stream, and similar--any video | of a video conference, continuous video stream, and similar, i.e., | |||
that is processed by a decoder between setup and teardown. For | any video that is processed by a decoder between setup and | |||
streaming, the requirement of constant parameters pertains through | teardown. For streaming, the requirement of constant parameters | |||
splicing. Such information includes profile, level, and sub-profile | pertains through splicing. Such information includes profile, | |||
information to determine a maximum capability interop point that is | level, and sub-profile information to determine a maximum | |||
guaranteed to be never exceeded, even if splicing of video sequences | capability interop point that is guaranteed to never be exceeded, | |||
occurs within a session. It further includes constraint fields (most | even if splicing of video sequences occurs within a session. It | |||
of which are flags), which can optionally be set to indicate that the | further includes constraint fields (most of which are flags), | |||
video bitstream will be constrained in the use of certain features as | which can optionally be set to indicate that the video bitstream | |||
indicated by the values of those fields. With this, a bitstream can | will be constrained in the use of certain features, as indicated | |||
be labeled as not using certain tools, which allows among other | by the values of those fields. With this, a bitstream can be | |||
things for resource allocation in a decoder implementation. | labeled as not using certain tools, which allows, among other | |||
things, for resource allocation in a decoder implementation. | ||||
Video parameter set | Video parameter set | |||
The video parameter set (VPS) pertains to one or more coded video | ||||
The video parameter set (VPS) pertains to one or more coded video | sequences (CVSs) of multiple layers covering the same range of | |||
sequences (CVSs) of multiple layers covering the same range of access | access units and includes, among other information, decoding | |||
units, and includes, among other information, decoding dependency | dependency expressed as information for reference-picture-list | |||
expressed as information for reference picture list construction of | construction of enhancement layers. The VPS provides a "big | |||
enhancement layers. The VPS provides a "big picture" of a scalable | picture" of a scalable sequence, including what types of operation | |||
sequence, including what types of operation points are provided, the | points are provided; the profile, tier, and level of the operation | |||
profile, tier, and level of the operation points, and some other | points; and some other high-level properties of the bitstream that | |||
high-level properties of the bitstream that can be used as the basis | can be used as the basis for session negotiation and content | |||
for session negotiation and content selection, etc. One VPS may be | selection, etc. One VPS may be referenced by one or more sequence | |||
referenced by one or more sequence parameter sets. | parameter sets. | |||
Sequence parameter set | Sequence parameter set | |||
The sequence parameter set (SPS) contains syntax elements | ||||
The sequence parameter set (SPS) contains syntax elements pertaining | pertaining to a coded layer video sequence (CLVS), which is a | |||
to a coded layer video sequence (CLVS), which is a group of pictures | group of pictures belonging to the same layer, starting with a | |||
belonging to the same layer, starting with a random access point, and | random access point, and followed by pictures that may depend on | |||
followed by pictures that may depend on each other, until the next | each other until the next random access point picture. In MPEG-2, | |||
random access point picture. In MPEG-2, the equivalent of a CVS was | the equivalent of a CVS was a group of pictures (GOP), which | |||
a group of pictures (GOP), which normally started with an I frame and | normally started with an I frame and was followed by P and B | |||
was followed by P and B frames. While more complex in its options of | frames. While more complex in its options of random access | |||
random access points, VVC retains this basic concept. One remarkable | points, VVC retains this basic concept. One remarkable difference | |||
difference of VVC is that a CLVS may start with a Gradual Decoding | of VVC is that a CLVS may start with a Gradual Decoding Refresh | |||
Refresh (GDR) picture, without requiring presence of traditional | (GDR) picture without requiring presence of traditional random | |||
random access points in the bitstream, such as instantaneous decoding | access points in the bitstream, such as instantaneous decoding | |||
refresh (IDR) or clean random access (CRA) pictures. In many TV-like | refresh (IDR) or clean random access (CRA) pictures. In many TV- | |||
applications, a CVS contains a few hundred milliseconds to a few | like applications, a CVS contains a few hundred milliseconds to a | |||
seconds of video. In video conferencing (without switching MCUs | few seconds of video. In video conferencing (without switching | |||
involved), a CVS can be as long in duration as the whole session. | Multipoint Control Units (MCUs) involved), a CVS can be as long in | |||
duration as the whole session. | ||||
Picture and adaptation parameter set | Picture and adaptation parameter set | |||
The picture parameter set and the adaptation parameter set (PPS and | The picture parameter set (PPS) and the adaptation parameter set | |||
APS, respectively) carry information pertaining to zero or more | (APS) carry information pertaining to zero or more pictures and | |||
pictures and zero or more slices, respectively. The PPS contains | zero or more slices, respectively. The PPS contains information | |||
information that is likely to stay constant from picture to picture, | that is likely to stay constant from picture to picture, at least | |||
at least for pictures for a certain type-whereas the APS contains | for pictures for a certain type, whereas the APS contains | |||
information, such as adaptive loop filter coefficients, that are | information, such as adaptive loop filter coefficients, that are | |||
likely to change from picture to picture or even within a picture. A | likely to change from picture to picture or even within a picture. | |||
single APS is referenced by all slices of the same picture if that | A single APS is referenced by all slices of the same picture if | |||
APS contains information about luma mapping with chroma scaling | that APS contains information about luma mapping with chroma | |||
(LMCS) or scaling list. Different APSs containing ALF parameters can | scaling (LMCS) or a scaling list. Different APSs containing ALF | |||
be referenced by slices of the same picture. | parameters can be referenced by slices of the same picture. | |||
Picture header | Picture header | |||
A picture header (PH) contains information that is common to all | ||||
A Picture Header contains information that is common to all slices | slices that belong to the same picture. Being able to send that | |||
that belong to the same picture. Being able to send that information | information as a separate NAL unit when pictures are split into | |||
as a separate NAL unit when pictures are split into several slices | several slices allows for saving bitrate, compared to repeating | |||
allows for saving bitrate, compared to repeating the same information | the same information in all slices. However, there might be | |||
in all slices. However, there might be scenarios where low-bitrate | scenarios where low-bitrate video is transmitted using a single | |||
video is transmitted using a single slice per picture. Having a | slice per picture. Having a separate NAL unit to convey that | |||
separate NAL unit to convey that information incurs in an overhead | information incurs in an overhead for such scenarios. For such | |||
for such scenarios. For such scenarios, the picture header syntax | scenarios, the picture header syntax structure is directly | |||
structure is directly included in the slice header, instead of its | included in the slice header, instead of its own NAL unit. The | |||
own NAL unit. The mode of the picture header syntax structure being | mode of the picture header syntax structure being included in its | |||
included in its own NAL unit or not can only be switched on/off for | own NAL unit or not can only be switched on/off for an entire CLVS | |||
an entire CLVS, and can only be switched off when in the entire CLVS | and can only be switched off when, in the entire CLVS, each | |||
each picture contains only one slice. | picture contains only one slice. | |||
Profile, tier, and level | Profile, tier, and level | |||
The profile, tier, and level syntax structures in DCI, VPS, and | ||||
The profile, tier and level syntax structures in DCI, VPS and SPS | SPS contain profile, tier, and level information for all layers | |||
contain profile, tier, level information for all layers that refer to | that refer to the DCI, for layers associated with one or more | |||
the DCI, for layers associated with one or more output layer sets | output layer sets specified by the VPS, and for any layer that | |||
specified by the VPS, and for any layer that refers to the SPS, | refers to the SPS, respectively. | |||
respectively. | ||||
Sub-profiles | Sub-profiles | |||
Within the VVC specification, a sub-profile is a 32-bit number, | ||||
Within the VVC specification, a sub-profile is a 32-bit number, coded | coded according to ITU-T Recommendation T.35, that does not carry | |||
according to ITU-T Rec. T.35, that does not carry a semantics. It is | semantics. It is carried in the profile_tier_level structure and | |||
carried in the profile_tier_level structure and hence (potentially) | hence is (potentially) present in the DCI, VPS, and SPS. External | |||
present in the DCI, VPS, and SPS. External registration bodies can | registration bodies can register a T.35 codepoint with ITU-T | |||
register a T.35 codepoint with ITU-T registration authorities and | registration authorities and associate with their registration a | |||
associate with their registration a description of bitstream | description of bitstream restrictions beyond the profiles defined | |||
restrictions beyond the profiles defined by ITU-T and ISO/IEC. This | by ITU-T and ISO/IEC. This would allow encoder manufacturers to | |||
would allow encoder manufacturers to label the bitstreams generated | label the bitstreams generated by their encoder as complying with | |||
by their encoder as complying with such sub-profile. It is expected | such sub-profile. It is expected that upstream standardization | |||
that upstream standardization organizations (such as: DVB and ATSC), | organizations (such as Digital Video Broadcasting (DVB) and | |||
as well as walled-garden video services will take advantage of this | Advanced Television Systems Committee (ATSC)), as well as walled- | |||
labeled system. In contrast to "normal" profiles, it is expected | garden video services, will take advantage of this labeled system. | |||
that sub-profiles may indicate encoder choices traditionally left | In contrast to "normal" profiles, it is expected that sub-profiles | |||
open in the (decoder-centric) video coding specs, such as GOP | may indicate encoder choices traditionally left open in the | |||
structures, minimum/maximum QP values, and the mandatory use of | (decoder-centric) video coding specifications, such as GOP | |||
certain tools or SEI messages. | structures, minimum/maximum Quantizer Parameter (QP) values, and | |||
the mandatory use of certain tools or SEI messages. | ||||
General constraint fields | General constraint fields | |||
The profile_tier_level structure carries a considerable number of | ||||
The profile_tier_level structure carries a considerable number of | constraint fields (most of which are flags), which an encoder can | |||
constraint fields (most of which are flags), which an encoder can use | use to indicate to a decoder that it will not use a certain tool | |||
to indicate to a decoder that it will not use a certain tool or | or technology. They were included in reaction to a perceived | |||
technology. They were included in reaction to a perceived market | market need to label a bitstream as not exercising a certain tool | |||
need for labeled a bitstream as not exercising a certain tool that | that has become commercially unviable. | |||
has become commercially unviable. | ||||
Temporal scalability support | Temporal scalability support | |||
VVC includes support of temporal scalability, by the inclusion of | ||||
VVC includes support of temporal scalability, by inclusion of the | the signaling of TemporalId in the NAL unit header, the | |||
signaling of TemporalId in the NAL unit header, the restriction that | restriction that pictures of a particular temporal sublayer cannot | |||
pictures of a particular temporal sublayer cannot be used for inter | be used for inter prediction reference by pictures of a lower | |||
prediction reference by pictures of a lower temporal sublayer, the | temporal sublayer, the sub-bitstream extraction process, and the | |||
sub-bitstream extraction process, and the requirement that each sub- | requirement that each sub-bitstream extraction output be a | |||
bitstream extraction output be a conforming bitstream. Media-Aware | conforming bitstream. Media-Aware Network Elements (MANEs) can | |||
Network Elements (MANEs) can utilize the TemporalId in the NAL unit | utilize the TemporalId in the NAL unit header for stream | |||
header for stream adaptation purposes based on temporal scalability. | adaptation purposes based on temporal scalability. | |||
Reference picture resampling (RPR) | Reference picture resampling (RPR) | |||
In AVC and HEVC, the spatial resolution of pictures cannot change | ||||
In AVC and HEVC, the spatial resolution of pictures cannot change | unless a new sequence using a new SPS starts, with an intra random | |||
unless a new sequence using a new SPS starts, with an Intra random | access point (IRAP) picture. VVC enables picture resolution | |||
access point (IRAP) picture. VVC enables picture resolution change | change within a sequence at a position without encoding an IRAP | |||
within a sequence at a position without encoding an IRAP picture, | picture, which is always intra coded. This feature is sometimes | |||
which is always intra-coded. This feature is sometimes referred to | referred to as reference picture resampling (RPR), as the feature | |||
as reference picture resampling (RPR), as the feature needs | needs resampling of a reference picture used for inter prediction | |||
resampling of a reference picture used for inter prediction when that | when that reference picture has a different resolution than the | |||
reference picture has a different resolution than the current picture | current picture being decoded. RPR allows resolution change | |||
being decoded. RPR allows resolution change without the need of | without the need of coding an IRAP picture and hence avoids a | |||
coding an IRAP picture and hence avoids a momentary bit rate spike | momentary bit rate spike caused by an IRAP picture in streaming or | |||
caused by an IRAP picture in streaming or video conferencing | video conferencing scenarios, e.g., to cope with network condition | |||
scenarios, e.g., to cope with network condition changes. RPR can | changes. RPR can also be used in application scenarios wherein | |||
also be used in application scenarios wherein zooming of the entire | zooming of the entire video region or some region of interest is | |||
video region or some region of interest is needed. | needed. | |||
Spatial, SNR, and multiview scalability | Spatial, SNR, and multiview scalability | |||
VVC includes support for spatial, SNR, and multiview scalability. | VVC includes support for spatial, SNR, and multiview scalability. | |||
Scalable video coding is widely considered to have technical benefits | Scalable video coding is widely considered to have technical | |||
and enrich services for various video applications. Until recently, | benefits and enrich services for various video applications. | |||
however, the functionality has not been included in the first version | Until recently, however, the functionality has not been included | |||
of specifications of the video codecs. In VVC, however, all those | in the first version of specifications of the video codecs. In | |||
forms of scalability are supported in the first version of VVC | VVC, however, all those forms of scalability are supported in the | |||
natively through the signaling of the nuh_layer_id in the NAL unit | first version of VVC natively through the signaling of the | |||
header, the VPS which associates layers with given nuh_layer_id to | nuh_layer_id in the NAL unit header, the VPS that associates | |||
each other, reference picture selection, reference picture resampling | layers with the given nuh_layer_id to each other, reference | |||
for spatial scalability, and a number of other mechanisms not | picture selection, reference picture resampling for spatial | |||
relevant for this memo. | scalability, and a number of other mechanisms not relevant for | |||
this memo. | ||||
Spatial scalability | Spatial scalability | |||
With the existence of reference picture resampling (RPR), the | ||||
With the existence of Reference Picture Resampling (RPR), the | ||||
additional burden for scalability support is just a | additional burden for scalability support is just a | |||
modification of the high-level syntax (HLS). The inter-layer | modification of the high-level syntax (HLS). The inter-layer | |||
prediction is employed in a scalable system to improve the | prediction is employed in a scalable system to improve the | |||
coding efficiency of the enhancement layers. In addition to | coding efficiency of the enhancement layers. In addition to | |||
the spatial and temporal motion-compensated predictions that | the spatial and temporal motion-compensated predictions that | |||
are available in a single-layer codec, the inter-layer | are available in a single-layer codec, the inter-layer | |||
prediction in VVC uses the possibly resampled video data of the | prediction in VVC uses the possibly resampled video data of the | |||
reconstructed reference picture from a reference layer to | reconstructed reference picture from a reference layer to | |||
predict the current enhancement layer. The resampling process | predict the current enhancement layer. The resampling process | |||
for inter-layer prediction, when used, is performed at the | for inter-layer prediction, when used, is performed at the | |||
block-level, reusing the existing interpolation process for | block level, reusing the existing interpolation process for | |||
motion compensation in single-layer coding. It means that no | motion compensation in single-layer coding. It means that no | |||
additional resampling process is needed to support spatial | additional resampling process is needed to support spatial | |||
scalability. | scalability. | |||
SNR scalability | SNR scalability | |||
SNR scalability is similar to spatial scalability except that | ||||
SNR scalability is similar to spatial scalability except that | ||||
the resampling factors are 1:1. In other words, there is no | the resampling factors are 1:1. In other words, there is no | |||
change in resolution, but there is inter-layer prediction. | change in resolution, but there is inter-layer prediction. | |||
Multiview scalability | Multiview scalability | |||
The first version of VVC also supports multiview scalability, | ||||
The first version of VVC also supports multiview scalability, | ||||
wherein a multi-layer bitstream carries layers representing | wherein a multi-layer bitstream carries layers representing | |||
multiple views, and one or more of the represented views can be | multiple views, and one or more of the represented views can be | |||
output at the same time. | output at the same time. | |||
SEI messages | SEI messages | |||
Supplemental enhancement information (SEI) messages are | ||||
information in the bitstream that do not influence the decoding | ||||
process as specified in the VVC specification but address issues | ||||
of representation/rendering of the decoded bitstream, label the | ||||
bitstream for certain applications, and other, similar tasks. The | ||||
overall concept of SEI messages and many of the messages | ||||
themselves has been inherited from the AVC and HEVC | ||||
specifications. Except for the SEI messages that affect the | ||||
specification of the hypothetical reference decoder (HRD), other | ||||
SEI messages for use in the VVC environment, which are generally | ||||
useful also in other video coding technologies, are not included | ||||
in the main VVC specification but in a companion specification | ||||
[VSEI]. | ||||
Supplemental enhancement information (SEI) messages are information | 1.1.3. High-Level Picture Partitioning (Informative) | |||
in the bitstream that do not influence the decoding process as | ||||
specified in the VVC spec, but address issues of representation/ | ||||
rendering of the decoded bitstream, label the bitstream for certain | ||||
applications, among other, similar tasks. The overall concept of SEI | ||||
messages and many of the messages themselves has been inherited from | ||||
the AVC and HEVC specs. Except for the SEI messages that affect the | ||||
specification of the hypothetical reference decoder (HRD), other SEI | ||||
messages for use in the VVC environment, which are generally useful | ||||
also in other video coding technologies, are not included in the main | ||||
VVC specification but in a companion specification [VSEI]. | ||||
1.1.3. High-Level Picture Partitioning (informative) | ||||
VVC inherited the concept of tiles and wavefront parallel processing | VVC inherited the concept of tiles and wavefront parallel processing | |||
(WPP) from HEVC, with some minor to moderate differences. The basic | (WPP) from HEVC, with some minor to moderate differences. The basic | |||
concept of slices was kept in VVC but designed in an essentially | concept of slices was kept in VVC but designed in an essentially | |||
different form. VVC is the first video coding standard that includes | different form. VVC is the first video coding standard that includes | |||
subpictures as a feature, which provides the same functionality as | subpictures as a feature, which provides the same functionality as | |||
HEVC motion-constrained tile sets (MCTSs) but designed differently to | HEVC motion-constrained tile sets (MCTSs) but designed differently to | |||
have better coding efficiency and to be friendlier for usage in | have better coding efficiency and to be friendlier for usage in | |||
application systems. More details of these differences are described | application systems. More details of these differences are described | |||
below. | below. | |||
Tiles and WPP | Tiles and WPP | |||
Same as in HEVC, a picture can be split into tile rows and tile | ||||
Same as in HEVC, a picture can be split into tile rows and tile | columns in VVC, in-picture prediction across tile boundaries is | |||
columns in VVC, in-picture prediction across tile boundaries is | disallowed, etc. However, the syntax for signaling of tile | |||
disallowed, etc. However, the syntax for signaling of tile | partitioning has been simplified by using a unified syntax design | |||
partitioning has been simplified, by using a unified syntax design | for both the uniform and the non-uniform mode. In addition, | |||
for both the uniform and the non-uniform mode. In addition, | signaling of entry point offsets for tiles in the slice header is | |||
signaling of entry point offsets for tiles in the slice header is | optional in VVC, while it is mandatory in HEVC. The WPP design in | |||
optional in VVC while it is mandatory in HEVC. The WPP design in VVC | VVC has two differences compared to HEVC: i) the CTU row delay is | |||
has two differences compared to HEVC: i) The CTU row delay is reduced | reduced from two CTUs to one CTU, and ii) signaling of entry point | |||
from two CTUs to one CTU; ii) signaling of entry point offsets for | offsets for WPP in the slice header is optional in VVC while it is | |||
WPP in the slice header is optional in VVC while it is mandatory in | mandatory in HEVC. | |||
HEVC. | ||||
Slices | Slices | |||
In VVC, the conventional slices based on CTUs (as in HEVC) or | ||||
macroblocks (as in AVC) have been removed. The main reasoning | ||||
behind this architectural change is as follows. The advances in | ||||
video coding since 2003 (the publication year of AVC v1) have been | ||||
such that slice-based error concealment has become practically | ||||
impossible due to the ever-increasing number and efficiency of in- | ||||
picture and inter-picture prediction mechanisms. An error- | ||||
concealed picture is the decoding result of a transmitted coded | ||||
picture for which there is some data loss (e.g., loss of some | ||||
slices) of the coded picture or a reference picture, as at least | ||||
some part of the coded picture is not error-free (e.g., that | ||||
reference picture was an error-concealed picture). For example, | ||||
when one of the multiple slices of a picture is lost, it may be | ||||
error-concealed using an interpolation of the neighboring slices. | ||||
While advanced video coding prediction mechanisms provide | ||||
significantly higher coding efficiency, they also make it harder | ||||
for machines to estimate the quality of an error-concealed | ||||
picture, which was already a hard problem with the use of simpler | ||||
prediction mechanisms. Advanced in-picture prediction mechanisms | ||||
also cause the coding efficiency loss due to splitting a picture | ||||
into multiple slices to be more significant. Furthermore, network | ||||
conditions become significantly better while, at the same time, | ||||
techniques for dealing with packet losses have become | ||||
significantly improved. As a result, very few implementations | ||||
have recently used slices for maximum-transmission-unit-size | ||||
matching. Instead, substantially all applications where low-delay | ||||
error resilience is required (e.g., video telephony and video | ||||
conferencing) rely on system/transport-level error resilience | ||||
(e.g., retransmission or forward error correction) and/or picture- | ||||
based error resilience tools (e.g., feedback-based error | ||||
resilience, insertion of IRAPs, scalability with a higher | ||||
protection level of the base layer, and so on). Considering all | ||||
the above, nowadays, it is very rare that a picture that cannot be | ||||
correctly decoded is passed to the decoder, and when such a rare | ||||
case occurs, the system can afford to wait for an error-free | ||||
picture to be decoded and available for display without resulting | ||||
in frequent and long periods of picture freezing seen by end | ||||
users. | ||||
In VVC, the conventional slices based on CTUs (as in HEVC) or | Slices in VVC have two modes: rectangular slices and raster-scan | |||
macroblocks (as in AVC) have been removed. The main reasoning behind | slices. The rectangular slice, as indicated by its name, covers a | |||
this architectural change is as follows. The advances in video | rectangular region of the picture. Typically, a rectangular slice | |||
coding since 2003 (the publication year of AVC v1) have been such | consists of several complete tiles. However, it is also possible | |||
that slice-based error concealment has become practically impossible, | that a rectangular slice is a subset of a tile and consists of one | |||
due to the ever-increasing number and efficiency of in-picture and | or more consecutive, complete CTU rows within a tile. A raster- | |||
inter-picture prediction mechanisms. An error-concealed picture is | scan slice consists of one or more complete tiles in a tile | |||
the decoding result of a transmitted coded picture for which there is | raster-scan order; hence, the region covered by raster-scan slices | |||
some data loss (e.g., loss of some slices) of the coded picture or a | need not but could have a non-rectangular shape, but it may also | |||
reference picture for at least some part of the coded picture is not | happen to have the shape of a rectangle. The concept of slices in | |||
error-free (e.g., that reference picture was an error-concealed | VVC is therefore strongly linked to or based on tiles instead of | |||
picture). For example, when one of the multiple slices of a picture | CTUs (as in HEVC) or macroblocks (as in AVC). | |||
is lost, it may be error-concealed using an interpolation of the | ||||
neighboring slices. While advanced video coding prediction | ||||
mechanisms provide significantly higher coding efficiency, they also | ||||
make it harder for machines to estimate the quality of an error- | ||||
concealed picture, which was already a hard problem with the use of | ||||
simpler prediction mechanisms. Advanced in-picture prediction | ||||
mechanisms also cause the coding efficiency loss due to splitting a | ||||
picture into multiple slices to be more significant. Furthermore, | ||||
network conditions become significantly better while at the same time | ||||
techniques for dealing with packet losses have become significantly | ||||
improved. As a result, very few implementations have recently used | ||||
slices for maximum transmission unit size matching. Instead, | ||||
substantially all applications where low-delay error resilience is | ||||
required (e.g., video telephony and video conferencing) rely on | ||||
system/transport-level error resilience (e.g., retransmission, | ||||
forward error correction) and/or picture-based error resilience tools | ||||
(feedback-based error resilience, insertion of IRAPs, scalability | ||||
with higher protection level of the base layer, and so on). | ||||
Considering all the above, nowadays it is very rare that a picture | ||||
that cannot be correctly decoded is passed to the decoder, and when | ||||
such a rare case occurs, the system can afford to wait for an error- | ||||
free picture to be decoded and available for display without | ||||
resulting in frequent and long periods of picture freezing seen by | ||||
end users. | ||||
Slices in VVC have two modes: rectangular slices and raster-scan | ||||
slices. The rectangular slice, as indicated by its name, covers a | ||||
rectangular region of the picture. Typically, a rectangular slice | ||||
consists of several complete tiles. However, it is also possible | ||||
that a rectangular slice is a subset of a tile and consists of one or | ||||
more consecutive, complete CTU rows within a tile. A raster-scan | ||||
slice consists of one or more complete tiles in a tile raster scan | ||||
order, hence the region covered by a raster-scan slices need not but | ||||
could have a non-rectangular shape, but it may also happen to have | ||||
the shape of a rectangle. The concept of slices in VVC is therefore | ||||
strongly linked to or based on tiles instead of CTUs (as in HEVC) or | ||||
macroblocks (as in AVC). | ||||
Subpictures | Subpictures | |||
VVC is the first video coding standard that includes the support | ||||
of subpictures as a feature. Each subpicture consists of one or | ||||
more complete rectangular slices that collectively cover a | ||||
rectangular region of the picture. A subpicture may be either | ||||
specified to be extractable (i.e., coded independently of other | ||||
subpictures of the same picture and of earlier pictures in | ||||
decoding order) or not extractable. Regardless of whether a | ||||
subpicture is extractable or not, the encoder can control whether | ||||
in-loop filtering (including deblocking, SAO, and ALF) is applied | ||||
across the subpicture boundaries individually for each subpicture. | ||||
VVC is the first video coding standard that includes the support of | Functionally, subpictures are similar to the motion-constrained | |||
subpictures as a feature. Each subpicture consists of one or more | tile sets (MCTSs) in HEVC. They both allow independent coding and | |||
complete rectangular slices that collectively cover a rectangular | extraction of a rectangular subset of a sequence of coded pictures | |||
region of the picture. A subpicture may be either specified to be | for use cases like viewport-dependent 360-degree video streaming | |||
extractable (i.e., coded independently of other subpictures of the | optimization and region of interest (ROI) applications. | |||
same picture and of earlier pictures in decoding order) or not | ||||
extractable. Regardless of whether a subpicture is extractable or | ||||
not, the encoder can control whether in-loop filtering (including | ||||
deblocking, SAO, and ALF) is applied across the subpicture boundaries | ||||
individually for each subpicture. | ||||
Functionally, subpictures are similar to the motion-constrained tile | ||||
sets (MCTSs) in HEVC. They both allow independent coding and | ||||
extraction of a rectangular subset of a sequence of coded pictures, | ||||
for use cases like viewport-dependent 360o video streaming | ||||
optimization and region of interest (ROI) applications. | ||||
There are several important design differences between subpictures | There are several important design differences between subpictures | |||
and MCTSs. First, the subpictures feature in VVC allows motion | and MCTSs. First, the subpictures featured in VVC allow motion | |||
vectors of a coding block pointing outside of the subpicture even | vectors of a coding block to point outside of the subpicture, even | |||
when the subpicture is extractable by applying sample padding at | when the subpicture is extractable by applying sample padding at | |||
subpicture boundaries in this case, similarly as at picture | the subpicture boundaries, in this case, similarly as at picture | |||
boundaries. Second, additional changes were introduced for the | boundaries. Second, additional changes were introduced for the | |||
selection and derivation of motion vectors in the merge mode and in | selection and derivation of motion vectors in the merge mode and | |||
the decoder side motion vector refinement process of VVC. This | in the decoder-side motion vector refinement process of VVC. This | |||
allows higher coding efficiency compared to the non-normative motion | allows higher coding efficiency compared to the non-normative | |||
constraints applied at the encoder-side for MCTSs. Third, rewriting | motion constraints applied at the encoder-side for MCTSs. Third, | |||
of SHs (and PH NAL units, when present) is not needed when extracting | rewriting of slice headers (SHs) (and PH NAL units, when present) | |||
one or more extractable subpictures from a sequence of pictures to | is not needed when extracting one or more extractable subpictures | |||
create a sub-bitstream that is a conforming bitstream. In sub- | from a sequence of pictures to create a sub-bitstream that is a | |||
bitstream extractions based on HEVC MCTSs, rewriting of SHs is | conforming bitstream. In sub-bitstream extractions based on HEVC | |||
needed. Note that in both HEVC MCTSs extraction and VVC subpictures | MCTSs, rewriting of SHs is needed. Note that, in both HEVC MCTSs | |||
extraction, rewriting of SPSs and PPSs is needed. However, typically | extraction and VVC subpictures extraction, rewriting of SPSs and | |||
there are only a few parameter sets in a bitstream, while each | PPSs is needed. However, typically, there are only a few | |||
picture has at least one slice, therefore rewriting of SHs can be a | parameter sets in a bitstream, whereas each picture has at least | |||
significant burden for application systems. Fourth, slices of | one slice; therefore, rewriting of SHs can be a significant burden | |||
different subpictures within a picture are allowed to have different | for application systems. Fourth, slices of different subpictures | |||
NAL unit types. Fifth, VVC specifies HRD and level definitions for | within a picture are allowed to have different NAL unit types. | |||
subpicture sequences, thus the conformance of the sub-bitstream of | Fifth, VVC specifies HRD and level definitions for subpicture | |||
each extractable subpicture sequence can be ensured by encoders. | sequences, thus the conformance of the sub-bitstream of each | |||
extractable subpicture sequence can be ensured by encoders. | ||||
1.1.4. NAL Unit Header | 1.1.4. NAL Unit Header | |||
VVC maintains the NAL unit concept of HEVC with modifications. VVC | VVC maintains the NAL unit concept of HEVC with modifications. VVC | |||
uses a two-byte NAL unit header, as shown in Figure 1. The payload | uses a two-byte NAL unit header, as shown in Figure 1. The payload | |||
of a NAL unit refers to the NAL unit excluding the NAL unit header. | of a NAL unit refers to the NAL unit excluding the NAL unit header. | |||
+---------------+---------------+ | +---------------+---------------+ | |||
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
|F|Z| LayerID | Type | TID | | |F|Z| LayerID | Type | TID | | |||
+---------------+---------------+ | +---------------+---------------+ | |||
The Structure of the VVC NAL Unit Header. | Figure 1: The Structure of the VVC NAL Unit Header | |||
Figure 1 | ||||
The semantics of the fields in the NAL unit header are as specified | The semantics of the fields in the NAL unit header are as specified | |||
in VVC and described briefly below for convenience. In addition to | in VVC and described briefly below for convenience. In addition to | |||
the name and size of each field, the corresponding syntax element | the name and size of each field, the corresponding syntax element | |||
name in VVC is also provided. | name in VVC is also provided. | |||
F: 1 bit | F: 1 bit | |||
forbidden_zero_bit. This field is required to be zero in VVC. | ||||
forbidden_zero_bit. Required to be zero in VVC. Note that the | Note that the inclusion of this bit in the NAL unit header was to | |||
inclusion of this bit in the NAL unit header was to enable | enable transport of VVC video over MPEG-2 transport systems | |||
transport of VVC video over MPEG-2 transport systems (avoidance of | (avoidance of start code emulations) [MPEG2S]. In the context of | |||
start code emulations) [MPEG2S]. In the context of this payload | this payload format, the value 1 may be used to indicate a syntax | |||
format, the value 1 may be used to indicate a syntax violation, | violation, e.g., for a NAL unit resulted from aggregating a number | |||
e.g., for a NAL unit resulted from aggregating a number of | of fragmented units of a NAL unit but missing the last fragment, | |||
fragmented units of a NAL unit but missing the last fragment, as | as described in the last sentence of Section 4.3.3. | |||
described in the last sentence of section 4.3.3. | ||||
Z: 1 bit | Z: 1 bit | |||
nuh_reserved_zero_bit. This field is required to be zero in VVC, | ||||
nuh_reserved_zero_bit. Required to be zero in VVC, and reserved | and reserved for future extensions by ITU-T and ISO/IEC. | |||
for future extensions by ITU-T and ISO/IEC. | This memo does not overload the "Z" bit for local extensions a) | |||
This memo does not overload the "Z" bit for local extensions, as | because overloading the "F" bit is sufficient and b) in order to | |||
a) overloading the "F" bit is sufficient and b) to preserve the | preserve the usefulness of this memo to possible future versions | |||
usefulness of this memo to possible future versions of [VVC]. | of [VVC]. | |||
LayerId: 6 bits | LayerId: 6 bits | |||
nuh_layer_id. This field identifies the layer a NAL unit belongs | ||||
nuh_layer_id. Identifies the layer a NAL unit belongs to, wherein | to, wherein a layer may be, e.g., a spatial scalable layer, a | |||
a layer may be, e.g., a spatial scalable layer, a quality scalable | quality scalable layer, a layer containing a different view, etc. | |||
layer, a layer containing a different view, etc. | ||||
Type: 5 bits | Type: 5 bits | |||
nal_unit_type. This field specifies the NAL unit type, as defined | ||||
nal_unit_type. This field specifies the NAL unit type as defined | ||||
in Table 5 of [VVC]. For a reference of all currently defined NAL | in Table 5 of [VVC]. For a reference of all currently defined NAL | |||
unit types and their semantics, please refer to Section 7.4.2.2 in | unit types and their semantics, please refer to Section 7.4.2.2 in | |||
[VVC]. | [VVC]. | |||
TID: 3 bits | TID: 3 bits | |||
nuh_temporal_id_plus1. This field specifies the temporal | nuh_temporal_id_plus1. This field specifies the temporal | |||
identifier of the NAL unit plus 1. The value of TemporalId is | identifier of the NAL unit plus 1. The value of TemporalId is | |||
equal to TID minus 1. A TID value of 0 is illegal to ensure that | equal to TID minus 1. A TID value of 0 is illegal to ensure that | |||
there is at least one bit in the NAL unit header equal to 1, so to | there is at least one bit in the NAL unit header equal to 1 in | |||
enable the consideration of start code emulations in the NAL unit | order to enable the consideration of start code emulations in the | |||
payload data independent of the NAL unit header. | NAL unit payload data independent of the NAL unit header. | |||
1.2. Overview of the Payload Format | 1.2. Overview of the Payload Format | |||
This payload format defines the following processes required for | This payload format defines the following processes required for | |||
transport of VVC coded data over RTP [RFC3550]: | transport of VVC coded data over RTP [RFC3550]: | |||
* Usage of RTP header with this payload format | * usage of the RTP header with this payload format | |||
* Packetization of VVC coded NAL units into RTP packets using three | * packetization of VVC coded NAL units into RTP packets using three | |||
types of payload structures: a single NAL unit packet, aggregation | types of payload structures: a single NAL unit packet, aggregation | |||
packet, and fragment unit | packet, and fragment unit | |||
* Transmission of VVC NAL units of the same bitstream within a | * transmission of VVC NAL units of the same bitstream within a | |||
single RTP stream | single RTP stream | |||
* Media type parameters to be used with the Session Description | * media type parameters to be used with the Session Description | |||
Protocol (SDP) [RFC8866] | Protocol (SDP) [RFC8866] | |||
* Usage of RTCP feedback messages | * usage of RTCP feedback messages | |||
2. Conventions | 2. Conventions | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Definitions and Abbreviations | 3. Definitions and Abbreviations | |||
3.1. Definitions | 3.1. Definitions | |||
This document uses the terms and definitions of VVC. Section 3.1.1 | This document uses the terms and definitions of VVC. Section 3.1.1 | |||
lists relevant definitions from [VVC] for convenience. Section 3.1.2 | lists relevant definitions from [VVC] for convenience. Section 3.1.2 | |||
provides definitions specific to this memo. All the used terms and | provides definitions specific to this memo. All the used terms and | |||
definitions in this memo are verbatim copies of [VVC] specification. | definitions in this memo are verbatim copies from the [VVC] | |||
specification. | ||||
3.1.1. Definitions from the VVC Specification | 3.1.1. Definitions from the VVC Specification | |||
Access unit (AU): A set of PUs that belong to different layers and | Access unit (AU): | |||
contain coded pictures associated with the same time for output from | A set of PUs that belong to different layers and contain coded | |||
the DPB. | pictures associated with the same time for output from the DPB. | |||
Adaptation parameter set (APS): A syntax structure containing syntax | Adaptation parameter set (APS): | |||
elements that apply to zero or more slices as determined by zero or | A syntax structure containing syntax elements that apply to zero | |||
more syntax elements found in slice headers. | or more slices as determined by zero or more syntax elements found | |||
in slice headers. | ||||
Bitstream: A sequence of bits, in the form of a NAL unit stream or a | Bitstream: | |||
byte stream, that forms the representation of a sequence of AUs | A sequence of bits, in the form of a NAL unit stream or a byte | |||
forming one or more coded video sequences (CVSs). | stream, that forms the representation of a sequence of AUs forming | |||
one or more coded video sequences (CVSs). | ||||
Coded picture: A coded representation of a picture comprising VCL NAL | Coded picture: | |||
units with a particular value of nuh_layer_id within an AU and | A coded representation of a picture comprising VCL NAL units with | |||
containing all CTUs of the picture. | a particular value of nuh_layer_id within an AU and containing all | |||
CTUs of the picture. | ||||
Clean random access (CRA) PU: A PU in which the coded picture is a | Clean random access (CRA) PU: | |||
CRA picture. | A PU in which the coded picture is a CRA picture. | |||
Clean random access (CRA) picture: An IRAP picture for which each VCL | Clean random access (CRA) picture: | |||
NAL unit has nal_unit_type equal to CRA_NUT. | An IRAP picture for which each VCL NAL unit has nal_unit_type | |||
equal to CRA_NUT. | ||||
Coded video sequence (CVS): A sequence of AUs that consists, in | Coded video sequence (CVS): | |||
decoding order, of a CVSS AU, followed by zero or more AUs that are | A sequence of AUs that consists, in decoding order, of a CVSS AU, | |||
not CVSS AUs, including all subsequent AUs up to but not including | followed by zero or more AUs that are not CVSS AUs, including all | |||
any subsequent AU that is a CVSS AU. | subsequent AUs up to but not including any subsequent AU that is a | |||
CVSS AU. | ||||
Coded video sequence start (CVSS) AU: An AU in which there is a PU | Coded video sequence start (CVSS) AU: | |||
for each layer in the CVS and the coded picture in each PU is a CLVSS | An AU in which there is a PU for each layer in the CVS and the | |||
picture. | coded picture in each PU is a CLVSS picture. | |||
Coded layer video sequence (CLVS): A sequence of PUs with the same | Coded layer video sequence (CLVS): | |||
value of nuh_layer_id that consists, in decoding order, of a CLVSS | A sequence of PUs with the same value of nuh_layer_id that | |||
PU, followed by zero or more PUs that are not CLVSS PUs, including | consists, in decoding order, of a CLVSS PU, followed by zero or | |||
all subsequent PUs up to but not including any subsequent PU that is | more PUs that are not CLVSS PUs, including all subsequent PUs up | |||
a CLVSS PU. | to but not including any subsequent PU that is a CLVSS PU. | |||
Coded layer video sequence start (CLVSS) PU: A PU in which the coded | Coded layer video sequence start (CLVSS) PU: | |||
picture is a CLVSS picture. | A PU in which the coded picture is a CLVSS picture. | |||
Coded layer video sequence start (CLVSS) picture: A coded picture | Coded layer video sequence start (CLVSS) picture: | |||
that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or | A coded picture that is an IRAP picture with | |||
a GDR picture with NoOutputBeforeRecoveryFlag equal to 1. | NoOutputBeforeRecoveryFlag equal to 1 or a GDR picture with | |||
NoOutputBeforeRecoveryFlag equal to 1. | ||||
Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs | Coding Tree Block (CTB): | |||
of chroma samples of a picture that has three sample arrays, or a CTB | An NxN block of samples for some value of N such that the division | |||
of samples of a monochrome picture or a picture that is coded using | of a component into CTBs is a partitioning. | |||
three separate colour planes and syntax structures used to code the | ||||
samples. | ||||
Decoding Capability Information (DCI): A syntax structure containing | Coding tree unit (CTU): | |||
syntax elements that apply to the entire bitstream. | A CTB of luma samples, two corresponding CTBs of chroma samples of | |||
a picture that has three sample arrays, or a CTB of samples of a | ||||
monochrome picture or a picture that is coded using three separate | ||||
colour planes and syntax structures used to code the samples. | ||||
Decoded picture buffer (DPB): A buffer holding decoded pictures for | Coding Unit (CU): | |||
reference, output reordering, or output delay specified for the | A coding block of luma samples, two corresponding coding blocks of | |||
hypothetical reference decoder. | chroma samples of a picture that has three sample arrays in the | |||
single tree mode, or a coding block of luma samples of a picture | ||||
that has three sample arrays in the dual tree mode, or two coding | ||||
blocks of chroma samples of a picture that has three sample arrays | ||||
in the dual tree mode, or a coding block of samples of a | ||||
monochrome picture, and syntax structures used to code the | ||||
samples. | ||||
Gradual decoding refresh (GDR) picture: A picture for which each VCL | Decoding Capability Information (DCI): | |||
NAL unit has nal_unit_type equal to GDR_NUT. | A syntax structure containing syntax elements that apply to the | |||
entire bitstream. | ||||
Instantaneous decoding refresh (IDR) PU: A PU in which the coded | Decoded picture buffer (DPB): | |||
picture is an IDR picture. | A buffer holding decoded pictures for reference, output | |||
reordering, or output delay specified for the hypothetical | ||||
reference decoder. | ||||
Instantaneous decoding refresh (IDR) picture: An IRAP picture for | Gradual decoding refresh (GDR) picture: | |||
which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or | A picture for which each VCL NAL unit has nal_unit_type equal to | |||
IDR_N_LP. | GDR_NUT. | |||
Intra random access point (IRAP) AU: An AU in which there is a PU for | Instantaneous decoding refresh (IDR) PU: | |||
each layer in the CVS and the coded picture in each PU is an IRAP | A PU in which the coded picture is an IDR picture. | |||
picture. | ||||
Intra random access point (IRAP) PU: A PU in which the coded picture | Instantaneous decoding refresh (IDR) picture: | |||
is an IRAP picture. | An IRAP picture for which each VCL NAL unit has nal_unit_type | |||
equal to IDR_W_RADL or IDR_N_LP. | ||||
Intra random access point (IRAP) picture: A coded picture for which | Intra random access point (IRAP) AU: | |||
all VCL NAL units have the same value of nal_unit_type in the range | An AU in which there is a PU for each layer in the CVS and the | |||
of IDR_W_RADL to CRA_NUT, inclusive. | coded picture in each PU is an IRAP picture. | |||
Layer: A set of VCL NAL units that all have a particular value of | Intra random access point (IRAP) PU: | |||
nuh_layer_id and the associated non-VCL NAL units. | A PU in which the coded picture is an IRAP picture. | |||
Network abstraction layer (NAL) unit: A syntax structure containing | Intra random access point (IRAP) picture: | |||
an indication of the type of data to follow and bytes containing that | A coded picture for which all VCL NAL units have the same value of | |||
data in the form of an RBSP interspersed as necessary with emulation | nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive. | |||
prevention bytes. | ||||
Network abstraction layer (NAL) unit stream: A sequence of NAL units. | Layer: | |||
A set of VCL NAL units that all have a particular value of | ||||
nuh_layer_id and the associated non-VCL NAL units. | ||||
Output Layer Set (OLS): A set of layers for which one or more layers | Network abstraction layer (NAL) unit: | |||
are specified as the output layers. | A syntax structure containing an indication of the type of data to | |||
follow and bytes containing that data in the form of an RBSP | ||||
interspersed as necessary with emulation prevention bytes. | ||||
Operation point (OP): A temporal subset of an OLS, identified by an | Network abstraction layer (NAL) unit stream: | |||
OLS index and a highest value of TemporalId. | A sequence of NAL units. | |||
Picture parameter set (PPS): A syntax structure containing syntax | Output Layer Set (OLS): | |||
elements that apply to zero or more entire coded pictures as | A set of layers for which one or more layers are specified as the | |||
determined by a syntax element found in each slice header. | output layers. | |||
Picture unit (PU): A set of NAL units that are associated with each | Operation point (OP): | |||
other according to a specified classification rule, are consecutive | A temporal subset of an OLS, identified by an OLS index and a | |||
in decoding order, and contain exactly one coded picture. | highest value of TemporalId. | |||
Random access: The act of starting the decoding process for a | Picture Header (PH): | |||
bitstream at a point other than the beginning of the stream. | A syntax structure containing syntax elements that apply to all | |||
slices of a coded picture. | ||||
Sequence parameter set (SPS): A syntax structure containing syntax | Picture parameter set (PPS): | |||
elements that apply to zero or more entire CLVSs as determined by the | A syntax structure containing syntax elements that apply to zero | |||
content of a syntax element found in the PPS referred to by a syntax | or more entire coded pictures as determined by a syntax element | |||
element found in each picture header. | found in each slice header. | |||
Slice: An integer number of complete tiles or an integer number of | Picture unit (PU): | |||
consecutive complete CTU rows within a tile of a picture that are | A set of NAL units that are associated with each other according | |||
exclusively contained in a single NAL unit. | to a specified classification rule, are consecutive in decoding | |||
order, and contain exactly one coded picture. | ||||
Slice header (SH): A part of a coded slice containing the data | Random access: | |||
elements pertaining to all tiles or CTU rows within a tile | The act of starting the decoding process for a bitstream at a | |||
represented in the slice. | point other than the beginning of the bitstream. | |||
Sublayer: A temporal scalable layer of a temporal scalable bitstream | Raw Byte Sequence Payload (RBSP): | |||
consisting of VCL NAL units with a particular value of the TemporalId | A syntax structure containing an integer number of bytes that is | |||
variable, and the associated non-VCL NAL units. | encapsulated in a NAL unit and is either empty or has the form of | |||
a string of data bits containing syntax elements followed by an | ||||
RBSP stop bit and zero or more subsequent bits equal to 0. | ||||
Subpicture: An rectangular region of one or more slices within a | Sequence parameter set (SPS): | |||
picture. | A syntax structure containing syntax elements that apply to zero | |||
or more entire CLVSs as determined by the content of a syntax | ||||
element found in the PPS referred to by a syntax element found in | ||||
each picture header. | ||||
Sublayer representation: A subset of the bitstream consisting of NAL | Slice: | |||
units of a particular sublayer and the lower sublayers. | An integer number of complete tiles or an integer number of | |||
consecutive complete CTU rows within a tile of a picture that are | ||||
exclusively contained in a single NAL unit. | ||||
Tile: A rectangular region of CTUs within a particular tile column | Slice header (SH): | |||
and a particular tile row in a picture. | A part of a coded slice containing the data elements pertaining to | |||
all tiles or CTU rows within a tile represented in the slice. | ||||
Tile column: A rectangular region of CTUs having a height equal to | Sublayer: | |||
the height of the picture and a width specified by syntax elements in | A temporal scalable layer of a temporal scalable bitstream | |||
the picture parameter set. | consisting of VCL NAL units with a particular value of the | |||
TemporalId variable, and the associated non-VCL NAL units. | ||||
Tile row: A rectangular region of CTUs having a height specified by | Subpicture: | |||
syntax elements in the picture parameter set and a width equal to the | A rectangular region of one or more slices within a picture. | |||
width of the picture. | ||||
Video coding layer (VCL) NAL unit: A collective term for coded slice | Sublayer representation: | |||
NAL units and the subset of NAL units that have reserved values of | A subset of the bitstream consisting of NAL units of a particular | |||
nal_unit_type that are classified as VCL NAL units in this | sublayer and the lower sublayers. | |||
Specification. | ||||
Tile: | ||||
A rectangular region of CTUs within a particular tile column and a | ||||
particular tile row in a picture. | ||||
Tile column: | ||||
A rectangular region of CTUs having a height equal to the height | ||||
of the picture and a width specified by syntax elements in the | ||||
picture parameter set. | ||||
Tile row: | ||||
A rectangular region of CTUs having a height specified by syntax | ||||
elements in the picture parameter set and a width equal to the | ||||
width of the picture. | ||||
Video coding layer (VCL) NAL unit: | ||||
A collective term for coded slice NAL units and the subset of NAL | ||||
units that have reserved values of nal_unit_type that are | ||||
classified as VCL NAL units in this Specification. | ||||
3.1.2. Definitions Specific to This Memo | 3.1.2. Definitions Specific to This Memo | |||
Media-Aware Network Element (MANE): A network element, such as a | Media-Aware Network Element (MANE): | |||
middlebox, selective forwarding unit, or application-layer gateway | A network element, such as a middlebox, selective forwarding unit, | |||
that is capable of parsing certain aspects of the RTP payload headers | or application-layer gateway that is capable of parsing certain | |||
or the RTP payload and reacting to their contents. | aspects of the RTP payload headers or the RTP payload and reacting | |||
to their contents. | ||||
Informative note: The concept of a MANE goes beyond normal routers | | Informative note: The concept of a MANE goes beyond normal | |||
or gateways in that a MANE has to be aware of the signaling (e.g., | | routers or gateways in that a MANE has to be aware of the | |||
to learn about the payload type mappings of the media streams), | | signaling (e.g., to learn about the payload type mappings of | |||
and in that it has to be trusted when working with Secure RTP | | the media streams), and in that it has to be trusted when | |||
(SRTP). The advantage of using MANEs is that they allow packets | | working with Secure RTP (SRTP). The advantage of using | |||
to be dropped according to the needs of the media coding. For | | MANEs is that they allow packets to be dropped according to | |||
example, if a MANE has to drop packets due to congestion on a | | the needs of the media coding. For example, if a MANE has | |||
certain link, it can identify and remove those packets whose | | to drop packets due to congestion on a certain link, it can | |||
elimination produces the least adverse effect on the user | | identify and remove those packets whose elimination produces | |||
experience. After dropping packets, MANEs must rewrite RTCP | | the least adverse effect on the user experience. After | |||
packets to match the changes to the RTP stream, as specified in | | dropping packets, MANEs must rewrite RTCP packets to match | |||
Section 7 of [RFC3550]. | | the changes to the RTP stream, as specified in Section 7 of | |||
| [RFC3550]. | ||||
NAL unit decoding order: A NAL unit order that conforms to the | NAL unit decoding order: | |||
constraints on NAL unit order given in Section 7.4.2.4 in [VVC], | A NAL unit order that conforms to the constraints on NAL unit | |||
follow the Order of NAL units in the bitstream. | order given in Section 7.4.2.4 in [VVC], follow the order of NAL | |||
units in the bitstream. | ||||
RTP stream (See [RFC7656]): Within the scope of this memo, one RTP | RTP stream (see [RFC7656]): | |||
stream is utilized to transport a VVC bitstream, which may contain | Within the scope of this memo, one RTP stream is utilized to | |||
one or more layers, and each layer may contain one or more temporal | transport a VVC bitstream, which may contain one or more layers, | |||
sublayers. | and each layer may contain one or more temporal sublayers. | |||
Transmission order: The order of packets in ascending RTP sequence | Transmission order: | |||
number order (in modulo arithmetic). Within an aggregation packet, | The order of packets in ascending RTP sequence number order (in | |||
the NAL unit transmission order is the same as the order of | modulo arithmetic). Within an aggregation packet, the NAL unit | |||
appearance of NAL units in the packet. | transmission order is the same as the order of appearance of NAL | |||
units in the packet. | ||||
3.2. Abbreviations | 3.2. Abbreviations | |||
AU Access Unit | AU Access Unit | |||
AP Aggregation Packet | AP Aggregation Packet | |||
APS Adaptation Parameter Set | APS Adaptation Parameter Set | |||
CTU Coding Tree Unit | CTU Coding Tree Unit | |||
CVS Coded Video Sequence | ||||
DPB Decoded Picture Buffer | CVS Coded Video Sequence | |||
DCI Decoding Capability Information | DPB Decoded Picture Buffer | |||
DON Decoding Order Number | DCI Decoding Capability Information | |||
FIR Full Intra Request | DON Decoding Order Number | |||
FU Fragmentation Unit | FIR Full Intra Request | |||
GDR Gradual Decoding Refresh | FU Fragmentation Unit | |||
HRD Hypothetical Reference Decoder | GDR Gradual Decoding Refresh | |||
IDR Instantaneous Decoding Refresh | HRD Hypothetical Reference Decoder | |||
IRAP Intra Random Access Point | IDR Instantaneous Decoding Refresh | |||
MANE Media-Aware Network Element | IRAP Intra Random Access Point | |||
MTU Maximum Transfer Unit | MANE Media-Aware Network Element | |||
NAL Network Abstraction Layer | MTU Maximum Transfer Unit | |||
NALU Network Abstraction Layer Unit | NAL Network Abstraction Layer | |||
OLS Output Layer Set | NALU Network Abstraction Layer Unit | |||
PLI Picture Loss Indication | OLS Output Layer Set | |||
PPS Picture Parameter Set | PLI Picture Loss Indication | |||
RPSI Reference Picture Selection Indication | PPS Picture Parameter Set | |||
SEI Supplemental Enhancement Information | RPSI Reference Picture Selection Indication | |||
SLI Slice Loss Indication | SEI Supplemental Enhancement Information | |||
SPS Sequence Parameter Set | SLI Slice Loss Indication | |||
VCL Video Coding Layer | SPS Sequence Parameter Set | |||
VPS Video Parameter Set | VCL Video Coding Layer | |||
VPS Video Parameter Set | ||||
4. RTP Payload Format | 4. RTP Payload Format | |||
4.1. RTP Header Usage | 4.1. RTP Header Usage | |||
The format of the RTP header is specified in [RFC3550] (reprinted as | The format of the RTP header is specified in [RFC3550] (reprinted as | |||
Figure 2 for convenience). This payload format uses the fields of | Figure 2 for convenience). This payload format uses the fields of | |||
the header in a manner consistent with that specification. | the header in a manner consistent with that specification. | |||
The RTP payload (and the settings for some RTP header bits) for | The RTP payload (and the settings for some RTP header bits) for | |||
aggregation packets and fragmentation units are specified in | aggregation packets and fragmentation units are specified in Sections | |||
Section 4.3.2 and Section 4.3.3, respectively. | 4.3.2 and 4.3.3, respectively. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
|V=2|P|X| CC |M| PT | sequence number | | |V=2|P|X| CC |M| PT | sequence number | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| timestamp | | | timestamp | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| synchronization source (SSRC) identifier | | | synchronization source (SSRC) identifier | | |||
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | |||
| contributing source (CSRC) identifiers | | | contributing source (CSRC) identifiers | | |||
| .... | | | .... | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
RTP Header According to [RFC3550] | Figure 2: RTP Header According to RFC 3550 | |||
Figure 2 | ||||
The RTP header information to be set according to this RTP payload | The RTP header information to be set according to this RTP payload | |||
format is set as follows: | format is set as follows: | |||
Marker bit (M): 1 bit | Marker bit (M): 1 bit | |||
Set for the last packet, in transmission order, among each set of | Set for the last packet, in transmission order, among each set of | |||
packets that contain NAL units of one access unit. This is in | packets that contain NAL units of one access unit. This is in | |||
line with the normal use of the M bit in video formats to allow an | line with the normal use of the M bit in video formats to allow an | |||
efficient playout buffer handling. | efficient playout buffer handling. | |||
Payload Type (PT): 7 bits | Payload Type (PT): 7 bits | |||
The assignment of an RTP payload type for this new packet format | The assignment of an RTP payload type for this new packet format | |||
is outside the scope of this document and will not be specified | is outside the scope of this document and will not be specified | |||
here. The assignment of a payload type has to be performed either | here. The assignment of a payload type has to be performed either | |||
through the profile used or in a dynamic way. | through the profile used or in a dynamic way. | |||
Sequence Number (SN): 16 bits | Sequence Number (SN): 16 bits | |||
Set and used in accordance with [RFC3550]. | Set and used in accordance with [RFC3550]. | |||
Timestamp: 32 bits | Timestamp: 32 bits | |||
The RTP timestamp is set to the sampling timestamp of the content. | The RTP timestamp is set to the sampling timestamp of the content. | |||
A 90 kHz clock rate MUST be used. If the NAL unit has no timing | A 90 kHz clock rate MUST be used. If the NAL unit has no timing | |||
properties of its own (e.g., parameter set and SEI NAL units), the | properties of its own (e.g., parameter set and SEI NAL units), the | |||
RTP timestamp MUST be set to the RTP timestamp of the coded | RTP timestamp MUST be set to the RTP timestamp of the coded | |||
pictures of the access unit in which the NAL unit (according to | pictures of the access unit in which the NAL unit (according to | |||
Section 7.4.2.4 of [VVC]) is included. Receivers MUST use the RTP | Section 7.4.2.4 of [VVC]) is included. Receivers MUST use the RTP | |||
timestamp for the display process, even when the bitstream | timestamp for the display process, even when the bitstream | |||
contains picture timing SEI messages or decoding unit information | contains picture timing SEI messages or decoding unit information | |||
SEI messages as specified in [VVC]. | SEI messages, as specified in [VVC]. | |||
Informative note: When picture timing SEI messages are present, | | Informative note: When picture timing SEI messages are | |||
the RTP sender is responsible to ensure that the RTP timestamps | | present, the RTP sender is responsible to ensure that the | |||
are consistent with the timing information carried in the | | RTP timestamps are consistent with the timing information | |||
picture timing SEI messages. | | carried in the picture timing SEI messages. | |||
Synchronization source (SSRC): 32 bits | Synchronization source (SSRC): 32 bits | |||
Used to identify the source of the RTP packets. A single SSRC is | Used to identify the source of the RTP packets. A single SSRC is | |||
used for all parts of a single bitstream. | used for all parts of a single bitstream. | |||
4.2. Payload Header Usage | 4.2. Payload Header Usage | |||
The first two bytes of the payload of an RTP packet are referred to | The first two bytes of the payload of an RTP packet are referred to | |||
as the payload header. The payload header consists of the same | as the payload header. The payload header consists of the same | |||
fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown | fields (F, Z, LayerId, Type, and TID) as the NAL unit header shown in | |||
in Section 1.1.4, irrespective of the type of the payload structure. | Section 1.1.4, irrespective of the type of the payload structure. | |||
The TID value indicates (among other things) the relative importance | The TID value indicates (among other things) the relative importance | |||
of an RTP packet, for example, because NAL units belonging to higher | of an RTP packet, for example, because NAL units belonging to higher | |||
temporal sublayers are not used for the decoding of lower temporal | temporal sublayers are not used for the decoding of lower temporal | |||
sublayers. A lower value of TID indicates a higher importance. | sublayers. A lower value of TID indicates a higher importance. More | |||
More-important NAL units MAY be better protected against transmission | important NAL units MAY be better protected against transmission | |||
losses than less-important NAL units. | losses than less-important NAL units. | |||
4.3. Payload Structures | 4.3. Payload Structures | |||
Three different types of RTP packet payload structures are specified. | Three different types of RTP packet payload structures are specified. | |||
A receiver can identify the type of an RTP packet payload through the | A receiver can identify the type of an RTP packet payload through the | |||
Type field in the payload header. | Type field in the payload header. | |||
The three different payload structures are as follows: | The three different payload structures are as follows: | |||
skipping to change at page 23, line 14 ¶ | skipping to change at line 1079 ¶ | |||
* Aggregation Packet (AP): Contains more than one NAL unit within | * Aggregation Packet (AP): Contains more than one NAL unit within | |||
one access unit. This payload structure is specified in | one access unit. This payload structure is specified in | |||
Section 4.3.2. | Section 4.3.2. | |||
* Fragmentation Unit (FU): Contains a subset of a single NAL unit. | * Fragmentation Unit (FU): Contains a subset of a single NAL unit. | |||
This payload structure is specified in Section 4.3.3. | This payload structure is specified in Section 4.3.3. | |||
4.3.1. Single NAL Unit Packets | 4.3.1. Single NAL Unit Packets | |||
A single NAL unit packet contains exactly one NAL unit, and consists | A single NAL unit packet contains exactly one NAL unit and consists | |||
of a payload header as defined in Table 5 of [VVC] (denoted here as | of a payload header, as defined in Table 5 of [VVC] (denoted here as | |||
PayloadHdr), following with a conditional 16-bit DONL field (in | PayloadHdr), following with a conditional 16-bit DONL field (in | |||
network byte order), and the NAL unit payload data (the NAL unit | network byte order), and the NAL unit payload data (the NAL unit | |||
excluding its NAL unit header) of the contained NAL unit, as shown in | excluding its NAL unit header) of the contained NAL unit, as shown in | |||
Figure 3. | Figure 3. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| PayloadHdr | DONL (conditional) | | | PayloadHdr | DONL (conditional) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
| NAL unit payload data | | | NAL unit payload data | | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The Structure of a Single NAL Unit Packet | Figure 3: The Structure of a Single NAL Unit Packet | |||
Figure 3 | ||||
The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
significant bits of the decoding order number of the contained NAL | significant bits of the decoding order number of the contained NAL | |||
unit. If sprop-max-don-diff (see definition in Section 7.2) is | unit. If sprop-max-don-diff (defined in Section 7.2) is greater than | |||
greater than 0, the DONL field MUST be present, and the variable DON | 0, the DONL field MUST be present, and the variable DON for the | |||
for the contained NAL unit is derived as equal to the value of the | contained NAL unit is derived as equal to the value of the DONL | |||
DONL field. Otherwise (sprop-max-don-diff is equal to 0), the DONL | field. Otherwise (sprop-max-don-diff is equal to 0), the DONL field | |||
field MUST NOT be present. | MUST NOT be present. | |||
4.3.2. Aggregation Packets (APs) | 4.3.2. Aggregation Packets (APs) | |||
Aggregation Packets (APs) can reduce packetization overhead for small | Aggregation packets (APs) can reduce packetization overhead for small | |||
NAL units, such as most of the non-VCL NAL units, which are often | NAL units, such as most of the non-VCL NAL units, which are often | |||
only a few octets in size. | only a few octets in size. | |||
An AP aggregates NAL units of one access unit and it MUST NOT contain | An AP aggregates NAL units of one access unit, and it MUST NOT | |||
NAL units from more than one AU. Each NAL unit to be carried in an | contain NAL units from more than one AU. Each NAL unit to be carried | |||
AP is encapsulated in an aggregation unit. NAL units aggregated in | in an AP is encapsulated in an aggregation unit. NAL units | |||
one AP are included in NAL unit decoding order. | aggregated in one AP are included in NAL-unit-decoding order. | |||
An AP consists of a payload header as defined in Table 5 of [VVC] | An AP consists of a payload header, as defined in Table 5 of [VVC] | |||
(denoted here as PayloadHdr with Type=28) followed by two or more | (denoted here as PayloadHdr with Type=28), followed by two or more | |||
aggregation units, as shown in Figure 4. | aggregation units, as shown in Figure 4. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| PayloadHdr (Type=28) | | | | PayloadHdr (Type=28) | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | |||
| two or more aggregation units | | | two or more aggregation units | | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The Structure of an Aggregation Packet | Figure 4: The Structure of an Aggregation Packet | |||
Figure 4 | ||||
The fields in the payload header of an AP are set as follows. The F | The fields in the payload header of an AP are set as follows. The F | |||
bit MUST be equal to 0 if the F bit of each aggregated NAL unit is | bit MUST be equal to 0 if the F bit of each aggregated NAL unit is | |||
equal to zero; otherwise, it MUST be equal to 1. The Type field MUST | equal to zero; otherwise, it MUST be equal to 1. The Type field MUST | |||
be equal to 28. | be equal to 28. | |||
The value of LayerId MUST be equal to the lowest value of LayerId of | The value of LayerId MUST be equal to the lowest value of LayerId of | |||
all the aggregated NAL units. The value of TID MUST be the lowest | all the aggregated NAL units. The value of TID MUST be the lowest | |||
value of TID of all the aggregated NAL units. | value of TID of all the aggregated NAL units. | |||
Informative note: All VCL NAL units in an AP have the same TID | | Informative note: All VCL NAL units in an AP have the same TID | |||
value since they belong to the same access unit. However, an AP | | value since they belong to the same access unit. However, an | |||
may contain non-VCL NAL units for which the TID value in the NAL | | AP may contain non-VCL NAL units for which the TID value in the | |||
unit header may be different than the TID value of the VCL NAL | | NAL unit header may be different than the TID value of the VCL | |||
units in the same AP. | | NAL units in the same AP. | |||
Informative Note: If a system envisions sub-picture level or | | Informative note: If a system envisions subpicture-level or | |||
picture level modifications, for example by removing sub-pictures | | picture-level modifications, for example, by removing | |||
or pictures of a particular layer, a good design choice on the | | subpictures or pictures of a particular layer, a good design | |||
sender's side would be to aggregate NAL units belonging to only | | choice on the sender's side would be to aggregate NAL units | |||
the same sub-picture or picture of a particular layer. | | belonging to only the same subpicture or picture of a | |||
| particular layer. | ||||
An AP MUST carry at least two aggregation units and can carry as many | An AP MUST carry at least two aggregation units and can carry as many | |||
aggregation units as necessary; however, the total amount of data in | aggregation units as necessary; however, the total amount of data in | |||
an AP obviously MUST fit into an IP packet, and the size SHOULD be | an AP obviously MUST fit into an IP packet, and the size SHOULD be | |||
chosen so that the resulting IP packet is smaller than the MTU size | chosen so that the resulting IP packet is smaller than the MTU size | |||
so to avoid IP layer fragmentation. An AP MUST NOT contain FUs | in order to avoid IP layer fragmentation. An AP MUST NOT contain the | |||
specified in Section 4.3.3. APs MUST NOT be nested; i.e., an AP can | FUs specified in Section 4.3.3. APs MUST NOT be nested, i.e., an AP | |||
not contain another AP. | cannot contain another AP. | |||
The first aggregation unit in an AP consists of a conditional 16-bit | The first aggregation unit in an AP consists of a conditional 16-bit | |||
DONL field (in network byte order) followed by a 16-bit unsigned size | DONL field (in network byte order), followed by 16 bits of unsigned | |||
information (in network byte order) that indicates the size of the | size information (in network byte order) that indicate the size of | |||
NAL unit in bytes (excluding these two octets, but including the NAL | the NAL unit in bytes (excluding these two octets but including the | |||
unit header), followed by the NAL unit itself, including its NAL unit | NAL unit header), followed by the NAL unit itself, including its NAL | |||
header, as shown in Figure 5. | unit header, as shown in Figure 5. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : DONL (conditional) | NALU size | | | : DONL (conditional) | NALU size | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| NALU size | | | | NALU size | | | |||
+-+-+-+-+-+-+-+-+ NAL unit | | +-+-+-+-+-+-+-+-+ NAL unit | | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : | | : | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The Structure of the First Aggregation Unit in an AP | Figure 5: The Structure of the First Aggregation Unit in an AP | |||
Figure 5 | ||||
Informative Note: The first octet of Figure 5 (indicated by the | | Informative note: The first octet of Figure 5 (indicated by the | |||
first colon) belongs to a previous aggregation unit. It is | | first colon) belongs to a previous aggregation unit. It is | |||
depicted to emphasize that aggregation units are octet-aligned | | depicted to emphasize that aggregation units are octet aligned | |||
only. Similarly, the NAL unit carried in the aggregation unit can | | only. Similarly, the NAL unit carried in the aggregation unit | |||
terminate at the octet boundary. | | can terminate at the octet boundary. | |||
The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
significant bits of the decoding order number of the aggregated NAL | significant bits of the decoding order number of the aggregated NAL | |||
unit. | unit. | |||
If sprop-max-don-diff is greater than 0, the DONL field MUST be | If sprop-max-don-diff is greater than 0, the DONL field MUST be | |||
present in an aggregation unit that is the first aggregation unit in | present in an aggregation unit that is the first aggregation unit in | |||
an AP, and the variable DON for the aggregated NAL unit is derived as | an AP, and the variable DON for the aggregated NAL unit is derived as | |||
equal to the value of the DONL field, and the variable DON for an | equal to the value of the DONL field, and the variable DON for an | |||
aggregation unit that is not the first aggregation unit in an AP | aggregation unit that is not the first aggregation unit in an AP- | |||
aggregated NAL unit is derived as equal to the DON of the preceding | aggregated NAL unit is derived as equal to the DON of the preceding | |||
aggregated NAL unit in the same AP plus 1 modulo 65536. Otherwise | aggregated NAL unit in the same AP plus 1 modulo 65536. Otherwise | |||
(sprop-max-don-diff is equal to 0), the DONL field MUST NOT be | (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be | |||
present in an aggregation unit that is the first aggregation unit in | present in an aggregation unit that is the first aggregation unit in | |||
an AP. | an AP. | |||
An aggregation unit that is not the first aggregation unit in an AP | An aggregation unit that is not the first aggregation unit in an AP | |||
will be followed immediately by a 16-bit unsigned size information | will be followed immediately by 16 bits of unsigned size information | |||
(in network byte order) that indicates the size of the NAL unit in | (in network byte order) that indicate the size of the NAL unit in | |||
bytes (excluding these two octets, but including the NAL unit | bytes (excluding these two octets but including the NAL unit header), | |||
header), followed by the NAL unit itself, including its NAL unit | followed by the NAL unit itself, including its NAL unit header, as | |||
header, as shown in Figure 6. | shown in Figure 6. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : NALU size | NAL unit | | | : NALU size | NAL unit | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : | | : | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The Structure of an Aggregation Unit That Is Not the First | Figure 6: The Structure of an Aggregation Unit That Is Not the First | |||
Aggregation Unit in an AP | Aggregation Unit in an AP | |||
Figure 6 | ||||
Informative Note: The first octet of Figure 6 (indicated by the | | Informative note: The first octet of Figure 6 (indicated by the | |||
first colon) belongs to a previous aggregation unit. It is | | first colon) belongs to a previous aggregation unit. It is | |||
depicted to emphasize that aggregation units are octet-aligned | | depicted to emphasize that aggregation units are octet aligned | |||
only. Similarly, the NAL unit carried in the aggregation unit can | | only. Similarly, the NAL unit carried in the aggregation unit | |||
terminate at the octet boundary. | | can terminate at the octet boundary. | |||
Figure 7 presents an example of an AP that contains two aggregation | Figure 7 presents an example of an AP that contains two aggregation | |||
units, labeled as 1 and 2 in the figure, without the DONL field being | units, labeled as 1 and 2 in the figure, without the DONL field being | |||
present. | present. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| RTP Header | | | RTP Header | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
skipping to change at page 27, line 26 ¶ | skipping to change at line 1260 ¶ | |||
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| . . . | NALU 2 Size | NALU 2 HDR | | | . . . | NALU 2 Size | NALU 2 HDR | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| NALU 2 HDR | | | | NALU 2 HDR | | | |||
+-+-+-+-+-+-+-+-+ NALU 2 Data | | +-+-+-+-+-+-+-+-+ NALU 2 Data | | |||
| . . . | | | . . . | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
An Example of an AP Packet Containing | Figure 7: An Example of an AP Packet Containing Two Aggregation | |||
Two Aggregation Units without the DONL Field | Units without the DONL Field | |||
Figure 7 | ||||
Figure 8 presents an example of an AP that contains two aggregation | Figure 8 presents an example of an AP that contains two aggregation | |||
units, labeled as 1 and 2 in the figure, with the DONL field being | units, labeled as 1 and 2 in the figure, with the DONL field being | |||
present. | present. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| RTP Header | | | RTP Header | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
skipping to change at page 28, line 27 ¶ | skipping to change at line 1289 ¶ | |||
+ . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| : NALU 2 Size | | | : NALU 2 Size | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| NALU 2 HDR | | | | NALU 2 HDR | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | | |||
| | | | | | |||
| . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
An Example of an AP Containing | Figure 8: An Example of an AP Containing Two Aggregation Units | |||
Two Aggregation Units with the DONL Field | with the DONL Field | |||
Figure 8 | ||||
4.3.3. Fragmentation Units | 4.3.3. Fragmentation Units | |||
Fragmentation Units (FUs) are introduced to enable fragmenting a | Fragmentation Units (FUs) are introduced to enable fragmenting a | |||
single NAL unit into multiple RTP packets, possibly without | single NAL unit into multiple RTP packets, possibly without | |||
cooperation or knowledge of the [VVC] encoder. A fragment of a NAL | cooperation or knowledge of the [VVC] encoder. A fragment of a NAL | |||
unit consists of an integer number of consecutive octets of that NAL | unit consists of an integer number of consecutive octets of that NAL | |||
unit. Fragments of the same NAL unit MUST be sent in consecutive | unit. Fragments of the same NAL unit MUST be sent in consecutive | |||
order with ascending RTP sequence numbers (with no other RTP packets | order with ascending RTP sequence numbers (with no other RTP packets | |||
within the same RTP stream being sent between the first and last | within the same RTP stream being sent between the first and last | |||
fragment). | fragment). | |||
When a NAL unit is fragmented and conveyed within FUs, it is referred | When a NAL unit is fragmented and conveyed within FUs, it is referred | |||
to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST | to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST | |||
NOT be nested; i.e., an FU can not contain a subset of another FU. | NOT be nested, i.e., an FU cannot contain a subset of another FU. | |||
The RTP timestamp of an RTP packet carrying an FU is set to the NALU- | The RTP timestamp of an RTP packet carrying an FU is set to the NALU- | |||
time of the fragmented NAL unit. | time of the fragmented NAL unit. | |||
An FU consists of a payload header as defined in Table 5 of [VVC] | An FU consists of a payload header as defined in Table 5 of [VVC] | |||
(denoted here as PayloadHdr with Type=29), an FU header of one octet, | (denoted here as PayloadHdr with Type=29), an FU header of one octet, | |||
a conditional 16-bit DONL field (in network byte order), and an FU | a conditional 16-bit DONL field (in network byte order), and an FU | |||
payload, as shown in Figure 9. | payload (as shown in Figure 9). | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| PayloadHdr (Type=29) | FU header | DONL (cond) | | | PayloadHdr (Type=29) | FU header | DONL (cond) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | |||
| DONL (cond) | | | | DONL (cond) | | | |||
|-+-+-+-+-+-+-+-+ | | |-+-+-+-+-+-+-+-+ | | |||
| FU payload | | | FU payload | | |||
| | | | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
The Structure of an FU | Figure 9: The Structure of an FU | |||
Figure 9 | ||||
The fields in the payload header are set as follows. The Type field | The fields in the payload header are set as follows. The Type field | |||
MUST be equal to 29. The fields F, LayerId, and TID MUST be equal to | MUST be equal to 29. The fields F, LayerId, and TID MUST be equal to | |||
the fields F, LayerId, and TID, respectively, of the fragmented NAL | the fields F, LayerId, and TID, respectively, of the fragmented NAL | |||
unit. | unit. | |||
The FU header consists of an S bit, an E bit, an R bit and a 5-bit | The FU header consists of an S bit, an E bit, an R bit, and a 5-bit | |||
FuType field, as shown in Figure 10. | FuType field, as shown in Figure 10. | |||
+---------------+ | +---------------+ | |||
|0|1|2|3|4|5|6|7| | |0|1|2|3|4|5|6|7| | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
|S|E|P| FuType | | |S|E|P| FuType | | |||
+---------------+ | +---------------+ | |||
The Structure of FU Header | Figure 10: The Structure of the FU Header | |||
Figure 10 | ||||
The semantics of the FU header fields are as follows: | The semantics of the FU header fields are as follows: | |||
S: 1 bit | S: 1 bit | |||
When set to 1, the S bit indicates the start of a fragmented NAL | When set to 1, the S bit indicates the start of a fragmented NAL | |||
unit, i.e., the first byte of the FU payload is also the first | unit, i.e., the first byte of the FU payload is also the first | |||
byte of the payload of the fragmented NAL unit. When the FU | byte of the payload of the fragmented NAL unit. When the FU | |||
payload is not the start of the fragmented NAL unit payload, the S | payload is not the start of the fragmented NAL unit payload, the S | |||
bit MUST be set to 0. | bit MUST be set to 0. | |||
skipping to change at page 30, line 11 ¶ | skipping to change at line 1356 ¶ | |||
The semantics of the FU header fields are as follows: | The semantics of the FU header fields are as follows: | |||
S: 1 bit | S: 1 bit | |||
When set to 1, the S bit indicates the start of a fragmented NAL | When set to 1, the S bit indicates the start of a fragmented NAL | |||
unit, i.e., the first byte of the FU payload is also the first | unit, i.e., the first byte of the FU payload is also the first | |||
byte of the payload of the fragmented NAL unit. When the FU | byte of the payload of the fragmented NAL unit. When the FU | |||
payload is not the start of the fragmented NAL unit payload, the S | payload is not the start of the fragmented NAL unit payload, the S | |||
bit MUST be set to 0. | bit MUST be set to 0. | |||
E: 1 bit | E: 1 bit | |||
When set to 1, the E bit indicates the end of a fragmented NAL | When set to 1, the E bit indicates the end of a fragmented NAL | |||
unit, i.e., the last byte of the payload is also the last byte of | unit, i.e., the last byte of the payload is also the last byte of | |||
the fragmented NAL unit. When the FU payload is not the last | the fragmented NAL unit. When the FU payload is not the last | |||
fragment of a fragmented NAL unit, the E bit MUST be set to 0. | fragment of a fragmented NAL unit, the E bit MUST be set to 0. | |||
P: 1 bit | P: 1 bit | |||
When set to 1, the P bit indicates the last FU of the last VCL NAL | When set to 1, the P bit indicates the last FU of the last VCL NAL | |||
unit of a coded picture, i.e., the last byte of the FU payload is | unit of a coded picture, i.e., the last byte of the FU payload is | |||
also the last byte of the last VCL NAL unit of the coded picture. | also the last byte of the last VCL NAL unit of the coded picture. | |||
When the FU payload is not the last fragment of the last VCL NAL | When the FU payload is not the last fragment of the last VCL NAL | |||
unit of a coded picture, the P bit MUST be set to 0. | unit of a coded picture, the P bit MUST be set to 0. | |||
FuType: 5 bits | FuType: 5 bits | |||
The field FuType MUST be equal to the field Type of the fragmented | The field FuType MUST be equal to the field Type of the fragmented | |||
NAL unit. | NAL unit. | |||
The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
significant bits of the decoding order number of the fragmented NAL | significant bits of the decoding order number of the fragmented NAL | |||
unit. | unit. | |||
If sprop-max-don-diff is greater than 0, and the S bit is equal to 1, | If sprop-max-don-diff is greater than 0, and the S bit is equal to 1, | |||
the DONL field MUST be present in the FU, and the variable DON for | the DONL field MUST be present in the FU, and the variable DON for | |||
the fragmented NAL unit is derived as equal to the value of the DONL | the fragmented NAL unit is derived as equal to the value of the DONL | |||
field. Otherwise (sprop-max-don-diff is equal to 0, or the S bit is | field. Otherwise (sprop-max-don-diff is equal to 0, or the S bit is | |||
equal to 0), the DONL field MUST NOT be present in the FU. | equal to 0), the DONL field MUST NOT be present in the FU. | |||
A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., | A non-fragmented NAL unit MUST NOT be transmitted in one FU, i.e., | |||
the Start bit and End bit must not both be set to 1 in the same FU | the Start bit and End bit must not both be set to 1 in the same FU | |||
header. | header. | |||
The FU payload consists of fragments of the payload of the fragmented | The FU payload consists of fragments of the payload of the fragmented | |||
NAL unit so that if the FU payloads of consecutive FUs, starting with | NAL unit so that, if the FU payloads of consecutive FUs, starting | |||
an FU with the S bit equal to 1 and ending with an FU with the E bit | with an FU with the S bit equal to 1 and ending with an FU with the E | |||
equal to 1, are sequentially concatenated, the payload of the | bit equal to 1, are sequentially concatenated, the payload of the | |||
fragmented NAL unit can be reconstructed. The NAL unit header of the | fragmented NAL unit can be reconstructed. The NAL unit header of the | |||
fragmented NAL unit is not included as such in the FU payload, but | fragmented NAL unit is not included as such in the FU payload, but | |||
rather the information of the NAL unit header of the fragmented NAL | rather the information of the NAL unit header of the fragmented NAL | |||
unit is conveyed in F, LayerId, and TID fields of the FU payload | unit is conveyed in the F, LayerId, and TID fields of the FU payload | |||
headers of the FUs and the FuType field of the FU header of the FUs. | headers of the FUs and the FuType field of the FU header of the FUs. | |||
An FU payload MUST NOT be empty. | An FU payload MUST NOT be empty. | |||
If an FU is lost, the receiver SHOULD discard all following | If an FU is lost, the receiver SHOULD discard all following | |||
fragmentation units in transmission order corresponding to the same | fragmentation units in transmission order, corresponding to the same | |||
fragmented NAL unit, unless the decoder in the receiver is known to | fragmented NAL unit, unless the decoder in the receiver is known to | |||
be prepared to gracefully handle incomplete NAL units. | be prepared to gracefully handle incomplete NAL units. | |||
A receiver in an endpoint or in a MANE MAY aggregate the first n-1 | A receiver in an endpoint or in a MANE MAY aggregate the first n-1 | |||
fragments of a NAL unit to an (incomplete) NAL unit, even if fragment | fragments of a NAL unit to an (incomplete) NAL unit, even if fragment | |||
n of that NAL unit is not received. In this case, the | n of that NAL unit is not received. In this case, the | |||
forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a | forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a | |||
syntax violation. | syntax violation. | |||
4.4. Decoding Order Number | 4.4. Decoding Order Number | |||
skipping to change at page 32, line 20 ¶ | skipping to change at line 1445 ¶ | |||
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), | If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), | |||
AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] | AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] | |||
If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), | If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), | |||
AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) | AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) | |||
If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), | If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), | |||
AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) | AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) | |||
For any two NAL units m and n, the following applies: | For any two NAL units (m and n), the following applies: | |||
* AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows | * When AbsDon[n] is greater than AbsDon[m], this indicates that NAL | |||
NAL unit m in NAL unit decoding order. | unit n follows NAL unit m in NAL unit decoding order. | |||
* When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order | * When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order | |||
of the two NAL units can be in either order. | of the two NAL units can be in either order. | |||
* AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes | * When AbsDon[n] is less than AbsDon[m], this indicates that NAL | |||
NAL unit m in decoding order. | unit n precedes NAL unit m in decoding order. | |||
Informative note: When two consecutive NAL units in the NAL | | Informative note: When two consecutive NAL units in the NAL | |||
unit decoding order have different values of AbsDon, the | | unit decoding order have different values of AbsDon, the | |||
absolute difference between the two AbsDon values may be | | absolute difference between the two AbsDon values may be | |||
greater than or equal to 1. | | greater than or equal to 1. | |||
Informative note: There are multiple reasons to allow for the | | Informative note: There are multiple reasons to allow for the | |||
absolute difference of the values of AbsDon for two consecutive | | absolute difference of the values of AbsDon for two consecutive | |||
NAL units in the NAL unit decoding order to be greater than | | NAL units in the NAL unit decoding order to be greater than | |||
one. An increment by one is not required, as at the time of | | one. An increment by one is not required, as at the time of | |||
associating values of AbsDon to NAL units, it may not be known | | associating values of AbsDon to NAL units, it may not be known | |||
whether all NAL units are to be delivered to the receiver. For | | whether all NAL units are to be delivered to the receiver. For | |||
example, a gateway might not forward VCL NAL units of higher | | example, a gateway might not forward VCL NAL units of higher | |||
sublayers or some SEI NAL units when there is congestion in the | | sublayers or some SEI NAL units when there is congestion in the | |||
network. In another example, the first intra-coded picture of | | network. In another example, the first intra-coded picture of | |||
a pre-encoded clip is transmitted in advance to ensure that it | | a pre-encoded clip is transmitted in advance to ensure that it | |||
is readily available in the receiver, and when transmitting the | | is readily available in the receiver, and when transmitting the | |||
first intra-coded picture, the originator does not exactly know | | first intra-coded picture, the originator does not exactly know | |||
how many NAL units will be encoded before the first intra-coded | | how many NAL units will be encoded before the first intra-coded | |||
picture of the pre-encoded clip follows in decoding order. | | picture of the pre-encoded clip follows in decoding order. | |||
Thus, the values of AbsDon for the NAL units of the first | | Thus, the values of AbsDon for the NAL units of the first | |||
intra-coded picture of the pre-encoded clip have to be | | intra-coded picture of the pre-encoded clip have to be | |||
estimated when they are transmitted, and gaps in values of | | estimated when they are transmitted, and gaps in values of | |||
AbsDon may occur. | | AbsDon may occur. | |||
5. Packetization Rules | 5. Packetization Rules | |||
The following packetization rules apply: | The following packetization rules apply: | |||
* If sprop-max-don-diff is greater than 0, the transmission order of | * If sprop-max-don-diff is greater than 0, the transmission order of | |||
NAL units carried in the RTP stream MAY be different than the NAL | NAL units carried in the RTP stream MAY be different than the NAL | |||
unit decoding order. Otherwise (sprop-max-don-diff is equal to | unit decoding order. Otherwise (sprop-max-don-diff is equal to | |||
0), the transmission order of NAL units carried in the RTP stream | 0), the transmission order of NAL units carried in the RTP stream | |||
MUST be the same as the NAL unit decoding order. | MUST be the same as the NAL unit decoding order. | |||
* A NAL unit of a small size SHOULD be encapsulated in an | * A NAL unit of a small size SHOULD be encapsulated in an | |||
aggregation packet together with one or more other NAL units in | aggregation packet together with one or more other NAL units in | |||
order to avoid the unnecessary packetization overhead for small | order to avoid the unnecessary packetization overhead for small | |||
NAL units. For example, non-VCL NAL units such as access unit | NAL units. For example, non-VCL NAL units, such as access unit | |||
delimiters, parameter sets, or SEI NAL units are typically small | delimiters, parameter sets, or SEI NAL units, are typically small | |||
and can often be aggregated with VCL NAL units without violating | and can often be aggregated with VCL NAL units without violating | |||
MTU size constraints. | MTU size constraints. | |||
* Each non-VCL NAL unit SHOULD, when possible from an MTU size match | * Each non-VCL NAL unit SHOULD, when possible from an MTU size match | |||
viewpoint, be encapsulated in an aggregation packet together with | viewpoint, be encapsulated in an aggregation packet together with | |||
its associated VCL NAL unit, as typically a non-VCL NAL unit would | its associated VCL NAL unit, as typically a non-VCL NAL unit would | |||
be meaningless without the associated VCL NAL unit being | be meaningless without the associated VCL NAL unit being | |||
available. | available. | |||
* For carrying exactly one NAL unit in an RTP packet, a single NAL | * For carrying exactly one NAL unit in an RTP packet, a single NAL | |||
skipping to change at page 34, line 22 ¶ | skipping to change at line 1524 ¶ | |||
the following description should be seen as an example of a suitable | the following description should be seen as an example of a suitable | |||
implementation. Other schemes may be used as well, as long as the | implementation. Other schemes may be used as well, as long as the | |||
output for the same input is the same as the process described below. | output for the same input is the same as the process described below. | |||
The output is the same when the set of output NAL units and their | The output is the same when the set of output NAL units and their | |||
order are both identical. Optimizations relative to the described | order are both identical. Optimizations relative to the described | |||
algorithms are possible. | algorithms are possible. | |||
All normal RTP mechanisms related to buffer management apply. In | All normal RTP mechanisms related to buffer management apply. In | |||
particular, duplicated or outdated RTP packets (as indicated by the | particular, duplicated or outdated RTP packets (as indicated by the | |||
RTP sequence number and the RTP timestamp) are removed. To determine | RTP sequence number and the RTP timestamp) are removed. To determine | |||
the exact time for decoding, factors such as a possible intentional | the exact time for decoding, factors, such as a possible intentional | |||
delay to allow for proper inter-stream synchronization MUST be | delay to allow for proper inter-stream synchronization, MUST be | |||
factored in. | factored in. | |||
NAL units with NAL unit type values in the range of 0 to 27, | NAL units with NAL unit type values in the range of 0 to 27, | |||
inclusive, may be passed to the decoder. NAL-unit-like structures | inclusive, may be passed to the decoder. NAL-unit-like structures | |||
with NAL unit type values in the range of 28 to 31, inclusive, MUST | with NAL unit type values in the range of 28 to 31, inclusive, MUST | |||
NOT be passed to the decoder. | NOT be passed to the decoder. | |||
The receiver includes a receiver buffer, which is used to compensate | The receiver includes a receiver buffer, which is used to compensate | |||
for transmission delay jitter within individual RTP stream, and to | for transmission delay jitter within individual RTP streams and to | |||
reorder NAL units from transmission order to the NAL unit decoding | reorder NAL units from transmission order to the NAL unit decoding | |||
order. In this section, the receiver operation is described under | order. In this section, the receiver operation is described under | |||
the assumption that there is no transmission delay jitter within an | the assumption that there is no transmission delay jitter within an | |||
RTP stream. To make a difference from a practical receiver buffer | RTP stream. To make a difference from a practical receiver buffer | |||
that is also used for compensation of transmission delay jitter, the | that is also used for compensation of transmission delay jitter, the | |||
receiver buffer is hereafter called the de-packetization buffer in | receiver buffer is hereafter called the de-packetization buffer in | |||
this section. Receivers should also prepare for transmission delay | this section. Receivers should also prepare for transmission delay | |||
jitter; that is, either reserve separate buffers for transmission | jitter, that is, either reserve separate buffers for transmission | |||
delay jitter buffering and de-packetization buffering or use a | delay jitter buffering and de-packetization buffering or use a | |||
receiver buffer for both transmission delay jitter and de- | receiver buffer for both transmission delay jitter and de- | |||
packetization. Moreover, receivers should take transmission delay | packetization. Moreover, receivers should take transmission delay | |||
jitter into account in the buffering operation, e.g., by additional | jitter into account in the buffering operation, e.g., by additional | |||
initial buffering before starting of decoding and playback. | initial buffering before starting of decoding and playback. | |||
The de-packetization process extracts the NAL units from the RTP | The de-packetization process extracts the NAL units from the RTP | |||
packets in an RTP stream as follows. When an RTP packet carries a | packets in an RTP stream as follows. When an RTP packet carries a | |||
single NAL unit packet, the payload of the RTP packet is extracted as | single NAL unit packet, the payload of the RTP packet is extracted as | |||
a single NAL unit, excluding the DONL field, i.e., third and fourth | a single NAL unit, excluding the DONL field, i.e., third and fourth | |||
bytes, when sprop-max-don-diff is greater than 0. When an RTP packet | bytes, when sprop-max-don-diff is greater than 0. When an RTP packet | |||
carries an Aggregation Packet, several NAL units are extracted from | carries an aggregation packet, several NAL units are extracted from | |||
the payload of the RTP packet. In this case, each NAL unit | the payload of the RTP packet. In this case, each NAL unit | |||
corresponds to the part of the payload of each aggregation unit that | corresponds to the part of the payload of each aggregation unit that | |||
follows the NALU size field as described in Section 4.3.2. When an | follows the NALU size field, as described in Section 4.3.2. When an | |||
RTP packet carries a Fragmentation Unit (FU), all RTP packets from | RTP packet carries a Fragmentation Unit (FU), all RTP packets from | |||
the first FU (with the S field equal to 1) of the fragmented NAL unit | the first FU (with the S field equal to 1) of the fragmented NAL unit | |||
up to the last FU (with the E field equal to 1) of the fragmented NAL | up to the last FU (with the E field equal to 1) of the fragmented NAL | |||
unit are collected. The NAL unit is extracted from these RTP packets | unit are collected. The NAL unit is extracted from these RTP packets | |||
by concatenating all FU payloads in the same order as the | by concatenating all FU payloads in the same order as the | |||
corresponding RTP packets and appending the NAL unit header with the | corresponding RTP packets and appending the NAL unit header with the | |||
fields F, LayerId, and TID, set to equal to the values of the fields | fields F, LayerId, and TID set to equal the values of the fields F, | |||
F, LayerId, and TID in the payload header of the FUs respectively, | LayerId, and TID in the payload header of the FUs, respectively, and | |||
and with the NAL unit type set equal to the value of the field FuType | with the NAL unit type set equal to the value of the field FuType in | |||
in the FU header of the FUs, as described in Section 4.3.3. | the FU header of the FUs, as described in Section 4.3.3. | |||
When sprop-max-don-diff is equal to 0, the de-packetization buffer | When sprop-max-don-diff is equal to 0, the de-packetization buffer | |||
size is zero bytes, and the NAL units carried in the single RTP | size is zero bytes, and the NAL units carried in the single RTP | |||
stream are directly passed to the decoder in their transmission | stream are directly passed to the decoder in their transmission | |||
order, which is identical to their decoding order. | order, which is identical to their decoding order. | |||
When sprop-max-don-diff is greater than 0, the process described in | When sprop-max-don-diff is greater than 0, the process described in | |||
the remainder of this section applies. | the remainder of this section applies. | |||
There are two buffering states in the receiver: initial buffering and | There are two buffering states in the receiver: initial buffering and | |||
skipping to change at page 35, line 47 ¶ | skipping to change at line 1598 ¶ | |||
Initial buffering lasts until the difference between the greatest and | Initial buffering lasts until the difference between the greatest and | |||
smallest AbsDon values of the NAL units in the de-packetization | smallest AbsDon values of the NAL units in the de-packetization | |||
buffer is greater than or equal to the value of sprop-max-don-diff. | buffer is greater than or equal to the value of sprop-max-don-diff. | |||
After initial buffering, whenever the difference between the greatest | After initial buffering, whenever the difference between the greatest | |||
and smallest AbsDon values of the NAL units in the de-packetization | and smallest AbsDon values of the NAL units in the de-packetization | |||
buffer is greater than or equal to the value of sprop-max-don-diff, | buffer is greater than or equal to the value of sprop-max-don-diff, | |||
the following operation is repeatedly applied until this difference | the following operation is repeatedly applied until this difference | |||
is smaller than sprop-max-don-diff: | is smaller than sprop-max-don-diff: | |||
* The NAL unit in the de-packetization buffer with the smallest | The NAL unit in the de-packetization buffer with the smallest | |||
value of AbsDon is removed from the de-packetization buffer and | value of AbsDon is removed from the de-packetization buffer and | |||
passed to the decoder. | passed to the decoder. | |||
When no more NAL units are flowing into the de-packetization buffer, | When no more NAL units are flowing into the de-packetization buffer, | |||
all NAL units remaining in the de-packetization buffer are removed | all NAL units remaining in the de-packetization buffer are removed | |||
from the buffer and passed to the decoder in the order of increasing | from the buffer and passed to the decoder in the order of increasing | |||
AbsDon values. | AbsDon values. | |||
7. Payload Format Parameters | 7. Payload Format Parameters | |||
This section specifies the optional parameters. A mapping of the | This section specifies the optional parameters. A mapping of the | |||
parameters with Session Description Protocol (SDP) [RFC4556] is also | parameters with Session Description Protocol (SDP) [RFC8866] is also | |||
provided for applications that use SDP. | provided for applications that use SDP. | |||
Parameters starting with the string "sprop" for stream properties can | Parameters starting with the string "sprop" for stream properties can | |||
be used by a sender to provide a receiver with the properties of the | be used by a sender to provide a receiver with the properties of the | |||
stream that is or will be sent. The media sender (and not the | stream that is or will be sent. The media sender (and not the | |||
receiver) selects whether, and with what values, "sprop" parameters | receiver) selects whether, and with what values, "sprop" parameters | |||
are being sent. This uncommon characteristic of the "sprop" | are being sent. This uncommon characteristic of the "sprop" | |||
parameters may not be intuitive in the context of some signaling | parameters may not be intuitive in the context of some signaling | |||
protocol concepts, especially with offer/answer. Please see | protocol concepts, especially with offer/answer. Please see | |||
Section 7.3.2 for guidance specific to the use of sprop parameters in | Section 7.3.2 for guidance specific to the use of sprop parameters in | |||
the Offer/Answer case. | the offer/answer case. | |||
7.1. Media Type Registration | 7.1. Media Type Registration | |||
The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
Type name: video | Type name: video | |||
Subtype name: H266 | Subtype name: H266 | |||
Required parameters: N/A | Required parameters: N/A | |||
Optional parameters: | Optional parameters: profile-id, tier-flag, sub-profile-id, interop- | |||
constraints, level-id, sprop-sublayer-id, sprop-ols-id, recv- | ||||
profile-id, tier-flag, sub-profile-id, interop-constraints, level- | sublayer-id, recv-ols-id, max-recv-level-id, sprop-dci, sprop-vps, | |||
id, sprop-sublayer-id, sprop-ols-id, recv-sublayer-id, recv-ols- | sprop-sps, sprop-pps, sprop-sei, max-lsr, max-fps, sprop-max-don- | |||
id, max-recv-level-id, sprop-dci, sprop-vps, sprop-sps, sprop-pps, | diff, sprop-depack-buf-bytes, depack-buf-cap (refer to Section 7.2 | |||
sprop-sei, max-lsr, max-fps, sprop-max-don-diff, sprop-depack-buf- | for definitions). | |||
bytes, depack-buf-cap (Refer to Section 7.2 for definitions). | ||||
Encoding considerations: | ||||
This type is only defined for transfer via RTP (RFC 3550). | ||||
Security considerations: | ||||
See Section 9 of RFC XXXX. | Encoding considerations: This type is only defined for transfer via | |||
RTP [RFC3550]. | ||||
Interoperability considerations: N/A | Security considerations: See Section 9 of RFC 9328. | |||
Published specification: | ||||
Please refer to RFC XXXX and VVC coding specification [VVC]. | Interoperability considerations: N/A | |||
Applications that use this media type: | Published specification: Please refer to RFC 9328 and VVC coding | |||
specification [VVC]. | ||||
Any application that relies on VVC-based video services over RTP | Applications that use this media type: Any application that relies | |||
on VVC-based video services over RTP | ||||
Fragment identifier considerations: N/A | Fragment identifier considerations: N/A | |||
Additional information: N/A | Additional information: N/A | |||
Person & email address to contact for further information: | Person & email address to contact for further information: | |||
Stephan Wenger (stewe@stewe.org) | Stephan Wenger (stewe@stewe.org) | |||
Intended usage: COMMON | Intended usage: COMMON | |||
Restrictions on usage: N/A | ||||
Author: See Authors' Addresses section of RFC XXXX. | Restrictions on usage: N/A | |||
Change controller: | Author: See Authors' Addresses section of RFC 9328. | |||
IETF <avtcore@ietf.org> | Change controller: IETF <avtcore@ietf.org> | |||
7.2. Optional Parameters Definition | 7.2. Optional Parameters Definition | |||
profile-id, tier-flag, sub-profile-id, interop-constraints, and | profile-id, tier-flag, sub-profile-id, interop-constraints, and | |||
level-id: | level-id: | |||
These parameters indicate the profile, the tier, the default | ||||
These parameters indicate the profile, tier, default level, sub- | level, the sub-profile, and some constraints of the bitstream | |||
profile, and some constraints of the bitstream carried by the RTP | carried by the RTP stream, or a specific set of the profile, the | |||
stream, or a specific set of the profile, tier, default level, | tier, the default level, the sub-profile, and some constraints the | |||
sub-profile and some constraints the receiver supports. | receiver supports. | |||
The subset of coding tools that may have been used to generate the | The subset of coding tools that may have been used to generate the | |||
bitstream or that the receiver supports, as well as some | bitstream or that the receiver supports, as well as some | |||
additional constraints are indicated collectively by profile-id, | additional constraints, are indicated collectively by profile-id, | |||
sub-profile-id, and interop-constraints. | sub-profile-id, and interop-constraints. | |||
Informative note: There are 128 values of profile-id. The | | Informative note: There are 128 values of profile-id. The | |||
subset of coding tools identified by the profile-id can be | | subset of coding tools identified by profile-id can be | |||
further constrained with up to 255 instances of sub-profile-id. | | further constrained with up to 255 instances of sub-profile- | |||
In addition, 68 bits included in interop-constraints, which can | | id. In addition, 68 bits included in interop-constraints, | |||
be extended up to 324 bits provide means to further restrict | | which can be extended up to 324 bits, provide means to | |||
tools from existing profiles. To be able to support this fine- | | further restrict tools from existing profiles. To be able | |||
granular signaling of coding tool subsets with profile-id, sub- | | to support this fine-granular signaling of coding-tool | |||
profile-id and interop-constraints, it would be safe to require | | subsets with profile-id, sub-profile-id, and interop- | |||
symmetric use of these parameters in SDP offer/answer unless | | constraints, it would be safe to require symmetric use of | |||
recv-ols-id is included in the SDP answer for choosing one of | | these parameters in SDP offer/answer unless recv-ols-id is | |||
the layers offered. | | included in the SDP answer for choosing one of the layers | |||
| offered. | ||||
The tier is indicated by tier-flag. The default level is | The tier is indicated by tier-flag. The default level is | |||
indicated by level-id. The tier and the default level specify the | indicated by level-id. The tier and the default level specify the | |||
limits on values of syntax elements or arithmetic combinations of | limits on values of syntax elements or arithmetic combinations of | |||
values of syntax elements that are followed when generating the | values of syntax elements that are followed when generating the | |||
bitstream or that the receiver supports. | bitstream or that the receiver supports. | |||
In SDP offer/answer, when the SDP answer does not include the | In SDP offer/answer, when the SDP answer does not include the | |||
recv-ols-id parameter that is less than the sprop-ols-id parameter | recv-ols-id parameter that is less than the sprop-ols-id parameter | |||
in the SDP offer, the following applies: | in the SDP offer, the following applies: | |||
- The tier-flag, profile-id, sub-profile-id, and interop- | * The tier-flag, profile-id, sub-profile-id, and interop- | |||
constraints parameters MUST be used symmetrically, i.e., the | constraints parameters MUST be used symmetrically, i.e., the | |||
value of each of these parameters in the offer MUST be the same | value of each of these parameters in the offer MUST be the same | |||
as that in the answer, either explicitly signaled or implicitly | as that in the answer, either explicitly signaled or implicitly | |||
inferred. | inferred. | |||
- The level-id parameter is changeable as long as the highest | * The level-id parameter is changeable as long as the highest | |||
level indicated by the answer is either equal to or lower than | level indicated by the answer is either equal to or lower than | |||
that in the offer. Note that a highest level higher than | that in the offer. Note that the highest level higher than | |||
level-id in the offer for receiving can be included as max- | level-id in the offer for receiving can be included as max- | |||
recv-level-id. | recv-level-id. | |||
In SDP offer/answer, when the SDP answer does include the recv- | In SDP offer/answer, when the SDP answer does include the recv- | |||
ols-id parameter that is less than the sprop-ols-id parameter | ols-id parameter that is less than the sprop-ols-id parameter in | |||
in the SDP offer, the set of tier-flag, profile-id, sub- | the SDP offer, the set of tier-flag, profile-id, sub-profile-id, | |||
profile-id, interop-constraints, and level-id parameters | interop-constraints, and level-id parameters included in the | |||
included in the answer MUST be consistent with that for the | answer MUST be consistent with that for the chosen output layer | |||
chosen output layer set as indicated in the SDP offer, with the | set as indicated in the SDP offer, with the exception that the | |||
exception that the level-id parameter in the SDP answer is | level-id parameter in the SDP answer is changeable as long as the | |||
changeable as long as the highest level indicated by the answer | highest level indicated by the answer is either lower than or | |||
is either lower than or equal to that in the offer. | equal to that in the offer. | |||
More specifications of these parameters, including how they relate | More specifications of these parameters, including how they relate | |||
to syntax elements specified in [VVC] are provided below. | to syntax elements specified in [VVC], are provided below. | |||
profile-id: | profile-id: | |||
When profile-id is not present, a value of 1 (i.e., the Main 10 | When profile-id is not present, a value of 1 (i.e., the Main 10 | |||
profile) MUST be inferred. | profile) MUST be inferred. | |||
When used to indicate properties of a bitstream, profile-id is | When used to indicate properties of a bitstream, profile-id is | |||
derived from the general_profile_idc syntax element that applies | derived from the general_profile_idc syntax element that applies | |||
to the bitstream in an instance of the profile_tier_level( ) | to the bitstream in an instance of the profile_tier_level( ) | |||
syntax structure. | syntax structure. | |||
VVC bitstreams transported over RTP using the technologies of this | VVC bitstreams transported over RTP using the technologies of this | |||
memo SHOULD contain only a single profile_tier_level( ) structure | memo SHOULD contain only a single profile_tier_level( ) structure | |||
in the DCI, unless the sender can assure that a receiver can | in the DCI, unless the sender can assure that a receiver can | |||
correctly decode the VVC bitstream regardless of which | correctly decode the VVC bitstream, regardless of which | |||
profile_tier_level( ) structure contained in the DCI was used for | profile_tier_level( ) structure contained in the DCI was used for | |||
deriving profile-id and other parameters for the SDP O/A exchange. | deriving profile-id and other parameters for the SDP offer/answer | |||
exchange. | ||||
As specified in [VVC], a profile_tier_level( ) syntax structure | As specified in [VVC], a profile_tier_level( ) syntax structure | |||
may be contained in an SPS NAL unit, and one or more | may be contained in an SPS NAL unit, and one or more | |||
profile_tier_level( ) syntax structures may be contained in a VPS | profile_tier_level( ) syntax structures may be contained in a VPS | |||
NAL unit and in a DCI NAL unit. One of the following three cases | NAL unit and in a DCI NAL unit. One of the following three cases | |||
applies to the container NAL unit of the profile_tier_level( ) | applies to the container NAL unit of the profile_tier_level( ) | |||
syntax structure containing syntax elements used to derive the | syntax structure containing syntax elements used to derive the | |||
values of profile-id, tier-flag, level-id, sub-profile-id, or | values of profile-id, tier-flag, level-id, sub-profile-id, or | |||
interop-constraints: 1) The container NAL unit is an SPS, the | interop-constraints: | |||
bitstream is a single-layer bitstream, and the profile_tier_level( | ||||
) syntax structures in all SPSs referenced by the CVSs in the | 1. The container NAL unit is an SPS, the bitstream is a single- | |||
bitstream has the same values respectively for those | layer bitstream, and the profile_tier_level( ) syntax | |||
profile_tier_level( ) syntax elements; 2) The container NAL unit | structures in all SPSs referenced by the CVSs in the bitstream | |||
is a VPS, the profile_tier_level( ) syntax structure is the one in | have the same values respectively for those | |||
the VPS that applies to the OLS corresponding to the bitstream, | profile_tier_level( ) syntax elements. | |||
and the profile_tier_level( ) syntax structures applicable to the | ||||
OLS corresponding to the bitstream in all VPSs referenced by the | 2. The container NAL unit is a VPS, the profile_tier_level( ) | |||
CVSs in the bitstream have the same values respectively for those | syntax structure is the one in the VPS that applies to the OLS | |||
profile_tier_level( ) syntax elements; 3) The container NAL unit | corresponding to the bitstream, and the profile_tier_level( ) | |||
is a DCI NAL unit and the profile_tier_level( ) syntax structures | syntax structures applicable to the OLS corresponding to the | |||
in all DCI NAL units in the bitstream has the same values | bitstream in all VPSs referenced by the CVSs in the bitstream | |||
respectively for those profile_tier_level( ) syntax elements. | have the same values respectively for those | |||
profile_tier_level( ) syntax elements. | ||||
3. The container NAL unit is a DCI NAL unit, and the | ||||
profile_tier_level( ) syntax structures in all DCI NAL units | ||||
in the bitstream have the same values respectively for those | ||||
profile_tier_level( ) syntax elements. | ||||
[VVC] allows for multiple profile_tier_level( ) structures in a | [VVC] allows for multiple profile_tier_level( ) structures in a | |||
DCI NAL unit, which may contain different values for the syntax | DCI NAL unit, which may contain different values for the syntax | |||
elements used to derive the values of profile-id, tier-flag, | elements used to derive the values of profile-id, tier-flag, | |||
level-id, sub-profile-id, or interop-constraints in the different | level-id, sub-profile-id, or interop-constraints in the different | |||
entries. However, herein defined is only a single profile-id, | entries. However, herein defined is only a single profile-id, | |||
tier-flag, level-id, sub-profile-id, or interop-constraints. When | tier-flag, level-id, sub-profile-id, or interop-constraints. When | |||
signaling these parameters and a DCI NAL unit is present with | signaling these parameters and a DCI NAL unit is present with | |||
multiple profile_tier_level( ) structures, these values SHOULD be | multiple profile_tier_level( ) structures, these values SHOULD be | |||
the same as the first profile_tier_level structure in the DCI, | the same as the first profile_tier_level structure in the DCI, | |||
skipping to change at page 40, line 4 ¶ | skipping to change at line 1789 ¶ | |||
level-id, sub-profile-id, or interop-constraints in the different | level-id, sub-profile-id, or interop-constraints in the different | |||
entries. However, herein defined is only a single profile-id, | entries. However, herein defined is only a single profile-id, | |||
tier-flag, level-id, sub-profile-id, or interop-constraints. When | tier-flag, level-id, sub-profile-id, or interop-constraints. When | |||
signaling these parameters and a DCI NAL unit is present with | signaling these parameters and a DCI NAL unit is present with | |||
multiple profile_tier_level( ) structures, these values SHOULD be | multiple profile_tier_level( ) structures, these values SHOULD be | |||
the same as the first profile_tier_level structure in the DCI, | the same as the first profile_tier_level structure in the DCI, | |||
unless the sender has ensured that the receiver can decode the | unless the sender has ensured that the receiver can decode the | |||
bitstream when a different value is chosen. | bitstream when a different value is chosen. | |||
tier-flag, level-id: | tier-flag, level-id: | |||
The value of tier-flag MUST be in the range of 0 to 1, inclusive. | The value of tier-flag MUST be in the range of 0 to 1, inclusive. | |||
The value of level-id MUST be in the range of 0 to 255, inclusive. | The value of level-id MUST be in the range of 0 to 255, inclusive. | |||
If the tier-flag and level-id parameters are used to indicate | If the tier-flag and level-id parameters are used to indicate | |||
properties of a bitstream, they indicate the tier and the highest | properties of a bitstream, they indicate the tier and the highest | |||
level the bitstream complies with. | level the bitstream complies with. | |||
If the tier-flag and level-id parameters are used for capability | If the tier-flag and level-id parameters are used for capability | |||
exchange, the following applies. If max-recv-level-id is not | exchange, the following applies. If max-recv-level-id is not | |||
present, the default level defined by level-id indicates the | present, the default level defined by level-id indicates the | |||
highest level the codec wishes to support. Otherwise, max-recv- | highest level the codec wishes to support. Otherwise, max-recv- | |||
level-id indicates the highest level the codec supports for | level-id indicates the highest level the codec supports for | |||
receiving. For either receiving or sending, all levels that are | receiving. For either receiving or sending, all levels that are | |||
lower than the highest level supported MUST also be supported. | lower than the highest level supported MUST also be supported. | |||
If no tier-flag is present, a value of 0 MUST be inferred; if no | If no tier-flag is present, a value of 0 MUST be inferred; if no | |||
level-id is present, a value of 51 (i.e., level 3.1) MUST be | level-id is present, a value of 51 (i.e., level 3.1) MUST be | |||
inferred. | inferred. | |||
Informative note: The level values currently defined in the VVC | | Informative note: The level values currently defined in the | |||
specification are in the form of "majorNum.minorNum", and the | | VVC specification are in the form of "majorNum.minorNum", | |||
value of the level-id for each of the levels is equal to | | and the value of the level-id for each of the levels is | |||
majorNum * 16 + minorNum * 3. It is expected that if any | | equal to majorNum * 16 + minorNum * 3. It is expected that, | |||
levels are defined in the future, the same convention will be | | if any levels are defined in the future, the same convention | |||
used, but this cannot be guaranteed. | | will be used, but this cannot be guaranteed. | |||
When used to indicate properties of a bitstream, the tier-flag and | When used to indicate properties of a bitstream, the tier-flag and | |||
level-id parameters are derived respectively from the syntax | level-id parameters are derived respectively from the syntax | |||
element general_tier_flag, and the syntax element | element general_tier_flag, and the syntax element | |||
general_level_idc or sub_layer_level_idc[j], that apply to the | general_level_idc or sub_layer_level_idc[j], that apply to the | |||
bitstream, in an instance of the profile_tier_level( ) syntax | bitstream in an instance of the profile_tier_level( ) syntax | |||
structure. | structure. | |||
If the tier-flag and level-id are derived from the | If the tier-flag and level-id are derived from the | |||
profile_tier_level( ) syntax structure in a DCI NAL unit, the | profile_tier_level( ) syntax structure in a DCI NAL unit, the | |||
following applies: | following applies: | |||
- tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
- level-id = general_level_idc | * level-id = general_level_idc | |||
Otherwise, if the tier-flag and level-id are derived from the | Otherwise, if the tier-flag and level-id are derived from the | |||
profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | |||
and the bitstream contains the highest sublayer representation in | and the bitstream contains the highest sublayer representation in | |||
the OLS corresponding to the bitstream, the following applies: | the OLS corresponding to the bitstream, the following applies: | |||
- tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
- level-id = general_level_idc | ||||
* level-id = general_level_idc | ||||
Otherwise, if the tier-flag and level-id are derived from the | Otherwise, if the tier-flag and level-id are derived from the | |||
profile_tier_level( ) syntax structure in an SPS or VPS NAL | profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | |||
unit, and the bitstream does not contain the highest sublayer | and the bitstream does not contain the highest sublayer | |||
representation in the OLS corresponding to the bitstream, the | representation in the OLS corresponding to the bitstream, the | |||
following applies, with j being the value of the sprop- | following applies, with j being the value of the sprop-sublayer-id | |||
sublayer-id parameter: | parameter: | |||
- tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
- level-id = sub_layer_level_idc[j] | * level-id = sub_layer_level_idc[j] | |||
sub-profile-id: | sub-profile-id: | |||
The value of the parameter is a comma-separated (',') list of data | The value of the parameter is a comma-separated (',') list of data | |||
using base64 encoding (Section 4 of [RFC4648]) representation | using base64 encoding (Section 4 of [RFC4648]) representation | |||
without "==" padding. | without "==" padding. | |||
When used to indicate properties of a bitstream, sub-profile-id is | When used to indicate properties of a bitstream, sub-profile-id is | |||
derived from each of the ptl_num_sub_profiles | derived from each of the ptl_num_sub_profiles | |||
general_sub_profile_idc[i] syntax elements that apply to the | general_sub_profile_idc[i] syntax elements that apply to the | |||
bitstream in a profile_tier_level( ) syntax structure. | bitstream in a profile_tier_level( ) syntax structure. | |||
interop-constraints: | interop-constraints: | |||
skipping to change at page 41, line 29 ¶ | skipping to change at line 1861 ¶ | |||
The value of the parameter is a comma-separated (',') list of data | The value of the parameter is a comma-separated (',') list of data | |||
using base64 encoding (Section 4 of [RFC4648]) representation | using base64 encoding (Section 4 of [RFC4648]) representation | |||
without "==" padding. | without "==" padding. | |||
When used to indicate properties of a bitstream, sub-profile-id is | When used to indicate properties of a bitstream, sub-profile-id is | |||
derived from each of the ptl_num_sub_profiles | derived from each of the ptl_num_sub_profiles | |||
general_sub_profile_idc[i] syntax elements that apply to the | general_sub_profile_idc[i] syntax elements that apply to the | |||
bitstream in a profile_tier_level( ) syntax structure. | bitstream in a profile_tier_level( ) syntax structure. | |||
interop-constraints: | interop-constraints: | |||
A base64 encoding (Section 4 of [RFC4648]) representation of the | A base64 encoding (Section 4 of [RFC4648]) representation of the | |||
data that includes the syntax elements | data that includes the ptl_frame_only_constraint_flag syntax | |||
ptl_frame_only_constraint_flag and ptl_multilayer_enabled_flag and | element, the ptl_multilayer_enabled_flag syntax element, and the | |||
the general_constraints_info( ) syntax structure that apply to the | general_constraints_info( ) syntax structure that apply to the | |||
bitstream in an instance of the profile_tier_level( ) syntax | bitstream in an instance of the profile_tier_level( ) syntax | |||
structure. | structure. | |||
If the interop-constraints parameter is not present, the following | If the interop-constraints parameter is not present, the following | |||
MUST be inferred: | MUST be inferred: | |||
- ptl_frame_only_constraint_flag = 1 | * ptl_frame_only_constraint_flag = 1 | |||
- ptl_multilayer_enabled_flag = 0 | * ptl_multilayer_enabled_flag = 0 | |||
- gci_present_flag in the general_constraints_info( ) syntax | * gci_present_flag in the general_constraints_info( ) syntax | |||
structure = 0 | structure = 0 | |||
Using interop-constraints for capability exchange results in a | Using interop-constraints for capability exchange results in a | |||
requirement on any bitstream to be compliant with the interop- | requirement on any bitstream to be compliant with the interop- | |||
constraints. | constraints. | |||
sprop-sublayer-id: | sprop-sublayer-id: | |||
This parameter MAY be used to indicate the highest allowed value | This parameter MAY be used to indicate the highest allowed value | |||
of TID in the bitstream. When not present, the value of sprop- | of TID in the bitstream. When not present, the value of sprop- | |||
sublayer-id is inferred to be equal to 6. | sublayer-id is inferred to be equal to 6. | |||
The value of sprop-sublayer-id MUST be in the range of 0 to 6, | The value of sprop-sublayer-id MUST be in the range of 0 to 6, | |||
inclusive. | inclusive. | |||
sprop-ols-id: | sprop-ols-id: | |||
This parameter MAY be used to indicate the OLS that the bitstream | This parameter MAY be used to indicate the OLS that the bitstream | |||
applies to. When not present, the value of sprop-ols-id is | applies to. When not present, the value of sprop-ols-id is | |||
inferred to be equal to TargetOlsIdx as specified in 8.1.1 in | inferred to be equal to TargetOlsIdx, as specified in | |||
[VVC]. If this optional parameter is present, sprop-vps MUST also | Section 8.1.1 of [VVC]. If this optional parameter is present, | |||
be present or its content MUST be known a priori at the receiver. | sprop-vps MUST also be present or its content MUST be known a | |||
priori at the receiver. | ||||
The value of sprop-ols-id MUST be in the range of 0 to 256, | The value of sprop-ols-id MUST be in the range of 0 to 256, | |||
inclusive. | inclusive. | |||
Informative note: VVC allows having up to 257 output layer sets | | Informative note: VVC allows having up to 257 output layer | |||
indicated in the VPS as the number of output layer sets minus 2 | | sets indicated in the VPS, as the number of output layer | |||
is indicated with a field of 8 bits. | | sets minus 2 is indicated with a field of 8 bits. | |||
recv-sublayer-id: | recv-sublayer-id: | |||
This parameter MAY be used to signal a receiver's choice of the | This parameter MAY be used to signal a receiver's choice of the | |||
offered or declared sublayer representations in the sprop-vps and | offered or declared sublayer representations in sprop-vps and | |||
sprop-sps. The value of recv-sublayer-id indicates the TID of the | sprop-sps. The value of recv-sublayer-id indicates the TID of the | |||
highest sublayer that a receiver supports. When not present, the | highest sublayer that a receiver supports. When not present, the | |||
value of recv-sublayer-id is inferred to be equal to the value of | value of recv-sublayer-id is inferred to be equal to the value of | |||
the sprop-sublayer-id parameter in the SDP offer. | the sprop-sublayer-id parameter in the SDP offer. | |||
The value of recv-sublayer-id MUST be in the range of 0 to 6, | The value of recv-sublayer-id MUST be in the range of 0 to 6, | |||
inclusive. | inclusive. | |||
recv-ols-id: | recv-ols-id: | |||
This parameter MAY be used to signal a receiver's choice of the | This parameter MAY be used to signal a receiver's choice of the | |||
offered or declared output layer sets in the sprop-vps. The value | offered or declared output layer sets in sprop-vps. The value of | |||
of recv-ols-id indicates the OLS index of the bitstream that a | recv-ols-id indicates the OLS index of the bitstream that a | |||
receiver supports. When not present, the value of recv-ols-id is | receiver supports. When not present, the value of recv-ols-id is | |||
inferred to be equal to value of the sprop-ols-id parameter | inferred to be equal to the value of the sprop-ols-id parameter | |||
inferred from or indicated in the SDP offer. When present, the | inferred from or indicated in the SDP offer. When present, the | |||
value of recv-ols-id must be included only when sprop-ols-id was | value of recv-ols-id must be included only when sprop-ols-id was | |||
received and must refer to an output layer set in the VPS that | received and must refer to an output layer set in the VPS that | |||
includes no layers other than all or a subset of the layers of the | includes no layers other than all or a subset of the layers of the | |||
OLS referred to by sprop-ols-id. If this optional parameter is | OLS referred to by sprop-ols-id. If this optional parameter is | |||
present, sprop-vps must have been received or its content must be | present, sprop-vps must have been received or its content must be | |||
known a priori at the receiver. | known a priori at the receiver. | |||
The value of recv-ols-id MUST be in the range of 0 to 256, | The value of recv-ols-id MUST be in the range of 0 to 256, | |||
inclusive. | inclusive. | |||
skipping to change at page 43, line 23 ¶ | skipping to change at line 1947 ¶ | |||
The value of max-recv-level-id MUST be in the range of 0 to 255, | The value of max-recv-level-id MUST be in the range of 0 to 255, | |||
inclusive. | inclusive. | |||
When max-recv-level-id is not present, the value is inferred to be | When max-recv-level-id is not present, the value is inferred to be | |||
equal to level-id. | equal to level-id. | |||
max-recv-level-id MUST NOT be present when the highest level the | max-recv-level-id MUST NOT be present when the highest level the | |||
receiver supports is not higher than the default level. | receiver supports is not higher than the default level. | |||
sprop-dci: | sprop-dci: | |||
This parameter MAY be used to convey a decoding capability | This parameter MAY be used to convey a decoding capability | |||
information NAL unit of the bitstream for out-of-band | information NAL unit of the bitstream for out-of-band | |||
transmission. The parameter MAY also be used for capability | transmission. The parameter MAY also be used for capability | |||
exchange. The value of the parameter a base64 encoding (Section 4 | exchange. The value of the parameter is a base64 encoding | |||
of [RFC4648]) representations of the decoding capability | (Section 4 of [RFC4648]) representation of the decoding capability | |||
information NAL unit as specified in Section 7.3.2.1 of [VVC]. | information NAL unit, as specified in Section 7.3.2.1 of [VVC]. | |||
sprop-vps: | sprop-vps: | |||
This parameter MAY be used to convey any video parameter set to | ||||
This parameter MAY be used to convey any video parameter set NAL | the NAL unit of the bitstream for out-of-band transmission of | |||
unit of the bitstream for out-of-band transmission of video | video parameter sets. The parameter MAY also be used for | |||
parameter sets. The parameter MAY also be used for capability | capability exchange and to indicate substream characteristics | |||
exchange and to indicate sub-stream characteristics (i.e., | (i.e., properties of output layer sets and sublayer | |||
properties of output layer sets and sublayer representations as | representations, as defined in [VVC]). The value of the parameter | |||
defined in [VVC]). The value of the parameter is a comma- | is a comma-separated (',') list of base64 encoding (Section 4 of | |||
separated (',') list of base64 encoding (Section 4 of [RFC4648]) | [RFC4648]) representations of the video parameter set NAL units, | |||
representations of the video parameter set NAL units as specified | as specified in Section 7.3.2.3 of [VVC]. | |||
in Section 7.3.2.3 of [VVC]. | ||||
The sprop-vps parameter MAY contain one or more than one video | The sprop-vps parameter MAY contain one or more than one video | |||
parameter set NAL units. However, all other video parameter sets | parameter set NAL units. However, all other video parameter sets | |||
contained in the sprop-vps parameter MUST be consistent with the | contained in the sprop-vps parameter MUST be consistent with the | |||
first video parameter set in the sprop-vps parameter. A video | first video parameter set in the sprop-vps parameter. A video | |||
parameter set vpsB is said to be consistent with another video | parameter set vpsB is said to be consistent with another video | |||
parameter set vpsA if the number of OLSs in vpsA and vpsB is the | parameter set vpsA if the number of OLSs in vpsA and vpsB are the | |||
same and any decoder that conforms to the profile, tier, level, | same and any decoder that conforms to the profile, tier, level, | |||
and constraints indicated by the data starting from the syntax | and constraints indicated by the data starting from the syntax | |||
element general_profile_idc to the syntax structure | element general_profile_idc to the syntax structure | |||
general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
) syntax structure corresponding to any OLS with index olsIdx in | ) syntax structure corresponding to any OLS with index olsIdx in | |||
vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | |||
equal to olsIdx that conforms to the profile, tier, level, and | equal to olsIdx that conforms to the profile, tier, level, and | |||
constraints indicated by the data starting from the syntax element | constraints indicated by the data starting from the syntax element | |||
general_profile_idc to the syntax structure | general_profile_idc to the syntax structure | |||
general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
skipping to change at page 44, line 14 ¶ | skipping to change at line 1985 ¶ | |||
) syntax structure corresponding to any OLS with index olsIdx in | ) syntax structure corresponding to any OLS with index olsIdx in | |||
vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | |||
equal to olsIdx that conforms to the profile, tier, level, and | equal to olsIdx that conforms to the profile, tier, level, and | |||
constraints indicated by the data starting from the syntax element | constraints indicated by the data starting from the syntax element | |||
general_profile_idc to the syntax structure | general_profile_idc to the syntax structure | |||
general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
) syntax structure corresponding to the OLS with index | ) syntax structure corresponding to the OLS with index | |||
TargetOlsIdx in vpsB. | TargetOlsIdx in vpsB. | |||
sprop-sps: | sprop-sps: | |||
This parameter MAY be used to convey sequence parameter set NAL | This parameter MAY be used to convey sequence parameter set NAL | |||
units of the bitstream for out-of-band transmission of sequence | units of the bitstream for out-of-band transmission of sequence | |||
parameter sets. The value of the parameter is a comma-separated | parameter sets. The value of the parameter is a comma-separated | |||
(',') list of base64 encoding (Section 4 of [RFC4648]) | (',') list of base64 encoding (Section 4 of [RFC4648]) | |||
representations of the sequence parameter set NAL units as | representations of the sequence parameter set NAL units, as | |||
specified in Section 7.3.2.4 of [VVC]. | specified in Section 7.3.2.4 of [VVC]. | |||
A sequence parameter set spsB is said to be consistent with | A sequence parameter set spsB is said to be consistent with | |||
another sequence parameter set spsA if any decoder that conforms | another sequence parameter set spsA if any decoder that conforms | |||
to the profile, tier, level, and constraints indicated by the data | to the profile, tier, level, and constraints indicated by the data | |||
starting from the syntax element general_profile_idc to the syntax | starting from the syntax element general_profile_idc to the syntax | |||
structure general_constraints_info(), inclusive, in the | structure general_constraints_info(), inclusive, in the | |||
profile_tier_level( ) syntax structure in spsA can decode any | profile_tier_level( ) syntax structure in spsA can decode any | |||
CLVS(s) referencing spsB that conforms to the profile, tier, | CLVS(s) referencing spsB that conforms to the profile, tier, | |||
level, and constraints indicated by the data starting from the | level, and constraints indicated by the data starting from the | |||
skipping to change at page 44, line 35 ¶ | skipping to change at line 2005 ¶ | |||
starting from the syntax element general_profile_idc to the syntax | starting from the syntax element general_profile_idc to the syntax | |||
structure general_constraints_info(), inclusive, in the | structure general_constraints_info(), inclusive, in the | |||
profile_tier_level( ) syntax structure in spsA can decode any | profile_tier_level( ) syntax structure in spsA can decode any | |||
CLVS(s) referencing spsB that conforms to the profile, tier, | CLVS(s) referencing spsB that conforms to the profile, tier, | |||
level, and constraints indicated by the data starting from the | level, and constraints indicated by the data starting from the | |||
syntax element general_profile_idc to the syntax structure | syntax element general_profile_idc to the syntax structure | |||
general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
) syntax structure in spsB. | ) syntax structure in spsB. | |||
sprop-pps: | sprop-pps: | |||
This parameter MAY be used to convey picture parameter set NAL | This parameter MAY be used to convey picture parameter set NAL | |||
units of the bitstream for out-of-band transmission of picture | units of the bitstream for out-of-band transmission of picture | |||
parameter sets. The value of the parameter is a comma-separated | parameter sets. The value of the parameter is a comma-separated | |||
(',') list of base64 encoding (Section 4 of [RFC4648]) | (',') list of base64 encoding (Section 4 of [RFC4648]) | |||
representations of the picture parameter set NAL units as | representations of the picture parameter set NAL units, as | |||
specified in Section 7.3.2.5 of [VVC]. | specified in Section 7.3.2.5 of [VVC]. | |||
sprop-sei: | sprop-sei: | |||
This parameter MAY be used to convey one or more SEI messages that | This parameter MAY be used to convey one or more SEI messages that | |||
describe bitstream characteristics. When present, a decoder can | describe bitstream characteristics. When present, a decoder can | |||
rely on the bitstream characteristics that are described in the | rely on the bitstream characteristics that are described in the | |||
SEI messages for the entire duration of the session, independently | SEI messages for the entire duration of the session, independently | |||
from the persistence scopes of the SEI messages as specified in | from the persistence scopes of the SEI messages, as specified in | |||
[VSEI]. | [VSEI]. | |||
The value of the parameter is a comma-separated (',') list of | The value of the parameter is a comma-separated (',') list of | |||
base64 encoding (Section 4 of [RFC4648]) representations of SEI | base64 encoding (Section 4 of [RFC4648]) representations of SEI | |||
NAL units as specified in [VSEI]. | NAL units, as specified in [VSEI]. | |||
Informative note: Intentionally, no list of applicable or | ||||
inapplicable SEI messages is specified here. Conveying certain | ||||
SEI messages in sprop-sei may be sensible in some application | ||||
scenarios and meaningless in others. However, a few examples | ||||
are described below: | ||||
1) In an environment where the bitstream was created from film- | ||||
based source material, and no splicing is going to occur during | ||||
the lifetime of the session, the film grain characteristics SEI | ||||
message is likely meaningful, and sending it in sprop-sei | ||||
rather than in the bitstream at each entry point may help with | ||||
saving bits and allows one to configure the renderer only once, | ||||
avoiding unwanted artifacts. | ||||
2) Examples for SEI messages that would be meaningless to be | | Informative note: Intentionally, no list of applicable or | |||
conveyed in sprop-sei include the decoded picture hash SEI | | inapplicable SEI messages is specified here. Conveying | |||
message (it is close to impossible that all decoded pictures | | certain SEI messages in sprop-sei may be sensible in some | |||
have the same hashtag) or the filler payload SEI message (as | | application scenarios and meaningless in others. However, a | |||
there is no point in just having more bits in SDP). | | few examples are described below: | |||
| | ||||
| In an environment where the bitstream was created from film- | ||||
| based source material, and no splicing is going to occur | ||||
| during the lifetime of the session, the film grain | ||||
| characteristics SEI message is likely meaningful, and | ||||
| sending it in sprop-sei, rather than in the bitstream at | ||||
| each entry point, may help with saving bits and allows one | ||||
| to configure the renderer only once, avoiding unwanted | ||||
| artifacts. | ||||
| | ||||
| Examples for SEI messages that would be meaningless to be | ||||
| conveyed in sprop-sei include the decoded picture hash SEI | ||||
| message (it is close to impossible that all decoded pictures | ||||
| have the same hashtag) or the filler payload SEI message (as | ||||
| there is no point in just having more bits in SDP). | ||||
max-lsr: | max-lsr: | |||
The max-lsr MAY be used to signal the capabilities of a receiver | The max-lsr MAY be used to signal the capabilities of a receiver | |||
implementation and MUST NOT be used for any other purpose. The | implementation and MUST NOT be used for any other purpose. The | |||
value of max-lsr is an integer indicating the maximum processing | value of max-lsr is an integer indicating the maximum processing | |||
rate in units of luma samples per second. The max-lsr parameter | rate in units of luma samples per second. The max-lsr parameter | |||
signals that the receiver is capable of decoding video at a higher | signals that the receiver is capable of decoding video at a higher | |||
rate than is required by the highest level. | rate than is required by the highest level. | |||
Informative note: When the OPTIONAL media type parameters are | | Informative note: When the OPTIONAL media type parameters | |||
used to signal the properties of a bitstream, and max-lsr is | | are used to signal the properties of a bitstream, and max- | |||
not present, the values of tier-flag, profile-id, sub-profile- | | lsr is not present, the values of tier-flag, profile-id, | |||
id interop-constraints, and level-id must always be such that | | sub-profile-id, interop-constraints, and level-id must | |||
the bitstream complies fully with the specified profile, tier, | | always be such that the bitstream complies fully with the | |||
and level. | | specified profile, sub-profile, tier, level, and interop- | |||
| constraints. | ||||
When max-lsr is signaled, the receiver MUST be able to decode | When max-lsr is signaled, the receiver MUST be able to decode | |||
bitstreams that conform to the highest level, with the exception | bitstreams that conform to the highest level, with the exception | |||
that the MaxLumaSr value in Table 136 of [VVC] for the highest | that the MaxLumaSr value in Table A.3 of [VVC] for the highest | |||
level is replaced with the value of max-lsr. Senders MAY use this | level is replaced with the value of max-lsr. Senders MAY use this | |||
knowledge to send pictures of a given size at a higher picture | knowledge to send pictures of a given size at a higher picture | |||
rate than is indicated in the highest level. | rate than is indicated in the highest level. | |||
When not present, the value of max-lsr is inferred to be equal to | When not present, the value of max-lsr is inferred to be equal to | |||
the value of MaxLumaSr given in Table 136 of [VVC] for the highest | the value of MaxLumaSr given in Table A.3 of [VVC] for the highest | |||
level. | level. | |||
The value of max-lsr MUST be in the range of MaxLumaSr to 16 * | The value of max-lsr MUST be in the range of MaxLumaSr to 16 * | |||
MaxLumaSr, inclusive, where MaxLumaSr is given in Table 136 of | MaxLumaSr, inclusive, where MaxLumaSr is given in Table A.3 of | |||
[VVC] for the highest level. | [VVC] for the highest level. | |||
max-fps: | max-fps: | |||
The value of max-fps is an integer indicating the maximum picture | The value of max-fps is an integer indicating the maximum picture | |||
rate in units of pictures per 100 seconds that can be effectively | rate in units of pictures per 100 seconds that can be effectively | |||
processed by the receiver. The max-fps parameter MAY be used to | processed by the receiver. The max-fps parameter MAY be used to | |||
signal that the receiver has a constraint in that it is not | signal that the receiver has a constraint in that it is not | |||
capable of processing video effectively at the full picture rate | capable of processing video effectively at the full picture rate | |||
that is implied by the highest level and, when present, max-lsr. | that is implied by the highest level and, when present, max-lsr. | |||
The value of max-fps is not necessarily the picture rate at which | The value of max-fps is not necessarily the picture rate at which | |||
the maximum picture size can be sent, it constitutes a constraint | the maximum picture size can be sent; it constitutes a constraint | |||
on maximum picture rate for all resolutions. | on maximum picture rate for all resolutions. | |||
Informative note: The max-fps parameter is semantically | | Informative note: The max-fps parameter is semantically | |||
different from max-lsr in that max-fps is used to signal a | | different from max-lsr in that max-fps is used to signal a | |||
constraint, lowering the maximum picture rate from what is | | constraint, lowering the maximum picture rate from what is | |||
implied by other parameters. | | implied by other parameters. | |||
The encoder MUST use a picture rate equal to or less than this | The encoder MUST use a picture rate equal to or less than this | |||
value. In cases where the max-fps parameter is absent, the | value. In cases where the max-fps parameter is absent, the | |||
encoder is free to choose any picture rate according to the | encoder is free to choose any picture rate according to the | |||
highest level and any signaled optional parameters. | highest level and any signaled optional parameters. | |||
The value of max-fps MUST be smaller than or equal to the full | The value of max-fps MUST be smaller than or equal to the full | |||
picture rate that is implied by the highest level and, when | picture rate that is implied by the highest level and, when | |||
present, max-lsr. | present, max-lsr. | |||
skipping to change at page 47, line 12 ¶ | skipping to change at line 2120 ¶ | |||
of any two NAL units naluA and naluB, where naluA follows naluB in | of any two NAL units naluA and naluB, where naluA follows naluB in | |||
decoding order and precedes naluB in transmission order. | decoding order and precedes naluB in transmission order. | |||
The value of sprop-max-don-diff MUST be an integer in the range of | The value of sprop-max-don-diff MUST be an integer in the range of | |||
0 to 32767, inclusive. | 0 to 32767, inclusive. | |||
When not present, the value of sprop-max-don-diff is inferred to | When not present, the value of sprop-max-don-diff is inferred to | |||
be equal to 0. | be equal to 0. | |||
sprop-depack-buf-bytes: | sprop-depack-buf-bytes: | |||
This parameter signals the required size of the de-packetization | This parameter signals the required size of the de-packetization | |||
buffer in units of bytes. The value of the parameter MUST be | buffer in units of bytes. The value of the parameter MUST be | |||
greater than or equal to the maximum buffer occupancy (in units of | greater than or equal to the maximum buffer occupancy (in units of | |||
bytes) of the de-packetization buffer as specified in Section 6. | bytes) of the de-packetization buffer, as specified in Section 6. | |||
The value of sprop-depack-buf-bytes MUST be an integer in the | The value of sprop-depack-buf-bytes MUST be an integer in the | |||
range of 0 to 4294967295, inclusive. | range of 0 to 4294967295, inclusive. | |||
When sprop-max-don-diff is present and greater than 0, this | When sprop-max-don-diff is present and greater than 0, this | |||
parameter MUST be present and the value MUST be greater than 0. | parameter MUST be present and the value MUST be greater than 0. | |||
When not present, the value of sprop-depack-buf-bytes is inferred | When not present, the value of sprop-depack-buf-bytes is inferred | |||
to be equal to 0. | to be equal to 0. | |||
Informative note: The value of sprop-depack-buf-bytes indicates | | Informative note: The value of sprop-depack-buf-bytes | |||
the required size of the de-packetization buffer only. When | | indicates the required size of the de-packetization buffer | |||
network jitter can occur, an appropriately sized jitter buffer | | only. When network jitter can occur, an appropriately sized | |||
has to be available as well. | | jitter buffer has to be available as well. | |||
depack-buf-cap: | depack-buf-cap: | |||
This parameter signals the capabilities of a receiver | This parameter signals the capabilities of a receiver | |||
implementation and indicates the amount of de-packetization buffer | implementation and indicates the amount of de-packetization buffer | |||
space in units of bytes that the receiver has available for | space in units of bytes that the receiver has available for | |||
reconstructing the NAL unit decoding order from NAL units carried | reconstructing the NAL unit decoding order from NAL units carried | |||
in the RTP stream. A receiver is able to handle any RTP stream | in the RTP stream. A receiver is able to handle any RTP stream | |||
for which the value of the sprop-depack-buf-bytes parameter is | for which the value of the sprop-depack-buf-bytes parameter is | |||
smaller than or equal to this parameter. | smaller than or equal to this parameter. | |||
When not present, the value of depack-buf-cap is inferred to be | When not present, the value of depack-buf-cap is inferred to be | |||
equal to 4294967295. The value of depack-buf-cap MUST be an | equal to 4294967295. The value of depack-buf-cap MUST be an | |||
integer in the range of 1 to 4294967295, inclusive. | integer in the range of 1 to 4294967295, inclusive. | |||
Informative note: depack-buf-cap indicates the maximum possible | | Informative note: depack-buf-cap indicates the maximum | |||
size of the de-packetization buffer of the receiver only, | | possible size of the de-packetization buffer of the receiver | |||
without allowing for network jitter. | | only, without allowing for network jitter. | |||
7.3. SDP Parameters | 7.3. SDP Parameters | |||
The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
7.3.1. Mapping of Payload Type Parameters to SDP | 7.3.1. Mapping of Payload Type Parameters to SDP | |||
The media type video/H266 string is mapped to fields in the Session | The media type video/H266 string is mapped to fields in the Session | |||
Description Protocol (SDP) [RFC8866] as follows: | Description Protocol (SDP) [RFC8866] as follows: | |||
* The media name in the "m=" line of SDP MUST be video. | * The media name in the "m=" line of SDP MUST be video. | |||
* The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | |||
media subtype). | media subtype). | |||
* The clock rate in the "a=rtpmap" line MUST be 90000. | * The clock rate in the "a=rtpmap" line MUST be 90000. | |||
* The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | |||
interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | |||
recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | |||
fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf- | fps, sprop-max-don-diff, sprop-depack-buf-bytes, and depack-buf- | |||
cap, when present, MUST be included in the "a=fmtp" line of SDP. | cap, when present, MUST be included in the "a=fmtp" line of SDP. | |||
The fmtp line is expressed as a media type string, in the form of | The fmtp line is expressed as a media type string, in the form of | |||
a semicolon-separated list of parameter=value pairs. | a semicolon-separated list of parameter=value pairs. | |||
* The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei, | * The OPTIONAL parameters sprop-vps, sprop-sps, sprop-pps, sprop- | |||
and sprop-dci, when present, MUST be included in the "a=fmtp" line | sei, and sprop-dci, when present, MUST be included in the "a=fmtp" | |||
of SDP or conveyed using the "fmtp" source attribute as specified | line of SDP or conveyed using the "fmtp" source attribute as | |||
in Section 6.3 of [RFC5576]. For a particular media format (i.e., | specified in Section 6.3 of [RFC5576]. For a particular media | |||
RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or | format (i.e., RTP payload type), sprop-vps, sprop-sps, sprop-pps, | |||
sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP | sprop-sei, or sprop-dci MUST NOT be both included in the "a=fmtp" | |||
and conveyed using the "fmtp" source attribute. When included in | line of SDP and conveyed using the "fmtp" source attribute. When | |||
the "a=fmtp" line of SDP, those parameters are expressed as a | included in the "a=fmtp" line of SDP, those parameters are | |||
media type string, in the form of a semicolon-separated list of | expressed as a media type string, in the form of a semicolon- | |||
parameter=value pairs. When conveyed in the "a=fmtp" line of SDP | separated list of parameter=value pairs. When conveyed in the | |||
for a particular payload type, the parameters sprop-vps, sprop- | "a=fmtp" line of SDP for a particular payload type, the parameters | |||
sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each | sprop-vps, sprop-sps, sprop-pps, sprop-sei, and sprop-dci MUST be | |||
SSRC with the payload type. When conveyed using the "fmtp" source | applied to each SSRC with the payload type. When conveyed using | |||
attribute, these parameters are only associated with the given | the "fmtp" source attribute, these parameters are only associated | |||
source and payload type as parts of the "fmtp" source attribute. | with the given source and payload type as parts of the "fmtp" | |||
source attribute. | ||||
Informative note: Conveyance of sprop-vps, sprop-sps, and | | Informative note: Conveyance of sprop-vps, sprop-sps, and | |||
sprop-pps using the "fmtp" source attribute allows for out-of- | | sprop-pps using the "fmtp" source attribute allows for out-of- | |||
band transport of parameter sets in topologies like Topo-Video- | | band transport of parameter sets in topologies like Topo-Video- | |||
switch-MCU as specified in [RFC7667] | | switch-MCU, as specified in [RFC7667]. | |||
An general usage of media representation in SDP is as follows: | A general usage of media representation in SDP is as follows: | |||
m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
a=fmtp:98 profile-id=1; | a=fmtp:98 profile-id=1; | |||
sprop-vps=<video parameter sets data>; | sprop-vps=<video parameter sets data>; | |||
sprop-sps=<sequence parameter set data>; | sprop-sps=<sequence parameter set data>; | |||
sprop-pps=<picture parameter set data>; | sprop-pps=<picture parameter set data>; | |||
A SIP Offer/Answer exchange wherein both parties are expected to both | A SIP offer/answer exchange wherein both parties are expected to both | |||
send and receive could look like the following. Only the media | send and receive could look like the following. Only the media | |||
codec-specific parts of the SDP are shown. Some lines are wrapped | codec-specific parts of the SDP are shown. Some lines are wrapped | |||
due to text constraints. | due to text constraints. | |||
Offerer->Answerer: | Offerer->Answerer: | |||
m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
a=fmtp:98 profile-id=1; level_id=83; | a=fmtp:98 profile-id=1; level_id=83; | |||
The above represents an offer for symmetric video communication using | The above represents an offer for symmetric video communication using | |||
[VVC] and it's payload specification, at the main profile and level | [VVC] and its payload specification at the main profile and level 5.1 | |||
5.1 (and, as the levels are downgradable, all lower levels. | (and as the levels are downgradable, all lower levels). Informally | |||
Informally speaking, this offer tells the receiver of the offer that | speaking, this offer tells the receiver of the offer that the sender | |||
the sender is willing to receive up to 4Kp60 resolution at the | is willing to receive up to 4Kp60 resolution at the maximum bitrates | |||
maximum bitrates specified in [VVC]. At the same time, if this offer | specified in [VVC]. At the same time, if this offer were accepted | |||
were accepted "as is", the offer can expect that the answerer would | "as is", the offer can expect that the answerer would be able to | |||
be able to receive and properly decode H.266 media up to and | receive and properly decode H.266 media up to and including level | |||
including level 5.1. | 5.1. | |||
Answerer->Offerer: | Answerer->Offerer: | |||
m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
a=fmtp:98 profile-id=1; level_id=67 | a=fmtp:98 profile-id=1; level_id=67 | |||
With this answer to the offer above, the system receiving the offer | With this answer to the offer above, the system receiving the offer | |||
advises the offerer that it is incapable of handing H.266 at level | advises the offerer that it is incapable of handing H.266 at level | |||
5.1 but is capable of decoding 1080p60. As H.266 video codecs must | 5.1 but is capable of decoding 1080p60. As H.266 video codecs must | |||
support decoding at all levels below the maximum level they | support decoding at all levels below the maximum level they | |||
implement, the resulting user experience would likely be that both | implement, the resulting user experience would likely be that both | |||
systems send video at 1080p60. However, nothing prevents an encoder | systems send video at 1080p60. However, nothing prevents an encoder | |||
from further downgrading its sending to, for example 720p30 if it | from further downgrading its sending to, for example, 720p30 if it | |||
were short of cycles, bandwidth, or for other reasons. | were short of cycles or bandwidth or for other reasons. | |||
7.3.2. Usage with SDP Offer/Answer Model | 7.3.2. Usage with SDP Offer/Answer Model | |||
This section describes the negotiation of unicast messages using the | This section describes the negotiation of unicast messages using the | |||
offer-answer model as described in [RFC3264] and its updates. The | offer/answer model as described in [RFC3264] and its updates. The | |||
section is split into subsections, covering a) media format | section is split into subsections, covering a) media format | |||
configurations not involving non-temporal scalability; b) scalable | configurations not involving non-temporal scalability; b) scalable | |||
media format configurations; c) the description of the use of those | media format configurations; c) the description of the use of those | |||
parameters not involving the media configuration itself but rather | parameters not involving the media configuration itself but rather | |||
the parameters of the payload format design; and d) multicast. | the parameters of the payload format design; and d) multicast. | |||
7.3.2.1. Non-scalable media format configuration | 7.3.2.1. Non-scalable Media Format Configuration | |||
A non-scalable VVC media configuration is such a configuration where | A non-scalable VVC media configuration is such a configuration where | |||
no non-temporal scalability mechanisms are allowed. In [VVC] version | no non-temporal scalability mechanisms are allowed. In [VVC] version | |||
1, that implies that general_profile_idc indicates one of the | 1, it is implied that general_profile_idc indicates one of the | |||
following profiles: Main10, Main10 Still Picture, Main 10 4:4:4, | following profiles: Main 10, Main 10 Still Picture, Main 10 4:4:4, or | |||
Main10 4:4:4 Still Picture, with general_profile_idc values of 1, 65, | Main 10 4:4:4 Still Picture, with general_profile_idc values of 1, | |||
33, and 97, respectively. Note that non-scalable media | 65, 33, and 97, respectively. Note that non-scalable media | |||
configurations includes temporal scalability, inline with VVC's | configurations include temporal scalability inline with VVC's design | |||
design philosophy and profile structure. | philosophy and profile structure. | |||
The following limitations and rules pertaining to the media | The following limitations and rules pertaining to the media | |||
configuration apply: | configuration apply: | |||
* The parameters identifying a media format configuration for VVC | * The parameters identifying a media format configuration for VVC | |||
are profile-id, tier-flag, sub-profile-id, level-id, and interop- | are profile-id, tier-flag, sub-profile-id, level-id, and interop- | |||
constraints. These media configuration parameters, except level- | constraints. These media configuration parameters, except level- | |||
id, MUST be used symmetrically. | id, MUST be used symmetrically. | |||
The answerer MUST structure its answer in according to one of the | The answerer MUST structure its answer according to one of the | |||
following three options: | following three options: | |||
1) maintain all configuration parameters with the values remaining | 1. maintain all configuration parameters with the values | |||
the same as in the offer for the media format (payload type), with | remaining the same as in the offer for the media format | |||
the exception that the value of level-id is changeable as long as | (payload type), with the exception that the value of level-id | |||
the highest level indicated by the answer is not higher than that | is changeable as long as the highest level indicated by the | |||
indicated by the offer; | answer is not higher than that indicated by the offer; | |||
2) include in the answer the recv-sublayer-id parameter, with a | 2. include in the answer the recv-sublayer-id parameter, with a | |||
value less than the sprop-sublayer-id parameter in the offer, for | value less than the sprop-sublayer-id parameter in the offer, | |||
the media format (payload type), and maintain all configuration | for the media format (payload type), and maintain all | |||
parameters with the values remaining the same as in the offer for | configuration parameters with the values remaining the same as | |||
the media format (payload type), with the exception that the value | in the offer for the media format (payload type), with the | |||
of level-id is changeable as long as the highest level indicated | exception that the value of level-id is changeable as long as | |||
by the answer is not higher than the level indicated by the sprop- | the highest level indicated by the answer is not higher than | |||
sps or sprop-vps in offer for the chosen sublayer representation; | the level indicated by sprop-sps or sprop-vps in offer for the | |||
or | chosen sublayer representation; or | |||
3) remove the media format (payload type) completely (when one or | ||||
more of the parameter values are not supported). | ||||
Informative note: The above requirement for symmetric use | 3. remove the media format (payload type) completely (when one or | |||
does not apply for level-id, and does not apply for the | more of the parameter values are not supported). | |||
other bitstream or RTP stream properties and capability | ||||
parameters as described in Section 7.3.2.3 below. | | Informative note: The above requirement for symmetric use does | |||
| not apply for level-id and does not apply for the other | ||||
| bitstream or RTP stream properties and capability parameters, | ||||
| as described in Section 7.3.2.3 below. | ||||
* To simplify handling and matching of these configurations, the | * To simplify handling and matching of these configurations, the | |||
same RTP payload type number used in the offer SHOULD also be used | same RTP payload type number used in the offer SHOULD also be used | |||
in the answer, as specified in [RFC3264]. | in the answer, as specified in [RFC3264]. | |||
* The same RTP payload type number used in the offer for the media | * The same RTP payload type number used in the offer for the media | |||
subtype H266 MUST be used in the answer when the answer includes | subtype H266 MUST be used in the answer when the answer includes | |||
recv-sublayer-id. When the answer does not include recv-sublayer- | recv-sublayer-id. When the answer does not include recv-sublayer- | |||
id, the answer MUST NOT contain a payload type number used in the | id, the answer MUST NOT contain a payload type number used in the | |||
offer for the media subtype H266 unless the configuration is | offer for the media subtype H266 unless the configuration is | |||
exactly the same as in the offer or the configuration in the | exactly the same as in the offer or the configuration in the | |||
answer only differs from that in the offer with a different value | answer only differs from that in the offer with a different value | |||
of level-id. The answer MAY contain the recv-sublayer-id | of level-id. The answer MAY contain the recv-sublayer-id | |||
parameter if an VVC bitstream contains multiple operation points | parameter if a VVC bitstream contains multiple operation points | |||
(using temporal scalability and sublayers) and sprop-sps or sprop- | (using temporal scalability and sublayers) and sprop-sps or sprop- | |||
vps is included in the offer where information of sublayers are | vps is included in the offer where information of sublayers are | |||
present in the first sequence parameter set or video parameter set | present in the first sequence parameter set or video parameter set | |||
contained in sprop-sps or sprop-vps respectively. If the sprop- | contained in sprop-sps or sprop-vps, respectively. If sprop-sps | |||
sps or sprop-vps is provided in an offer, an answerer MAY select a | or sprop-vps is provided in an offer, an answerer MAY select a | |||
particular operation point indicated in the first sequence | particular operation point indicated in the first sequence | |||
parameter set or video parameter set contained in sprop-sps or | parameter set or video parameter set contained in sprop-sps or | |||
sprop-vps respectively. When the answer includes a recv-sublayer- | sprop-vps, respectively. When the answer includes a recv- | |||
id that is less than a sprop-sublayer-id in the offer, the | sublayer-id that is less than a sprop-sublayer-id in the offer, | |||
following applies: | the following applies: | |||
1) When sprop-sps parameter is present, all sequence parameter | 1. When the sprop-sps parameter is present, all sequence | |||
sets contained in the sprop-sps parameter in the SDP answer and | parameter sets contained in the sprop-sps parameter in the SDP | |||
all sequence parameter sets sent in-band for either the offerer- | answer and all sequence parameter sets sent in-band for either | |||
to-answerer direction or the answerer-to-offerer direction MUST be | the offerer-to-answerer direction or the answerer-to-offerer | |||
consistent with the first sequence parameter set in the sprop-sps | direction MUST be consistent with the first sequence parameter | |||
parameter of the offer (see the semantics of sprop-sps in | set in the sprop-sps parameter of the offer (see the semantics | |||
Section 7.1 of this document on one sequence parameter set being | of sprop-sps in Section 7.1 of this document on one sequence | |||
consistent with another sequence parameter set). | parameter set being consistent with another sequence parameter | |||
set). | ||||
2) When sprop-vps parameter is present, all video parameter sets | 2. When the sprop-vps parameter is present, all video parameter | |||
contained in the sprop-vps parameter in the SDP answer and all | sets contained in the sprop-vps parameter in the SDP answer | |||
video parameter sets sent in-band for either the offerer-to- | and all video parameter sets sent in-band for either the | |||
answerer direction or the answerer-to-offerer direction MUST be | offerer-to-answerer direction or the answerer-to-offerer | |||
consistent with the first video parameter set in the sprop-vps | direction MUST be consistent with the first video parameter | |||
parameter of the offer (see the semantics of sprop-vps in | set in the sprop-vps parameter of the offer (see the semantics | |||
Section 7.1 of this document on one video parameter set being | of sprop-vps in Section 7.1 of this document on one video | |||
consistent with another video parameter set). | parameter set being consistent with another video parameter | |||
set). | ||||
3) The bitstream sent in either direction MUST conform to the | 3. The bitstream sent in either direction MUST conform to the | |||
profile, tier, level, and constraints of the chosen sublayer | profile, tier, level, and constraints of the chosen sublayer | |||
representation as indicated by the profile_tier_level( ) syntax | representation, as indicated by the profile_tier_level( ) | |||
structure in the first sequence parameter set in the sprop-sps | syntax structure in the first sequence parameter set in the | |||
parameter or by the first profile_tier_level( ) syntax structure | sprop-sps parameter or by the first profile_tier_level( ) | |||
in the first video parameter set in the sprop-vps parameter of the | syntax structure in the first video parameter set in the | |||
offer. | sprop-vps parameter of the offer. | |||
Informative note: When an offerer receives an answer that | | Informative note: When an offerer receives an answer that does | |||
does not include recv-sublayer-id, it has to compare payload | | not include recv-sublayer-id, it has to compare payload types | |||
types not declared in the offer based on the media type | | not declared in the offer based on the media type (i.e., video/ | |||
(i.e., video/H266) and the above media configuration | | H266) and the above media configuration parameters with any | |||
parameters with any payload types it has already declared. | | payload types it has already declared. This will enable it to | |||
This will enable it to determine whether the configuration | | determine whether the configuration in question is new or if it | |||
in question is new or if it is equivalent to configuration | | is equivalent to configuration already offered, since a | |||
already offered, since a different payload type number may | | different payload type number may be used in the answer. The | |||
be used in the answer. The ability to perform operation | | ability to perform operation point selection enables a receiver | |||
point selection enables a receiver to utilize the temporal | | to utilize the temporal scalable nature of a VVC bitstream. | |||
scalable nature of an VVC bitstream. | ||||
7.3.2.2. Scalable media format configuration | 7.3.2.2. Scalable Media Format Configuration | |||
A scalable VVC media configuration is such a configuration where non- | A scalable VVC media configuration is such a configuration where non- | |||
temporal scalability mechanisms are allowed. In [VVC] version 1, | temporal scalability mechanisms are allowed. In [VVC] version 1, it | |||
that implies that general_profile_idc indicates one of the following | is implied that general_profile_idc indicates one of the following | |||
profiles: Multilayer Main 10, and Multilayer Main 10 4:4:4, with | profiles: Multilayer Main 10 and Multilayer Main 10 4:4:4, with | |||
general_profile_idc values of 17 and 49, respectively. | general_profile_idc values of 17 and 49, respectively. | |||
The following limitations and rules pertaining to the media | The following limitations and rules pertaining to the media | |||
configuration apply. They are listed in an order that would be | configuration apply. They are listed in an order that would be | |||
logical for an implementation to follow: | logical for an implementation to follow: | |||
* The parameters identifying a media format configuration for | * The parameters identifying a media format configuration for | |||
scalable VVC are profile-id, tier-flag, sub-profile-id, level-id, | scalable VVC are profile-id, tier-flag, sub-profile-id, level-id, | |||
interop-constraints, and sprop-vps. These media configuration | interop-constraints, and sprop-vps. These media configuration | |||
parameters, except level-id, MUST be used symmetrically, except as | parameters, except level-id, MUST be used symmetrically, except as | |||
noted below. | noted below. | |||
* The answerer MAY include a level-id that MUST be lower than or | * The answerer MAY include a level-id that MUST be lower than or | |||
equal to the level-id indicated in the offer (either expressed by | equal to the level-id indicated in the offer (either expressed by | |||
level-id in the offer, or implied by the default level as specific | level-id in the offer or implied by the default level, as | |||
in Section 7.1). | specified in Section 7.1). | |||
* When sprop-ols-id is present in an offer, sprop-vps MUST also be | * When sprop-ols-id is present in an offer, sprop-vps MUST also be | |||
present in the same offer and including at least one valid VPS, so | present in the same offer and include at least one valid VPS so to | |||
to allow the answerer to meaningfully interpret sprop-ols-id and | allow the answerer to meaningfully interpret sprop-ols-id and | |||
select recv-ols-id (see below). | select recv-ols-id (see below). | |||
* The answerer MUST NOT include recv-ols-id unless the offer | * The answerer MUST NOT include recv-ols-id unless the offer | |||
includes sprop-ols-id. When present, recv-ols-id MUST indicate a | includes sprop-ols-id. When present, recv-ols-id MUST indicate a | |||
supported output layer set in the VPS that includes no layers | supported output layer set in the VPS that includes no layers | |||
other than all or a subset of the layers of the OLS referred to by | other than all or a subset of the layers of the OLS referred to by | |||
sprop-ols-id. If unable, the answerer MUST remove the media | sprop-ols-id. If unable, the answerer MUST remove the media | |||
format. | format. | |||
Informative note: if an offerer wants to offer more than one | | Informative note: If an offerer wants to offer more than one | |||
output layer set, it can do so by offering multiple VVC media | | output layer set, it can do so by offering multiple VVC media | |||
with different payload types. | | with different payload types. | |||
* The offerer MAY include sprop-sublayer-id which indicates the | * The offerer MAY include sprop-sublayer-id, which indicates the | |||
highest allowed value of TID in the bitstream. The answerer MAY | highest allowed value of TID in the bitstream. The answerer MAY | |||
include recv-sublayer-id which can be used to reduce the number of | include recv-sublayer-id, which can be used to reduce the number | |||
sublayers from the value of sprop-sublayer-id. | of sublayers from the value of sprop-sublayer-id. | |||
* When the answerer includes recv-ols-id and configuration | * When the answerer includes recv-ols-id and configuration | |||
parameters profile-id, tier-flag, sub-profile-id, level-id, and | parameters profile-id, tier-flag, sub-profile-id, level-id, and | |||
interop-constraints, it MUST use the configuration parameter | interop-constraints, it MUST use the configuration parameter | |||
values as signaled in the sprop-vps for the operating point with | values as signaled in the sprop-vps for the operating point with | |||
the largest number of sublayers for the chosen output layer set, | the largest number of sublayers for the chosen output layer set, | |||
with the exception that the value of level-id is changeable as | with the exception that the value of level-id is changeable as | |||
long as the highest level indicated by the answer is not higher | long as the highest level indicated by the answer is not higher | |||
than the level indicated by the sprop-vps in offer for the | than the level indicated by sprop-vps in offer for the operating | |||
operating point with the largest number of sublayers for the | point with the largest number of sublayers for the chosen output | |||
chosen output layer set. | layer set. | |||
7.3.2.3. Payload format configuration | 7.3.2.3. Payload Format Configuration | |||
The following limitations and rules pertain to the configuration of | The following limitations and rules pertain to the configuration of | |||
the payload format buffer management mostly and apply to both | the payload format buffer management mostly and apply to both | |||
scalable and non-scalable VVC. | scalable and non-scalable VVC. | |||
* The parameters sprop-max-don-diff, and sprop-depack-buf-bytes | * The parameters sprop-max-don-diff and sprop-depack-buf-bytes | |||
describe the properties of an RTP stream that the offerer or the | describe the properties of an RTP stream that the offerer or the | |||
answerer is sending for the media format configuration. This | answerer is sending for the media format configuration. This | |||
differs from the normal usage of the offer/answer parameters: | differs from the normal usage of the offer/answer parameters; | |||
normally such parameters declare the properties of the bitstream | normally, such parameters declare the properties of the bitstream | |||
or RTP stream that the offerer or the answerer is able to receive. | or RTP stream that the offerer or the answerer is able to receive. | |||
When dealing with VVC, the offerer assumes that the answerer will | When dealing with VVC, the offerer assumes that the answerer will | |||
be able to receive media encoded using the configuration being | be able to receive media encoded using the configuration being | |||
offered. | offered. | |||
Informative note: The above parameters apply for any RTP | | Informative note: The above parameters apply for any RTP | |||
stream, when present, sent by a declaring entity with the same | | stream, when present, sent by a declaring entity with the same | |||
configuration. In other words, the applicability of the above | | configuration. In other words, the applicability of the above | |||
parameters to RTP streams depends on the source endpoint. | | parameters to RTP streams depends on the source endpoint. | |||
Rather than being bound to the payload type, the values may | | Rather than being bound to the payload type, the values may | |||
have to be applied to another payload type when being sent, as | | have to be applied to another payload type when being sent, as | |||
they apply for the configuration. | | they apply for the configuration. | |||
* The capability parameter max-lsr MAY be used to declare further | * The capability parameter max-lsr MAY be used to declare further | |||
capabilities of the offerer or answerer for receiving. It MUST | capabilities of the offerer or answerer for receiving. It MUST | |||
NOT be present when the direction attribute is sendonly. | NOT be present when the direction attribute is sendonly. | |||
* The capability parameter max-fps MAY be used to declare lower | * The capability parameter max-fps MAY be used to declare lower | |||
capabilities of the offerer or answerer for receiving. It MUST | capabilities of the offerer or answerer for receiving. It MUST | |||
NOT be present when the direction attribute is sendonly. | NOT be present when the direction attribute is sendonly. | |||
* When an offerer offers an interleaved stream, indicated by the | * When an offerer offers an interleaved stream, indicated by the | |||
presence of sprop-max-don-diff with a value larger than zero, the | presence of sprop-max-don-diff with a value larger than zero, the | |||
offerer MUST include the size of the de-packetization buffer | offerer MUST include the size of the de-packetization buffer | |||
sprop-depack-buf-bytes. | sprop-depack-buf-bytes. | |||
* To enable the offerer and answerer to inform each other about | * To enable the offerer and answerer to inform each other about | |||
their capabilities for de-packetization buffering in receiving RTP | their capabilities for de-packetization buffering in receiving RTP | |||
streams, both parties are RECOMMENDED to include depack-buf-cap. | streams, both parties are RECOMMENDED to include depack-buf-cap. | |||
* The sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when present | * The parameters sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when | |||
(included in the "a=fmtp" line of SDP or conveyed using the "fmtp" | present (included in the "a=fmtp" line of SDP or conveyed using | |||
source attribute as specified in Section 6.3 of [RFC5576]), are | the "fmtp" source attribute, as specified in Section 6.3 of | |||
used for out-of-band transport of the parameter sets (DCI, VPS, | [RFC5576]), are used for out-of-band transport of the parameter | |||
SPS, or PPS, respectively). | sets (DCI, VPS, SPS, or PPS, respectively). | |||
* The answerer MAY use either out-of-band or in-band transport of | * The answerer MAY use either out-of-band or in-band transport of | |||
parameter sets for the bitstream it is sending, regardless of | parameter sets for the bitstream it is sending, regardless of | |||
whether out-of-band parameter sets transport has been used in the | whether out-of-band parameter sets transport has been used in the | |||
offerer-to-answerer direction. Parameter sets included in an | offerer-to-answerer direction. Parameter sets included in an | |||
answer are independent of those parameter sets included in the | answer are independent of those parameter sets included in the | |||
offer, as they are used for decoding two different bitstreams, one | offer, as they are used for decoding two different bitstreams; one | |||
from the answerer to the offerer and the other in the opposit | from the answerer to the offerer and the other in the opposite | |||
direction. In case some RTP packets are sent before the SDP | direction. In case some RTP packets are sent before the SDP | |||
offer/answer settles down, in-band parameter sets MUST be used for | offer/answer settles down, in-band parameter sets MUST be used for | |||
those RTP stream parts sent before the SDP offer/answer. | those RTP stream parts sent before the SDP offer/answer. | |||
* The following rules apply to transport of parameter set in the | * The following rules apply to transport of parameter sets in the | |||
offerer-to-answerer direction. | offerer-to-answerer direction. | |||
- An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | - An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | |||
sprop-pps. If none of these parameters is present in the | sprop-pps. If none of these parameters are present in the | |||
offer, then only in-band transport of parameter sets is used. | offer, then only in-band transport of parameter sets is used. | |||
- If the level to use in the offerer-to-answerer direction is | - If the level to use in the offerer-to-answerer direction is | |||
equal to the default level in the offer, the answerer MUST be | equal to the default level in the offer, the answerer MUST be | |||
prepared to use the parameter sets included in sprop-vps, | prepared to use the parameter sets included in sprop-vps, | |||
sprop-sps, and sprop-pps (either included in the "a=fmtp" line | sprop-sps, and sprop-pps (either included in the "a=fmtp" line | |||
of SDP or conveyed using the "fmtp" source attribute) for | of SDP or conveyed using the "fmtp" source attribute) for | |||
decoding the incoming bitstream, e.g., by passing these | decoding the incoming bitstream, e.g., by passing these | |||
parameter set NAL units to the video decoder before passing any | parameter set NAL units to the video decoder before passing any | |||
NAL units carried in the RTP streams. Otherwise, the answerer | NAL units carried in the RTP streams. Otherwise, the answerer | |||
MUST ignore sprop-vps, sprop-sps, and sprop-pps (either | MUST ignore sprop-vps, sprop-sps, and sprop-pps (either | |||
included in the "a=fmtp" line of SDP or conveyed using the | included in the "a=fmtp" line of SDP or conveyed using the | |||
"fmtp" source attribute) and the offerer MUST transmit | "fmtp" source attribute) and the offerer MUST transmit | |||
parameter sets in-band. | parameter sets in-band. | |||
* The following rules apply to transport of parameter set in the | * The following rules apply to transport of parameter sets in the | |||
answerer-to-offerer direction. | answerer-to-offerer direction. | |||
- An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | - An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | |||
sprop-pps. If none of these parameters is present in the | sprop-pps. If none of these parameters are present in the | |||
answer, then only in-band transport of parameter sets is used. | answer, then only in-band transport of parameter sets is used. | |||
- The offerer MUST be prepared to use the parameter sets included | - The offerer MUST be prepared to use the parameter sets included | |||
in sprop-vps, sprop-sps, and sprop-pps (either included in the | in sprop-vps, sprop-sps, and sprop-pps (either included in the | |||
"a=fmtp" line of SDP or conveyed using the "fmtp" source | "a=fmtp" line of SDP or conveyed using the "fmtp" source | |||
attribute) for decoding the incoming bitstream, e.g., by | attribute) for decoding the incoming bitstream, e.g., by | |||
passing these parameter set NAL units to the video decoder | passing these parameter set NAL units to the video decoder | |||
before passing any NAL units carried in the RTP streams. | before passing any NAL units carried in the RTP streams. | |||
* When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are | * When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are | |||
conveyed using the "fmtp" source attribute as specified in | conveyed using the "fmtp" source attribute, as specified in | |||
Section 6.3 of [RFC5576], the receiver of the parameters MUST | Section 6.3 of [RFC5576], the receiver of the parameters MUST | |||
store the parameter sets included in sprop-dci, sprop-vps, sprop- | store the parameter sets included in sprop-dci, sprop-vps, sprop- | |||
sps, and/or sprop-pps and associate them with the source given as | sps, and/or sprop-pps and associate them with the source given as | |||
part of the "fmtp" source attribute. Parameter sets associated | part of the "fmtp" source attribute. Parameter sets associated | |||
with one source (given as part of the "fmtp" source attribute) | with one source (given as part of the "fmtp" source attribute) | |||
MUST only be used to decode NAL units conveyed in RTP packets from | MUST only be used to decode NAL units conveyed in RTP packets from | |||
the same source (given as part of the "fmtp" source attribute). | the same source (given as part of the "fmtp" source attribute). | |||
When this mechanism is in use, SSRC collision detection and | When this mechanism is in use, SSRC collision detection and | |||
resolution MUST be performed as specified in [RFC5576]. | resolution MUST be performed as specified in [RFC5576]. | |||
Table 1 lists the interpretation of all the parameters that MAY be | Figure 11 lists the interpretation of all the parameters that MAY be | |||
used for the various combinations of offer, answer, and direction | used for the various combinations of offer, answer, and direction | |||
attributes. Note that the two columns wherein the recv-ols-id | attributes. | |||
parameter is used only apply to answers, whereas the other columns | ||||
apply to both offers and answers. | ||||
sendonly --+ | sendonly --+ | |||
answer: recvonly, recv-ols-id --+ | | answer: recvonly, recv-ols-id --+ | | |||
recvonly w/o recv-ols-id --+ | | | recvonly w/o recv-ols-id --+ | | | |||
answer: sendrecv, recv-ols-id --+ | | | | answer: sendrecv, recv-ols-id --+ | | | | |||
sendrecv w/o recv-ols-id --+ | | | | | sendrecv w/o recv-ols-id --+ | | | | | |||
| | | | | | | | | | | | |||
profile-id C D C D P | profile-id C D C D P | |||
tier-flag C D C D P | tier-flag C D C D P | |||
level-id D D D D P | level-id D D D D P | |||
skipping to change at page 57, line 32 ¶ | skipping to change at line 2553 ¶ | |||
sprop-dci P P - - P | sprop-dci P P - - P | |||
sprop-sei P P - - P | sprop-sei P P - - P | |||
sprop-vps P P - - P | sprop-vps P P - - P | |||
sprop-sps P P - - P | sprop-sps P P - - P | |||
sprop-pps P P - - P | sprop-pps P P - - P | |||
sprop-sublayer-id P P - - P | sprop-sublayer-id P P - - P | |||
recv-sublayer-id O O O O - | recv-sublayer-id O O O O - | |||
sprop-ols-id P P - - P | sprop-ols-id P P - - P | |||
recv-ols-id X O X O - | recv-ols-id X O X O - | |||
Table 1. Interpretation of parameters for various combinations of | ||||
offers, answers, direction attributes, with and without recv-ols-id. | ||||
Columns that do not indicate offer or answer apply to both. | ||||
Legend: | Legend: | |||
C: configuration for sending and receiving bitstreams | C: configuration for sending and receiving bitstreams | |||
D: changeable configuration, same as C except possible | D: changeable configuration, same as C, except possible | |||
to answer with a different but consistent value (see the | to answer with a different but consistent value (see the | |||
semantics of the six parameters related to profile, tier, | semantics of the six parameters related to profile, tier, | |||
and level on these parameters being consistent) | and level on these parameters being consistent) | |||
P: properties of the bitstream to be sent | P: properties of the bitstream to be sent | |||
R: receiver capabilities | R: receiver capabilities | |||
O: operation point selection | O: operation point selection | |||
X: MUST NOT be present | X: MUST NOT be present | |||
-: not usable, when present MUST be ignored | -: not usable, when present MUST be ignored | |||
Figure 11: Interpretation of Parameters for Various Combinations | ||||
of Offers, Answers, and Direction Attributes, with and without | ||||
recv-ols-id. | ||||
Parameters used for declaring receiver capabilities are, in general, | Parameters used for declaring receiver capabilities are, in general, | |||
downgradable; i.e., they express the upper limit for a sender's | downgradable, i.e., they express the upper limit for a sender's | |||
possible behavior. Thus, a sender MAY select to set its encoder | possible behavior. Thus, a sender MAY select to set its encoder | |||
using only lower/lesser or equal values of these parameters. | using only lower/lesser or equal values of these parameters. | |||
When the answer does not include a recv-ols-id that is less than the | When the answer does not include a recv-ols-id that is less than the | |||
sprop-ols-id in the offer, parameters declaring a configuration point | sprop-ols-id in the offer, parameters declaring a configuration point | |||
are not changeable, with the exception of the level-id parameter for | are not changeable, with the exception of the level-id parameter for | |||
unicast usage, and these parameters express values a receiver expects | unicast usage, and these parameters express values a receiver expects | |||
to be used and MUST be used verbatim in the answer as in the offer. | to be used and MUST be used verbatim in the answer as in the offer. | |||
When a sender's capabilities are declared with the configuration | When a sender's capabilities are declared with the configuration | |||
skipping to change at page 58, line 26 ¶ | skipping to change at line 2596 ¶ | |||
configurations in a single payload type. Thus, when multiple | configurations in a single payload type. Thus, when multiple | |||
configuration offers are made, each offer requires its own RTP | configuration offers are made, each offer requires its own RTP | |||
payload type associated with the offer. However, it is possible to | payload type associated with the offer. However, it is possible to | |||
offer multiple operation points using one configuration in a single | offer multiple operation points using one configuration in a single | |||
payload type by including sprop-vps in the offer and recv-ols-id in | payload type by including sprop-vps in the offer and recv-ols-id in | |||
the answer. | the answer. | |||
An implementation SHOULD be able to understand all media type | An implementation SHOULD be able to understand all media type | |||
parameters (including all optional media type parameters), even if it | parameters (including all optional media type parameters), even if it | |||
doesn't support the functionality related to the parameter. This, in | doesn't support the functionality related to the parameter. This, in | |||
conjunction with proper application logic in the implementation | conjunction with proper application logic in the implementation, | |||
allows the implementation, after having received an offer, to create | allows the implementation, after having received an offer, to create | |||
an answer by potentially downgrading one or more of the optional | an answer by potentially downgrading one or more of the optional | |||
parameters to the point where the implementation can cope, leading to | parameters to the point where the implementation can cope, leading to | |||
higher chances of interoperability beyond the most basic interop | higher chances of interoperability beyond the most basic interop | |||
points (for which, as described above, no optional parameters are | points (for which, as described above, no optional parameters are | |||
necessary). | necessary). | |||
Informative note: in implementations of previous H.26x payload | | Informative note: In implementations of previous H.26x payload | |||
formats it was occasionally observed that implementations were | | formats, it was occasionally observed that implementations were | |||
incapable of parsing most (or all) of the optional parameters. As | | incapable of parsing most (or all) of the optional parameters. | |||
a result, the offer-answer exchange resulted in a baseline | | As a result, the offer/answer exchange resulted in a baseline | |||
performance (using the default values for the optional parameters) | | performance (using the default values for the optional | |||
with the resulting suboptimal user experience. However, there are | | parameters) with the resulting suboptimal user experience. | |||
valid reasons to forego the implementation complexity of | | However, there are valid reasons to forego the implementation | |||
implementing the parsing of some or all of the optional | | complexity of implementing the parsing of some or all of the | |||
parameters, for example, when there is pre-determined knowledge, | | optional parameters, for example, when there is predetermined | |||
not negotiated by an SDP-based offer/answer process, of the | | knowledge, not negotiated by an SDP-based offer/answer process, | |||
capabilities of the involved systems (walled gardens, baseline | | of the capabilities of the involved systems (walled gardens, | |||
requirements defined in application standards higher up in the | | baseline requirements defined in application standards higher | |||
stack, and similar). | | up in the stack, and similar). | |||
An answerer MAY extend the offer with additional media format | An answerer MAY extend the offer with additional media format | |||
configurations. However, to enable their usage, in most cases a | configurations. However, to enable their usage, in most cases, a | |||
second offer is required from the offerer to provide the bitstream | second offer is required from the offerer to provide the bitstream | |||
property parameters that the media sender will use. This also has | property parameters that the media sender will use. This also has | |||
the effect that the offerer has to be able to receive this media | the effect that the offerer has to be able to receive this media | |||
format configuration, not only to send it. | format configuration, not only to send it. | |||
7.3.3. Multicast | 7.3.3. Multicast | |||
For bitstreams being delivered over multicast, the following rules | For bitstreams being delivered over multicast, the following rules | |||
apply: | apply: | |||
skipping to change at page 59, line 46 ¶ | skipping to change at line 2659 ¶ | |||
as long as the three above rules are obeyed. | as long as the three above rules are obeyed. | |||
7.3.4. Usage in Declarative Session Descriptions | 7.3.4. Usage in Declarative Session Descriptions | |||
When VVC over RTP is offered with SDP in a declarative style, as in | When VVC over RTP is offered with SDP in a declarative style, as in | |||
Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement | Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement | |||
Protocol (SAP) [RFC2974], the following considerations are necessary. | Protocol (SAP) [RFC2974], the following considerations are necessary. | |||
* All parameters capable of indicating both bitstream properties and | * All parameters capable of indicating both bitstream properties and | |||
receiver capabilities are used to indicate only bitstream | receiver capabilities are used to indicate only bitstream | |||
properties. For example, in this case, the parameter profile-id, | properties. For example, in this case, the parameters profile-id, | |||
tier-id, level-id declares the values used by the bitstream, not | tier-id, and level-id declare the values used by the bitstream, | |||
the capabilities for receiving bitstreams. As a result, the | not the capabilities for receiving bitstreams. As a result, the | |||
following interpretation of the parameters MUST be used: | following interpretation of the parameters MUST be used: | |||
- Declaring actual configuration or bitstream properties: | - Declaring actual configuration or bitstream properties: | |||
o profile-id | o profile-id | |||
o tier-flag | o tier-flag | |||
o level-id | o level-id | |||
skipping to change at page 61, line 11 ¶ | skipping to change at line 2720 ¶ | |||
reject (RTSP) or not participate in (SAP) the session. It | reject (RTSP) or not participate in (SAP) the session. It | |||
falls on the creator of the session to use values that are | falls on the creator of the session to use values that are | |||
expected to be supported by the receiving application. | expected to be supported by the receiving application. | |||
7.3.5. Considerations for Parameter Sets | 7.3.5. Considerations for Parameter Sets | |||
When out-of-band transport of parameter sets is used, parameter sets | When out-of-band transport of parameter sets is used, parameter sets | |||
MAY still be additionally transported in-band unless explicitly | MAY still be additionally transported in-band unless explicitly | |||
disallowed by an application, and some of these additional parameter | disallowed by an application, and some of these additional parameter | |||
sets may update some of the out-of-band transported parameter sets. | sets may update some of the out-of-band transported parameter sets. | |||
Update of a parameter set refers to the sending of a parameter set of | An update of a parameter set refers to the sending of a parameter set | |||
the same type using the same parameter set ID but with different | of the same type using the same parameter set ID but with different | |||
values for at least one other parameter of the parameter set. | values for at least one other parameter of the parameter set. | |||
8. Use with Feedback Messages | 8. Use with Feedback Messages | |||
The following subsections define the use of the Picture Loss | The following subsections define the use of the Picture Loss | |||
Indication (PLI) and Full Intra Request (FIR) feedback messages with | Indication (PLI) and Full Intra Request (FIR) feedback messages with | |||
[VVC]. The PLI is defined in [RFC4585], and the FIR message is | [VVC]. The PLI is defined in [RFC4585], and the FIR message is | |||
defined in [RFC5104]. In accordance with this memo, unlike [HEVC], a | defined in [RFC5104]. In accordance with this memo, unlike [HEVC], a | |||
sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture | sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture | |||
Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and | Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and | |||
treat a received SLI as a PLI. | treat a received SLI as a PLI. | |||
8.1. Picture Loss Indication (PLI) | 8.1. Picture Loss Indication (PLI) | |||
As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a | As specified in Section 6.3.1 of [RFC4585], the reception of a PLI by | |||
media sender indicates "the loss of an undefined amount of coded | a media sender indicates "the loss of an undefined amount of coded | |||
video data belonging to one or more pictures". Without having any | video data belonging to one or more pictures". Without having any | |||
specific knowledge of the setup of the bitstream (such as use and | specific knowledge of the setup of the bitstream (such as use and | |||
location of in-band parameter sets, non-IRAP decoder refresh points, | location of in-band parameter sets, non-IRAP decoder refresh points, | |||
picture structures, and so forth), a reaction to the reception of an | picture structures, and so forth), a reaction to the reception of a | |||
PLI by a VVC sender SHOULD be to send an IRAP picture and relevant | PLI by a VVC sender SHOULD be to send an IRAP picture and relevant | |||
parameter sets; potentially with sufficient redundancy so to ensure | parameter sets, potentially with sufficient redundancy so to ensure | |||
correct reception. However, sometimes information about the | correct reception. However, sometimes information about the | |||
bitstream structure is known. For example, state could have been | bitstream structure is known. For example, such information can be | |||
established outside of the mechanisms defined in this document that | parameter sets that have been conveyed out of band through mechanisms | |||
parameter sets are conveyed out of band only, and stay static for the | not defined in this document and that are known to stay static for | |||
duration of the session. In that case, it is obviously unnecessary | the duration of the session. In that case, it is obviously | |||
to send them in-band as a result of the reception of a PLI. Other | unnecessary to send them in-band as a result of the reception of a | |||
examples could be devised based on a priori knowledge of different | PLI. Other examples could be devised based on a priori knowledge of | |||
aspects of the bitstream structure. In all cases, the timing and | different aspects of the bitstream structure. In all cases, the | |||
congestion control mechanisms of RFC 4585 MUST be observed. | timing and congestion control mechanisms of [RFC4585] MUST be | |||
observed. | ||||
8.2. Full Intra Request (FIR) | 8.2. Full Intra Request (FIR) | |||
The purpose of the FIR message is to force an encoder to send an | The purpose of the FIR message is to force an encoder to send an | |||
independent decoder refresh point as soon as possible, while | independent decoder refresh point as soon as possible while observing | |||
observing applicable congestion-control-related constraints, such as | applicable congestion-control-related constraints, such as those set | |||
those set out in [RFC8082]). | out in [RFC8082]. | |||
Upon reception of a FIR, a sender MUST send an IDR picture. | Upon reception of a FIR, a sender MUST send an IDR picture. | |||
Parameter sets MUST also be sent, except when there is a priori | Parameter sets MUST also be sent, except when there is a priori | |||
knowledge that the parameter sets have been correctly established. A | knowledge that the parameter sets have been correctly established. A | |||
typical example for that is an understanding between sender and | typical example for that is an understanding between the sender and | |||
receiver, established by means outside this document, that parameter | receiver, established by means outside this document, that parameter | |||
sets are exclusively sent out-of-band. | sets are exclusively sent out of band. | |||
9. Security Considerations | 9. Security Considerations | |||
The scope of this Security Considerations section is limited to the | The scope of this section is limited to the payload format itself and | |||
payload format itself and to one feature of [VVC] that may pose a | to one feature of [VVC] that may pose a particularly serious security | |||
particularly serious security risk if implemented naively. The | risk if implemented naively. The payload format, in isolation, does | |||
payload format, in isolation, does not form a complete system. | not form a complete system. Implementers are advised to read and | |||
Implementers are advised to read and understand relevant security- | understand relevant security-related documents, especially those | |||
related documents, especially those pertaining to RTP (see the | pertaining to RTP (see the Security Considerations section in | |||
Security Considerations section in [RFC3550]), and the security of | [RFC3550]) and the security of the call-control stack chosen (that | |||
the call-control stack chosen (that may make use of the media type | may make use of the media type registration of this memo). | |||
registration of this memo). Implementers should also consider known | Implementers should also consider known security vulnerabilities of | |||
security vulnerabilities of video coding and decoding implementations | video coding and decoding implementations in general and avoid those. | |||
in general and avoid those. | ||||
Within this RTP payload format, and with the exception of the user | Within this RTP payload format, and with the exception of the user | |||
data SEI message as described below, no security threats other than | data SEI message as described below, no security threats other than | |||
those common to RTP payload formats are known. In other words, | those common to RTP payload formats are known. In other words, | |||
neither the various media-plane-based mechanisms, nor the signaling | neither the various media-plane-based mechanisms nor the signaling | |||
part of this memo, seems to pose a security risk beyond those common | part of this memo seem to pose a security risk beyond those common to | |||
to all RTP-based systems. | all RTP-based systems. | |||
RTP packets using the payload format defined in this specification | RTP packets using the payload format defined in this specification | |||
are subject to the security considerations discussed in the RTP | are subject to the security considerations discussed in the RTP | |||
specification [RFC3550], and in any applicable RTP profile such as | specification [RFC3550] and in any applicable RTP profile, such as | |||
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | |||
SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | |||
Does Not Mandate a Single Media Security Solution" [RFC7202] | Does Not Mandate a Single Media Security Solution" [RFC7202] | |||
discusses, it is not an RTP payload format's responsibility to | discusses, it is not an RTP payload format's responsibility to | |||
discuss or mandate what solutions are used to meet the basic security | discuss or mandate what solutions are used to meet the basic security | |||
goals like confidentiality, integrity and source authenticity for RTP | goals, like confidentiality, integrity, and source authenticity for | |||
in general. This responsibility lays on anyone using RTP in an | RTP in general. This responsibility lays on anyone using RTP in an | |||
application. They can find guidance on available security mechanisms | application. They can find guidance on available security mechanisms | |||
and important considerations in "Options for Securing RTP Sessions" | and important considerations in "Options for Securing RTP Sessions" | |||
[RFC7201]. The rest of this section discusses the security impacting | [RFC7201]. The rest of this section discusses the security impacting | |||
properties of the payload format itself. | properties of the payload format itself. | |||
Because the data compression used with this payload format is applied | Because the data compression used with this payload format is applied | |||
end-to-end, any encryption needs to be performed after compression. | end to end, any encryption needs to be performed after compression. | |||
A potential denial-of-service threat exists for data encodings using | A potential denial-of-service threat exists for data encodings using | |||
compression techniques that have non-uniform receiver-end | compression techniques that have non-uniform receiver-end | |||
computational load. The attacker can inject pathological datagrams | computational load. The attacker can inject pathological datagrams | |||
into the bitstream that are complex to decode and that cause the | into the bitstream that are complex to decode and that cause the | |||
receiver to be overloaded. [VVC] is particularly vulnerable to such | receiver to be overloaded. [VVC] is particularly vulnerable to such | |||
attacks, as it is extremely simple to generate datagrams containing | attacks, as it is extremely simple to generate datagrams containing | |||
NAL units that affect the decoding process of many future NAL units. | NAL units that affect the decoding process of many future NAL units. | |||
Therefore, the usage of data origin authentication and data integrity | Therefore, the usage of data origin authentication and data integrity | |||
protection of at least the RTP packet is RECOMMENDED but NOT | protection of at least the RTP packet is RECOMMENDED but NOT REQUIRED | |||
REQUIRED, based on the thoughts of [RFC7202] | based on the thoughts of [RFC7202]. | |||
Like HEVC [RFC7798], [VVC] includes a user data Supplemental | Like HEVC [RFC7798], [VVC] includes a user data Supplemental | |||
Enhancement Information (SEI) message. This SEI message allows | Enhancement Information (SEI) message. This SEI message allows | |||
inclusion of an arbitrary bitstring into the video bitstream. Such a | inclusion of an arbitrary bitstring into the video bitstream. Such a | |||
bitstring could include JavaScript, machine code, and other active | bitstring could include JavaScript, machine code, and other active | |||
content. [VVC] leaves the handling of this SEI message to the | content. [VVC] leaves the handling of this SEI message to the | |||
receiving system. In order to avoid harmful side effects of the user | receiving system. In order to avoid harmful side effects of the user | |||
data SEI message, decoder implementations cannot naively trust its | data SEI message, decoder implementations cannot naively trust its | |||
content. For example, it would be a bad and insecure implementation | content. For example, it would be a bad and insecure implementation | |||
practice to forward any JavaScript a decoder implementation detects | practice to forward any JavaScript a decoder implementation detects | |||
skipping to change at page 63, line 43 ¶ | skipping to change at line 2848 ¶ | |||
end points. | end points. | |||
10. Congestion Control | 10. Congestion Control | |||
Congestion control for RTP SHALL be used in accordance with RTP | Congestion control for RTP SHALL be used in accordance with RTP | |||
[RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or | [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or | |||
AVPF [RFC4585]. If best-effort service is being used, an additional | AVPF [RFC4585]. If best-effort service is being used, an additional | |||
requirement is that users of this payload format MUST monitor packet | requirement is that users of this payload format MUST monitor packet | |||
loss to ensure that the packet loss rate is within an acceptable | loss to ensure that the packet loss rate is within an acceptable | |||
range. Packet loss is considered acceptable if a TCP flow across the | range. Packet loss is considered acceptable if a TCP flow across the | |||
same network path, and experiencing the same network conditions, | same network path and experiencing the same network conditions would | |||
would achieve an average throughput, measured on a reasonable | achieve an average throughput, measured on a reasonable timescale, | |||
timescale, that is not less than all RTP streams combined are | that is not less than all RTP streams combined are achieved. This | |||
achieved. This condition can be satisfied by implementing | condition can be satisfied by implementing congestion-control | |||
congestion-control mechanisms to adapt the transmission rate, the | mechanisms to adapt the transmission rate, by implementing the number | |||
number of layers subscribed for a layered multicast session, or by | of layers subscribed for a layered multicast session, or by arranging | |||
arranging for a receiver to leave the session if the loss rate is | for a receiver to leave the session if the loss rate is unacceptably | |||
unacceptably high. | high. | |||
The bitrate adaptation necessary for obeying the congestion control | The bitrate adaptation necessary for obeying the congestion control | |||
principle is easily achievable when real-time encoding is used, for | principle is easily achievable when real-time encoding is used, for | |||
example, by adequately tuning the quantization parameter. However, | example, by adequately tuning the quantization parameter. However, | |||
when pre-encoded content is being transmitted, bandwidth adaptation | when pre-encoded content is being transmitted, bandwidth adaptation | |||
requires the pre-coded bitstream to be tailored for such adaptivity. | requires the pre-coded bitstream to be tailored for such adaptivity. | |||
The key mechanisms available in [VVC] are temporal scalability, and | The key mechanisms available in [VVC] are temporal scalability and | |||
spatial/SNR scalability. A media sender can remove NAL units | spatial/SNR scalability. A media sender can remove NAL units | |||
belonging to higher temporal sublayers (i.e., those NAL units with a | belonging to higher temporal sublayers (i.e., those NAL units with a | |||
high value of TID) or higher spatio-SNR layers until the sending | high value of TID) or higher spatio-SNR layers until the sending | |||
bitrate drops to an acceptable range. | bitrate drops to an acceptable range. | |||
The mechanisms mentioned above generally work within a defined | The mechanisms mentioned above generally work within a defined | |||
profile and level and, therefore, no renegotiation of the channel is | profile and level; therefore no renegotiation of the channel is | |||
required. Only when non-downgradable parameters (such as profile) | required. Only when non-downgradable parameters (such as profile) | |||
are required to be changed does it become necessary to terminate and | are required to be changed does it become necessary to terminate and | |||
restart the RTP stream(s). This may be accomplished by using | restart the RTP stream(s). This may be accomplished by using | |||
different RTP payload types. | different RTP payload types. | |||
MANEs MAY remove certain unusable packets from the RTP stream when | MANEs MAY remove certain unusable packets from the RTP stream when | |||
that RTP stream was damaged due to previous packet losses. This can | that RTP stream was damaged due to previous packet losses. This can | |||
help reduce the network load in certain special cases. For example, | help reduce the network load in certain special cases. For example, | |||
MANEs can remove those FUs where the leading FUs belonging to the | MANEs can remove those FUs where the leading FUs belonging to the | |||
same NAL unit have been lost or those dependent slice segments when | same NAL unit have been lost or those dependent slice segments when | |||
the leading slice segments belonging to the same slice have been | the leading slice segments belonging to the same slice have been | |||
lost, because the trailing FUs or dependent slice segments are | lost, because the trailing FUs or dependent slice segments are | |||
meaningless to most decoders. MANE can also remove higher temporal | meaningless to most decoders. MANE can also remove higher temporal | |||
scalable layers if the outbound transmission (from the MANE's | scalable layers if the outbound transmission (from the MANE's | |||
viewpoint) experiences congestion. | viewpoint) experiences congestion. | |||
11. IANA Considerations | 11. IANA Considerations | |||
A new media type, as specified in Section 7.1 of this memo, has been | A new media type has been registered with IANA; see Section 7.1. | |||
registered with IANA. | ||||
12. Acknowledgements | ||||
Dr. Byeongdoo Choi is thanked for the video codec related technical | ||||
discussion and other aspects in this memo. Xin Zhao and Dr. Xiang Li | ||||
are thanked for their contributions on [VVC] specification | ||||
descriptive content. Spencer Dawkins is thanked for his valuable | ||||
review comments that led to great improvements of this memo. Some | ||||
parts of this specification share text with the RTP payload format | ||||
for HEVC [RFC7798]. We thank the authors of that specification for | ||||
their excellent work. | ||||
13. References | 12. References | |||
13.1. Normative References | 12.1. Normative References | |||
[ISO23090-3] | [ISO23090-3] | |||
ISO/IEC 23090-3, "Information technology - Coded | International Organization for Standardization, | |||
representation of immersive media Part 3 Versatile Video | "Information technology - Coded representation of | |||
Coding", 2021, <https://www.iso.org/standard/73022.html>. | immersive media - Part 3: Versatile video coding", ISO/ | |||
IEC 23090-3:2022, September 2022, | ||||
<https://www.iso.org/standard/73022.html>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model | [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model | |||
with Session Description Protocol (SDP)", RFC 3264, | with Session Description Protocol (SDP)", RFC 3264, | |||
DOI 10.17487/RFC3264, June 2002, | DOI 10.17487/RFC3264, June 2002, | |||
<https://www.rfc-editor.org/info/rfc3264>. | <https://www.rfc-editor.org/info/rfc3264>. | |||
skipping to change at page 65, line 35 ¶ | skipping to change at line 2926 ¶ | |||
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | |||
Video Conferences with Minimal Control", STD 65, RFC 3551, | Video Conferences with Minimal Control", STD 65, RFC 3551, | |||
DOI 10.17487/RFC3551, July 2003, | DOI 10.17487/RFC3551, July 2003, | |||
<https://www.rfc-editor.org/info/rfc3551>. | <https://www.rfc-editor.org/info/rfc3551>. | |||
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | |||
Norrman, "The Secure Real-time Transport Protocol (SRTP)", | Norrman, "The Secure Real-time Transport Protocol (SRTP)", | |||
RFC 3711, DOI 10.17487/RFC3711, March 2004, | RFC 3711, DOI 10.17487/RFC3711, March 2004, | |||
<https://www.rfc-editor.org/info/rfc3711>. | <https://www.rfc-editor.org/info/rfc3711>. | |||
[RFC4556] Zhu, L. and B. Tung, "Public Key Cryptography for Initial | ||||
Authentication in Kerberos (PKINIT)", RFC 4556, | ||||
DOI 10.17487/RFC4556, June 2006, | ||||
<https://www.rfc-editor.org/info/rfc4556>. | ||||
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, | [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, | |||
"Extended RTP Profile for Real-time Transport Control | "Extended RTP Profile for Real-time Transport Control | |||
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, | Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, | |||
DOI 10.17487/RFC4585, July 2006, | DOI 10.17487/RFC4585, July 2006, | |||
<https://www.rfc-editor.org/info/rfc4585>. | <https://www.rfc-editor.org/info/rfc4585>. | |||
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | |||
Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | |||
<https://www.rfc-editor.org/info/rfc4648>. | <https://www.rfc-editor.org/info/rfc4648>. | |||
skipping to change at page 66, line 35 ¶ | skipping to change at line 2966 ¶ | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | |||
Session Description Protocol", RFC 8866, | Session Description Protocol", RFC 8866, | |||
DOI 10.17487/RFC8866, January 2021, | DOI 10.17487/RFC8866, January 2021, | |||
<https://www.rfc-editor.org/info/rfc8866>. | <https://www.rfc-editor.org/info/rfc8866>. | |||
[VSEI] "Versatile supplemental enhancement information messages | [VSEI] ITU-T, "Versatile supplemental enhancement information | |||
for coded video bitstreams", 2020, | messages for coded video bitstreams", ITU-T | |||
Recommendation H.274, May 2022, | ||||
<https://www.itu.int/rec/T-REC-H.274>. | <https://www.itu.int/rec/T-REC-H.274>. | |||
[VVC] "Versatile Video Coding, ITU-T Recommendation H.266", | [VVC] ITU-T, "Versatile Video Coding", ITU-T | |||
2020, <http://www.itu.int/rec/T-REC-H.266>. | Recommendation H.266, April 2022, | |||
<http://www.itu.int/rec/T-REC-H.266>. | ||||
13.2. Informative References | 12.2. Informative References | |||
[CABAC] and et al, "Transform coefficient coding in HEVC, IEEE | [CABAC] Sole, J., et al., "Transform coefficient coding in HEVC", | |||
Transactions on Circuits and Systems for Video | IEEE Transactions on Circuits and Systems for Video | |||
Technology", DOI 10.1109/TCSVT.2012.2223055, December | Technology, DOI 10.1109/TCSVT.2012.2223055, December 2012, | |||
2012, <https://doi.org/10.1109/TCSVT.2012.2223055>. | <https://doi.org/10.1109/TCSVT.2012.2223055>. | |||
[HEVC] "High efficiency video coding, ITU-T Recommendation | [HEVC] ITU-T, "High efficiency video coding", ITU-T | |||
H.265", 2019, <https://www.itu.int/rec/T-REC-H.265>. | Recommendation H.265, August 2021, | |||
<https://www.itu.int/rec/T-REC-H.265>. | ||||
[MPEG2S] IS0/IEC, "Information technology - Generic coding of | [MPEG2S] International Organization for Standardization, | |||
moving pictures and associated audio information - Part 1: | "Information technology - Generic coding of moving | |||
Systems, ISO International Standard 13818-1", 2013. | pictures and associated audio information - Part 1: | |||
Systems", ISO/IEC 13818-1:2022, September 2022. | ||||
[RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session | [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session | |||
Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, | Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, | |||
October 2000, <https://www.rfc-editor.org/info/rfc2974>. | October 2000, <https://www.rfc-editor.org/info/rfc2974>. | |||
[RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP | [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP | |||
Payload Format for H.264 Video", RFC 6184, | Payload Format for H.264 Video", RFC 6184, | |||
DOI 10.17487/RFC6184, May 2011, | DOI 10.17487/RFC6184, May 2011, | |||
<https://www.rfc-editor.org/info/rfc6184>. | <https://www.rfc-editor.org/info/rfc6184>. | |||
skipping to change at page 68, line 5 ¶ | skipping to change at line 3034 ¶ | |||
[RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | |||
M. Hannuksela, "RTP Payload Format for High Efficiency | M. Hannuksela, "RTP Payload Format for High Efficiency | |||
Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | |||
March 2016, <https://www.rfc-editor.org/info/rfc7798>. | March 2016, <https://www.rfc-editor.org/info/rfc7798>. | |||
[RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | |||
and M. Stiemerling, Ed., "Real-Time Streaming Protocol | and M. Stiemerling, Ed., "Real-Time Streaming Protocol | |||
Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | |||
2016, <https://www.rfc-editor.org/info/rfc7826>. | 2016, <https://www.rfc-editor.org/info/rfc7826>. | |||
Appendix A. Change History | Acknowledgements | |||
To RFC Editor: PLEASE REMOVE ThIS SECTION BEFORE PUBLICATION | ||||
draft-zhao-payload-rtp-vvc-00 ........ initial version | ||||
draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and | ||||
corrections | ||||
draft-ietf-payload-rtp-vvc-00 ........ initial WG draft | ||||
draft-ietf-payload-rtp-vvc-01 ........ VVC specification update | ||||
draft-ietf-payload-rtp-vvc-02 ........ VVC specification update | ||||
draft-ietf-payload-rtp-vvc-03 ........ VVC coding tool introduction | ||||
update | ||||
draft-ietf-payload-rtp-vvc-04 ........ VVC coding tool introduction | ||||
update | ||||
draft-ietf-payload-rtp-vvc-05 ........ reference udpate and adding | ||||
placement for open issues | ||||
draft-ietf-payload-rtp-vvc-06 ........ address editor's note | ||||
draft-ietf-payload-rtp-vvc-07 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-08 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-09 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-10 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-11 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-12 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-13 ........ address editor's notes | ||||
draft-ietf-payload-rtp-vvc-14 ........ address 2nd WGLC comments | Dr. Byeongdoo Choi is thanked for the video-codec-related technical | |||
discussion and other aspects in this memo. Xin Zhao and Dr. Xiang Li | ||||
are thanked for their contributions on [VVC] specification | ||||
descriptive content. Spencer Dawkins is thanked for his valuable | ||||
review comments that led to great improvements of this memo. Some | ||||
parts of this specification share text with the RTP payload format | ||||
for HEVC [RFC7798]. We thank the authors of that specification for | ||||
their excellent work. | ||||
Authors' Addresses | Authors' Addresses | |||
Shuai Zhao | Shuai Zhao | |||
Intel | Intel | |||
2200 Mission College Blvd | 2200 Mission College Blvd | |||
Santa Clara, 95054 | Santa Clara, 95054 | |||
United States of America | United States of America | |||
Email: shuai.zhao@ieee.org | Email: shuai.zhao@ieee.org | |||
Stephan Wenger | Stephan Wenger | |||
Tencent | Tencent | |||
2747 Park Blvd | 2747 Park Blvd | |||
End of changes. 344 change blocks. | ||||
1262 lines changed or deleted | 1228 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |