<?xml version='1.0' encoding='utf-8'?> <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> <!DOCTYPE rfc SYSTEM"rfc2629-xhtml.ent" [ <!ENTITY RFC6350 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6350.xml"> ]> <?rfc toc="yes"?> <?rfc text-list-symbols="o.-*+"?>"rfc2629-xhtml.ent"> <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="draft-ietf-netvc-requirements-10" ipr="trust200902" submissionType="IETF" xml:lang="en" version="3">number="8761" consensus="true" symRefs="false" sortRefs="true" tocInclude="true"> <?rfcsymrefs="no"?>text-list-symbols="o.-*+"?> <front> <title abbrev="Video Codec Requirements andEvaluattion">VideoEvaluation">Video Codec Requirements and Evaluation Methodology</title> <seriesInfo name="RFC" value="8761" /> <author fullname="Alexey Filippov" initials="A." surname="Filippov"> <organization>Huawei Technologies</organization> <address> <email>alexey.filippov@huawei.com</email><!-- uri and facsimile elements may also be added --></address> </author> <author fullname="Andrey Norkin" initials="A." surname="Norkin"> <organization>Netflix</organization> <address> <email>anorkin@netflix.com</email> </address> </author> <author fullname="Jose Roberto Alvarez" initials="J.R." surname="Alvarez"> <organization>Huawei Technologies</organization> <address> <email>j.alvarez@ieee.org</email> </address> </author> <dateday="28" month="November" year="2019" /> <keyword>Internet-Draft</keyword>month="April" year="2020"/> <keyword>NETVC</keyword> <keyword>evaluation</keyword> <keyword>requirements</keyword> <keyword>compression performance</keyword> <keyword>video coding applications</keyword> <abstract> <t> This document provides requirements for a video codec designed mainly for use over the Internet. In addition, this document describes an evaluation methodologyneededfor measuring the compression efficiency toensuredetermine whether or not the stated requirementsare fulfilled or not.have been fulfilled. </t> </abstract> </front> <middle> <section title="Introduction"><!-- 1, line 97--> <t>In this document,<t>This document presents the requirements for a video codec designed mainly for use over theInternet are presented.Internet. The requirements encompass a wide range of applications that use data transmission over theInternetInternet, including Internet video streaming, IPTV, peer-to-peer video conferencing, video sharing, screencasting, gamestreamingstreaming, and video monitoring/and surveillance. For each application, typical resolutions,frame-ratesframe rates, andpicture accesspicture-access modes are presented. Specific requirements related to data transmission over packet-loss networks are considered as well. In this document, when we discussdata protection techniquesdata-protection techniques, we only refer to methods designed and implemented to protect data inside the video codec since there are many existing techniques that protect generic data transmitted over networks with packet losses. From the theoretical point of view, both packet-loss and bit-error robustness can be beneficial for video codecs. In practice, packet losses are a more significant problem than bit corruption in IP networks. It is worth noting that there is an evident interdependence between the possible amount of delay and the necessity oferror robusterror-robust video streams: </t><t>o If an<ul spacing="normal"> <li>If the amount of delay is not crucial for an application, then reliable transport protocols such as TCP thatretransmitsretransmit undelivered packets can be used to guarantee correct decoding of transmitted data.</t> <t>o If</li> <li>If the amount of delay must be kept low, then either data transmission should be error free (e.g., by using managed networks) or the compressed video stream should be error resilient.</t></li> </ul> <t>Thus, error resilience can be useful for delay-critical applications to provide low delay in a packet-loss environment. </t> </section><!-- ends: "1 from line 97--><sectiontitle="Definitions and abbreviations usedanchor="defs" title="Terminology Used inthis document"> <!-- 2, line 113--> <artwork> <![CDATA[ +------------------+-----------------------------------------------+ | Term | Meaning | +------------------+-----------------------------------------------+ | HighThis Document"> <section anchor="def1" title="Definitions"> <dl newline="true"> <dt>High dynamic| is arange imaging</dt> <dd>A set of techniques thatallowallows a greater| | range imaging |dynamic range of exposures or values (i.e.,| | |awidewider range of values between light and dark| | |areas) than normal digital imagingtechniques.| | |techniques. The intention is to accurately represent the| | |wide range of intensity levels found insuch | | |examples such as exterior scenes that include| | |light-colored items struck by direct sunlight| | |and areas of deep shadow[7]. | | | | | Random<xref target="HDR"/>.</dd> <dt>Random access| is theperiod</dt> <dd>The period of time between the two closest| | period |independently decodable frames(pictures). | | | | | RD-point | A(pictures).</dd> <dt>RD-point</dt> <dd>A point in atwo dimensionaltwo-dimensional rate-distortion| | |space where the values of bitrate and quality| | |metric are used as x- and y-coordinates,| | | respectively | | | | | Visually | is arespectively.</dd> <dt>Visually lossless compression</dt> <dd>A form or manner of lossy compression| | lossless |where the data that are lost after the file| | compression |is compressed and decompressed is not| | |detectable to the eye; the compressed data| | | appearingappear identical to the uncompressed| | |data[8]. | | | | | Wide<xref target="COMPRESSION"/>.</dd> <dt>Wide colorgamut | is agamut</dt> <dd>A certain complete color subset (e.g.,| | |considered in ITU-RBT.2020)BT.2020 <xref target="BT2020-2" />) that supports a| | |wider range of colors (i.e., an extendedrange| | |range of colors that can be generated by a specific| | |input or output device such as a videocamera,| | | monitorcamera, monitor, or printer and can be interpreted by| | |a color model) than conventional color gamuts| | |(e.g., considered in ITU-R BT.601 <xref target="BT601"/> orBT.709). | +------------------+-----------------------------------------------+ Table 1. Definitions used in the text of this document ]]> </artwork> <artwork> <![CDATA[ +--------------+---------------------------------------------------+ | Abbreviation | Meaning | +--------------+---------------------------------------------------+ | AI | All-IntraBT.709 <xref target="BT709"/>).</dd> </dl> </section> <section anchor="abbr" title="Abbreviations"> <dl newline="false" indent="12" spacing="normal"> <dt>AI</dt> <dd>All-Intra (each picture isintra-coded) | | BD-Rate | Bjontegaardintra-coded)</dd> <dt>BD-Rate</dt> <dd>Bjontegaard DeltaRate | | FIZD | justRate</dd> <dt>FIZD</dt> <dd>just the First picture is Intra-coded, Zero| | |structuralDelay | | GOP | Group of Picture | | HBR | HighDelay</dd> <dt>FPS</dt> <dd>Frames per Second</dd> <dt>GOP</dt> <dd>Group of Picture</dd> <dt>GPU</dt> <dd>Graphics Processing Unit</dd> <dt>HBR</dt> <dd>High Bitrate Range| | HDR | High</dd> <dt>HDR</dt> <dd>High DynamicRange | | HRD | HypotheticalRange</dd> <dt>HRD</dt> <dd>Hypothetical ReferenceDecoder | | IPTV | InternetDecoder</dd> <dt>HEVC</dt> <dd>High Efficiency Video Coding</dd> <dt>IPTV</dt> <dd>Internet ProtocolTelevision | | LBR | LowTelevision</dd> <dt>LBR</dt> <dd>Low BitrateRange | | MBR | MediumRange</dd> <dt>MBR</dt> <dd>Medium BitrateRange | | MOS | MeanRange</dd> <dt>MOS</dt> <dd>Mean OpinionScore | | MS-SSIM | Multi-ScaleScore</dd> <dt>MS-SSIM</dt> <dd>Multi-Scale Structural Similarity qualityindex | | PAM | Pictureindex</dd> <dt>PAM</dt> <dd>Picture AccessMode | | PSNR | PeakMode</dd> <dt>PSNR</dt> <dd>Peak Signal-to-NoiseRatio | | QoS | Quality of Service | | QP | Quantization Parameter | | RA | Random Access | | RAP | RandomRatio</dd> <dt>QoS</dt> <dd>Quality of Service</dd> <dt>QP</dt> <dd>Quantization Parameter</dd> <dt>RA</dt> <dd>Random Access</dd> <dt>RAP</dt> <dd>Random AccessPeriod | | RD | Rate-Distortion | | SEI | SupplementalPeriod</dd> <dt>RD</dt> <dd>Rate-Distortion</dd> <dt>SEI</dt> <dd>Supplemental EnhancementInformation | | UGC | User-Generated Content | | VDI | VirtualInformation</dd> <dt>SIMD</dt> <dd>Single Instruction, Multiple Data</dd> <dt>SNR</dt> <dd>Signal-to-Noise Ratio</dd> <dt>UGC</dt> <dd>User-Generated Content</dd> <dt>VDI</dt> <dd>Virtual DesktopInfrastructure | | VUI | VideoInfrastructure</dd> <dt>VUI</dt> <dd>Video UsabilityInformation | | WCG | WideInformation</dd> <dt>WCG</dt> <dd>Wide ColorGamut | +--------------+---------------------------------------------------+ Table 2. Abbreviations used in the text of this document ]]> </artwork>Gamut</dd> </dl> </section> </section><!-- ends: "2 from line 113--><section anchor="apps" title="Applications"><!-- 3, line 191--><t>In thischapter,section, an overview of video codec applications that are currently available on the Internet market is presented. It is worth noting that there are different use cases for each application that define a targetplatform, and henceplatform; hence, there are different types of communication channels involved (e.g., wired or wireless channels) that are characterized by differentquality of serviceQoS as well as bandwidth; for instance, wired channels are considerably moreerror-free from error than wireless channels and therefore require different QoS approaches. The target platform, the channelbandwidthbandwidth, and the channel quality determine resolutions,frame-ratesframe rates, and either quality orbit-ratesbitrates for video streams to be encoded or decoded. By default, color format YCbCr 4:2:0 is assumed for the application scenarios listed below. </t> <section title="Internet Video Streaming"><!-- 3.1, line 197--><t>Typical content for this application is movies,TV-seriesTV series and shows, and animation. Internet video streaming uses a variety of client devices and has to operate under changing network conditions. For this reason, an adaptive streaming model has been widely adopted. Video material is encoded at different quality levels and different resolutions, which are then chosen by a client depending on its capabilities and current network bandwidth. An example combination of resolutions and bitrates is shown inTable 3.<xref target="vid-stream" />. </t> <t>A video encoding pipeline in on-demand Internet video streaming typically operates as follows: </t> <ul> <li>Video is encoded in the cloud by software encoders. </li> <li>Source video is split into chunks, each of which is encoded separately, in parallel. </li> <li>Closed-GOP encoding with2-5 second intra-pictureintrapicture intervals of 2-5 seconds (ormore)longer) is used. </li> <li>Encoding is perceptually optimized. Perceptual quality is important and should be considered during the codec development. </li> </ul><artwork> <![CDATA[ +----------------------+-------------------------+-----------------+ | Resolution * | Frame-rate, fps | PAM | +----------------------+-------------------------+-----------------+ +----------------------+-------------------------+-----------------+ | 4K, 3840x2160 | 24/1.001,<table anchor="vid-stream"> <name> Internet Video Streaming: Typical Values of Resolutions, Frame Rates, and PAMs</name> <thead> <tr> <th>Resolution *</th> <th>PAM</th> <th align="center">Frame Rate, FPS **</th> </tr> </thead> <tbody> <tr> <td>4K, 3840x2160</td> <td>RA</td> <td align="center" rowspan="10"><br/><br/><br/>24/1.001, 24, 25,| RA | +----------------------+ +-----------------+ | 2K (1080p), 1920x1080| 30/1.001,<br/>30/1.001, 30, 50,| RA | +----------------------+ +-----------------+ | 1080i, 1920x1080* | 60/1.001,<br/>60/1.001, 60, 100,| RA | +----------------------+ +-----------------+ | 720p, 1280x720 | 120/1.001, 120 | RA | +----------------------+ +-----------------+ | 576p<br/>120/1.001, 120</td> </tr> <tr> <td>2K (1080p), 1920x1080</td> <td>RA</td> </tr> <tr> <td>1080i, 1920x1080*</td> <td>RA</td> </tr> <tr> <td>720p, 1280x720</td> <td>RA</td> </tr> <tr> <td>576p (EDTV),720x576 | The set of frame-rates | RA | +----------------------+ +-----------------+ | 576i720x576</td> <td>RA</td> </tr> <tr> <td>576i (SDTV),720x576*| presented in this table | RA | +----------------------+ +-----------------+ | 480p720x576*</td> <td>RA</td> </tr> <tr> <td>480p (EDTV),720x480 | is taken from Table 2 | RA | +----------------------+ +-----------------+ | 480i720x480</td> <td>RA</td> </tr> <tr> <td>480i (SDTV),720x480*| in [1] | RA | +----------------------+ +-----------------+ | 512x384 | | RA | +----------------------+ +-----------------+ | QVGA, 320x240 | | RA | +----------------------+-------------------------+-----------------+]]> Table 3. Internet Video Streaming: typical values of resolutions, frame-rates, and RAPs </artwork>720x480*</td> <td>RA</td> </tr> <tr> <td>512x384</td> <td>RA</td> </tr> <tr> <td>QVGA, 320x240</td> <td>RA</td> </tr> </tbody> </table> <t>NB *:*Note: Interlaced content can be handled at the higher system level and not necessarily by using specialized video coding tools. It is included in this table only for the sake ofcompletenesscompleteness, as most video content today is in the progressive format. </t><t>Characteristics<t> **Note: The set of frame rates presented in this table is taken from Table 2 in <xref target="BT2020-2"/>. </t> <t>The characteristics and requirements of this application scenario are as follows: </t> <ul> <li>High encoder complexity (up to 10x and more) can be tolerated since encoding happens once and in parallel for different segments. </li> <li>Decoding complexity should be kept at reasonable levels to enable efficient decoder implementation. </li> <li><t>Support and efficient encoding of a wide range of content types and formats is required:</t> <ul> <li>High Dynamic Range (HDR), Wide Color Gamut (WCG),high resolutionhigh-resolution (currently, up to 4K),high frame-rateand high-frame-rate content are important usecases,cases; the codec should be able to encode such content efficiently. </li><li>Coding<li>Improvement of coding efficiencyimprovementat both lower and higher resolutions is important since low resolutions are used when streaming inlow bandwidthlow-bandwidth conditions. </li> <li>Improvement on both "easy" and "difficult" content in terms of compression efficiency at the same quality level contributes to the overall bitrate/storage savings. </li> <li>Film grain (and sometimes other types of noise) is often present inthe streaming movie-type contentmovies and similar content; this is usuallyapart of the creative intent. </li> </ul></li> <li>Significant improvements in compression efficiency between generations of video standards are desirable since this scenario typically assumes long-term support of legacy video codecs. </li> <li>Random access points are inserted frequently (one per 2-5 seconds) to enable switching between resolutions and fast-forward playback. </li><li>Elementary<li>The elementary stream should have a model that allows easy parsing and identification of the sample components. </li> <li>Middle QP values are normally used instreaming,streaming; this is also the range where compression efficiency is important for this scenario. </li> <li>Scalability or other forms of supporting multiple quality representations are beneficial if they do not incur significant bitrate overhead and if mandated in the first version. </li> </ul> </section><!-- ends: "3.1 from line 197--><section title="Internet Protocol Television (IPTV)"><!-- 3.2, line 269--><t>This is a service for delivering television content over IP-based networks. IPTV may be classified into two main groups based on the type of delivery, as follows: </t> <ul> <li>unicast (e.g., for video on demand), where delay is not crucial; and </li> <li>multicast/broadcast (e.g., for transmitting news) wherezapping, i.e.zapping (i.e., streamchanging,changing) delay is important. </li> </ul> <t>In the IPTV scenario, traffic is transmitted over managed(QoS- based)(QoS-based) networks. Typical content used in this application is news, movies, cartoons, series, TV shows, etc. One important requirement for both groups isRandomthat random access topictures, i.e.pictures (i.e., the random access period(RAP)(RAP)) should be kept small enough(approximately,(approximately 1-5 seconds). Optional requirements are as follows: </t> <ul> <li>Temporal (frame-rate) scalability; and </li> <li>Resolution and quality (SNR) scalability. </li> </ul><t>For<t> For this application, typical values of resolutions,frame-rates,frame rates, andRAPsPAMs are presented inTable 4. </t> <artwork> <![CDATA[ +----------------------+-------------------------+-----------------+ | Resolution * | Frame-rate, fps | PAM | +----------------------+-------------------------+-----------------+ +----------------------+-------------------------+-----------------+ | 2160p (4K),3840x2160 | 24/1.001,<xref target="IPTV" />. </t> <table anchor="IPTV"> <name> IPTV: Typical Values of Resolutions, Frame Rates, and PAMs</name> <thead> <tr> <th>Resolution *</th> <th>PAM</th> <th align="center">Frame Rate, FPS **</th> </tr> </thead> <tbody> <tr> <td align="center">2160p (4K), 3840x2160</td> <td>RA</td> <td align="center" rowspan="8"><br/><br/><br/>24/1.001, 24, 25,| RA | +----------------------+ +-----------------+ | 1080p, 1920x1080 | 30/1.001,<br/>30/1.001, 30, 50,| RA | +----------------------+ +-----------------+ | 1080i, 1920x1080* | 60/1.001,<br/>60/1.001, 60, 100,| RA | +----------------------+ +-----------------+ | 720p, 1280x720 | 120/1.001,<br/>120/1.001, 120| RA | +----------------------+ +-----------------+ | 576p</td> </tr> <tr> <td>1080p, 1920x1080</td> <td>RA</td> </tr> <tr> <td>1080i, 1920x1080*</td> <td>RA</td> </tr> <tr> <td>720p, 1280x720</td> <td>RA</td> </tr> <tr> <td>576p (EDTV),720x576 | The set of frame-rates | RA | +----------------------+ +-----------------+ | 576i720x576</td> <td>RA</td> </tr> <tr> <td>576i (SDTV),720x576*| presented in this table | RA | +----------------------+ +-----------------+ | 480p720x576*</td> <td>RA</td> </tr> <tr> <td>480p (EDTV),720x480 | is taken from Table 2 | RA | +----------------------+ +-----------------+ | 480i720x480</td> <td>RA</td> </tr> <tr> <td>480i (SDTV),720x480*| in [1] | RA | +----------------------+-------------------------+-----------------+ Table 4. IPTV: typical values of resolutions, frame-rates, and RAPs ]]> </artwork>720x480*</td> <td>RA</td> </tr> </tbody> </table> <t>NB *:*Note: Interlaced content can be handled at the higher system level and not necessarily by using specialized video coding tools. It is included in this table only for the sake ofcompletenesscompleteness, as most video content today is inthea progressive format. </t></section> <!-- ends: "3.2<t> **Note: The set of frame rates presented in this table is taken fromline 269-->Table 2 in <xref target="BT2020-2" />. </t> </section> <section title="Videoconferencing"> <!-- 3.3, line 319-->Conferencing"> <t>This is a form of video connection over the Internet. This form allows users to establish connections to two or more people by two- way video and audio transmission for communication inreal-time.real time. For this application, both stationary and mobile devices can be used. The main requirements are as follows: </t> <ul> <li>Delay should be kept as low as possible (the preferable and maximum end-to-end delay values should be less than 100 ms[9]<xref target="SG-16"/> and 320 ms[2],<xref target="G1091"/>, respectively); </li> <li>Temporal (frame-rate) scalability; and </li> <li>Error robustness. </li> </ul><t>Support<t> Support of resolution and quality (SNR) scalability is highly desirable. For this application, typical values of resolutions,frame-rates,frame rates, andRAPsPAMs are presented inTable 5.<xref target="vid-conf"/>. </t><artwork> <![CDATA[ +----------------------+-------------------------+----------------+ | Resolution | Frame-rate, fps | PAM | +----------------------+-------------------------+----------------+ +----------------------+-------------------------+----------------+ | 1080p, 1920x1080 | 15, 30 | FIZD | +----------------------+-------------------------+----------------+ | 720p, 1280x720 | 30, 60 | FIZD | +----------------------+-------------------------+----------------+ | 4CIF, 704x576 | 30, 60 | FIZD | +----------------------+-------------------------+----------------+ | 4SIF, 704x480 | 30, 60 | FIZD | +----------------------+-------------------------+----------------+ | VGA, 640x480 | 30, 60 | FIZD | +----------------------+-------------------------+----------------+ | 360p, 640x360 | 30, 60 | FIZD | +----------------------+-------------------------+----------------+ Table 5.<table anchor="vid-conf"> <name> Videoconferencing: typical valuesConferencing: Typical Values ofresolutions, frame- rates, and RAPs ]]> </artwork>Resolutions, Frame Rates, and PAMs</name> <thead> <tr> <th>Resolution</th> <th>Frame Rate, FPS</th> <th>PAM</th> </tr> </thead> <tbody> <tr> <td>1080p, 1920x1080 </td> <td>15, 30</td> <td>FIZD</td> </tr> <tr> <td>720p, 1280x720</td> <td>30, 60</td> <td>FIZD</td> </tr> <tr> <td>4CIF, 704x576</td> <td>30, 60</td> <td>FIZD</td> </tr> <tr> <td>4SIF, 704x480</td> <td>30, 60</td> <td>FIZD</td> </tr> <tr> <td>VGA, 640x480 </td> <td>30, 60</td> <td>FIZD</td> </tr> <tr> <td>360p, 640x360</td> <td>30, 60</td> <td>FIZD</td> </tr> </tbody> </table> </section><!-- ends: "3.3 from line 319--><section title="Videosharing"> <!-- 3.4, line 358-->Sharing"> <t>This is a service that allows people to upload and share video data (using live streaming or not) andtowatchthem.those videos. It is also known as video hosting. A typicalUser-generatedUser-Generated Content (UGC) scenario for this application is to capture video using mobile cameras such asGoProGoPros or cameras integrated into smartphones (amateur video). The main requirements are as follows: </t> <ul> <li>Random access to pictures for downloaded video data; </li> <li>Temporal (frame-rate) scalability; and </li> <li>Error robustness. </li> </ul><t>Support<t> Support of resolution and quality (SNR) scalability is highly desirable. For this application, typical values of resolutions,frame-rates,frame rates, andRAPsPAMs are presented inTable 6.<xref target="vid-share" />. </t><artwork> <![CDATA[ +----------------------+-------------------------+----------------+ | Resolution | Frame-rate, fps | PAM | +----------------------+-------------------------+----------------+ +----------------------+-------------------------+----------------+ | 2160p (4K),3840x2160 | 24,<t> Typical values of resolutions and frame rates in <xref target="vid-share" /> are taken from <xref target="YOUTUBE" />. </t> <table anchor="vid-share"> <name> Video Sharing: Typical Values of Resolutions, Frame Rates, and PAMs </name> <thead> <tr> <th>Resolution</th> <th>Frame Rate, FPS</th> <th>PAM</th> </tr> </thead> <tbody> <tr> <td>2160p (4K), 3840x2160</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ | 1440p (2K),2560x1440 | 24,60</td> <td>RA</td> </tr> <tr> <td>1440p (2K), 2560x1440</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ | 1080p, 1920x1080 | 24,60</td> <td>RA</td> </tr> <tr> <td>1080p, 1920x1080</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ | 720p, 1280x720 | 24,60</td> <td>RA</td> </tr> <tr> <td>720p, 1280x720</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ | 480p, 854x480 | 24,60</td> <td>RA</td> </tr> <tr> <td>480p, 854x480</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ | 360p,60</td> <td>RA</td> </tr> <tr> <td>360p, 640x360| 24,</td> <td>24, 25, 30, 48, 50,60 | RA | +----------------------+-------------------------+----------------+ Table 6. Video sharing: typical values of resolutions, frame-rates [10], and RAPs ]]> </artwork>60</td> <td>RA</td> </tr> </tbody> </table> </section><!-- ends: "3.4 from line 358--><section title="Screencasting"><!-- 3.5, line 373--><t>This is a service that allows users to record and distribute video data from a computerdesktop screen output.screen. This service requires efficient compression of computer-generated content with high visual quality up to visually and mathematically (numerically) lossless[11].<xref target="HEVC-EXT" />. Currently, this application includes business presentations(powerpoint, word(PowerPoint, Word documents, email messages, etc.), animation (cartoons), gaming content, and datavisualization, i.e. suchvisualization. This type of contentthatis characterized by fast motion, rotation, smooth shade, 3D effect, highly saturated colors with full resolution, clear textures and sharp edges with distinct colors[11]),<xref target="HEVC-EXT" />, virtual desktop infrastructure (VDI), screen/desktop sharing and collaboration, supervisory control and data acquisition (SCADA) display, automotive/navigation display, cloud gaming, factory automation display, wireless display, display wall, digital operating room (DiOR), etc. For this application, an important requirement is the support of low-delay configurations with zero structuraldelay,delay for a wide range of video formats (e.g., RGB) in addition to YCbCr 4:2:0 and YCbCr 4:4:4[11].<xref target="HEVC-EXT" />. For this application, typical values of resolutions,frame-rates,frame rates, andRAPsPAMs are presented inTable 7. </t> <artwork> <![CDATA[ +----------------------+-------------------------+----------------+ | Resolution | Frame-rate, fps | PAM | +----------------------+-------------------------+----------------+ +----------------------+-------------------------+----------------+ | Input<xref target="screencast" />. </t> <table anchor="screencast"> <name> Screencasting for RGB and YCbCr 4:4:4 Format: Typical Values of Resolutions, Frame Rates, and PAMs </name> <thead> <tr> <th align="center">Resolution</th> <th align="center">Frame Rate, FPS</th> <th align="center">PAM</th> </tr> </thead> <tbody> <tr> <td colspan="3" align="center">Input color format: RGB4:4:4 | +----------------------+-------------------------+----------------+ | 5k, 5120x2880 | 15,4:4:4</td> </tr> <tr> <td>5k, 5120x2880</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | 4k, 3840x2160 | 15,FIZD</td> </tr> <tr> <td>4k, 3840x2160</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | WQXGA, 2560x1600 | 15,FIZD</td> </tr> <tr> <td>WQXGA, 2560x1600</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | WUXGA, 1920x1200 | 15,FIZD</td> </tr> <tr> <td>WUXGA, 1920x1200</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | WSXGA+, 1680x1050 | 15,FIZD</td> </tr> <tr> <td>WSXGA+, 1680x1050</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | WXGA, 1280x800 | 15,FIZD</td> </tr> <tr> <td>WXGA, 1280x800</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | XGA, 1024x768 | 15,FIZD</td> </tr> <tr> <td>XGA, 1024x768</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | SVGA, 800x600 | 15,FIZD</td> </tr> <tr> <td>SVGA, 800x600</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | VGA, 640x480 | 15,FIZD</td> </tr> <tr> <td>VGA, 640x480</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | InputFIZD</td> </tr> <tr> <td colspan="3" align="center">Input color format: YCbCr4:4:4 | +----------------------+-------------------------+----------------+ | 5k, 5120x2880 | 15,4:4:4</td> </tr> <tr> <td>5k, 5120x2880</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | 4k, 3840x2160 | 15,FIZD</td> </tr> <tr> <td>4k, 3840x2160</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | 1440pFIZD</td> </tr> <tr> <td>1440p (2K),2560x1440| 15,2560x1440</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | 1080p, 1920x1080 | 15,FIZD</td> </tr> <tr> <td>1080p, 1920x1080</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ | 720p, 1280x720 | 15,FIZD</td> </tr> <tr> <td>720p, 1280x720</td> <td>15, 30,60 | AI,60</td> <td>AI, RA,FIZD | +----------------------+-------------------------+----------------+ Table 7. Screencasting for RGB and YCbCr 4:4:4 format: typical values of resolutions, frame-rates, and RAPs ]]> </artwork>FIZD</td> </tr> </tbody> </table> </section><!-- ends: "3.5 from line 373--><section title="Gamestreaming"> <!-- 3.6, line 380-->Streaming"> <t>This is a service that provides game content over the Internet to different local devices such asnotebooks,notebooks and gamingtablets, etc.tablets. In this category of applications, the server renders 3D games in a cloudserver,server and streams the game to any device with a wired or wireless broadband connection[12].<xref target="GAME" />. There arelow latencylow-latency requirements for transmitting user interactions and receiving game datain less thanwith aturn-aroundturnaround delay of less than 100 ms. This allows anyone to play (or resume)full featuredfull-featured games from anywhereinon the Internet[12].<xref target="GAME" />. An example of this application is Nvidia Grid[12].<xref target="GAME" />. Anothercategoryapplication scenario of this category is broadcast of video games played by people over the Internet in real time or for later viewing[12].<xref target="GAME" />. There are manycompaniescompanies, such asTwitch,Twitch and YY inChinaChina, that enable game broadcasting[12].<xref target="GAME" />. Games typically contain a lot of sharp edges and large motion[12].<xref target="GAME" />. The main requirements are as follows: </t> <ul> <li>Random access to pictures for game broadcasting; </li> <li>Temporal (frame-rate) scalability; and </li> <li>Error robustness. </li> </ul><t>Support<t> Support of resolution and quality (SNR) scalability is highly desirable. For this application, typical values of resolutions,frame-rates,frame rates, andRAPsPAMs are similar to ones presented inTable 5.<xref target="vid-conf"/>. </t> </section><!-- ends: "3.6 from line 380--><section title="Videomonitoring / surveillance"> <!-- 3.7, line 393-->Monitoring and Surveillance"> <t>This is a type of live broadcasting over IP-based networks. Video streams are sent to many receivers at the same time. A new receiver may connect to the stream at an arbitrary moment, so the random access period should be kept small enough (approximately,~1-51-5 seconds). Data are transmitted publicly in the case of video monitoring and privately in the case of videosurveillance, respectively.surveillance. ForIP-IP cameras that have to capture,processprocess, and encode video data, complexity -- including computational and hardwarecomplexitycomplexity, as well as memory bandwidth -- should be kept low to allow real-time processing. In addition, support of a high dynamic range and a monochrome mode (e.g., for infrared cameras) as well as resolution and quality (SNR) scalability is an essential requirement for video surveillance. In someuse-cases,use cases, high video signal fidelity is required even after lossy compression. Typical values of resolutions,frame-rates,frame rates, andRAPsPAMs for video monitoring/and surveillance applications are presented inTable 8.<xref target="monitoring"/>. </t><artwork> <![CDATA[ +----------------------+-------------------------+-----------------+ | Resolution | Frame-rate, fps | PAM | +----------------------+-------------------------+-----------------+ +----------------------+-------------------------+-----------------+ | 2160p (4K),3840x2160 | 12, 25, 30 | RA, FIZD | +----------------------+-------------------------+-----------------+ | 5Mpixels, 2560x1920 | 12, 25, 30 | RA, FIZD | +----------------------+-------------------------+-----------------+ | 1080p, 1920x1080 | 25, 30 | RA, FIZD | +----------------------+-------------------------+-----------------+ | 1.3Mpixels, 1280x960 |<table anchor="monitoring"> <name> Video Monitoring and Surveillance: Typical Values of Resolutions, Frame Rates, and PAMs</name> <thead> <tr> <th>Resolution</th> <th>Frame Rate, FPS</th> <th>PAM</th> </tr> </thead> <tbody> <tr> <td>2160p (4K), 3840x2160</td> <td>12, 25,30 | RA, FIZD | +----------------------+-------------------------+-----------------+ | 720p, 1280x720 |30</td> <td>RA, FIZD</td> </tr> <tr> <td>5Mpixels, 2560x1920</td> <td>12, 25,30 | RA, FIZD | +----------------------+-------------------------+-----------------+ | SVGA,30</td> <td>RA, FIZD</td> </tr> <tr> <td>1080p, 1920x1080</td> <td>25, 30</td> <td>RA, FIZD</td> </tr> <tr> <td>1.23Mpixels, 1280x960</td> <td>25, 30</td> <td>RA, FIZD</td> </tr> <tr> <td>720p, 1280x720</td> <td>25, 30</td> <td>RA, FIZD</td> </tr> <tr> <td>SVGA, 800x600| 25, 30 | RA, FIZD | +----------------------+-------------------------+-----------------+ Table 8. Video monitoring / surveillance: typical values of resolutions, frame-rates, and RAPs ]]> </artwork></td> <td>25, 30</td> <td>RA, FIZD</td> </tr> </tbody> </table> </section><!-- ends: "3.7 from line 393--></section><!-- ends: "3 from line 191--><section title="Requirements"><!-- 4, line 401--><t>Taking the requirements discussed above for specific video applications, thischaptersection proposes requirements for aninternetInternet video codec. </t> <section anchor="gen-reqs" title="Generalrequirements"> <!-- 4.1, line 406-->Requirements"> <section anchor="efficiency" title="Coding Efficiency"> <t>4.1.1.The mostbasicfundamental requirement is coding efficiency,i.e.i.e., compression performance on both "easy" and "difficult" content for applications and use cases inSection 2.<xref target="apps" />. The codec should provide higher coding efficiency over state-of-the-art video codecs such as HEVC/H.265 and VP9, at leastby 25%25%, in accordance with the methodology described inSection 4.1<xref target="eval-method"/> of this document. For higher resolutions, the improvements in coding efficiencyimprovementsare expected to be higher than for lower resolutions.<!-- 4.1.1, line 408--></t><t> 4.1.2. Good quality</section> <section anchor="profiles" title="Profiles and Levels"> <t>Good-quality specification and well-defined profiles and levels are required to enable device interoperability and facilitate decoder implementations. A profile consists of a subset of entire bitstream syntaxelements and consequentlyelements; consequently, it also defines the necessary tools for decoding a conforming bitstream of that profile. A level imposes a set of numerical limits to the values of some syntax elements. An example of codec levels to be supported is presented inTable 9.<xref target="codec-levels"/>. An actual level definition should include constraints on features that impact the decoder complexity. For example, these features might be as follows: maximumbit-rate,bitrate, line buffer size, memory usage, etc. </t><artwork> <![CDATA[ +------------------------------------------------------------------+ | Level | Example<table anchor="codec-levels"> <name>Codec Levels</name> <thead> <tr> <th>Level</th> <th>Example picture resolution at highest framerate | +-------------+----------------------------------------------------+ | | 128x96(12,288*)@30.0 | | 1 | 176x144(25,344*)@15.0 | +-------------+----------------------------------------------------+ | 2 | 352x288(101,376*)@30.0 | +-------------+----------------------------------------------------+ | | 352x288(101,376*)@60.0 | | 3 | 640x360(230,400*)@30.0 | +-------------+----------------------------------------------------+ | | 640x360(230,400*)@60.0 | | 4 | 960x540(518,400*)@30.0 | +-------------+----------------------------------------------------+ | | 720x576(414,720*)@75.0 | | 5 | 960x540(518,400*)@60.0 | | | 1280x720(921,600*)@30.0 | +-------------+----------------------------------------------------+ | | 1,280x720(921,600*)@68.0 | | 6 | 2,048x1,080(2,211,840*)@30.0 | +-------------+----------------------------------------------------+ | | 1,280x720(921,600*)@120.0 | | 7 | 2,048x1,080(2,211,840*)@60.0 | +-------------+----------------------------------------------------+ | | 1,920x1,080(2,073,600*)@120.0 | | 8 | 3,840x2,160(8,294,400*)@30.0 | | | 4,096x2,160(8,847,360*)@30.0 | +-------------+----------------------------------------------------+ | | 1,920x1,080(2,073,600*)@250.0 | | 9 | 4,096x2,160(8,847,360*)@60.0 | +-------------+----------------------------------------------------+ | | 1,920x1,080(2,073,600*)@300.0 | | 10 | 4,096x2,160(8,847,360*)@120.0 | +-------------+----------------------------------------------------+ | | 3,840x2,160(8,294,400*)@120.0 | | 11 | 8,192x4,320(35,389,440*)@30.0 | +-------------+----------------------------------------------------+ | | 3,840x2,160(8,294,400*)@250.0 | | 12 | 8,192x4,320(35,389,440*)@60.0 | +-------------+----------------------------------------------------+ | | 3,840x2,160(8,294,400*)@300.0 | | 13 | 8,192x4,320(35,389,440*)@120.0 | +-------------+----------------------------------------------------+ Table 9. Codec levels ]]> </artwork>rate</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>128x96(12,288*)@30.0<br/>176x144(25,344*)@15.0</td> </tr> <tr> <td>2</td> <td>352x288(101,376*)@30.0</td> </tr> <tr> <td>3</td> <td>352x288(101,376*)@60.0<br/>640x360(230,400*)@30.0</td> </tr> <tr> <td>4</td> <td>640x360(230,400*)@60.0<br/>960x540(518,400*)@30.0</td> </tr> <tr> <td>5</td> <td>720x576(414,720*)@75.0<br/>960x540(518,400*)@60.0<br/>1280x720(921,600*)@30.0</td> </tr> <tr> <td>6</td> <td>1,280x720(921,600*)@68.0<br/>2,048x1,080(2,211,840*)@30.0</td> </tr> <tr> <td>7</td> <td>1,280x720(921,600*)@120.0</td> </tr> <tr> <td>8</td> <td>1,920x1,080(2,073,600*)@120.0<br/>3,840x2,160(8,294,400*)@30.0<br/>4,096x2,160(8,847,360*)@30.0</td> </tr> <tr> <td>9</td> <td>1,920x1,080(2,073,600*)@250.0<br/>4,096x2,160(8,847,360*)@60.0</td> </tr> <tr> <td>10</td> <td>1,920x1,080(2,073,600*)@300.0<br/>4,096x2,160(8,847,360*)@120.0</td> </tr> <tr> <td>11</td> <td>3,840x2,160(8,294,400*)@120.0<br/>8,192x4,320(35,389,440*)@30.0</td> </tr> <tr> <td>12</td> <td>3,840x2,160(8,294,400*)@250.0<br/>8,192x4,320(35,389,440*)@60.0</td> </tr> <tr> <td>13</td> <td>3,840x2,160(8,294,400*)@300.0<br/>8,192x4,320(35,389,440*)@120.0</td> </tr> </tbody> </table> <t>NB *:*Note: The quantities of pixels are presented forsuchapplicationswherein which a picture can have an arbitrary size (e.g.,screencasting)screencasting). </t><t> 4.1.3. Bitstream</section> <section anchor="syntax" title="Bitstream Syntax"> <t>Bitstream syntax should allow extensibility and backward compatibility. New features can be supported easily by using metadata(e.g., such(such as SEI messages, VUI, and headers) without affecting the bitstream compatibility with legacy decoders. A newer version of the decoder shall be able to play bitstreams of an older version of the same or lower profile and level. </t> </section> <section anchor="model" title="Parsing and Identification of Sample Components"> <t>4.1.4.A bitstream should have a model that allows easy parsing and identification of the sample components (such asISO/IEC14496-10,Annex B of ISO/IEC 14496-10 <xref target="ISO14496-10" /> or ISO/IEC14496-15).14496-15 <xref target="ISO14496-15"/>). In particular, information needed for packet handling (e.g., frame type) should not require parsing anything below the header level.<!-- 4.1.4, line 414--></t> </section> <section anchor="tools" title="Perceptual Quality Tools"> <t>4.1.5.Perceptual quality tools (such as adaptive QP and quantization matrices) should be supported by the codecbit-stream. <!-- 4.1.5, line 416-->bitstream. </t><!-- ends: "4.1.5 from line 416--></section> <section anchor="buffer" title="Buffer Model"> <t>4.1.6.The codec specification shall define a buffer model such as hypothetical reference decoder (HRD).<!-- 4.1.6, line 418--></t><!-- ends: "4.1.6 from line 418--></section> <section anchor="integration" title="Integration"> <t>4.1.7.Specifications providing integration with system and delivery layers should be developed.<!-- 4.1.7, line 420--></t><!-- ends: "4.1.7 from line 420--></section><!-- ends: "4.1 from line 406--></section> <section title="Basicrequirements"> <!-- 4.2, line 423-->Requirements"> <section title="Inputsource formats:"> <!-- 4.2.1, line 425-->Source Formats"> <t> Input pictures coded by a video codec should have one of the following formats: </t> <ul> <li>Bit depth:8-8 and10-bits10 bits (up to12-bits12 bits for a high profile) per colorcomponent;component. </li> <li><t>Color sampling formats:</t> <ul> <li>YCbCr4:2:0;4:2:0 </li> <li>YCbCr 4:4:4, YCbCr4:2:24:2:2, and YCbCr 4:0:0 (preferably in differentprofile(s)).profile(s)) </li> </ul></li> <li>For profiles with bit depth of 10 bits per sample or higher, support of high dynamic range and wide color gamut. </li> <li>Support of arbitrary resolution according to the level constraints forsuchapplicationswherein which a picture can have an arbitrary size (e.g., in screencasting). </li><li>Exemplary</ul> <t> Exemplary input source formats for codec profiles are shown inTable 10. </li> </ul> <artwork> <![CDATA[ +---------+-----------------+-------------------------------------+ | Profile | Bit-depths<xref target="exemplary"/>. </t> <table anchor="exemplary"> <name>Exemplary Input Source Formats for Codec Profiles</name> <thead> <tr> <th>Profile</th> <th>Bit depths per| Color sampling formats | | |colorcomponent | | +---------+-----------------+-------------------------------------+ | 1 | 8 and 10 | 4:0:0 and 4:2:0 | +---------+-----------------+-------------------------------------+ | 2 | 8component</th> <th>Color sampling formats</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>8 and 10</td> <td>4:0:0 and 4:2:0</td> </tr> <tr> <td>2</td> <td>8 and10 | 4:0:0, 4:2:010</td> <td>4:0:0, 4:2:0, and4:4:4 | +---------+-----------------+-------------------------------------+ | 3 | 8, 104:4:4</td> </tr> <tr> <td>3</td> <td>8, 10, and12 | 4:0:0,12</td> <td>4:0:0, 4:2:0,4:2:24:2:2, and4:4:4 | +---------+-----------------+-------------------------------------+ Table 10. Exemplary input source formats for codec profiles ]]> </artwork>4:4:4</td> </tr> </tbody> </table> </section><!-- ends: "4.2.1 from line 425--><section title="Codingdelay:"> <!-- 4.2.2, line 466-->Delay"> <t> In order to meet coding delay requirements, a video codec should support all of the following: </t> <ul> <li><t>Support of configurations with zero structuraldelaydelay, also referred to as "low-delay" configurations.</t> <ul><li>Note 1: end-to-end<li>Note: End-to-end delay should beup tono more than 320 ms[2]<xref target="G1091" />, butitsit is preferable for its valueshouldto be less than 100 ms[9]<xref target="SG-16"/>. </li> </ul></li> <li>Support of efficient random access point encoding (such asintra codingintracoding and resetting of contextvariables)variables), as well as efficient switching between multiple quality representations. </li> <li>Support of configurations withnon-zerononzero structural delay (such as out-of-order ormulti-passmultipass encoding) for applications without low-delayrequirementsrequirements, if such configurations provide additional compression efficiency improvements. </li> </ul> </section><!-- ends: "4.2.2 from line 466--><sectiontitle="Complexity:"> <!-- 4.2.3, line 485-->title="Complexity"> <t> Encoding and decoding complexity considerations are as follows: </t> <ul> <li>Feasible real-time implementation of both an encoder and a decoder supporting a chosen subset of tools for hardware and software implementation on a wide range of state-of-the-art platforms. The subset of real-time encoder toolssubsetshould provide meaningful improvement in compression efficiency at reasonable complexity of hardware and software encoder implementations as compared to real-time implementations of state-of-the-art video compression technologies such as HEVC/H.265 and VP9. </li> <li>High-complexity software encoder implementations used by offline encoding applications can have a 10x or more complexity increase compared to state-of-the-art video compression technologies such as HEVC/H.265 and VP9. </li> </ul> </section><!-- ends: "4.2.3 from line 485--><sectiontitle="Scalability:"> <!-- 4.2.4, line 495--> <ul> <li>Temporal (frame-rate)title="Scalability"> <t> The mandatory scalability requirement is as follows: </t> <ul> <li>Temporal (frame-rate) scalability should be supported. </li> </ul> </section><!-- ends: "4.2.4 from line 495--><section title="Errorresilience:"> <!-- 4.2.5, line 501--> <ul> <li>ErrorResilience"> <t> In order to meet the error resiliencetoolsrequirement, a video codec should satisfy all of the following conditions: </t> <ul> <li>Tools that are complementary to theerror protectionerror-protection mechanisms implemented on the transport level should be supported. </li> <li>The codec should support mechanisms that facilitate packetization of a bitstream for common network protocols. </li> <li>Packetization mechanisms should enable frame-level error recovery by means of retransmission or error concealment. </li> <li>The codec should support effective mechanisms for allowing decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission. </li> <li>The bitstream specification shall support independently decodablesub-framesubframe units similar to slices or independent tiles. It shall be possible for the encoder to restrict thebit-streambitstream to allow parsing of thebit-streambitstream after apacket-losspacket loss and to communicate it to the decoder. </li> </ul> </section><!-- ends: "4.2.5 from line 501--></section><!-- ends: "4.2 from line 423--><section title="Optionalrequirements"> <!-- 4.3, line 519-->Requirements"> <section title="Inputsource formats"> <!-- 4.3.1, line 522-->Source Formats"> <t> It is a desired but not mandatory requirement for a video codec to support some of the following features: </t> <ul> <li>Bit depth: up to16-bits16 bits per color component. </li> <li>Color sampling formats: RGB 4:4:4. </li> <li>Auxiliary channel (e.g., alpha channel) support. </li> </ul> </section><!-- ends: "4.3.1 from line 522--><sectiontitle="Scalability:"> <!-- 4.3.2, line 534-->title="Scalability"> <t> Desirable scalability requirements are as follows: </t> <ul> <li>Resolution and quality (SNR) scalability thatprovide low compressionprovides a low-compression efficiency penalty(up(increase of up to 5% of BD-rate[13] increase<xref target="PSNR" /> per layer with reasonable increase of both computational and hardware complexity) can be supported in the main profile of the codec being developed by the NETVCWG.Working Group. Otherwise, a separate profile is needed to support these types of scalability. </li> <li>Computational complexityscalability(i.e.scalability (i.e., computational complexity is decreasing along with degrading picture quality) is desirable. </li> </ul> </section><!-- ends: "4.3.2 from line 534--><sectiontitle="Complexity:"> <!-- 4.3.3, line 544-->title="Complexity"> <t>Tools that enable parallel processing (e.g., slices, tiles,wave frontand wave-front propagation processing) at both encoder and decoder sides are highly desirable for many applications. </t> <ul> <li>High-levelmulti-coremulticore parallelism: encoder and decoder operation, especially entropy encoding and decoding, should allow multiple frames orsub-framesubframe regions(e.g.(e.g., 1D slices, 2D tiles, or partitions) to be processed concurrently, either independently or with deterministic dependencies that can be efficientlypipelinedpipelined. </li> <li>Low-levelinstruction setinstruction-set parallelism: favor algorithms that are SIMD/GPU friendly over inherently serial algorithms </li> </ul> </section><!-- ends: "4.3.3 from line 544--><section title="Codingefficiency"> <!-- 4.3.4, line 557-->Efficiency"> <t>Compression efficiency on noisy content, content with film grain, computer generated content, and low resolution materials is desirable. </t> </section><!-- ends: "4.3.4 from line 557--></section><!-- ends: "4.3 from line 519--></section><!-- ends: "4 from line 401--><section anchor="eval-method" title="Evaluationmethodology"> <!-- 5, line 563-->Methodology"> <t>As shown inFig.1,<xref target="QP"/>, compression performance testing is performed in3three overlapped ranges that encompass10ten different bitrate values: </t> <ul> <li>Low bitrate range (LBR) is the range that contains the4four lowest bitrates of the10ten specified bitrates(1(one of the4four bitrate values is shared with the neighboringrange);range). </li> <li>Medium bitrate range (MBR) is the range that contains the4four medium bitrates of the10ten specified bitrates(2(two of the4four bitrate values are shared with the neighboringranges);ranges). </li> <li>High bitrate range (HBR) is the range that contains the4four highest bitrates of the10ten specified bitrates(1(one of the4four bitrate values is shared with the neighboring range). </li> </ul> <t>Initially, for the codec selected as a reference one (e.g., HEVC or VP9), a set of10ten QP (quantization parameter) values should be specified as in[14]<xref target="I-D.ietf-netvc-testing" />, and corresponding quality values should be calculated. InFig.1,<xref target="QP"/>, QP and quality values are denoted asQP0, QP1, QP2,..., QP8, QP9"QP0"-"QP9" andQ0, Q1, Q2,..., Q8, Q9,"Q0"-"Q9", respectively. To guarantee the overlaps of quality levels between the bitrate ranges of the reference and tested codecs, a quality alignment procedure should be performed for each range's outermost (left- and rightmost) quality levels Qk of the reference codec(i.e.(i.e., for Q0, Q3, Q6, and Q9) and the quality levels Q'k(i.e.(i.e., Q'0, Q'3, Q'6, and Q'9) of the tested codec. Thus, these quality levelsQ'k and, hence,Q'k, and hence the corresponding QP value QP'k(i.e.(i.e., QP'0, QP'3, QP'6, andQP'9)QP'9), of the tested codec should be selected using the following formulas: </t><sourcecode><artwork name="" type="" align="left" alt=""><![CDATA[ Q'k = min { abs(Q'i - Qk) }, i in R QP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) }, i in R</sourcecode>]]></artwork> <t>where R is the range of the QP indexes of the tested codec,i.e.i.e., the candidate Internet video codec. The inner quality levels(i.e.(i.e., Q'1, Q'2, Q'4, Q'5, Q'7, andQ'8)Q'8), as well as their corresponding QP values of each range(i.e.(i.e., QP'1, QP'2, QP'4, QP'5, QP'7, andQP'8)QP'8), should be as equidistantly spaced as possible between the left- and rightmost quality levels without explicitly mapping their values using theaboveprocedure describedprocedure.above. </t><figure><figure anchor="QP"> <name>Quality/QPalignmentAlignment forcompression performance evaluationCompression Performance Evaluation </name> <artwork><![CDATA[QP'9 QP'8 QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0<+-----<+----- ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Tested | | | | | | | | | | | codec Q'0 Q'1 Q'2 Q'3 Q'4 Q'5 Q'6 Q'7 Q'8 Q'9<+-----<+----- ^ ^ ^ ^ | | | | Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9<+-----<+----- ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Reference | | | | | | | | | | | codec QP9 QP8 QP7 QP6 QP5 QP4 QP3 QP2 QP1 QP0<+----- +----------------+--------------+--------------+---------><+----- +----------------+--------------+--------------+---------> ^ ^ ^ ^Bit-rateBitrate |-------LBR------| |-----HBR------| ^ ^ |------MBR-----|]]></artwork> </figure> <t>Since the QP mapping results may vary for different sequences,eventually,this quality alignment procedure eventually needs to beseparatelyperformed separately for each quality assessment index and each sequence used for codec performance evaluation to fulfill theaboverequirements described above. </t> <t>To assess the quality of output (decoded) sequences, twoindexes, PSNR [3]indexes (PSNR <xref target="ISO29170-1"/> and MS-SSIM[3,15]<xref target="ISO29170-1"/> <xref target="MULTI-SCALE"/>) are separately computed. In the case of the YCbCr color format, PSNR should be calculated for each colorplaneplane, whereas MS-SSIM is calculated for the luma channel only. In the case of the RGB color format, both metrics are computed for R,GG, and B channels. Thus, for each sequence, 30 RD-points for PSNR(i.e.(i.e., three RD-curves, one for each channel) and 10 RD-points for MS-SSIM(i.e.(i.e., one RD-curve, for luma channel only) should be calculated in the case of YCbCr. If content is encoded as RGB, 60 RD-points (30 for PSNR and 30 for MS-SSIM) should becalculated, i.e.calculated (i.e., threeRD- curves (oneRD-curves, one for each channel) are computed for PSNR as well as three RD-curves (one for each channel) for MS-SSIM. </t> <t>Finally, to obtain an integral estimation, BD-rate savings[13]<xref target="PSNR" /> should be computed for each range and each quality index. In addition, average values over allthe 3three ranges should be provided for both PSNR and MS-SSIM. A list of video sequences that should be used fortestingtesting, as well as the10ten QP values for the referencecodeccodec, are defined in[14].<xref target="I-D.ietf-netvc-testing" />. Testing processes should use the information on the codec applications presented in this document. As the reference for evaluation, state-of-the-art video codecs such as HEVC/H.265[4,5]<xref target="ISO23008-2"/><xref target="H265"/> or VP9 must be used. The reference source code of the HEVC/H.265 codec can be found at[6].<xref target="HEVC"/>. The HEVC/H.265 codec must be configured according to[16]<xref target="CONDITIONS"/> andTable 11. </t> <artwork> <![CDATA[ +----------------------+-------------------------------------------+ | Intra-period, second |<xref target="intra-period" />. </t> <table anchor="intra-period"> <name>Intraperiods for Different HEVC/H.265 Encoding Modes According to [16]</name> <thead> <tr> <th>Intra-period, second</th> <th>HEVC/H.265 encoding mode according to[16]| +----------------------+-------------------------------------------+ | AI | Intra<xref target="CONDITIONS"/></th> </tr> </thead> <tbody> <tr> <td>AI</td> <td>Intra Main or IntraMain10 | +----------------------+-------------------------------------------+ | RA | RandomMain10</td> </tr> <tr> <td>RA</td> <td>Random access Mainor | | | Randomor<br/>Random accessMain10 | +----------------------+-------------------------------------------+ | FIZD | LowMain10</td> </tr> <tr> <td>FIZD</td> <td>Low delay Mainor | | | Lowor<br/>Low delayMain10 | +----------------------+-------------------------------------------+ Table 11. Intra-periods for different HEVC/H.265 encoding modes according to [16] ]]> </artwork>Main10</td> </tr> </tbody> </table> <t>According to the coding efficiency requirement described inSection 3.1.1,<xref target="efficiency"/>, BD-rate savings calculated for each color plane and averaged for all the video sequences used to test the NETVC codec should be, at least, </t> <ul> <li>25% if calculated over the whole bitrate range; and </li> <li>15% if calculated for each bitrate subrange (LBR, MBR, HBR). </li> </ul> <t>Since values of the two objective metrics (PSNR and MS-SSIM) are available for some color planes, each value should meet these coding efficiencyrequirements, i.e.requirements. That is, the final BD-rate saving denoted as S is calculated for a given color plane as follows: </t><sourcecode><artwork name="" type="" align="left" alt=""><![CDATA[ S = min { S_psnr, S_ms-ssim}, </sourcecode>} ]]></artwork> <t>where S_psnr and S_ms-ssim are BD-rate savings calculated for the given color plane using PSNR and MS-SSIM metrics, respectively. </t> <t>In addition to the objective quality measures defined above, subjective evaluation must also be performed for the final NETVC codec adoption. For subjective tests, the MOS-based evaluation procedure must be used as described insectionSection 2.1 of[3].<xref target="ISO29170-1" />. For perception-oriented tools that primarily impact subjective quality, additional tests may also be individually assigned even for intermediate evaluation, subject to a decision of the NETVC WG. </t> </section><!-- ends: "5 from line 563--><section title="Security Considerations"><!-- 6, line 648--><t>This document itself does not address any security considerations. However, it is worth noting that a codec implementation (for both an encoder and a decoder) should take into consideration the worst-case computational complexity, memory bandwidth, and physical memory size needed toprocessesprocess the potentially untrusted input (e.g., the decoded pictures used as references). </t> </section><!-- ends: "6 from line 648--><section title="IANA Considerations"><!-- 7, line 653--><t>This document has no IANA actions. </t> </section><!-- ends: "7 from line 653--> <section title="References"> <!-- 8, line 658--> <section title="Normative References"> <!-- 8.1, line 661--> <t>[1] Recommendation ITU-R BT.2020-2: Parameter</middle> <back> <references> <name>References</name> <references> <name>Normative References</name> <reference anchor="BT2020-2" target="https://www.itu.int/rec/R-REC-BT.2020-2-201510-I/en"> <front> <title>Parameter values forultra- highultra-high definition television systems for production and international programmeexchange, 2015. </t> <t>[2] Recommendation ITU-T G.1091: Qualityexchange</title> <author> <organization>ITU-R</organization> </author> <date month="October" year="2015" /> </front> <seriesInfo name="ITU-R Recommendation" value="BT.2020-2" /> </reference> <reference anchor="G1091" target="https://www.itu.int/rec/T-REC-G.1091/en"> <front> <title>Quality of Experience requirements for telepresenceservices, 2014. </t> <t>[3] ISO/IEC PDTR 29170-1: Informationservices</title> <author> <organization>ITU-T</organization> </author> <date month="October" year="2014" /> </front> <seriesInfo name="ITU-T Recommendation" value="G.1091" /> </reference> <reference anchor="ISO29170-1" target="https://www.iso.org/standard/63637.html"> <front> <title>Information technology -- Advanced image coding and evaluationmethodologies-- Part 1: Guidelines for</t> <t>[4] ISO/IEC 23008-2:2015. Informationimage coding system evaluation</title> <author> <organization>ISO</organization> </author> <date month="October" year="2017" /> </front> <seriesInfo name="ISO/IEC" value="TR 29170-1:2017" /> </reference> <reference anchor="ISO23008-2" target="https://www.iso.org/standard/67660.html"> <front> <title>Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency videocoding </t> <t>[5] Recommendation ITU-T H.265: Highcoding</title> <author> <organization>ISO</organization> </author> <date month="May" year="2018" /> </front> <seriesInfo name="ISO/IEC" value="23008-2:2015" /> </reference> <reference anchor="H265" target="https://www.itu.int/rec/T-REC-H.265"> <front> <title>High efficiency videocoding, 2013. </t> <t>[6] Highcoding</title> <author> <organization>ITU-T</organization> </author> <date month="November" year="2019" /> </front> <seriesInfo name="ITU-T Recommendation" value="H.265" /> </reference> <reference anchor="HEVC" target="https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/"> <front> <title>High Efficiency Video Coding (HEVC) reference software (HEVC Test Model also known asHM) at the web-site of FraunhoferHM)</title> <author> <organization>Fraunhofer Institute forTelecommunications, Heinrich Hertz Institute (HHI): https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ </t> </section> <!-- ends: "8.1 from line 661--> <section title="Informative References"> <!-- 8.2, line 683--> <t>[7] Definition of the term "highTelecommunications</organization> </author> </front> </reference> </references> <references> <name>Informative References</name> <reference anchor="HDR" target="http://www.digitizationguidelines.gov/term.php?term=highdynamicrangeimaging"> <front> <title>Term: High dynamic rangeimaging" at the web-site of Federalimaging</title> <author> <organization>Federal Agencies Digital GuidelinesInitiative: http://www.digitizationguidelines.gov/term.php?term=highdynami crangeimaging </t> <t>[8] Definition of the term "compression,Initiative</organization> </author> </front> </reference> <reference anchor="COMPRESSION" target="http://www.digitizationguidelines.gov/term.php?term=compressionvisuallylossless"> <front> <title>Term: Compression, visuallylossless" at the web-site of Federallossless</title> <author> <organization>Federal Agencies Digital GuidelinesInitiative: http://www.digitizationguidelines.gov/term.php?term=compressio nvisuallylossless </t> <t>[9] S. Wenger, "TheInitiative</organization> </author> </front> </reference> <reference anchor="SG-16" target="https://www.itu.int/md/T13-SG16-C-0988/en"> <front> <title>The case for scalability support in version 1 of Future VideoCoding," Document COM 16-C 988 R1-E of ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16), Geneva, Switzerland, September 2015. </t> <t>[10] "RecommendedCoding</title> <author surname="Wenger" initials="S"> <organization>ITU-T</organization> </author> <date month="September" year="2015" /> </front> <seriesInfo name="SG 16 (Study Period 2013)" value="Contribution 988" /> </reference> <reference anchor="YOUTUBE" target="https://support.google.com/youtube/answer/1722171?hl=en"> <front> <title>Recommended upload encodingsettings (Advanced)" for the YouTube video-sharing service: https://support.google.com/youtube/answer/1722171?hl=en </t> <t>[11] H. Yu, K. McCann, R. Cohen, and P. Amon, "Requirementssettings</title> <author> <organization>YouTube</organization> </author> </front> </reference> <reference anchor="HEVC-EXT" target="https://mpeg.chiariglione.org/standards/mpeg-h/high-efficiency-video-coding/requirements-extension-hevc-coding-screen-content"> <front> <title>Requirements forfuture extensionsan extension of HEVCinfor codingscreen content", Document N14174of screen content</title> <author surname="Yu" initials="H" role="editor"/> <author surname="McCann" initials="K" role="editor"/> <author surname="Cohen" initials="R" role="editor"/> <author surname="Amon" initials="P" role="editor"/> <date month="January" year="2014" /> </front> <seriesInfo name="ISO/IEC JTC 1/SC 29/WG 11 Moving Picture ExpertsGroup (ISO/IEC JTC 1/SC 29/ WG 11), San Jose, USA, January 2014. </t> <t>[12] Manindra Parhy, "GameGroup" value="MPEG2013/N14174" /> <seriesInfo name="San Jose," value="USA" /> </reference> <reference anchor="GAME" target=""> <front> <title>Game streaming requirement for Future VideoCoding," Document N36771 of ISO/IEC Moving Picture Experts Group (ISO/IECCoding</title> <author surname="Parhy" initials="M"/> <date month="June" year="2015" /> </front> <seriesInfo name="ISO/IEC JTC 1/SC 29/WG11), Warsaw, Poland, June 2015. </t> <t>[13] G. Bjontegaard, "Calculation11 Moving Picture Experts Group" value="N36771" /> <seriesInfo name="Warsaw," value="Poland" /> </reference> <reference anchor="PSNR" target="https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/"> <front> <title>Calculation of average PSNR differences betweenRD-curves," Document VCEG-M33 of ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16), Austin, Texas, USA, April 2001. </t> <t>[14] T. Daede, A. Norkin, and I. Brailovskiy, "Video Codec Testing and Quality Measurement", draft-ietf-netvc-testing-08(work in progress), January 2019, p.23. </t> <t>[15] Z. Wang, E. P. Simoncelli, and A. C. Bovik, "Multi-scaleRD-curves</title> <author surname="Bjontegaard" initials="G"> <organization>ITU-T</organization> </author> <date month="April" year="2001" /> </front> <seriesInfo name="SG 16" value="VCEG-M33" /> </reference> <!-- draft-ietf-netvc-testing-09 exists --> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-ietf-netvc-testing-09.xml"/> <reference anchor="MULTI-SCALE" target="https://ieeexplore.ieee.org/document/1292216"> <front> <title>Multiscale structural similarity for image qualityassessment," Invited Paper, IEEEassessment</title> <author surname="Wang" initials="Z"/> <author surname="Simoncelli" initials="E.P."/> <author surname="Bovik" initials="A.C."/> <date month="November" year="2003" /> </front> <seriesInfo name="IEEE" value="Thirty-Seventh Asilomar Conference on Signals, Systems andComputers, Nov. 2003, Vol. 2, pp. 1398-1402. </t> <t>[16] F. Bossen, "CommonComputers" /> <seriesInfo name="DOI" value="10.1109/ACSSC.2003.1292216" /> </reference> <reference anchor="CONDITIONS" target="http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281"> <front> <title>Common HM test conditions and software referenceconfigurations," Document JCTVC-L1100 of Jointconfigurations</title> <author surname="Bossen" initials="F"> </author> <date month="April" year="2013" /> </front> <seriesInfo name="Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16) and ISO/IEC Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG11), Geneva, Switzerland, January 2013. </t> </section> <!-- ends: "8.2 from line 683--> </section> <!-- ends: "8 from line 658-->11)" value="" /> <seriesInfo name="Document" value="JCTVC-L1100" /> </reference> <reference anchor="BT601" target="https://www.itu.int/rec/R-REC-BT.601/"> <front> <title>Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios</title> <author> <organization>ITU-R</organization> </author> <date month="March" year="2011" /> </front> <seriesInfo name="ITU-R Recommendation" value="BT.601" /> </reference> <reference anchor="ISO14496-10" target="https://www.iso.org/standard/75400.html"> <front> <title>Information technology -- Coding of audio-visual objects -- Part 10: Advanced video coding</title> <author> <organization>ISO/IEC</organization> </author> </front> <seriesInfo name="ISO/IEC DIS" value="14496-10" /> </reference> <reference anchor="ISO14496-15" target="https://www.iso.org/standard/74429.html"> <front> <title>Information technology — Coding of audio-visual objects — Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format</title> <author> <organization>ISO/IEC</organization> </author> </front> <seriesInfo name="ISO/IEC" value="14496-15" /> </reference> <reference anchor="BT709" target="https://www.itu.int/rec/R-REC-BT.709"> <front> <title>Parameter values for the HDTV standards for production and international programme exchange</title> <author> <organization>ITU-R</organization> </author> <date month="June" year="2015" /> </front> <seriesInfo name="ITU-R Recommendation" value="BT.709" /> </reference> </references> </references> <sectiontitle="Acknowledgments"> <!-- 9, line 716-->anchor="sect-8" numbered="false" toc="default"> <name>Acknowledgments</name> <t>The authors would like to thankMr.<contact fullname="Mr. PaulCoverdale, Mr.Coverdale"/>, <contact fullname="Mr. VasilyRufitskiy,Rufitskiy"/>, andDr.<contact fullname="Dr. JianleChenChen"/> for many useful discussions on this document and their help while preparingitit, as well asMr.<contact fullname="Mr. MoZanaty, Dr.Zanaty"/>, <contact fullname="Dr. MinhuaZhou, Dr.Zhou"/>, <contact fullname="Dr. AliBegen, Mr.Begen"/>, <contact fullname="Mr. ThomasDaede, Mr.Daede"/>, <contact fullname="Mr. AdamRoach, Dr.Roach"/>, <contact fullname="Dr. ThomasDavies, Mr.Davies"/>, <contact fullname="Mr. JonathanLennox, Dr.Lennox"/>, <contact fullname="Dr. TimothyTerriberry, Mr.Terriberry"/>, <contact fullname="Mr. PeterThatcher, Dr.Thatcher"/>, <contact fullname="Dr. Jean-MarcValin, Mr.Valin"/>, <contact fullname="Mr. RomanDanyliw, Mr.Danyliw"/>, <contact fullname="Mr. JackMoffitt, Mr.Moffitt"/>, <contact fullname="Mr. GregCoppa,Coppa"/>, andMr.<contact fullname="Mr. AndrewKrupiczkaKrupiczka"/> for their valuable comments on different revisions of this document. </t> </section><!-- ends: "9 from line 716--> </middle> <back></back> </rfc><!-- generated from file draft-ietf-netvc-requirements-10.nroff with nroff2xml 0.1.0 by Tomek Mrugalski -->