wdiff rfc8761.alt-original rfc8761.txt

Network Working Group

Internet Engineering Task Force (IETF)                       A. Filippov
Internet Draft
Request for Comments: 8761                           Huawei Technologies
Intended status:
Category: Informational                                        A. Norkin
ISSN: 2070-1721                                                  Netflix
                                                            J.R. Alvarez
                                                     Huawei Technologies
Expires:
                                                              April 20, 2020                               November 21, 2019

          Video Codec Requirements and Evaluation Methodology
                   draft-ietf-netvc-requirements-10.txt

Abstract

   This document provides requirements for a video codec designed mainly
   for use over the Internet.  In addition, this document describes an
   evaluation methodology needed for measuring the compression efficiency to ensure
   determine whether or not the stated requirements are
   fulfilled or not. have been fulfilled.

Status of this This Memo

   This Internet-Draft document is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts. The list of current Internet-Drafts can be accessed at
   http://datatracker.ietf.org/drafts/current/

   Internet-Drafts are draft documents valid for a maximum
   (IETF).  It represents the consensus of six
   months the IETF community.  It has
   received public review and may be updated, replaced, or obsoleted has been approved for publication by other the
   Internet Engineering Steering Group (IESG).  Not all documents
   at
   approved by the IESG are candidates for any time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list level of Internet
   Standard; see Section 2 of RFC 7841.

   Information about the current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list status of Internet-Draft Shadow Directories can this document, any errata,
   and how to provide feedback on it may be accessed obtained at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on April 21, 2020.
   https://www.rfc-editor.org/info/rfc8761.

Copyright Notice

   Copyright (c) 2019 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info)
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction...................................................3  Introduction
   2. Definitions and abbreviations used  Terminology Used in this document............4 This Document
     2.1.  Definitions
     2.2.  Abbreviations
   3. Applications...................................................6  Applications
     3.1.  Internet Video Streaming..................................6 Streaming
     3.2.  Internet Protocol Television (IPTV).......................8 (IPTV)
     3.3.  Video conferencing.......................................10 Conferencing
     3.4.  Video sharing............................................10 Sharing
     3.5. Screencasting............................................11  Screencasting
     3.6.  Game streaming...........................................13 Streaming
     3.7.  Video monitoring / surveillance..........................13 Monitoring and Surveillance
   4. Requirements..................................................14  Requirements
     4.1.  General requirements.....................................14 Requirements
       4.1.1.  Coding Efficiency
       4.1.2.  Profiles and Levels
       4.1.3.  Bitstream Syntax
       4.1.4.  Parsing and Identification of Sample Components
       4.1.5.  Perceptual Quality Tools
       4.1.6.  Buffer Model
       4.1.7.  Integration
     4.2.  Basic requirements.......................................16 Requirements
       4.2.1.  Input source formats:...............................16 Source Formats
       4.2.2.  Coding delay:.......................................17 Delay
       4.2.3. Complexity:.........................................17  Complexity
       4.2.4. Scalability:........................................18  Scalability
       4.2.5.  Error resilience:...................................18 Resilience
     4.3.  Optional requirements....................................18 Requirements
       4.3.1.  Input source formats................................18 Source Formats
       4.3.2. Scalability:........................................18  Scalability
       4.3.3. Complexity:.........................................19  Complexity
       4.3.4.  Coding efficiency...................................19 Efficiency
   5.  Evaluation methodology........................................19 Methodology
   6.  Security Considerations.......................................22 Considerations
   7.  IANA Considerations...........................................22 Considerations
   8. References....................................................22  References
     8.1.  Normative References.....................................22 References
     8.2.  Informative References...................................23
   9. Acknowledgments...............................................24 References
   Acknowledgments
   Authors' Addresses

1.  Introduction

   In this document,

   This document presents the requirements for a video codec designed
   mainly for use over the Internet are presented. Internet.  The requirements encompass a wide
   range of applications that use data transmission over the
   Internet Internet,
   including Internet video streaming, IPTV, peer-to-peer video
   conferencing, video sharing, screencasting, game streaming streaming, and video
   monitoring / and surveillance.  For each application, typical
   resolutions, frame-rates frame rates, and picture access picture-access modes are presented.
   Specific requirements related to data transmission over packet-loss
   networks are considered as well.  In this document, when we discuss
   data protection techniques
   data-protection techniques, we only refer to methods designed and
   implemented to protect data inside the video codec since there are
   many existing techniques that protect generic data transmitted over
   networks with packet losses.  From the theoretical point of view,
   both packet-loss and bit-error robustness can be beneficial for video
   codecs.  In practice, packet losses are a more significant problem
   than bit corruption in IP networks.  It is worth noting that there is
   an evident interdependence between the possible amount of delay and
   the necessity of error robust error-robust video streams:

   o

   *  If an the amount of delay is not crucial for an application, then
      reliable transport protocols such as TCP that retransmits retransmit
      undelivered packets can be used to guarantee correct decoding of
      transmitted data.

   o

   *  If the amount of delay must be kept low, then either data
      transmission should be error free (e.g., by using managed
      networks) or the compressed video stream should be error
      resilient.

   Thus, error resilience can be useful for delay-critical applications
   to provide low delay in a packet-loss environment.

2. Definitions and abbreviations used  Terminology Used in this document

   +------------------+-----------------------------------------------+
   |      Term        |                   Meaning                     |
   +------------------+-----------------------------------------------+
   | This Document

2.1.  Definitions

   High dynamic     | is a range imaging
      A set of techniques that allow allows a greater   |
   | range imaging    | dynamic range of
      exposures or values (i.e.,   |
   |                  | a wide wider range of values between light
      and dark |
   |                  | areas) than normal digital imaging techniques.|
   |                  | techniques.  The
      intention is to accurately represent the  |
   |                  | wide range of intensity
      levels found in such  |
   |                  | examples such as exterior scenes that include      |
   |                  |
      light-colored items struck by direct sunlight |
   |                  | and areas of deep
      shadow [7].                 |
   |                  |                                               |
   |

   Random access    | is the period
      The period of time between the two closest     |
   | period           | independently decodable
      frames (pictures).    |
   |                  |                                               |
   |

   RD-point         |
      A point in a two dimensional two-dimensional rate-distortion  |
   |                  | space where the
      values of bitrate and quality |
   |                  | metric are used as x- and
      y-coordinates,      |
   |                  | respectively                                  |
   |                  |                                               |
   | respectively.

   Visually         | is a lossless compression
      A form or manner of lossy compression      |
   | lossless         | where the data that are lost
      after the file   |
   | compression      | is compressed and decompressed is not         |
   |                  | detectable to
      the eye; the compressed data    |
   |                  | appearing appear identical to the uncompressed       |
   |                  |
      data [8].                                     |
   |                  |                                               |
   |

   Wide color gamut | is a
      A certain complete color subset (e.g.,     |
   |                  | considered in ITU-R BT.2020) BT.2020
      [1]) that supports a  |
   |                  | wider range of colors (i.e., an extended range|
   |                  |
      range of colors that can be generated by a specific |
   |                  | input or
      output device such as a video camera,|
   |                  | monitor camera, monitor, or printer and can
      be interpreted by  |
   |                  | a color model) than conventional color gamuts |
   |                  |
      (e.g., considered in ITU-R BT.601 [17] or BT.709). |
   +------------------+-----------------------------------------------+

   Table 1. Definitions used in the text of this document
   +--------------+---------------------------------------------------+
   | Abbreviation |                      Meaning                      |
   +--------------+---------------------------------------------------+
   | BT.709 [20]).

2.2.  Abbreviations

   AI      |          All-Intra (each picture is intra-coded)           |
   |

   BD-Rate    |     Bjontegaard Delta Rate                            |
   |

   FIZD     |        just the First picture is Intra-coded, Zero       |
   |              | structural
               Delay                                  |
   |

   FPS         Frames per Second

   GOP      |         Group of Picture                                  |
   |

   GPU         Graphics Processing Unit

   HBR      |         High Bitrate Range                                |
   |

   HDR      |         High Dynamic Range                                |
   |

   HRD      |         Hypothetical Reference Decoder                    |
   |

   HEVC        High Efficiency Video Coding

   IPTV     |        Internet Protocol Television                      |
   |

   LBR      |         Low Bitrate Range                                 |
   |

   MBR      |         Medium Bitrate Range                              |
   |

   MOS      |         Mean Opinion Score                                |
   |

   MS-SSIM    |     Multi-Scale Structural Similarity quality index   |
   |

   PAM      |         Picture Access Mode                               |
   |

   PSNR     |        Peak Signal-to-Noise Ratio                        |
   |

   QoS      |         Quality of Service                                |
   |

   QP       |          Quantization Parameter                            |
   |

   RA       |          Random Access                                     |
   |

   RAP      |         Random Access Period                              |
   |

   RD       |          Rate-Distortion                                   |
   |

   SEI      |         Supplemental Enhancement Information              |
   |

   SIMD        Single Instruction, Multiple Data

   SNR         Signal-to-Noise Ratio

   UGC      |         User-Generated Content                            |
   |

   VDI      |         Virtual Desktop Infrastructure                    |
   |

   VUI      |         Video Usability Information                       |
   |

   WCG      |         Wide Color Gamut                                  |
   +--------------+---------------------------------------------------+

Table 2. Abbreviations used in the text of this document

3.  Applications

   In this chapter, section, an overview of video codec applications that are
   currently available on the Internet market is presented.  It is worth
   noting that there are different use cases for each application that
   define a target platform, and hence platform; hence, there are different types of
   communication channels involved (e.g., wired or wireless channels)
   that are characterized by different quality of service QoS as well as bandwidth; for
   instance, wired channels are considerably more error- free from error than
   wireless channels and therefore require different QoS approaches.
   The target platform, the channel bandwidth bandwidth, and the channel quality
   determine resolutions, frame-rates frame rates, and either quality or
   bit-rates bitrates
   for video streams to be encoded or decoded.  By default, color format
   YCbCr 4:2:0 is assumed for the application scenarios listed below.

3.1.  Internet Video Streaming

   Typical content for this application is movies, TV-series TV series and shows,
   and animation.  Internet video streaming uses a variety of client
   devices and has to operate under changing network conditions.  For
   this reason, an adaptive streaming model has been widely adopted.
   Video material is encoded at different quality levels and different
   resolutions, which are then chosen by a client depending on its
   capabilities and current network bandwidth.  An example combination
   of resolutions and bitrates is shown in Table 3. 1.

   A video encoding pipeline in on-demand Internet video streaming
   typically operates as follows:

   o

   *  Video is encoded in the cloud by software encoders.

   o

   *  Source video is split into chunks, each of which is encoded
      separately, in parallel.

   o

   *  Closed-GOP encoding with 2-5 second intra-picture intrapicture intervals of 2-5 seconds (or
      more)
      longer) is used.

   o

   *  Encoding is perceptually optimized.  Perceptual quality is
      important and should be considered during the codec development.

   +----------------------+-------------------------+-----------------+

       +---------------+-----+-------------------------------------+
       | Resolution *  |     Frame-rate, fps     | PAM |
   +----------------------+-------------------------+-----------------+
   +----------------------+-------------------------+-----------------+          Frame Rate, FPS **         |
       +===============+=====+=====================================+
       | 4K, 3840x2160 | RA  | 24/1.001, 24, 25, 30/1.001, 30, 50, |       RA
       +---------------+-----+  60/1.001, 60, 100, 120/1.001, 120  |
   +----------------------+                         +-----------------+
       | 2K (1080p), 1920x1080|    30/1.001, 30, 50,   | RA  |
   +----------------------+                         +-----------------+                                     |
       | 1920x1080     |     |                                     |
       +---------------+-----+                                     |
       | 1080i,        | RA  |                                     |
       | 1920x1080*    |    60/1.001, 60, 100,     |       RA                                     |
   +----------------------+                         +-----------------+
       +---------------+-----+                                     |
       | 720p,         | RA  |                                     |
       | 1280x720      |      120/1.001, 120     |       RA                                     |
   +----------------------+                         +-----------------+
       +---------------+-----+                                     |
       | 576p (EDTV),  | RA  |                                     |
       | 720x576       | The set of frame-rates     |       RA                                     |
   +----------------------+                         +-----------------+
       +---------------+-----+                                     |
       | 576i (SDTV), 720x576*| presented in this table  | RA  |
   +----------------------+                         +-----------------+                                     |
       | 720x576*      |     |                                     |
       +---------------+-----+                                     |
       | 480p (EDTV),  | RA  |                                     |
       | 720x480       |  is taken from Table 2     |       RA                                     |
   +----------------------+                         +-----------------+
       +---------------+-----+                                     |
       | 480i (SDTV), 720x480*|          in [1]  | RA  |
   +----------------------+                         +-----------------+                                     |       512x384
       | 720x480*      |     |                                     |
       +---------------+-----+                                     |
       | 512x384       | RA  |
   +----------------------+                         +-----------------+                                     |
       +---------------+-----+                                     |
       | QVGA, 320x240 |                         | RA  |
   +----------------------+-------------------------+-----------------+                                     |
       +---------------+-----+-------------------------------------+

            Table 3. 1: Internet Video Streaming: typical values Typical Values of resolutions,
   frame-rates,
                     Resolutions, Frame Rates, and RAPs

   NB *: PAMs

   *Note: Interlaced content can be handled at the higher system level
   and not necessarily by using specialized video coding tools.  It is
   included in this table only for the sake of completeness completeness, as most
   video content today is in the progressive format.

   Characteristics

   **Note: The set of frame rates presented in this table is taken from
   Table 2 in [1].

   The characteristics and requirements of this application scenario are
   as follows:

   o

   *  High encoder complexity (up to 10x and more) can be tolerated
      since encoding happens once and in parallel for different
      segments.

   o

   *  Decoding complexity should be kept at reasonable levels to enable
      efficient decoder implementation.

   o

   *  Support and efficient encoding of a wide range of content types
      and formats is required:

       .

      -  High Dynamic Range (HDR), Wide Color Gamut (WCG), high high-
         resolution (currently, up to 4K), high frame-rate and high-frame-rate content
         are important use cases, cases; the codec should be able to encode
         such content efficiently.

       . Coding

      -  Improvement of coding efficiency improvement at both lower and higher
         resolutions is important since low resolutions are used when
         streaming in low bandwidth low-bandwidth conditions.

       .

      -  Improvement on both "easy" and "difficult" content in terms of
         compression efficiency at the same quality level contributes to
         the overall bitrate/storage savings.

       .

      -  Film grain (and sometimes other types of noise) is often
         present in the streaming movie-type content movies and similar content; this is usually a part of
         the creative intent.

   o

   *  Significant improvements in compression efficiency between
      generations of video standards are desirable since this scenario
      typically assumes long-term support of legacy video codecs.

   o

   *  Random access points are inserted frequently (one per 2-5 seconds)
      to enable switching between resolutions and fast-forward playback.

   o  Elementary

   *  The elementary stream should have a model that allows easy parsing
      and identification of the sample components.

   o

   *  Middle QP values are normally used in streaming, streaming; this is also the
      range where compression efficiency is important for this scenario.

   o

   *  Scalability or other forms of supporting multiple quality
      representations are beneficial if they do not incur significant
      bitrate overhead and if mandated in the first version.

3.2.  Internet Protocol Television (IPTV)

   This is a service for delivering television content over IP-based
   networks.  IPTV may be classified into two main groups based on the
   type of delivery, as follows:

   o

   *  unicast (e.g., for video on demand), where delay is not crucial;
   o
      and

   *  multicast/broadcast (e.g., for transmitting news) where zapping,
      i.e. zapping
      (i.e., stream changing, changing) delay is important.

   In the IPTV scenario, traffic is transmitted over managed (QoS-
   based) (QoS-based)
   networks.  Typical content used in this application is news, movies,
   cartoons, series, TV shows, etc.  One important requirement for both
   groups is Random that random access to pictures, i.e. pictures (i.e., the random access
   period (RAP) (RAP)) should be kept small enough (approximately, (approximately 1-5
   seconds).  Optional requirements are as follows:

   o

   *  Temporal (frame-rate) scalability;

   o and

   *  Resolution and quality (SNR) scalability.

   For this application, typical values of resolutions, frame-rates, frame rates, and RAPs
   PAMs are presented in Table 4.

   +----------------------+-------------------------+-----------------+ 2.

   +-----------------------+-----+-------------------------------------+
   | Resolution *          |     Frame-rate, fps PAM |          Frame Rate, FPS **         |
   +=======================+=====+=====================================+
   |      2160p (4K),      | RA  | 24/1.001, 24, 25, 30/1.001, 30, 50, |
   |       3840x2160       |     |  60/1.001, 60, 100, 120/1.001, 120  |
   +-----------------------+-----+                                     |
   | 1080p,                | RA  |                                     |
   | 1920x1080             |     |       PAM                                     |
   +----------------------+-------------------------+-----------------+
   +----------------------+-------------------------+-----------------+
   +-----------------------+-----+                                     | 2160p (4K),3840x2160
   |    24/1.001, 24, 25, 1080i,                | RA  |
   +----------------------+                         +-----------------+                                     |   1080p, 1920x1080
   |    30/1.001, 30, 50, 1920x1080*            |       RA     |
   +----------------------+                         +-----------------+                                     |   1080i, 1920x1080*
   +-----------------------+-----+                                     |    60/1.001, 60, 100,
   | 720p,                 | RA  |
   +----------------------+                         +-----------------+                                     |    720p,
   | 1280x720              |      120/1.001, 120     |       RA                                     |
   +----------------------+                         +-----------------+
   +-----------------------+-----+                                     |
   | 576p (EDTV),          | RA  |                                     |
   | 720x576               | The set of frame-rates     |       RA                                     |
   +----------------------+                         +-----------------+
   +-----------------------+-----+                                     |
   | 576i (SDTV), 720x576*| presented in this table          | RA  |
   +----------------------+                         +-----------------+                                     |
   | 720x576*              |     |                                     |
   +-----------------------+-----+                                     |
   | 480p (EDTV),          | RA  |                                     |
   | 720x480               |  is taken from Table 2     |       RA                                     |
   +----------------------+                         +-----------------+
   +-----------------------+-----+                                     |
   | 480i (SDTV), 720x480*|          in [1]          | RA  |
   +----------------------+-------------------------+-----------------+                                     |
   | 720x480*              |     |                                     |
   +-----------------------+-----+-------------------------------------+

    Table 4. 2: IPTV: typical values Typical Values of resolutions, frame-rates, Resolutions, Frame Rates, and RAPs

   NB *: PAMs

   *Note: Interlaced content can be handled at the higher system level
   and not necessarily by using specialized video coding tools.  It is
   included in this table only for the sake of completeness completeness, as most
   video content today is in a progressive format.

   **Note: The set of frame rates presented in this table is taken from
   Table 2 in [1].

3.3.  Video conferencing Conferencing

   This is a form of video connection over the Internet.  This form
   allows users to establish connections to two or more people by two-
   way video and audio transmission for communication in real-time. real time.  For
   this application, both stationary and mobile devices can be used.
   The main requirements are as follows:

   o

   *  Delay should be kept as low as possible (the preferable and
      maximum end-to-end delay values should be less than 100 ms [9] and
      320 ms [2], respectively);

   o

   *  Temporal (frame-rate) scalability;

   o and

   *  Error robustness.

   Support of resolution and quality (SNR) scalability is highly
   desirable.  For this application, typical values of resolutions,
   frame-rates,
   frame rates, and RAPs PAMs are presented in Table 5.

   +----------------------+-------------------------+----------------+ 3.

               +------------------+-----------------+------+
               | Resolution       |     Frame-rate, fps Frame Rate, FPS | PAM  |
   +----------------------+-------------------------+----------------+
   +----------------------+-------------------------+----------------+
               +==================+=================+======+
               | 1080p, 1920x1080 | 15, 30          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+
               | 720p, 1280x720   | 30, 60          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+
               | 4CIF, 704x576    | 30, 60          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+
               | 4SIF, 704x480    | 30, 60          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+
               | VGA, 640x480     | 30, 60          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+
               | 360p, 640x360    | 30, 60          | FIZD |
   +----------------------+-------------------------+----------------+
               +------------------+-----------------+------+

                    Table 5. 3: Video conferencing: typical values Conferencing: Typical
                  Values of resolutions, frame-
   rates, Resolutions, Frame Rates, and RAPs
                                    PAMs

3.4.  Video sharing Sharing

   This is a service that allows people to upload and share video data
   (using live streaming or not) and to watch them. those videos.  It is also
   known as video hosting.  A typical User-generated User-Generated Content (UGC)
   scenario for this application is to capture video using mobile
   cameras such as
   GoPro GoPros or cameras integrated into smartphones
   (amateur video).  The main requirements are as follows:

   o

   *  Random access to pictures for downloaded video data;

   o

   *  Temporal (frame-rate) scalability;

   o and

   *  Error robustness.

   Support of resolution and quality (SNR) scalability is highly
   desirable.  For this application, typical values of resolutions,
   frame-rates,
   frame rates, and RAPs PAMs are presented in Table 6.

   +----------------------+-------------------------+----------------+ 4.

   Typical values of resolutions and frame rates in Table 4 are taken
   from [10].

         +-----------------------+------------------------+-----+
         | Resolution            |     Frame-rate, fps Frame Rate, FPS        | PAM |
   +----------------------+-------------------------+----------------+
   +----------------------+-------------------------+----------------+
         +=======================+========================+=====+
         | 2160p (4K),3840x2160 (4K), 3840x2160 | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+
         | 1440p (2K),2560x1440 (2K), 2560x1440 | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+
         | 1080p, 1920x1080      | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+
         | 720p, 1280x720        | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+
         | 480p, 854x480         | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+
         | 360p, 640x360         | 24, 25, 30, 48, 50, 60 | RA  |
   +----------------------+-------------------------+----------------+
         +-----------------------+------------------------+-----+

                Table 6. 4: Video sharing: typical values Sharing: Typical Values of resolutions, frame-rates
   [10],
                    Resolutions, Frame Rates, and RAPs PAMs

3.5.  Screencasting

   This is a service that allows users to record and distribute video
   data from a computer desktop screen output. screen.  This service requires efficient
   compression of computer-generated content with high visual quality up
   to visually and mathematically (numerically) lossless [11].
   Currently, this application includes business presentations
   (powerpoint, word
   (PowerPoint, Word documents, email messages, etc.), animation
   (cartoons), gaming content, and data visualization, i.e. such visualization.  This type of
   content that is characterized by fast motion, rotation, smooth shade, 3D
   effect, highly saturated colors with full resolution, clear textures
   and sharp edges with distinct colors [11]), [11], virtual desktop
   infrastructure (VDI), screen/desktop sharing and collaboration,
   supervisory control and data acquisition (SCADA) display, automotive/navigation automotive/
   navigation display, cloud gaming, factory automation display,
   wireless display, display wall, digital operating room (DiOR), etc.
   For this application, an important requirement is the support of low-delay low-
   delay configurations with zero structural delay, delay for a wide range of
   video formats (e.g., RGB) in addition to YCbCr 4:2:0 and YCbCr 4:4:4
   [11].  For this application, typical values of resolutions, frame-rates, frame
   rates, and RAPs PAMs are presented in Table 7.

   +----------------------+-------------------------+----------------+ 5.

        +-----------------------+-----------------+--------------+
        |       Resolution      |     Frame-rate, fps Frame Rate, FPS |     PAM      |
   +----------------------+-------------------------+----------------+
   +----------------------+-------------------------+----------------+
        +=======================+=================+==============+
        |             Input color format: RGB 4:4:4              |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 5k, 5120x2880         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 4k, 3840x2160         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | WQXGA, 2560x1600      | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | WUXGA, 1920x1200      | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | WSXGA+, 1680x1050     | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | WXGA, 1280x800        | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | XGA, 1024x768         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | SVGA, 800x600         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | VGA, 640x480          | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        |            Input color format: YCbCr 4:4:4             |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 5k, 5120x2880         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 4k, 3840x2160         | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 1440p (2K), 2560x1440| 2560x1440 | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 1080p, 1920x1080      | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+
        | 720p, 1280x720        | 15, 30, 60      | AI, RA, FIZD |
   +----------------------+-------------------------+----------------+
        +-----------------------+-----------------+--------------+

          Table 7. 5: Screencasting for RGB and YCbCr 4:4:4 format: typical
   values Format:
           Typical Values of resolutions, frame-rates, Resolutions, Frame Rates, and RAPs PAMs

3.6.  Game streaming Streaming

   This is a service that provides game content over the Internet to
   different local devices such as notebooks, notebooks and gaming tablets, etc. tablets.  In
   this category of applications, the server renders 3D games in a cloud
   server,
   server and streams the game to any device with a wired or wireless
   broadband connection [12].  There are low latency low-latency requirements for
   transmitting user interactions and receiving game data in less than with a turn-around
   turnaround delay of less than 100 ms.  This allows anyone to play (or
   resume) full featured full-featured games from anywhere in on the Internet [12].  An
   example of this application is Nvidia Grid [12].  Another category application
   scenario of this category is broadcast of video games played by
   people over the Internet in real time or for later viewing [12].
   There are many
   companies companies, such as Twitch, Twitch and YY in China China, that enable
   game broadcasting [12].  Games typically contain a lot of sharp edges
   and large motion [12].  The main requirements are as follows:

   o

   *  Random access to pictures for game broadcasting;

   o

   *  Temporal (frame-rate) scalability;

   o and

   *  Error robustness.

   Support of resolution and quality (SNR) scalability is highly
   desirable.  For this application, typical values of resolutions,
   frame-rates,
   frame rates, and RAPs PAMs are similar to ones presented in Table 5. 3.

3.7.  Video monitoring / surveillance Monitoring and Surveillance

   This is a type of live broadcasting over IP-based networks.  Video
   streams are sent to many receivers at the same time.  A new receiver
   may connect to the stream at an arbitrary moment, so the random
   access period should be kept small enough (approximately, ~1-5 1-5
   seconds).  Data are transmitted publicly in the case of video
   monitoring and privately in the case of video surveillance, respectively. surveillance.  For IP- IP
   cameras that have to capture, process process, and encode video data,
   complexity -- including computational and hardware complexity complexity, as
   well as memory bandwidth -- should be kept low to allow real-time
   processing.  In addition, support of a high dynamic range and a
   monochrome mode (e.g., for infrared cameras) as well as resolution
   and quality (SNR) scalability is an essential requirement for video
   surveillance.  In some use-cases, use cases, high video signal fidelity is
   required even after lossy compression.  Typical values of
   resolutions, frame-rates, frame rates, and RAPs PAMs for video monitoring / and
   surveillance applications are presented in Table 8.

   +----------------------+-------------------------+-----------------+ 6.

          +-----------------------+-----------------+----------+
          | Resolution            |     Frame-rate, fps Frame Rate, FPS | PAM      |
   +----------------------+-------------------------+-----------------+
   +----------------------+-------------------------+-----------------+
          +=======================+=================+==========+
          | 2160p (4K),3840x2160 (4K), 3840x2160 | 12, 25, 30      | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+
          | 5Mpixels, 2560x1920   | 12, 25, 30      | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+
          | 1080p, 1920x1080      | 25, 30          | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+
          | 1.3Mpixels, 1.23Mpixels, 1280x960 | 25, 30          | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+
          | 720p, 1280x720        | 25, 30          | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+
          | SVGA, 800x600         | 25, 30          | RA, FIZD |
   +----------------------+-------------------------+-----------------+
          +-----------------------+-----------------+----------+

               Table 8. 6: Video monitoring / surveillance: typical values Monitoring and Surveillance:
             Typical Values of
   resolutions, frame-rates, Resolutions, Frame Rates, and RAPs
                                   PAMs

4.  Requirements

   Taking the requirements discussed above for specific video
   applications, this chapter section proposes requirements for an internet Internet
   video codec.

4.1.  General requirements Requirements

4.1.1.  Coding Efficiency

   The most basic fundamental requirement is coding efficiency, i.e. i.e.,
   compression performance on both "easy" and "difficult" content for
   applications and use cases in Section 2. 3.  The codec should provide
   higher coding efficiency over state-of-the-art video codecs such as
   HEVC/H.265 and VP9, at least by 25% 25%, in accordance with the methodology
   described in Section 4.1 5 of this document.  For higher resolutions, the
   improvements in coding efficiency improvements are expected to be higher than for
   lower resolutions.

4.1.2. Good quality  Profiles and Levels

   Good-quality specification and well-defined profiles and levels are
   required to enable device interoperability and facilitate decoder
   implementations.  A profile consists of a subset of entire bitstream
   syntax elements and consequently elements; consequently, it also defines the necessary tools
   for decoding a conforming bitstream of that profile.  A level imposes
   a set of numerical limits to the values of some syntax elements.  An
   example of codec levels to be supported is presented in Table 9. 7.  An
   actual level definition should include constraints on features that
   impact the decoder complexity.  For example, these features might be
   as follows: maximum bit-rate, bitrate, line buffer size, memory usage, etc.

   +------------------------------------------------------------------+

   +-------+-----------------------------------------------------------+
   | Level | Example picture resolution at highest frame rate               |
   +-------------+----------------------------------------------------+
   |       |           128x96(12,288*)@30.0 rate                                                      |
   +=======+===========================================================+
   | 1     | 128x96(12,288*)@30.0 176x144(25,344*)@15.0                |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 2     | 352x288(101,376*)@30.0                                    |
   +-------------+----------------------------------------------------+
   |             |           352x288(101,376*)@60.0                   |
   +-------+-----------------------------------------------------------+
   | 3     | 352x288(101,376*)@60.0 640x360(230,400*)@30.0             |
   +-------------+----------------------------------------------------+
   |             |           640x360(230,400*)@60.0                   |
   +-------+-----------------------------------------------------------+
   | 4     | 640x360(230,400*)@60.0 960x540(518,400*)@30.0             |
   +-------------+----------------------------------------------------+
   |             |           720x576(414,720*)@75.0                   |
   +-------+-----------------------------------------------------------+
   | 5     | 720x576(414,720*)@75.0 960x540(518,400*)@60.0             |
   |       | 1280x720(921,600*)@30.0                                   |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 6     | 1,280x720(921,600*)@68.0                                  |
   |      6       | 2,048x1,080(2,211,840*)@30.0                              |
   +-------------+----------------------------------------------------+
   |             |           1,280x720(921,600*)@120.0                |
   +-------+-----------------------------------------------------------+
   | 7     |           2,048x1,080(2,211,840*)@60.0 1,280x720(921,600*)@120.0                                 |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 8     | 1,920x1,080(2,073,600*)@120.0                             |
   |      8       | 3,840x2,160(8,294,400*)@30.0                              |
   |       | 4,096x2,160(8,847,360*)@30.0                              |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 9     | 1,920x1,080(2,073,600*)@250.0                             |
   |      9       | 4,096x2,160(8,847,360*)@60.0                              |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 10    | 1,920x1,080(2,073,600*)@300.0                             |
   |     10       | 4,096x2,160(8,847,360*)@120.0                             |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 11    | 3,840x2,160(8,294,400*)@120.0                             |
   |     11       | 8,192x4,320(35,389,440*)@30.0                             |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 12    | 3,840x2,160(8,294,400*)@250.0                             |
   |     12       | 8,192x4,320(35,389,440*)@60.0                             |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+
   | 13    | 3,840x2,160(8,294,400*)@300.0                             |
   |     13       | 8,192x4,320(35,389,440*)@120.0                            |
   +-------------+----------------------------------------------------+
   +-------+-----------------------------------------------------------+

                           Table 9. 7: Codec levels

   NB *: Levels

   *Note: The quantities of pixels are presented for such applications
   where in
   which a picture can have an arbitrary size (e.g., screencasting) screencasting).

4.1.3.  Bitstream Syntax

   Bitstream syntax should allow extensibility and backward
   compatibility.  New features can be supported easily by using
   metadata (e.g., such (such as SEI messages, VUI, and headers) without affecting
   the bitstream compatibility with legacy decoders.  A newer version of
   the decoder shall be able to play bitstreams of an older version of
   the same or lower profile and level.

4.1.4.  Parsing and Identification of Sample Components

   A bitstream should have a model that allows easy parsing and
   identification of the sample components (such as ISO/IEC14496-10, Annex B of ISO/IEC
   14496-10 [18] or ISO/IEC 14496-15). 14496-15 [19]).  In particular, information
   needed for packet handling (e.g., frame type) should not require
   parsing anything below the header level.

4.1.5.  Perceptual Quality Tools

   Perceptual quality tools (such as adaptive QP and quantization
   matrices) should be supported by the codec bit-stream. bitstream.

4.1.6.  Buffer Model

   The codec specification shall define a buffer model such as
   hypothetical reference decoder (HRD).

4.1.7.  Integration

   Specifications providing integration with system and delivery layers
   should be developed.

4.2.  Basic requirements Requirements

4.2.1.  Input source Source Formats

   Input pictures coded by a video codec should have one of the
   following formats:

   o

   *  Bit depth: 8- 8 and 10-bits 10 bits (up to 12-bits 12 bits for a high profile) per
      color component;

   o component.

   *  Color sampling formats:

       .

      -  YCbCr 4:2:0;

       . 4:2:0

      -  YCbCr 4:4:4, YCbCr 4:2:2 4:2:2, and YCbCr 4:0:0 (preferably in
         different profile(s)).

   o profile(s))

   *  For profiles with bit depth of 10 bits per sample or higher,
      support of high dynamic range and wide color gamut.

   o

   *  Support of arbitrary resolution according to the level constraints
      for such applications where in which a picture can have an arbitrary size
      (e.g., in screencasting).

   o

   Exemplary input source formats for codec profiles are shown in
   Table 10.

+---------+-----------------+-----------------------------------------+ 8.

   +---------+--------------------------------+------------------------+
   | Profile | Bit-depths Bit depths per color component | Color sampling formats         |
   |         | color component                                | formats                |
+---------+-----------------+-----------------------------------------+
   +=========+================================+========================+
   | 1       | 8 and 10                       | 4:0:0 and 4:2:0        |
+---------+-----------------+-----------------------------------------+
   +---------+--------------------------------+------------------------+
   | 2       | 8 and 10                       | 4:0:0, 4:2:0 4:2:0,          |
   |         |                                | and 4:4:4              |
+---------+-----------------+-----------------------------------------+
   +---------+--------------------------------+------------------------+
   | 3       | 8, 10 10, and 12                  | 4:0:0, 4:2:0, 4:2:2          |
   |         |                                | 4:2:2, and 4:4:4       |
+---------+-----------------+-----------------------------------------+
   +---------+--------------------------------+------------------------+

         Table 10. 8: Exemplary input source formats Input Source Formats for codec profiles Codec Profiles

4.2.2.  Coding delay:

   o Delay

   In order to meet coding delay requirements, a video codec should
   support all of the following:

   *  Support of configurations with zero structural delay delay, also
      referred to as "low-delay" configurations.

       . Note 1: end-to-end

      -  Note: End-to-end delay should be up to no more than 320 ms [2] [2], but its
         it is preferable for its value should to be less than 100 ms [9]

   o [9].

   *  Support of efficient random access point encoding (such as intra
      coding
      intracoding and resetting of context variables) variables), as well as
      efficient switching between multiple quality representations.

   o

   *  Support of configurations with non-zero nonzero structural delay (such as
      out-of-order or multi-pass multipass encoding) for applications without
      low-delay requirements low-
      delay requirements, if such configurations provide additional
      compression efficiency improvements.

4.2.3. Complexity:

   o  Complexity

   Encoding and decoding complexity considerations are as follows:

   *  Feasible real-time implementation of both an encoder and a decoder
      supporting a chosen subset of tools for hardware and software
      implementation on a wide range of state-of-the-art platforms.  The
      subset of real-time encoder tools subset should provide meaningful
      improvement in compression efficiency at reasonable complexity of
      hardware and software encoder implementations as compared to real-time real-
      time implementations of state-of-the-art video compression
      technologies such as HEVC/H.265 and VP9.

   o

   *  High-complexity software encoder implementations used by offline
      encoding applications can have a 10x or more complexity increase
      compared to state-of-the-art video compression technologies such
      as HEVC/H.265 and VP9.

4.2.4. Scalability:

   o  Scalability

   The mandatory scalability requirement is as follows:

   *  Temporal (frame-rate) scalability should be supported.

4.2.5.  Error resilience:

   o  Error Resilience

   In order to meet the error resilience tools requirement, a video codec
   should satisfy all of the following conditions:

   *  Tools that are complementary to the error
      protection error-protection mechanisms
      implemented on the transport level should be supported.

   o

   *  The codec should support mechanisms that facilitate packetization
      of a bitstream for common network protocols.

   o

   *  Packetization mechanisms should enable frame-level error recovery
      by means of retransmission or error concealment.

   o

   *  The codec should support effective mechanisms for allowing
      decoding and reconstruction of significant parts of pictures in
      the event that parts of the picture data are lost in transmission.

   o

   *  The bitstream specification shall support independently decodable
      sub-frame
      subframe units similar to slices or independent tiles.  It shall
      be possible for the encoder to restrict the bit-stream bitstream to allow
      parsing of the bit-stream bitstream after a packet-loss packet loss and to communicate it
      to the decoder.

4.3.  Optional requirements Requirements

4.3.1.  Input source formats

   o Source Formats

   It is a desired but not mandatory requirement for a video codec to
   support some of the following features:

   *  Bit depth: up to 16-bits 16 bits per color component.

   o

   *  Color sampling formats: RGB 4:4:4.

   o

   *  Auxiliary channel (e.g., alpha channel) support.

4.3.2. Scalability:

   o  Scalability

   Desirable scalability requirements are as follows:

   *  Resolution and quality (SNR) scalability that provide low provides a low-
      compression efficiency penalty (up (increase of up to 5% of BD-rate
      [13] increase per layer with reasonable increase of both computational and
      hardware complexity) can be supported in the main profile of the
      codec being developed by the NETVC WG. Working Group.  Otherwise, a
      separate profile is needed to support these types of scalability.

   o

   *  Computational complexity scalability(i.e. scalability (i.e., computational
      complexity is decreasing along with degrading picture quality) is
      desirable.

4.3.3. Complexity:  Complexity

   Tools that enable parallel processing (e.g., slices, tiles, wave and wave-
   front propagation processing) at both encoder and decoder sides are
   highly desirable for many applications.

   o

   *  High-level multi-core multicore parallelism: encoder and decoder operation,
      especially entropy encoding and decoding, should allow multiple
      frames or sub-frame subframe regions (e.g. (e.g., 1D slices, 2D tiles, or
      partitions) to be processed concurrently, either independently or
      with deterministic dependencies that can be efficiently pipelined

   o pipelined.

   *  Low-level instruction set instruction-set parallelism: favor algorithms that are
      SIMD/GPU friendly over inherently serial algorithms

4.3.4.  Coding efficiency Efficiency

   Compression efficiency on noisy content, content with film grain,
   computer generated content, and low resolution materials is
   desirable.

5.  Evaluation methodology Methodology

   As shown in Fig.1, Figure 1, compression performance testing is performed in 3
   three overlapped ranges that encompass 10 ten different bitrate values:

   o

   *  Low bitrate range (LBR) is the range that contains the 4 four lowest
      bitrates of the 10 ten specified bitrates (1 (one of the 4 four bitrate
      values is shared with the neighboring range);

   o range).

   *  Medium bitrate range (MBR) is the range that contains the 4 four
      medium bitrates of the 10 ten specified bitrates (2 (two of the 4 four
      bitrate values are shared with the neighboring ranges);

   o ranges).

   *  High bitrate range (HBR) is the range that contains the 4 four
      highest bitrates of the 10 ten specified bitrates (1 (one of the 4 four
      bitrate values is shared with the neighboring range).

   Initially, for the codec selected as a reference one (e.g., HEVC or
   VP9), a set of 10 ten QP (quantization parameter) values should be
   specified as in [14] [14], and corresponding quality values should be
   calculated.  In Fig.1, Figure 1, QP and quality values are denoted as QP0, QP1,
   QP2,..., QP8, QP9
   "QP0"-"QP9" and Q0, Q1, Q2,..., Q8, Q9, "Q0"-"Q9", respectively.  To guarantee the overlaps
   of quality levels between the bitrate ranges of the reference and
   tested codecs, a quality alignment procedure should be performed for
   each range's outermost (left- and rightmost) quality levels Qk of the
   reference codec (i.e. (i.e., for Q0, Q3, Q6, and Q9) and the quality levels
   Q'k (i.e. (i.e., Q'0, Q'3, Q'6, and Q'9) of the tested codec.  Thus, these
   quality levels Q'k and, hence, Q'k, and hence the corresponding QP value QP'k (i.e. (i.e.,
   QP'0, QP'3, QP'6, and QP'9) QP'9), of the tested codec should be selected
   using the following formulas:

   Q'k =   min { abs(Q'i - Qk) },
         i in R

   QP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) },
          i in R

   where R is the range of the QP indexes of the tested codec, i.e. i.e., the
   candidate Internet video codec.  The inner quality levels (i.e. (i.e., Q'1,
   Q'2, Q'4, Q'5, Q'7, and Q'8) Q'8), as well as their corresponding QP
   values of each range (i.e. (i.e., QP'1, QP'2, QP'4, QP'5, QP'7, and QP'8) QP'8),
   should be as equidistantly spaced as possible between the left- and
   rightmost quality levels without explicitly mapping their values
   using the above procedure described procedure. above.

   QP'9 QP'8  QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0 <+-----
    ^     ^    ^    ^    ^    ^    ^    ^    ^    ^    | Tested
    |     |    |    |    |    |    |    |    |    |    | codec
   Q'0   Q'1  Q'2  Q'3  Q'4  Q'5  Q'6  Q'7  Q'8  Q'9  <+-----
    ^               ^              ^              ^
    |               |              |              |
   Q0    Q1    Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  <+-----
    ^    ^     ^    ^    ^    ^    ^    ^    ^    ^    | Reference
    |    |     |    |    |    |    |    |    |    |    | codec
   QP9  QP8   QP7  QP6  QP5  QP4  QP3  QP2  QP1  QP0  <+-----
   +----------------+--------------+--------------+--------->
   ^                ^              ^              ^     Bit-rate     Bitrate
   |-------LBR------|              |-----HBR------|
                    ^              ^
                    |------MBR-----|

   Figure 1 1: Quality/QP alignment Alignment for compression performance evaluation Compression Performance Evaluation

   Since the QP mapping results may vary for different sequences,
   eventually, this
   quality alignment procedure eventually needs to be separately performed
   separately for each quality assessment index and each sequence used
   for codec performance evaluation to fulfill the above requirements
   described
   requirements. above.

   To assess the quality of output (decoded) sequences, two indexes,
   PSNR indexes
   (PSNR [3] and MS-SSIM [3,15] [3] [15]) are separately computed.  In the case
   of the YCbCr color format, PSNR should be calculated for each color
   plane
   plane, whereas MS-SSIM is calculated for the luma channel only.  In
   the case of the RGB color format, both metrics are computed for R, G G,
   and B channels.  Thus, for each sequence, 30 RD-points for PSNR (i.e.
   (i.e., three RD-curves, one for each channel) and 10 RD-points for
   MS-SSIM
   (i.e. (i.e., one RD-curve, for luma channel only) should be
   calculated in the case of YCbCr.  If content is encoded as RGB, 60
   RD-points (30 for PSNR and 30 for MS-SSIM) should be calculated, i.e. calculated
   (i.e., three RD-
   curves (one RD-curves, one for each channel) are computed for PSNR
   as well as three RD-curves (one for each channel) for MS-SSIM.

   Finally, to obtain an integral estimation, BD-rate savings [13]
   should be computed for each range and each quality index.  In
   addition, average values over all the 3 three ranges should be provided for
   both PSNR and MS-SSIM.  A list of video sequences that should be used
   for testing testing, as well as the 10 ten QP values for the reference codec codec,
   are defined in [14].  Testing processes should use the information on
   the codec applications presented in this document.  As the reference
   for evaluation, state-of-the-art video codecs such as HEVC/H.265
   [4,5]
   [4][5] or VP9 must be used.  The reference source code of the
   HEVC/H.265 HEVC/
   H.265 codec can be found at [6].  The HEVC/H.265 codec must be
   configured according to [16] and Table 11.

   +----------------------+-------------------------------------------+ 9.

   +----------------------+--------------------------------------------+
   | Intra-period, second | HEVC/H.265 encoding                        |
   |                      | mode according to [16]|
   +----------------------+-------------------------------------------+ [16]                     |
   +======================+============================================+
   | AI                   | Intra Main or Intra                        |
   |                      | Main10                                     |
   +----------------------+-------------------------------------------+
   +----------------------+--------------------------------------------+
   | RA                   | Random access Main or                      |
   |                      | Random access Main10                       |
   +----------------------+-------------------------------------------+
   +----------------------+--------------------------------------------+
   | FIZD                 | Low delay Main or Low                      |
   |                      |             Low delay Main10                               |
   +----------------------+-------------------------------------------+
   +----------------------+--------------------------------------------+

       Table 11. Intra-periods 9: Intraperiods for different Different HEVC/H.265 encoding modes
   according Encoding Modes
                             According to [16]

   According to the coding efficiency requirement described in
   Section
   3.1.1, 4.1.1, BD-rate savings calculated for each color plane and
   averaged for all the video sequences used to test the NETVC codec
   should be, at least,

   o

   *  25% if calculated over the whole bitrate range;
   o and

   *  15% if calculated for each bitrate subrange (LBR, MBR, HBR).

   Since values of the two objective metrics (PSNR and MS-SSIM) are
   available for some color planes, each value should meet these coding
   efficiency requirements, i.e. requirements.  That is, the final BD-rate saving denoted
   as S is calculated for a given color plane as follows:

   S = min { S_psnr, S_ms-ssim }, }

   where S_psnr and S_ms-ssim are BD-rate savings calculated for the
   given color plane using PSNR and MS-SSIM metrics, respectively.

   In addition to the objective quality measures defined above,
   subjective evaluation must also be performed for the final NETVC
   codec adoption.  For subjective tests, the MOS-based evaluation
   procedure must be used as described in section Section 2.1 of [3].  For
   perception-oriented tools that primarily impact subjective quality,
   additional tests may also be individually assigned even for
   intermediate evaluation, subject to a decision of the NETVC WG.

6.  Security Considerations

   This document itself does not address any security considerations.
   However, it is worth noting that a codec implementation (for both an
   encoder and a decoder) should take into consideration the worst-case
   computational complexity, memory bandwidth, and physical memory size
   needed to processes process the potentially untrusted input (e.g., the decoded
   pictures used as references).

7.  IANA Considerations

   This document has no IANA actions.

8.  References

8.1.  Normative References

   [1]   Recommendation ITU-R BT.2020-2: Parameter        ITU-R, "Parameter values for ultra-
         high ultra-high definition
              television systems for production and international
              programme exchange, 2015.

   [2] exchange", ITU-R Recommendation ITU-T G.1091: Quality BT.2020-2,
              October 2015,
              <https://www.itu.int/rec/R-REC-BT.2020-2-201510-I/en>.

   [2]        ITU-T, "Quality of Experience requirements for
              telepresence services, 2014. services", ITU-T Recommendation G.1091,
              October 2014, <https://www.itu.int/rec/T-REC-G.1091/en>.

   [3]   ISO/IEC PDTR 29170-1: Information        ISO, "Information technology -- Advanced image coding and
              evaluation methodologies -- Part 1: Guidelines for
         codec evaluation.

   [4] image coding system
              evaluation", ISO/IEC 23008-2:2015. Information TR 29170-1:2017, October 2017,
              <https://www.iso.org/standard/63637.html>.

   [4]        ISO, "Information technology -- High efficiency coding and
              media delivery in heterogeneous environments -- Part 2:
              High efficiency video coding coding", ISO/IEC 23008-2:2015, May
              2018, <https://www.iso.org/standard/67660.html>.

   [5]   Recommendation ITU-T H.265: High        ITU-T, "High efficiency video coding,
         2013. coding", ITU-T
              Recommendation H.265, November 2019,
              <https://www.itu.int/rec/T-REC-H.265>.

   [6]   High        Fraunhofer Institute for Telecommunications, "High
              Efficiency Video Coding (HEVC) reference software (HEVC
              Test Model also known as HM) at the web-site of Fraunhofer
         Institute for Telecommunications, Heinrich Hertz Institute
         (HHI): https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ HM)",
              <https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/>.

8.2.  Informative References

   [7]   Definition of the term "high dynamic range imaging" at the
         web-site of        Federal Agencies Digital Guidelines Initiative:
         http://www.digitizationguidelines.gov/term.php?term=highdynami
         crangeimaging Initiative, "Term:
              High dynamic range imaging",
              <http://www.digitizationguidelines.gov/
              term.php?term=highdynamicrangeimaging>.

   [8]   Definition of the term "compression, visually lossless" at the
         web-site of        Federal Agencies Digital Guidelines Initiative:
         http://www.digitizationguidelines.gov/term.php?term=compressio
         nvisuallylossless Initiative, "Term:
              Compression, visually lossless",
              <http://www.digitizationguidelines.gov/
              term.php?term=compressionvisuallylossless>.

   [9]   S.        Wenger, S., "The case for scalability support in version 1
              of Future Video Coding," Document COM 16-C 988 R1-E of ITU-T
         Video Coding Experts Group (ITU-T Q.6/SG 16), Geneva,
         Switzerland, Coding", SG 16 (Study Period
              2013) Contribution 988, September 2015. 2015,
              <https://www.itu.int/md/T13-SG16-C-0988/en>.

   [10]       YouTube, "Recommended upload encoding settings (Advanced)" for the
         YouTube video-sharing service:
         https://support.google.com/youtube/answer/1722171?hl=en settings",
              <https://support.google.com/youtube/answer/1722171?hl=en>.

   [11]  H.       Yu, K. H., Ed., McCann, R. K., Ed., Cohen, R., Ed., and P. Amon,
              Ed., "Requirements for
         future extensions an extension of HEVC in for coding of
              screen content", Document
         N14174 of ISO/IEC JTC 1/SC 29/WG 11 Moving Picture
              Experts Group (ISO/IEC JTC 1/SC 29/
         WG 11), MPEG2013/N14174, San Jose, USA, January 2014.
              2014, <https://mpeg.chiariglione.org/standards/mpeg-h/
              high-efficiency-video-coding/requirements-extension-hevc-
              coding-screen-content>.

   [12]  Manindra       Parhy, M., "Game streaming requirement for Future Video
         Coding," Document N36771 of
              Coding", ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts
              Group (ISO/IEC JTC 1/SC 29/WG 11), N36771, Warsaw, Poland, June 2015.

   [13]  G.       Bjontegaard, G., "Calculation of average PSNR differences
              between RD-curves," Document VCEG-M33 of ITU-T Video Coding
         Experts Group (ITU-T Q.6/SG 16), Austin, Texas, USA, RD-curves", SG 16 VCEG-M33, April
         2001. 2001,
              <https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/>.

   [14]  T.       Daede, A. T., Norkin, A., and I. Brailovskiy, "Video Codec
              Testing and Quality Measurement", draft-ietf-netvc-testing-08(work Work in
         progress), Progress,
              Internet-Draft, draft-ietf-netvc-testing-09, 31 January 2019, p.23.
              2020,
              <https://tools.ietf.org/html/draft-ietf-netvc-testing-09>.

   [15]  Z.       Wang, E. P. Z., Simoncelli, E.P., and A. C. A.C. Bovik, "Multi-scale "Multiscale
              structural similarity for image quality assessment," Invited
         Paper, assessment", IEEE
              Thirty-Seventh Asilomar Conference on Signals, Systems and
              Computers, Nov. DOI 10.1109/ACSSC.2003.1292216, November 2003, Vol. 2, pp. 1398-1402.
              <https://ieeexplore.ieee.org/document/1292216>.

   [16]  F.       Bossen, F., "Common HM test conditions and software
              reference
         configurations," Document JCTVC-L1100 of configurations", Joint Collaborative Team on
              Video Coding (JCT-VC) of the ITU-T Video Coding Experts
              Group (ITU-T Q.6/SG 16) and ISO/IEC Moving Picture Experts
              Group (ISO/IEC JTC 1/SC 29/WG 11), Geneva,
         Switzerland, January 2013.

9. 11) , Document JCTVC-L1100,
              April 2013, <http://phenix.it-
              sudparis.eu/jct/doc_end_user/
              current_document.php?id=7281>.

   [17]       ITU-R, "Studio encoding parameters of digital television
              for standard 4:3 and wide screen 16:9 aspect ratios",
              ITU-R Recommendation BT.601, March 2011,
              <https://www.itu.int/rec/R-REC-BT.601/>.

   [18]       ISO/IEC, "Information technology -- Coding of audio-visual
              objects -- Part 10: Advanced video coding", ISO/IEC
              DIS 14496-10, <https://www.iso.org/standard/75400.html>.

   [19]       ISO/IEC, "Information technology - Coding of audio-visual
              objects - Part 15: Carriage of network abstraction layer
              (NAL) unit structured video in the ISO base media file
              format", ISO/IEC 14496-15,
              <https://www.iso.org/standard/74429.html>.

   [20]       ITU-R, "Parameter values for the HDTV standards for
              production and international programme exchange", ITU-R
              Recommendation BT.709, June 2015,
              <https://www.itu.int/rec/R-REC-BT.709>.

Acknowledgments

   The authors would like to thank Mr. Paul Coverdale, Mr. Vasily
   Rufitskiy, and Dr. Jianle Chen for many useful discussions on this
   document and their help while preparing it it, as well as Mr. Mo Zanaty,
   Dr. Minhua Zhou, Dr. Ali Begen, Mr. Thomas Daede, Mr. Adam Roach,
   Dr. Thomas Davies, Mr. Jonathan Lennox, Dr. Timothy Terriberry,
   Mr. Peter Thatcher, Dr. Jean-Marc Valin, Mr. Roman Danyliw, Mr. Jack
   Moffitt, Mr. Greg Coppa, and Mr. Andrew Krupiczka for their valuable
   comments on different revisions of this document.

   This document was prepared using 2-Word-v2.0.template.dot.

Authors' Addresses

   Alexey Filippov
   Huawei Technologies

   Email: alexey.filippov@huawei.com

   Andrey Norkin
   Netflix

   Email: anorkin@netflix.com

   Jose Roberto Alvarez
   Huawei Technologies

   Email: j.alvarez@ieee.org