Network Working Group                                          A. Grange
Internet-Draft                                             H. Alvestrand
Intended status: Informational                                    Google
Expires: August 22, 2013                               February 18, 2013


                        A VP9 Bitstream Overview
                     draft-grange-vp9-bitstream-00

Abstract

   This document describes VP9, a video codec being developed
   specifically to meet the demand for the consumption of video over the
   Internet, including professionally and amateur produced video-on-
   demand and conversational video content.  VP9 is an evolution of the
   VP8 video codec that is described in [bankoski-rfc6386] and includes
   a number of enhancements and new coding tools that have been added to
   improve the coding efficiency.  The new tools that have been added so
   far include: larger prediction block sizes up to 64x64, various forms
   of compound INTER prediction, more modes for INTRA prediction,
   &#8539;-pel motion vectors, 8-tap switchable sub-pixel interpolation
   filters, improved motion reference generation, improved motion vector
   coding, improved entropy coding including frame-level entropy
   adaptation for various symbols, improved loop filtering, the
   incorporation of the Asymmetric Discrete Sine Transform (ADST),
   larger 16x16 and 32x32 DCTs, and improved frame level segmentation.
   VP9 is under active development and this document provides only a
   snapshot of the current state of the coding tools as they exist
   today.  The finalized version of the VP9 bitstream may differ
   considerably from the description contained herein and may encompass
   the exclusion or modification of existing coding tools or the
   addition of new coding tools.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


Grange & Alvestrand      Expires August 22, 2013                [Page 1]

Internet-Draft                     VP9                     February 2013


   This Internet-Draft will expire on August 22, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Grange & Alvestrand      Expires August 22, 2013                [Page 2]

Internet-Draft                     VP9                     February 2013


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Outline of the Codec . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Prediction Block Size  . . . . . . . . . . . . . . . . . .  4
     2.2.  Prediction Modes . . . . . . . . . . . . . . . . . . . . .  5
       2.2.1.  INTRA modes  . . . . . . . . . . . . . . . . . . . . .  5
       2.2.2.  INTER Modes  . . . . . . . . . . . . . . . . . . . . .  5
       2.2.3.  Compound INTER-INTRA Mode  . . . . . . . . . . . . . .  6
     2.3.  Sub-Pixel Interpolation  . . . . . . . . . . . . . . . . .  6
     2.4.  Transforms . . . . . . . . . . . . . . . . . . . . . . . .  7
     2.5.  Motion Vector Reference Selection and Coding . . . . . . .  7
     2.6.  Entropy Coding and Adaptation  . . . . . . . . . . . . . .  8
     2.7.  Loop Filter  . . . . . . . . . . . . . . . . . . . . . . .  9
     2.8.  Segmentation . . . . . . . . . . . . . . . . . . . . . . .  9
   3.  Bitstream features . . . . . . . . . . . . . . . . . . . . . . 10
     3.1.  Error-Resilience . . . . . . . . . . . . . . . . . . . . . 10
     3.2.  Parallel Decodability  . . . . . . . . . . . . . . . . . . 11
       3.2.1.  Frame-Level Parallelism  . . . . . . . . . . . . . . . 11
       3.2.2.  Tiling . . . . . . . . . . . . . . . . . . . . . . . . 11
     3.3.  Scalability  . . . . . . . . . . . . . . . . . . . . . . . 12
   4.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   7.  Informative References . . . . . . . . . . . . . . . . . . . . 13
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14


Grange & Alvestrand      Expires August 22, 2013                [Page 3]

Internet-Draft                     VP9                     February 2013


1.  Introduction

   Video data accounts for a significant proportion of all internet
   traffic, and the trend is toward higher quality, larger format and
   often professionally produced video, encoded at higher data rates and
   supported by the improved provisioning of high bandwidth internet
   connections.  VP9 is being developed as an open source solution
   tailored to the specific characteristics of the internet, under the
   auspices of the WebM project [Google-webm], with the aim of providing
   the highest quality user experience and the ability to support the
   widest range of use-cases on a diverse set of target devices.  This
   document provides a high-level technical overview of the coding tools
   that will likely be included in the final VP9 bitstream.


2.  Outline of the Codec

   A large proportion of the advance that VP9 has made over VP8 can be
   attributed to a straightforward generational progression from the
   current to the future, driven by the need for the greater efficiency
   required to handle a new coding "sweet-spot" that has evolved to
   support the provisioning of larger frame size, higher quality video
   formats.

2.1.  Prediction Block Size

   A large part of the coding efficiency improvements achieved by VP9
   can be attributed to the introduction of larger prediction block
   sizes.  Specifically, VP9 introduces the notion of Super-Blocks of
   size up to 64x64 and their quad-tree like decomposition all the way
   down to a block size of 4x4, with some quirks as described below.  In
   particular, a superblock of size 64x64 (SB64) could be split into 4
   superblocks of size 32x32 (SB32), each of which can be further split
   into 16x16 macroblocks (MB).  Each SB64, SB32 or MB could be
   predicted as a whole using a conveyed INTRA prediction mode, or an
   INTER prediction mode with up to two motion vectors and corresponding
   reference frames, as described in Section 3.2.  A macroblock can be
   further split using one of three mode families: B_PRED - where each
   4x4 sub-block within the MB can be coded using a signaled 4x4 INTRA
   prediction mode; I8X8_PRED - where each 8x8 block within the MB can
   be coded using a signaled 8x8 INTRA prediction mode; and SPLITMV -
   where each 4x4 sub-block within the MB is coded in INTER mode with a
   corresponding motion vector, but with the option of grouping common
   motion vectors over 16x8, 8x16, or 8x8 partitions within the MB.
   Note that the B_PRED and SPLITMV modes in VP9 work in the same way as
   they do in VP8.


Grange & Alvestrand      Expires August 22, 2013                [Page 4]

Internet-Draft                     VP9                     February 2013


2.2.  Prediction Modes

   VP9 supports the following prediction modes for various block-sizes:

2.2.1.  INTRA modes

   At block-size 4x4, VP9 supports ten intra prediction modes; DC,
   Vertical, Horizontal, TM (True Motion), Horizontal Up, Left Diagonal,
   Vertical Right, Vertical Left, Right Diagonal, and Horizontal Down
   (the same set defined by VP8).  For blocks from 8x8 to 64x64 there is
   also support for ten intra modes; DC, Vertical, Horizontal, TM (True
   Motion), and six angular predictors corresponding, approximately, to
   angles of 27, 45, 63, 117, 135, and 153 degrees.  Furthermore, there
   is additionally the option of applying a low-pass filter to the
   prediction that can be signaled in the bitstream.

2.2.2.  INTER Modes

   VP9 currently supports INTER prediction from up to three reference
   frame buffers (named LAST_FRAME, GOLDEN_FRAME and ALTREF_FRAME, as in
   VP8), but for any particular frame the three available references are
   dynamically selectable from a pool of eight stored reference frames.
   A syntax element in the frame header indicates which sub-set of three
   reference buffers are available when encoding the frame.  A further
   syntax element indicates which of three frame buffers, if any, are to
   be updated at the end of encoding a frame.  Some coded frames may be
   designated as invisible in the sense that they are only used as a
   reference and never actually displayed, akin to the ALTREF frame in
   VP8.  It is also likely that the number of available working
   reference buffers will be increased from three to four in the final
   VP9 bitstream.

   Each INTER coded block within a frame, may be coded using up to two
   motion vectors with two different reference buffers out of the three
   working reference buffers selected for the frame.  When a single
   motion vector is used, sub-pixel interpolation from the indicated
   reference frame buffer is used to obtain the predictor.  When two
   motion vectors, mv1 and mv2, are conveyed for any given block, the
   corresponding reference frame buffers ref1 and ref2 must be different
   from each other, and the final predictor is then obtained by
   averaging the individual predictors from each of the motion vectors,
   i.e.,

   P[i, j] = floor((Pmv1, ref1[i, j] + Pmv2, ref2[i, j] + 1) / 2)

   where P[i, j] is the predictor value at pixel location [i, j], and
   Pmv1, ref1 and Pmv2, ref2 are the INTER predictors corresponding to
   the two motion vectors and reference buffers conveyed.


Grange & Alvestrand      Expires August 22, 2013                [Page 5]

Internet-Draft                     VP9                     February 2013


2.2.3.  Compound INTER-INTRA Mode

   A further prediction mode under consideration is a combination INTER/
   INTRA mode.  In this mode, an INTER predictor and an INTRA predictor
   are combined in a manner whereby pixels closer to the INTRA
   prediction edge (top or left) are weighted more heavily towards the
   INTRA predictor, whilst pixels further away from the edges are
   weighted more heavily towards the INTER predictor.  The exact weights
   used for each pixel thus depend on the particular INTRA prediction
   direction in use.  Conceptually, each INTRA prediction mode at a
   given block size is associated with a constant weighting block of the
   same size - that provides the weight for the INTRA predictor as
   compared to the inter predictor.  For instance, if the weighting
   matrix for a given INTRA mode m and block-size n is given by an nxn
   matrix, Wm, with values between 0 and 1, then the predictor of pixel
   [i, j] denoted P[i, j] is obtained by:

   P[i, j] = Wm[i, j] .  Pm[i, j] + (1 - Wm[i, j]) .  Pmv, ref[i, j]

   where Pm is the INTRA predictor for the given INTRA mode, and Pmv,
   ref is the INTER predictor obtained using motion vector mv and
   reference frame index ref.  This mode is restricted to one motion
   vector per block, and only to blocks of size 16x16 and above, i.e.
   MB/SB32/SB64.  The weighting matrix may be obtained from a 1-D
   exponential decay function of the form A + B exp (-Kx), where x
   represents the distance along the prediction direction to the nearest
   left/top edge.

2.3.  Sub-Pixel Interpolation

   The filters used for sub-pixel interpolation of fractional motion are
   critical to the performance of a video codec.  The maximum motion
   vector precision supported is 1/8-pixel, with the option of switching
   between 1/4-pixel and 1/8-pixel precision using a frame level flag.
   If 1/8-pixel precision is used in the frame, however, it is only used
   for small motion, depending on the magnitude of the reference motion
   vector.  For larger motion - indicated by a larger reference - there
   is almost always motion blur which obviates the need for higher
   precision interpolation.  VP9 defines a family of three 8-tap
   filters, selectable at either the frame or macroblock level in the
   bitstream:

   o  8-tap Regular: An 8-tap Lagrangian interpolation filter designed
      using the int_filt function in MATLAB,

   o  8-tap Sharp: A DCT-based interpolation filter with a sharper
      response, used mostly around sharper edges,


Grange & Alvestrand      Expires August 22, 2013                [Page 6]

Internet-Draft                     VP9                     February 2013


   o  8-tap Smooth (non-interpolating): A smoothing filter designed
      using the windowed Fourier series approach with a Hamming window.
      Note that unlike the other two filters, this filter is non-
      interpolating in the sense that the prediction at integer pixel-
      aligned locations is a smoothed version of the reference frame
      pixels.

2.4.  Transforms

   VP9 supports the Discrete Cosine Transform (DCTs) at sizes 4x4, 8x8,
   16x16 and 32x32 and removes the second-order transform that was
   employed in VP8.  Only transform sizes equal to, or smaller than, the
   prediction block size may be specified.  Modes B_PRED and 4x4 SPLITMV
   are thus restricted to using only the 4x4 transform; modes I8X8_PRED
   and non-4x4 SPLITMV can use either the 4x4 or 8x8 transform; full-
   size (16x16) macroblock predictors can be coupled with either the
   4x4, 8x8 or 16x16 transforms, and superblocks can use any transform
   size up to 32x32.  Further restrictions on the available sub-set of
   transforms can be signaled at the frame-level, by specifying a
   maximum allowable transform size, or at the macroblock level by
   explicitly signaling which of the available transform sizes is used.

   In addition, VP9 introduces support for a new transform type, the
   Asymmetric Discrete Sine Transform (ADST), which can be used in
   combination with specific intra-prediction modes.  It has been shown
   in [Han-Icassp] and [Han-Itip] that when a one-sided boundary is
   available, as in most INTRA prediction modes, the ADST rather than
   the DCT is the optimal transform for the residual signal.  Intra
   prediction modes that predict from a left edge can use the 1-D ADST
   in the horizontal direction, combined with a 1-D DCT in the vertical
   direction.  Similarly, the residual signal resulting from intra
   prediction modes that predict from the top edge can employ a vertical
   1-D ADST transform combined with a horizontal 1-D DCT transform.
   Intra prediction modes that predict from both edges such as the True
   Motion (TM_PRED) mode and some diagonal intra prediction modes, use
   the 1-D ADST in both horizontal and vertical directions.

2.5.  Motion Vector Reference Selection and Coding

   One of the most critical factors in the efficiency of motion vector
   encoding is the generation of a suitable reference motion vector to
   be used as a predictor.  VP9 creates a sorted list of candidate
   reference motion vectors that encompasses the three vectors best,
   nearest and near as defined by VP8.  In addition to the candidates
   produced by the VP8 algorithm, VP9 additionally evaluates the motion
   vector of the co-located block in the reference frame and those of
   nearby blocks.  VP9 introduces a new scoring mechanism to rank these
   reference vectors whereby each candidate is evaluated to determine


Grange & Alvestrand      Expires August 22, 2013                [Page 7]

Internet-Draft                     VP9                     February 2013


   how well it would have predicted the reconstructed pixels in close
   proximity to the current block (more specifically a small number of
   rows immediately above the current block, and maybe a small number of
   columns to the left of the current block).  A predictor is created
   using each candidate vector in turn to displace the pixels in the
   reference frame and the variance of the resulting error signal, with
   respect to the set of pixels in the current frame, is used to rank
   the reference vectors.

   With the three best candidate reference vectors best, nearest and
   near identified, the encoder can either signal the use of the vector
   identified as the nearest (NEAREST_MV mode) or near (NEAR_MV mode)
   or, if neither of them is deemed appropriate, signal the use of a
   completely new motion vector (NEW_MV mode) that is then specified as
   a delta from the best reference candidate.

   One further mode, ZERO_MV, signals the use of the (0, 0) motion
   vector.

   In addition, a more efficient motion vector offset encoding mechanism
   has been introduced.

2.6.  Entropy Coding and Adaptation

   The VP9 bitstream employs the VP8 BoolCoder as the underlying
   arithmetic encoder.  Generally speaking, given a symbol from any
   n-ary alphabet, a static binary tree is constructed with n-1 internal
   nodes, and a binary arithmetic encoder is run at each such node as
   the tree is traversed to encode a particular symbol.  The
   probabilities at each node use 8-bit precision.  The set of n-1
   probabilities for coding the symbol is referred to as the entropy
   coding context of the symbol.  Almost all of the coding elements
   conveyed in a bit-stream - including modes, motion vectors, reference
   frames, and prediction residuals for each transform type and size -
   use this strategy.

   Video content is inherently highly non-stationary in nature and a
   critical component of any codec is the mechanism used to track the
   statistics of the various encoded symbols and update the parameters
   of the entropy coding contexts to match.  VP9 makes use of forward
   context updates through the use of flags in the frame header that
   signal modifications of the coding contexts at the start of each
   frame.  The syntax for forward updates is designed to allow an
   arbitrary sub-set of the node probabilities to be updated whilst
   leaving the others unchanged.  The advantage of using forward
   adaptation is that decoding performance can be substantially
   improved, because no intermediate computations based on encountered
   token counts is necessary.  Updates are encoded differentially, to


Grange & Alvestrand      Expires August 22, 2013                [Page 8]

Internet-Draft                     VP9                     February 2013


   allow a more efficient specification of updated coding contexts which
   is essential given the expanded set of tokens available in VP9.

   In addition, there is also a limited option for signaling backward
   adaptation, which in VP9 is only possible at the end of encoding each
   frame so that the impact on decoding speed is minimal.  Specifically,
   for every frame encoded, first a forward update modifies the entropy
   coding contexts for various symbols encoded starting from the initial
   state at the beginning of the frame.  Thereafter, all symbols encoded
   in the frame are coded using this modified coding state.  At the end
   of the frame, both the encoder and decoder are expected to have
   accumulated counts for various symbols actually encoded or decoded
   over the frame.  Using these actual distributions, a backward update
   step is applied to adapt the entropy coding context for use as the
   baseline for the next frame.

2.7.  Loop Filter

   VP9 introduces a variety of new prediction block and transform sizes
   that require additional loop filtering options to handle a larger
   number of combinations of boundary types.  VP9 also incorporates a
   flatness detector in the loop filter that detects flat regions and
   varies the filter strength and size accordingly.

2.8.  Segmentation

   VP9 introduces more advanced segmentation features that make it much
   more efficient and powerful, allowing each superblock or macroblock
   to specify a segment-ID to which it belongs.  Then, for each segment,
   the frame header can convey common features that will be applied to
   all MBs/SB32s/SB64s belonging to the same segment ID.  Further, the
   segmentation map is coded differentially across frames in order to
   minimize the size of the signaling overhead.  Examples of information
   that can be conveyed for a segment include: restrictions on the
   reference frames that can be used for each segment, coefficient
   skips, quantizer and loopfilter strength, and transform size options.
   Generally speaking, the segmentation mechanism provides a flexible
   set of tools that can be used, in an application specific way, to
   target improvements in perceptual quality for a given compression
   ratio.

   In the reference implementation, segmentation is currently used to
   identify background and foreground areas in encoded video content.
   The (static) background is then coded at a higher quality compared to
   the rest of the frame in certain reference frames (such as the alt-
   ref frame) that provides prediction that persists over a number of
   frames.  In contrast, for the frames between these persistent
   reference frames, the background is given fewer bits by, for example,


Grange & Alvestrand      Expires August 22, 2013                [Page 9]

Internet-Draft                     VP9                     February 2013


   restricting the set of available reference buffers, using only the
   ZERO_MV coding mode, or skipping the residual coefficient block.  The
   result is that more bits are available to code the foreground-portion
   of the scene, while still preserving very good perceptual quality on
   the static background.  Other use cases involving spatial and
   temporal masking for perceptual quality improvement are conceivable.


3.  Bitstream features

   In addition to providing high compression efficiency with reasonable
   complexity, the VP9 bitstream includes features designed to support a
   variety of specific use-cases that are important to internet video
   content delivery and consumption.  This section provides an overview
   of these features.

3.1.  Error-Resilience

   For communication of conversational video with low latency over an
   unreliable network, it is imperative to support a coding mode where
   decoding can continue without errors even when arbitrary frames are
   lost.  Specifically, the arithmetic encoder should still be able to
   decode symbols correctly in frames subsequent to lost frames, even
   though frame buffers have been corrupted, leading to encoder-decoder
   mismatch.  The hope is that the drift between the encoder and decoder
   will still be manageable until such time as a key frame is sent or
   other corrective action (such as reference picture selection) can be
   taken.  VP9 supports a frame level error_resilient_mode flag which
   when turned on will only allow coding modes where this is possible to
   achieve.  In particular, the following restrictions are imposed in
   error resilient mode:

   1.  The entropy coding context probabilities are reset to defaults at
       the beginning of each frame.  (This effectively prevents
       propagation of forward updates as well as backward updates),

   2.  For MV reference selection, the co-located MV from a previously
       encoded reference frame can no longer be included in the
       reference candidate list,

   3.  For MV reference selection, sorting of the initial list of motion
       vector reference candidates based on search in the reference
       frame buffer is disabled.

   These restrictions produce a modest performance drop.


Grange & Alvestrand      Expires August 22, 2013               [Page 10]

Internet-Draft                     VP9                     February 2013


3.2.  Parallel Decodability

   Smooth encoding and playback of high-definition video on resource
   constrained personal devices (smartphones, tablets, netbooks, etc.)
   in software necessitates exploiting some form of parallelism, so that
   multi-threaded applications can be built around the codec to exploit
   the inherent multi-processing capabilities of modern processors.
   This may include either the ability to encode/decode parts of a frame
   in parallel, or the ability to decode successive frames in parallel,
   or a combination of both.  VP9 supports both forms of parallelism, as
   described below:

3.2.1.  Frame-Level Parallelism

   A frame level flag frame_parallel_mode, when turned on, enables an
   encoding mode where the entropy decoding for successive frames can be
   conducted in a quasi-parallel manner just by parsing the frame
   headers, before these frames actually need to be reconstructed.  In
   this mode, only the frame headers need to be decoded sequentially.
   Beyond that, the entropy decoding for each frame can be conducted in
   a lagged parallel mode as long as the co-located motion vector
   information from a previous reference frame has been decoded prior to
   the current frame.  The reconstruction of the frames can then be
   conducted sequentially in coding order as they are required to be
   displayed.  This mode will enable multi-threaded decoder
   implementations that results in smoother playback performance.
   Specifically, this mode imposes the following restrictions on the
   bitstream, which is a subset of the restrictions for the error-
   resilient mode.

   1.  Backward entropy coding context updates are disabled.  But
       forward updates are allowed to propagate.

   2.  For MV reference selection, sorting of the initial list of motion
       vector reference candidates based on a search in the reference
       frame buffer is disabled.  However, the co-located MV from a
       previously encoded reference frame can be included in the initial
       candidate list.

3.2.2.  Tiling

   In addition to making provisions for decoding multiple frames in
   parallel, VP9 also has support for decoding a single frame using
   multiple threads.  For this, VP9 introduces tiles, which are
   independently coded and decodable sub-units of the video frame.  When
   enabled a frame can be split into, for example, 2 or 4 column-based
   tiles.  Each tile shares the same frame entropy model, but all
   contexts and pixel values (for intra prediction) that cross tile-


Grange & Alvestrand      Expires August 22, 2013               [Page 11]

Internet-Draft                     VP9                     February 2013


   boundaries take the same value as those at the left, top or right
   edge of the frame.  Each tile can thus be decoded and encoded
   completely independently, which is expected to enable significant
   speedups in multi-threaded encoders/decoders, without introducing any
   additional latency.  Note that loop-filtering across tile-edges can
   still be applied, assuming a decoder implementation model where the
   loop-filtering operation lags the decoder's reconstruction of the
   individual tiles within the frame so as not to use any pixel that is
   not already reconstructed.  Further, backward entropy adaptation - a
   light-weight operation - can still be conducted for the whole frame
   after entropy decoding for all tiles has finished.

3.3.  Scalability

   The VP9 bit-stream will provide a number of flexible features that
   can be combined in specific ways to efficiently provide various forms
   of scalability.  VP9 increases the number of available reference
   frame buffers to eight, from which three may be selected for each
   frame.  In addition, each coded frame may be resampled and coded at a
   resolution different from the reference buffers, allowing internal
   spatial resolution changes on-the-fly without having to resort to
   using keyframes.  When such a resolution change is signaled in the
   bit-stream, the reference buffers as well as the corresponding MV
   information is suitably transformed to the new resolution before
   applying standard coding tools.  Furthermore, VP9 defines the
   maintenance of four different entropy coding contexts to be selected
   and optionally updated on every frame, thereby making it possible for
   the encoder to use a different entropy coding context for each
   scalable layer, if required.  These flexible features together enable
   an encoder/decoder to implement various forms of coarse-grained
   scalability, including temporal, spatial, or combined spatio-temporal
   scalability, without explicitly creating spatially scalable encoding
   modes.


4.  IANA Considerations

   This document makes no request of IANA.

   Note to RFC Editor: this section may be removed on publication as an
   RFC.


5.  Security Considerations

   The VP9 bitstream offers no security functions.  Integrity and
   confidentiality must be ensured by functions outside the bistream.


Grange & Alvestrand      Expires August 22, 2013               [Page 12]

Internet-Draft                     VP9                     February 2013


   The VP9 bitstream does not offer functions for embedding of other
   types of objects, either active or passive.  So this class of attack
   cannot be mounted using VP9.

   Implementations of codecs are often written with a strong focus on
   speed.  The reference software has been carefully vetted for security
   issues, but no guarantees can be given.  People who use other
   people's decoder software will need to take appropriate care when
   executing the software in a security sensitive context.


6.  Acknowledgements

   This document is heavily based on the paper by Bankoski, J., Bultje,
   R.S., Grange, A., Gu, Q., Han, J., Koleszar, J., Mukherjee, D.,
   Wilkins, P., Xu, Y., Towards a Next Generation Open-source Video
   Codec, IS&T / SPIE EI Conference on Visual Information Processing and
   Communication IV, February 5-7, 2013.


7.  Informative References

   [Google-webm]
              "WEBM project website", March .

              http://www.webmproject.org/

   [Han-Icassp]
              Han, J., "Towards jointly optimal spatial prediction and
              adaptive transform in video/image coding", March 2010.

              IEEE Int. Conf. on Acoustics, Speech and Signal Proc.
              (ICASSP), pp. 726-729

   [Han-Itip]
              Han, J., "Jointly optimized spatial prediction and block
              transform for video and image coding", April 2012.

              IEEE Transactions on Image Processing, vol. 21, pp. 1874-
              1884

   [RFC6368]  Marques, P., Raszuk, R., Patel, K., Kumaki, K., and T.
              Yamagata, "Internal BGP as the Provider/Customer Edge
              Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)",
              RFC 6368, September 2011.

   [vp9-paper]
              Bankoski, J., Bultje, R., Grange, A., Gu, Q., Han, J.,


Grange & Alvestrand      Expires August 22, 2013               [Page 13]

Internet-Draft                     VP9                     February 2013


              Koleszar, J., Mukherjee, D., Wilkins, P., and Y. Xu,
              "Towards a Next Generation Open-source Video Codec",
              February 2013.

              IS&T / SPIE EI Conference on Visual Information Processing
              and Communication IV


Authors' Addresses

   Adrian Grange
   Google

   Email: agrange@google.com


   Harald Alvestrand
   Google


   Phone:
   Fax:
   Email: hta@google.com
   URI:


Grange & Alvestrand      Expires August 22, 2013               [Page 14]