rfc8878xml2.original.xml | rfc8878.xml | |||
---|---|---|---|---|
<?xml version="1.0" encoding="US-ASCII"?> | <?xml version="1.0" encoding="utf-8"?> | |||
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ | ||||
<!ENTITY RFC1952 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/re | ||||
ference.RFC.1952.xml"> | ||||
]> | ||||
<rfc ipr="trust200902" category="info" obsoletes="8478" | ||||
docName="draft-kucherawy-rfc8478bis-05"> | ||||
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> | <!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent"> | |||
<?rfc toc="yes" ?> | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" | |||
<?rfc tocdepth="4" ?> | obsoletes="8478" docName="draft-kucherawy-rfc8478bis-05" number="8878" | |||
<?rfc symrefs="yes" ?> | updates="" submissionType="IETF" category="info" consensus="true" | |||
<?rfc sortrefs="yes"?> | xml:lang="en" tocInclude="true" tocDepth="4" symRefs="true" | |||
<?rfc compact="yes" ?> | sortRefs="true" version="3"> | |||
<?rfc subcompact="no"?> | ||||
<front> | <front> | |||
<title abbrev="application/zstd"> | ||||
Zstandard Compression and the application/zstd Media Type | ||||
</title> | ||||
<author initials="Y." surname="Collet" | ||||
fullname="Yann Collet"> | ||||
<organization> | ||||
</organization> | ||||
<address> | ||||
<postal> | ||||
<street>1 Hacker Way</street> | ||||
<city>Menlo Park</city> | ||||
<region>CA</region> | ||||
<code>94025</code> | ||||
<country>United States of America</country> | ||||
</postal> | ||||
<email>cyan@fb.com</email> | ||||
</address> | ||||
</author> | ||||
<author initials="M. S." surname="Kucherawy" | ||||
fullname="Murray S. Kucherawy" | ||||
role="editor"> | ||||
<organization> | ||||
</organization> | ||||
<address> | ||||
<postal> | ||||
<street>1 Hacker Way</street> | ||||
<city>Menlo Park</city> | ||||
<region>CA</region> | ||||
<code>94025</code> | ||||
<country>United States of America</country> | ||||
</postal> | ||||
<email>msk@fb.com</email> | ||||
</address> | ||||
</author> | ||||
<date year="2020"/> | ||||
<area>General</area> | ||||
<keyword>compression</keyword> | <title abbrev="application/zstd"> | |||
Zstandard Compression and the 'application/zstd' Media Type | ||||
</title> | ||||
<seriesInfo name="RFC" value="8878"/> | ||||
<author initials="Y." surname="Collet" fullname="Yann Collet"> | ||||
<organization>Facebook</organization> | ||||
<address> | ||||
<postal> | ||||
<street>1 Hacker Way</street> | ||||
<city>Menlo Park</city> | ||||
<region>CA</region> | ||||
<code>94025</code> | ||||
<country>United States of America</country> | ||||
</postal> | ||||
<email>cyan@fb.com</email> | ||||
</address> | ||||
</author> | ||||
<author initials="M." surname="Kucherawy" fullname="Murray S. Kucherawy" rol | ||||
e="editor"> | ||||
<organization>Facebook</organization> | ||||
<address> | ||||
<postal> | ||||
<street>1 Hacker Way</street> | ||||
<city>Menlo Park</city> | ||||
<region>CA</region> | ||||
<code>94025</code> | ||||
<country>United States of America</country> | ||||
</postal> | ||||
<email>msk@fb.com</email> | ||||
</address> | ||||
</author> | ||||
<date year="2021" month="February" /> | ||||
<area>General</area> | ||||
<keyword>compression</keyword> | ||||
<abstract> | ||||
<abstract> | <t>Zstandard, or "zstd" (pronounced "zee standard"), is a lossless data | |||
<t> Zstandard, or "zstd" (pronounced "zee standard"), is a | compression mechanism. This document describes the mechanism and | |||
data compression mechanism. This document describes the | registers a media type, content encoding, and a structured syntax | |||
mechanism and registers a media type and content | suffix to be used when transporting zstd-compressed content via MIME.</t> | |||
encoding to be used when transporting zstd-compressed | ||||
content via Multipurpose Internet Mail Extensions | ||||
(MIME). It also registers a corresponding media | ||||
type, content encoding, and structured syntax suffix. </t> | ||||
<t> Despite use of the word "standard" as part of its name, | <t> Despite use of the word "standard" as part of Zstandard, | |||
readers are advised that this document is not an | readers are advised that this document is not an | |||
Internet Standards Track specification; it is being | Internet Standards Track specification; it is being | |||
published for informational purposes only. </t> | published for informational purposes only. </t> | |||
<t> This document replaces and obsoletes RFC 8478. </t> | ||||
<t> This document replaces and obsoletes RFC 8478. </t> | </abstract> | |||
</abstract> | </front> | |||
<middle> | ||||
</front> | <section anchor="intro" numbered="true" toc="default"> | |||
<name>Introduction</name> | ||||
<middle> | <t> Zstandard, or "zstd" (pronounced "zee standard"), is a | |||
<section anchor="intro" title="Introduction"> | ||||
<t> Zstandard, or "zstd" (pronounced "zee standard"), is a | ||||
data compression mechanism, akin to gzip | data compression mechanism, akin to gzip | |||
<xref target="RFC1952"/>. </t> | <xref target="RFC1952" format="default"/>. </t> | |||
<t> Despite use of the word "standard" as part of its name, | ||||
<t> Despite use of the word "standard" as part of its name, | ||||
readers are advised that this document is not an | readers are advised that this document is not an | |||
Internet Standards Track specification; it is being | Internet Standards Track specification; it is being | |||
published for informational purposes only. </t> | published for informational purposes only. </t> | |||
<t>This document describes the Zstandard format. Also, to enable the | ||||
<t> This document describes the Zstandard format. Also, | transport of a data object compressed with Zstandard, this document | |||
to enable the transport of a data object compressed with | registers a media type, content encoding, and structured syntax suffix | |||
Zstandard, this document registers a media type that can be | that can be used to identify such content when it is used in a | |||
used to identify such content when it is used in a payload | payload.</t> | |||
encoded using Multipurpose Internet Mail Extensions | </section> | |||
(MIME). </t> | <section anchor="definitions" numbered="true" toc="default"> | |||
</section> | <name>Definitions</name> | |||
<t>Some terms used elsewhere in this document are defined | ||||
<section anchor="definitions" title="Definitions"> | here for clarity. </t> | |||
<t> Some terms used elsewhere in this document are defined | <dl newline="false" spacing="normal"> | |||
here for clarity. </t> | <dt>uncompressed:</dt> | |||
<dd>Describes an arbitrary | ||||
<t> <list style="hanging"> | ||||
<t hangText="uncompressed:"> Describes an arbitrary | ||||
set of bytes in their original form, prior to being | set of bytes in their original form, prior to being | |||
subjected to compression. </t> | subjected to compression. </dd> | |||
<t hangText="compress, compression:"> The act of processing | ||||
a set of bytes via the compression mechanism | ||||
described here. </t> | ||||
<t hangText="compressed:"> Describes the result of passing | <dt>compressed:</dt> | |||
<dd>Describes the result of passing | ||||
a set of bytes through this mechanism. The original | a set of bytes through this mechanism. The original | |||
input has thus been compressed. </t> | input has thus been compressed. </dd> | |||
<dt>decompressed:</dt> | ||||
<t hangText="decompress, decompression:"> The act of | <dd>Describes the result of passing | |||
processing a set of bytes through the inverse | ||||
of the compression mechanism described here, in an | ||||
attempt to recover the original set of bytes prior | ||||
to compression. </t> | ||||
<t hangText="decompressed:"> Describes the result of passing | ||||
a set of bytes through the reverse of this mechanism. | a set of bytes through the reverse of this mechanism. | |||
When this is successful, the decompressed payload and | When this is successful, the decompressed payload and | |||
the uncompressed payload are indistinguishable. </t> | the uncompressed payload are indistinguishable. </dd> | |||
<dt>encode:</dt> | ||||
<t hangText="encode:"> The process of translating data | <dd>The process of translating data | |||
from one form to another; this may include compression | from one form to another; this may include compression, | |||
or it may refer to other translations done as part | or it may refer to other translations done as part | |||
of this specification. </t> | of this specification. </dd> | |||
<dt>decode:</dt> | ||||
<t hangText="decode:"> The reverse of "encode"; describes | <dd>The reverse of "encode"; describes | |||
a process of reversing a prior encoding to recover | a process of reversing a prior encoding to recover | |||
the original content. </t> | the original content. </dd> | |||
<dt>frame:</dt> | ||||
<t hangText="frame:"> Content compressed by Zstandard is | <dd>Content compressed by Zstandard is | |||
transformed into a Zstandard frame. Multiple frames | transformed into a Zstandard frame. Multiple frames | |||
can be appended into a single file or stream. A frame | can be appended into a single file or stream. A frame | |||
is completely independent, has a defined beginning | is completely independent, has a defined beginning | |||
and end, and has a set of parameters that tells the | and end, and has a set of parameters that tells the | |||
decoder how to decompress it. </t> | decoder how to decompress it. </dd> | |||
<dt>block:</dt> | ||||
<t hangText="block:"> A frame encapsulates one or multiple | <dd>A frame encapsulates one or multiple | |||
blocks. Each block contains arbitrary content, which | blocks. Each block contains arbitrary content, which | |||
is described by its header, and has a guaranteed | is described by its header, and has a guaranteed | |||
maximum content size that depends upon frame | maximum content size that depends upon frame | |||
parameters. Unlike frames, each block depends | parameters. Unlike frames, each block depends | |||
on previous blocks for proper decoding. However, each | on previous blocks for proper decoding. However, each | |||
block can be decompressed without waiting for its | block can be decompressed without waiting for its | |||
successor, allowing streaming operations. </t> | successor, allowing streaming operations. </dd> | |||
<dt>natural order:</dt> | ||||
<t hangText="natural order:"> A sequence or ordering of | <dd>A sequence or ordering of | |||
objects or values that is typical of that type of | objects or values that is typical of that type of | |||
object or value. A set of unique integers, for | object or value. A set of unique integers, for | |||
example, is in "natural order" if when progressing | example, is in "natural order" if, when progressing | |||
from one element in the set or sequence to the next, | from one element in the set or sequence to the next, | |||
there is never a decrease in value. </t> | there is never a decrease in value. </dd> | |||
</list> </t> | </dl> | |||
<t>The naming convention for identifiers within the | ||||
<t> The naming convention for identifiers within the | ||||
specification is Mixed_Case_With_Underscores. | specification is Mixed_Case_With_Underscores. | |||
Identifiers inside square brackets indicate that the | Identifiers inside square brackets indicate that the | |||
identifier is optional in the presented context. </t> | identifier is optional in the presented context. </t> | |||
</section> | </section> | |||
<section anchor="compression" numbered="true" toc="default"> | ||||
<section anchor="compression" title="Compression Algorithm"> | <name>Compression Algorithm</name> | |||
<t> This section describes the Zstandard algorithm. </t> | <t> This section describes the Zstandard algorithm. </t> | |||
<t> The purpose of this document is to define a lossless | <t> The purpose of this document is to define a lossless | |||
compressed data format that is a) independent of the CPU | compressed data format that is a) independent of the CPU | |||
type, operating system, file system, and character set and | type, operating system, file system, and character set and | |||
b) is suitable for file compression and pipe and streaming | b) suitable for file compression and pipe and streaming | |||
compression, using the Zstandard algorithm. The text of | compression, using the Zstandard algorithm. The text of | |||
the specification assumes a basic background in | the specification assumes a basic background in | |||
programming at the level of bits and other primitive data | programming at the level of bits and other primitive data | |||
representations. </t> | representations. </t> | |||
<t> The data can be produced or consumed, even for an | ||||
<t> The data can be produced or consumed, even for an | ||||
arbitrarily long sequentially presented input data stream, | arbitrarily long sequentially presented input data stream, | |||
using only an a priori bounded amount of intermediate | using only an a priori bounded amount of intermediate | |||
storage, and hence can be used in data communications. | storage; hence, it can be used in data communications. | |||
The format uses the Zstandard compression method, and | The format uses the Zstandard compression method, and | |||
an optional xxHash-64 checksum method | an optional xxHash-64 checksum method | |||
<xref target="XXHASH"/>, for detection of data | <xref target="XXHASH" format="default"/>, for detection of da ta | |||
corruption. </t> | corruption. </t> | |||
<t> The data format defined by this specification does not | ||||
<t> The data format defined by this specification does not | ||||
attempt to allow random access to compressed data. </t> | attempt to allow random access to compressed data. </t> | |||
<t> Unless otherwise indicated below, a compliant compressor | ||||
<t> Unless otherwise indicated below, a compliant compressor | ||||
must produce data sets that conform to the specifications | must produce data sets that conform to the specifications | |||
presented here. However, it does not need to support all | presented here. However, it does not need to support all | |||
options. </t> | options. </t> | |||
<t> A compliant decompressor must be able to decompress at | ||||
<t> A compliant decompressor must be able to decompress at | ||||
least one working set of parameters that conforms to the | least one working set of parameters that conforms to the | |||
specifications presented here. It may also ignore | specifications presented here. It may also ignore | |||
informative fields, such as the checksum. Whenever it does | informative fields, such as the checksum. Whenever it does | |||
not support a parameter defined in the compressed stream, | not support a parameter defined in the compressed stream, | |||
it must produce a non-ambiguous error code and associated | it must produce an unambiguous error code and associated | |||
error message explaining which parameter is | error message explaining which parameter is | |||
unsupported. </t> | unsupported. </t> | |||
<t> This specification is intended for use by implementers | ||||
<t> This specification is intended for use by implementers | ||||
of software to compress data into Zstandard format and/or | of software to compress data into Zstandard format and/or | |||
decompress data from Zstandard format. The Zstandard | decompress data from Zstandard format. The Zstandard | |||
format is supported by an open source reference | format is supported by an open-source reference | |||
implementation, written in portable C, and available at | implementation, written in portable C, and available at | |||
<xref target="ZSTD"/>. </t> | <xref target="ZSTD" format="default"/>. </t> | |||
<section anchor="comp_frames" numbered="true" toc="default"> | ||||
<section anchor="comp_frames" title="Frames"> | <name>Frames</name> | |||
<t> Zstandard compressed data is made up of one | <t> Zstandard compressed data is made up of one | |||
or more frames. Each frame is independent and can | or more frames. Each frame is independent and can | |||
be decompressed independently of other frames. The | be decompressed independently of other frames. The | |||
decompressed content of multiple concatenated | decompressed content of multiple concatenated | |||
frames is the concatenation of each frame's | frames is the concatenation of each frame's | |||
decompressed content. </t> | decompressed content. </t> | |||
<t> There are two frame formats defined for | ||||
<t> There are two frame formats defined for | ||||
Zstandard: Zstandard frames and skippable frames. | Zstandard: Zstandard frames and skippable frames. | |||
Zstandard frames contain compressed data, while | Zstandard frames contain compressed data, while | |||
skippable frames contain custom user metadata. </t> | skippable frames contain custom user metadata. </t> | |||
<section anchor="comp_zstd_frames" numbered="true" toc="default"> | ||||
<section anchor="comp_zstd_frames" | <name>Zstandard Frames</name> | |||
title="Zstandard Frames"> | <t> The structure of a single Zstandard frame | |||
<t> The structure of a single Zstandard frame | ||||
is as follows: | is as follows: | |||
<figure><artwork> | </t> | |||
+--------------------+------------+ | ||||
| Magic_Number | 4 bytes | | ||||
+--------------------+------------+ | ||||
| Frame_Header | 2-14 bytes | | ||||
+--------------------+------------+ | ||||
| Data_Block | n bytes | | ||||
+--------------------+------------+ | ||||
| [More Data_Blocks] | | | ||||
+--------------------+------------+ | ||||
| [Content_Checksum] | 4 bytes | | ||||
+--------------------+------------+ | ||||
</artwork></figure> </t> | ||||
<t> <list style="hanging"> | ||||
<t hangText="Magic_Number:"> | ||||
4 bytes, little-endian | ||||
format. Value: 0xFD2FB528. | ||||
</t> | ||||
<t hangText="Frame_Header:"> | ||||
2 to 14 bytes, detailed in | ||||
<xref target="comp_frame_hdr"/>. | ||||
</t> | ||||
<t hangText="Data_Block:"> | ||||
Detailed in | ||||
<xref target="blocks"/>. | ||||
This is where | ||||
data appears. | ||||
</t> | ||||
<t hangText="Content_Checksum:"> | ||||
An optional 32-bit checksum, | ||||
only present if | ||||
Content_Checksum_Flag is set. | ||||
The content checksum is the | ||||
result of the XXH64() hash | ||||
function | ||||
<xref target="XXHASH"/> | ||||
digesting the original | ||||
(decoded) data as input, and a | ||||
seed of zero. The low 4 | ||||
bytes of the checksum are | ||||
stored in little-endian format. | ||||
</t> | ||||
</list> </t> | ||||
<t> The magic number was selected to be less | ||||
probable to find at the beginning of an | ||||
arbitrary file. It avoids trivial | ||||
patterns (0x00, 0xFF, repeated bytes, | ||||
increasing bytes, etc.), contains byte | ||||
values outside of ASCII range, and doesn't | ||||
map into UTF-8 space, all of which reduce | ||||
the likelihood of its appearance at the | ||||
top of a text file. </t> | ||||
<section anchor="comp_frame_hdr" | ||||
title="Frame Header"> | ||||
<t> The frame header has a variable | ||||
size, with a minimum of 2 bytes | ||||
and up to 14 bytes depending on | ||||
optional parameters. The structure | ||||
of Frame_Header is as follows: | ||||
<figure><artwork> | <table anchor="single-frame"> | |||
+-------------------------+-----------+ | <name>The Structure of a Single Zstandard Frame</name> | |||
| Frame_Header_Descriptor | 1 byte | | <tbody> | |||
+-------------------------+-----------+ | <tr> | |||
| [Window_Descriptor] | 0-1 byte | | <td>Magic_Number</td> | |||
+-------------------------+-----------+ | <td>4 bytes</td> | |||
| [Dictionary_ID] | 0-4 bytes | | </tr> | |||
+-------------------------+-----------+ | <tr> | |||
| [Frame_Content_Size] | 0-8 bytes | | <td>Frame_Header</td> | |||
+-------------------------+-----------+ | <td>2-14 bytes</td> | |||
</artwork></figure> </t> | </tr> | |||
<tr> | ||||
<td>Data_Block</td> | ||||
<td>n bytes</td> | ||||
</tr> | ||||
<tr> | ||||
<td>[More Data_Blocks]</td> | ||||
<td></td> | ||||
</tr> | ||||
<tr> | ||||
<td>[Content_Checksum]</td> | ||||
<td>4 bytes</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Magic_Number:</dt> | ||||
<dd> | ||||
4 bytes, little-endian format. Value: 0xFD2FB528. | ||||
</dd> | ||||
<dt>Frame_Header:</dt> | ||||
<dd> | ||||
2 to 14 bytes, detailed in | ||||
<xref target="comp_frame_hdr" format="default"/>. | ||||
</dd> | ||||
<dt>Data_Block:</dt> | ||||
<dd> | ||||
Detailed in <xref target="blocks" format="default"/>. | ||||
This is where data appears. | ||||
</dd> | ||||
<dt>Content_Checksum:</dt> | ||||
<dd> | ||||
An optional 32-bit checksum, | ||||
only present if | ||||
Content_Checksum_Flag is set. | ||||
The content checksum is the | ||||
result of the XXH64() hash | ||||
function | ||||
<xref target="XXHASH" format="default"/> | ||||
digesting the original | ||||
(decoded) data as input, and a | ||||
seed of zero. The low 4 | ||||
bytes of the checksum are | ||||
stored in little-endian format. | ||||
</dd> | ||||
</dl> | ||||
<t> The magic number was selected to be less | ||||
probable to find at the beginning of an | ||||
arbitrary file. It avoids trivial | ||||
patterns (0x00, 0xFF, repeated bytes, | ||||
increasing bytes, etc.), contains byte | ||||
values outside of the ASCII range, and doesn't | ||||
map into UTF-8 space, all of which reduce | ||||
the likelihood of its appearance at the | ||||
top of a text file. </t> | ||||
<section anchor="comp_frame_hdr" numbered="true" toc="default"> | ||||
<name>Frame Header</name> | ||||
<t> The frame header has a variable | ||||
size, with a minimum of 2 bytes | ||||
up to a maximum of 14 bytes depending on | ||||
optional parameters. The structure | ||||
of Frame_Header is as follows:</t> | ||||
<section anchor="comp_frame_header_desc" | <table anchor="frame-header"> | |||
title="Frame_Header_Descriptor"> | <name>The Structure of Frame_Header</name> | |||
<t> The first header's byte is | <tbody> | |||
called the | <tr> | |||
Frame_Header_Descriptor. | <td>Frame_Header_Descriptor </td> | |||
It describes | <td>1 byte</td> | |||
which other fields are | </tr> | |||
present. Decoding this | <tr> | |||
byte is enough to tell the | <td>[Window_Descriptor]</td> | |||
size of Frame_Header. | <td>0-1 byte</td> | |||
</tr> | ||||
<tr> | ||||
<td>[Dictionary_ID]</td> | ||||
<td>0-4 bytes</td> | ||||
</tr> | ||||
<tr> | ||||
<td>[Frame_Content_Size]</td> | ||||
<td>0-8 bytes</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<figure><artwork> | <section anchor="comp_frame_header_desc" numbered="true" toc="defaul | |||
+------------+-------------------------+ | t"> | |||
| Bit Number | Field Name | | <name>Frame_Header_Descriptor</name> | |||
+------------+-------------------------+ | <t> The first header's byte is called the | |||
| 7-6 | Frame_Content_Size_Flag | | Frame_Header_Descriptor. It describes | |||
+------------+-------------------------+ | which other fields are present. Decoding this | |||
| 5 | Single_Segment_Flag | | byte is enough to tell the size of Frame_Header. | |||
+------------+-------------------------+ | </t> | |||
| 4 | (unused) | | ||||
+------------+-------------------------+ | ||||
| 3 | (reserved) | | ||||
+------------+-------------------------+ | ||||
| 2 | Content_Checksum_Flag | | ||||
+------------+-------------------------+ | ||||
| 1-0 | Dictionary_ID_Flag | | ||||
+------------+-------------------------+ | ||||
</artwork></figure> </t> | ||||
<t> In this table, bit 7 is | <table anchor="Frame-Header-Descriptor"> | |||
the highest bit, while bit | <name>The Frame_Header_Descriptor</name> | |||
0 is the lowest one. </t> | <thead> | |||
<tr> | ||||
<th>Bit Number</th> | ||||
<th>Field Name</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td>7-6</td> | ||||
<td>Frame_Content_Size_Flag </td> | ||||
</tr> | ||||
<tr> | ||||
<td>5</td> | ||||
<td>Single_Segment_Flag </td> | ||||
</tr> | ||||
<tr> | ||||
<td>4</td> | ||||
<td>(unused)</td> | ||||
</tr> | ||||
<tr> | ||||
<td>3</td> | ||||
<td>(reserved)</td> | ||||
</tr> | ||||
<tr> | ||||
<td>2</td> | ||||
<td>Content_Checksum_Flag</td> | ||||
</tr> | ||||
<tr> | ||||
<td>1-0</td> | ||||
<td>Dictionary_ID_Flag</td> | ||||
</tr> | ||||
<section title="Frame_Content_Siz | </tbody> | |||
e_Flag"> | </table> | |||
<t> | <t> In <xref target="Frame-Header-Descriptor" />, bit 7 is | |||
This is a 2-bit flag | the highest bit, while bit | |||
(equivalent to | 0 is the lowest one. </t> | |||
Frame_Header_Descriptor | <section numbered="true" toc="default"> | |||
right-shifted 6 bits) | <name>Frame_Content_Size_Flag</name> | |||
specifying whether | <t> | |||
Frame_Content_Size | This is a 2-bit flag (equivalent to Frame_Header_Descriptor | |||
(the decompressed | right-shifted 6 bits) specifying whether Frame_Content_Size | |||
data size) is provided | (the decompressed data size) is provided within the header. | |||
within the header. | Frame_Content_Size_Flag provides FCS_Field_Size, which is the | |||
Frame_Content_Size_Flag | number of bytes used by Frame_Content_Size according to | |||
provides FCS_Field_Size, | <xref target="frame-content-size-flag"/>: | |||
which is the | </t> | |||
number of bytes | ||||
used by | ||||
Frame_Content_Size | ||||
according to | ||||
the following table: | ||||
<figure><artwork> | ||||
+-------------------------+--------+---+---+---+ | ||||
| Frame_Content_Size_Flag | 0 | 1 | 2 | 3 | | ||||
+-------------------------+--------+---+---+---+ | ||||
| FCS_Field_Size | 0 or 1 | 2 | 4 | 8 | | ||||
+-------------------------+--------+---+---+---+ | ||||
</artwork></figu | ||||
re></t> | ||||
<t> | <table anchor="frame-content-size-flag"> | |||
When | <name>Frame_Content_Size_Flag Provides FCS_Field_Size</name> | |||
Frame_Content_Size_Flag | <tbody> | |||
is 0, FCS_Field_Size | <tr> | |||
depends on | <td>Frame_Content_Size_Flag</td> | |||
Single_Segment_Flag: | <td align="center">0</td> | |||
If Single_Segment_Flag | <td>1</td> | |||
is set, FCS_Field_Size | <td>2</td> | |||
is 1. Otherwise, | <td>3</td> | |||
FCS_Field_Size is 0; | </tr> | |||
Frame_Content_Size | <tr> | |||
is not provided. | <td>FCS_Field_Size</td> | |||
</t> | <td align="center">0 or 1</td> | |||
</section> | <td>2</td> | |||
<td>4</td> | ||||
<td>8</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<section title="Single_Segment_Fl | <t> | |||
ag"> | When Frame_Content_Size_Flag | |||
<t> | is 0, FCS_Field_Size | |||
depends on | ||||
Single_Segment_Flag: | ||||
if Single_Segment_Flag | ||||
is set, FCS_Field_Size | ||||
is 1. Otherwise, | ||||
FCS_Field_Size is 0; | ||||
Frame_Content_Size | ||||
is not provided. | ||||
</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Single_Segment_Flag</name> | ||||
<t> | ||||
If this flag is | If this flag is | |||
set, data must | set, data must | |||
be regenerated | be regenerated | |||
within a single | within a single | |||
continuous | continuous | |||
memory segment. | memory segment. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
In this case, | In this case, | |||
Window_Descriptor | Window_Descriptor | |||
byte is skipped, but | byte is skipped, but | |||
Frame_Content_Size | Frame_Content_Size | |||
is necessarily | is necessarily | |||
present. As a | present. As a | |||
consequence, | consequence, | |||
the decoder | the decoder | |||
must allocate a | must allocate a | |||
memory segment | memory segment | |||
of size equal | of a size equal to | |||
or larger than | or larger than | |||
Frame_Content_Size. | Frame_Content_Size. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
In order to protect the | In order to protect the | |||
decoder from | decoder from | |||
unreasonable memory | unreasonable memory | |||
requirements, | requirements, | |||
a decoder is | a decoder is | |||
allowed to reject a | allowed to reject a | |||
compressed frame that | compressed frame that | |||
requests a memory size | requests a memory size | |||
beyond the decoder's | beyond the decoder's | |||
authorized range. | authorized range. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
For broader | For broader | |||
compatibility, | compatibility, | |||
decoders are | decoders are | |||
recommended to | recommended to | |||
support memory | support memory | |||
sizes of at least 8 MB. | sizes of at least 8 MB. | |||
This is only a | This is only a | |||
recommendation; | recommendation; | |||
each decoder is | each decoder is | |||
free to support | free to support | |||
higher or lower | higher or lower | |||
limits, depending on | limits, depending on | |||
local limitations. | local limitations. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Unused Bit"> | <name>Unused Bit</name> | |||
<t> | <t> | |||
A decoder compliant | A decoder compliant | |||
with this specification | with this specification | |||
version shall not | version shall not | |||
interpret this bit. | interpret this bit. | |||
It might | It might | |||
be used in a future | be used in a future | |||
version, to signal a | version to signal a | |||
property that is not | property that is not | |||
mandatory to properly | mandatory to properly | |||
decode the frame. | decode the frame. | |||
An encoder compliant | An encoder compliant | |||
with this specification | with this specification | |||
must set this bit to | must set this bit to | |||
zero. | zero. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Reserved Bit"> | <name>Reserved Bit</name> | |||
<t> | <t> | |||
This bit is reserved | This bit is reserved | |||
for some future | for some future | |||
feature. Its value must | feature. Its value must | |||
be zero. A decoder | be zero. A decoder | |||
compliant with this | compliant with this | |||
specification version | specification version | |||
must ensure it is not | must ensure it is not | |||
set. This bit may be | set. This bit may be | |||
used in a future | used in a future | |||
revision, to signal a | revision to signal a | |||
feature that must be | feature that must be | |||
interpreted to decode | interpreted to decode | |||
the frame correctly. | the frame correctly. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Content_Checksum_ | <name>Content_Checksum_Flag</name> | |||
Flag"> | <t> | |||
<t> | ||||
If this flag is set, a | If this flag is set, a | |||
32-bit | 32-bit | |||
Content_Checksum will | Content_Checksum will | |||
be present at the | be present at the | |||
frame's end. | frame's end. | |||
See the description of | See the description of | |||
Content_Checksum above. | Content_Checksum above. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Dictionary_ID_Fla | <name>Dictionary_ID_Flag</name> | |||
g"> | <t> | |||
<t> | This is a 2-bit flag | |||
This is a 2-bit flag | (= Frame_Header_Descriptor & 0x3) indicating | |||
(= Frame_Header_Descripto | whether a dictionary ID | |||
r & 0x3) indicating | is provided within the | |||
whether a dictionary ID | header. It also | |||
is provided within the | specifies the size of | |||
header. It also | this field as | |||
specifies the size of | DID_Field_Size: | |||
this field as | </t> | |||
DID_Field_Size: | ||||
<figure><artwork> | ||||
+--------------------+---+---+---+---+ | ||||
| Dictionary_ID_Flag | 0 | 1 | 2 | 3 | | ||||
+--------------------+---+---+---+---+ | ||||
| DID_Field_Size | 0 | 1 | 2 | 4 | | ||||
+--------------------+---+---+---+---+ | ||||
</artwork></figu | ||||
re></t> | ||||
</section> | ||||
</section> | ||||
<section anchor="comp_window_descr" | ||||
title="Window Descriptor"> | ||||
<t> | ||||
This provides guarantees about | ||||
the minimum | ||||
memory buffer required to | ||||
decompress a frame. This | ||||
information is important for | ||||
decoders to allocate enough | ||||
memory. | ||||
</t> | ||||
<t> | ||||
The Window_Descriptor byte is | ||||
optional. When | ||||
Single_Segment_Flag is set, | ||||
Window_Descriptor is not | ||||
present. In this case, | ||||
Window_Size is | ||||
Frame_Content_Size, which can | ||||
be any value from 0 to | ||||
2^64-1 bytes (16 ExaBytes). | ||||
<figure><artwork> | <table anchor="Dictionary-ID-Flag"> | |||
+------------+----------+----------+ | <name>Dictionary_ID_Flag</name> | |||
| Bit Number | 7-3 | 2-0 | | <tbody> | |||
+------------+----------+----------+ | <tr> | |||
| Field Name | Exponent | Mantissa | | <td>Dictionary_ID_Flag</td> | |||
+------------+----------+----------+ | <td>0</td> | |||
</artwork></figure></t> | <td>1</td> | |||
<td>2</td> | ||||
<td>3</td> | ||||
</tr> | ||||
<tr> | ||||
<td>DID_Field_Size</td> | ||||
<td>0</td> | ||||
<td>1</td> | ||||
<td>2</td> | ||||
<td>4</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
</section> | ||||
</section> | ||||
<section anchor="comp_window_descr" numbered="true" toc="default"> | ||||
<name>Window Descriptor</name> | ||||
<t> | ||||
This provides guarantees about | ||||
the minimum | ||||
memory buffer required to | ||||
decompress a frame. This | ||||
information is important for | ||||
decoders to allocate enough | ||||
memory. | ||||
</t> | ||||
<t> | ||||
The Window_Descriptor byte is | ||||
optional. When | ||||
Single_Segment_Flag is set, | ||||
Window_Descriptor is not | ||||
present. In this case, | ||||
Window_Size is | ||||
Frame_Content_Size, which can | ||||
be any value from 0 to | ||||
2<sup>64</sup> - 1 bytes (16 ExaBytes). | ||||
</t> | ||||
<t> | <table anchor="window-descriptor"> | |||
The minimum memory buffer size | <name>Window_Descriptor</name> | |||
is called Window_Size. It is | <tbody> | |||
described by the following | <tr> | |||
formulae: | <td>Bit Number</td> | |||
<td align="center">7-3</td> | ||||
<td align="center">2-0</td> | ||||
</tr> | ||||
<tr> | ||||
<td>Field Name</td> | ||||
<td>Exponent</td> | ||||
<td>Mantissa</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<figure><artwork> | <t> | |||
The minimum memory buffer size | ||||
is called Window_Size. It is | ||||
described by the following | ||||
formulas: | ||||
</t> | ||||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
windowLog = 10 + Exponent; | windowLog = 10 + Exponent; | |||
windowBase = 1 << windowLog; | windowBase = 1 << windowLog; | |||
windowAdd = (windowBase / 8) * Mantissa; | windowAdd = (windowBase / 8) * Mantissa; | |||
Window_Size = windowBase + windowAdd; | Window_Size = windowBase + windowAdd; | |||
</artwork></figure></t> | ]]></artwork> | |||
<t> | ||||
<t> | ||||
The minimum Window_Size is | The minimum Window_Size is | |||
1 KB. The maximum Window_Size | 1 KB. The maximum Window_Size | |||
is (1<<41) + | is (1<<41) + | |||
7*(1<<38) bytes, | 7*(1<<38) bytes, | |||
which is 3.75 TB. | which is 3.75 TB. | |||
</t> | </t> | |||
<t> In general, larger | ||||
<t> In general, larger | ||||
Window_Size values tend to | Window_Size values tend to | |||
improve the compression ratio, bu t | improve the compression ratio, bu t | |||
at the cost of increased memory | at the cost of increased memory | |||
usage. </t> | usage. </t> | |||
<t> | ||||
<t> | ||||
To properly decode compressed | To properly decode compressed | |||
data, a decoder will need to | data, a decoder will need to | |||
allocate a buffer of at least | allocate a buffer of at least | |||
Window_Size bytes. | Window_Size bytes. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
In order to protect | In order to protect | |||
decoders from unreasonable | decoders from unreasonable | |||
memory requirements, a decoder | memory requirements, a decoder | |||
is allowed to reject a | is allowed to reject a | |||
compressed frame that | compressed frame that | |||
requests a memory size beyond | requests a memory size beyond the | |||
decoder's authorized range. | decoder's authorized range. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
For improved interoperability, | For improved interoperability, | |||
it's recommended for decoders | it's recommended for decoders | |||
to support values of | to support values of | |||
Window_Size up to 8 MB and | Window_Size up to 8 MB and | |||
for encoders not to generate | for encoders not to generate | |||
frames requiring a Window_Size | frames requiring a Window_Size | |||
larger than 8 MB. | larger than 8 MB. | |||
It's merely a recommendation | It's merely a recommendation | |||
though, and decoders are free | though, and decoders are free | |||
to support larger or lower | to support higher or lower | |||
limits, depending on local | limits, depending on local | |||
limitations. | limitations. | |||
</t> | </t> | |||
</section> | </section> | |||
<section anchor="comp_dictionary_id" numbered="true" toc="default"> | ||||
<name>Dictionary_ID</name> | ||||
<section anchor="comp_dictionary_id" | <t> | |||
title="Dictionary_ID"> | This is a field of variable size, | |||
<t> | ||||
This is a variable size field, | ||||
which contains the ID of the | which contains the ID of the | |||
dictionary required to properly | dictionary required to properly | |||
decode the frame. This field | decode the frame. This field | |||
is optional. When it's not | is optional. When it's not | |||
present, it's up to the decoder | present, it's up to the decoder | |||
to know which dictionary | to know which dictionary | |||
to use. </t> | to use. </t> | |||
<t> | ||||
<t> | ||||
Dictionary_ID field size is | Dictionary_ID field size is | |||
provided by DID_Field_Size. | provided by DID_Field_Size. | |||
DID_Field_Size is directly | DID_Field_Size is directly | |||
derived from the value | derived from the value | |||
of Dictionary_ID_Flag. One byte | of Dictionary_ID_Flag. One byte | |||
can represent an ID 0-255; 2 | can represent an ID 0-255; 2 | |||
bytes can represent an ID | bytes can represent an ID | |||
0-65535; 4 bytes can | 0-65535; 4 bytes can | |||
represent an ID 0-4294967295. | represent an ID 0-4294967295. | |||
Format is little-endian. | Format is little-endian. | |||
</t> | </t> | |||
<t> | ||||
<t> | ||||
It is permitted to represent a | It is permitted to represent a | |||
small ID (for example, 13) with | small ID (for example, 13) with | |||
a large 4-byte dictionary | a large 4-byte dictionary | |||
ID, even if it is less | ID, even if it is less | |||
efficient. | efficient. | |||
</t> | </t> | |||
<t> | ||||
<t> | Within private environments, | |||
Within private environments, | any dictionary ID | |||
any dictionary ID | can be used. However, for | |||
can be used. However, for | frames and dictionaries | |||
frames and dictionaries | distributed in public space, | |||
distributed in public space, | Dictionary_ID must be | |||
Dictionary_ID must be | attributed carefully. | |||
attributed carefully. | The following | |||
The following | ranges are reserved for use | |||
ranges are reserved for use | only with dictionaries that | |||
only with dictionaries that | have been registered with | |||
have been registered with | IANA (see <xref target="iana_dict" format="default"/>): | |||
IANA (see <xref target="iana_dict | </t> | |||
"/>): | <dl newline="false" spacing="normal"> | |||
<dt>low range:</dt> | ||||
<list style="hanging"> | <dd> | |||
<?rfc subcompact="yes"?> | ||||
<t hangText="low range:"> | ||||
<= 32767 | <= 32767 | |||
</t> | </dd> | |||
<t hangText="high range:" | <dt>high range:</dt> | |||
> | <dd> | |||
>= (1 << 31 | >= (1 << | |||
) | 31) | |||
</t> | </dd> | |||
<?rfc subcompact="no"?> | </dl> | |||
</list> </t> | <t> Any other value for | |||
<t> Any other value for | ||||
Dictionary_ID can be used | Dictionary_ID can be used | |||
by private arrangement | by private arrangement | |||
between participants. </t> | between participants. </t> | |||
<t> Any payload presented for | ||||
<t> Any payload presented for | ||||
decompression that | decompression that | |||
references an unregistered | references an unregistered | |||
reserved dictionary ID | reserved dictionary ID | |||
results in an error. </t> | results in an error. </t> | |||
</section> | </section> | |||
<section anchor="comp_frame_content_size" numbered="true" toc="defau | ||||
<section anchor="comp_frame_content_size" | lt"> | |||
title="Frame Content Size"> | ||||
<t> | ||||
This is the original | ||||
(uncompressed) size. This | ||||
information is optional. | ||||
Frame_Content_Size uses a | ||||
variable number of bytes, | ||||
provided by FCS_Field_Size. | ||||
FCS_Field_Size is provided by | ||||
the value of | ||||
Frame_Content_Size_Flag. | ||||
FCS_Field_Size can be equal to | ||||
0 (not present), 1, 2, 4, or | ||||
8 bytes. | ||||
<figure><artwork> | ||||
+----------------+--------------+ | ||||
| FCS Field Size | Range | | ||||
+----------------+--------------+ | ||||
| 0 | unknown | | ||||
+----------------+--------------+ | ||||
| 1 | 0 - 255 | | ||||
+----------------+--------------+ | ||||
| 2 | 256 - 65791 | | ||||
+----------------+--------------+ | ||||
| 4 | 0 - 2^32 - 1 | | ||||
+----------------+--------------+ | ||||
| 8 | 0 - 2^64 - 1 | | ||||
+----------------+--------------+ | ||||
</artwork></figure></t> | ||||
<t> | ||||
Frame_Content_Size format is | ||||
little-endian. When | ||||
FCS_Field_Size is 1, 4, or 8 | ||||
bytes, the value is read | ||||
directly. When FCS_Field_Size | ||||
is 2, the offset of 256 is | ||||
added. It's allowed to | ||||
represent a small size (for | ||||
example 18) using any | ||||
compatible variant. | ||||
</t> | ||||
</section> | ||||
</section> | ||||
<section anchor="blocks" title="Blocks"> | ||||
<t> After Magic_Number and | ||||
Frame_Header, there are some number of | ||||
blocks. Each frame must have at least | ||||
1 block, but there is no upper | ||||
limit on the number of blocks per | ||||
frame. </t> | ||||
<t> The structure of a block is as | ||||
follows: | ||||
<figure><artwork> | ||||
+--------------+---------------+ | ||||
| Block_Header | Block_Content | | ||||
+--------------+---------------+ | ||||
| 3 bytes | n bytes | | ||||
+--------------+---------------+ | ||||
</artwork></figure></t> | ||||
<t> Block_Header uses 3 bytes, | ||||
written using little-endian | ||||
convention. It contains three fields: | ||||
<figure><artwork> | ||||
+------------+------------+------------+ | ||||
| Last_Block | Block_Type | Block_Size | | ||||
+------------+------------+------------+ | ||||
| bit 0 | bits 1-2 | bits 3-23 | | ||||
+------------+------------+------------+ | ||||
</artwork></figure></t> | ||||
<section title="Last_Block"> | ||||
<t> The lowest bit (Last_Block) | ||||
signals whether | ||||
this block is the last one. | ||||
The frame will end after this | ||||
last block. It may be followed | ||||
by an optional Content_Checksum | ||||
(see | ||||
<xref target="comp_zstd_frames"/> | ||||
). | ||||
</t> | ||||
</section> | ||||
<section title="Block_Type"> | ||||
<t> The next 2 bits represent | ||||
the Block_Type. There are four | ||||
block types: | ||||
<figure><artwork> | ||||
+-----------+------------------+ | ||||
| Value | Block_Type | | ||||
+-----------+------------------+ | ||||
| 0 | Raw_Block | | ||||
+-----------+------------------+ | ||||
| 1 | RLE_Block | | ||||
+-----------+------------------+ | ||||
| 2 | Compressed_Block | | ||||
+-----------+------------------+ | ||||
| 3 | Reserved | | ||||
+-----------+------------------+ | ||||
</artwork></figure></t> | ||||
<t> <list style="hanging"> | ||||
<t hangText="Raw_Block:"> | ||||
This is an | ||||
uncompressed block. | ||||
Block_Content contains | ||||
Block_Size bytes. </t> | ||||
<t hangText="RLE_Block:"> | <name>Frame_Content_Size</name> | |||
This is a single byte, | <t> | |||
repeated Block_Size | This is the original | |||
times. Block_Content | (uncompressed) size. This | |||
consists of a single | information is optional. | |||
byte. On the | Frame_Content_Size uses a | |||
decompression side, | variable number of bytes, | |||
this byte must be | provided by FCS_Field_Size. | |||
repeated Block_Size | FCS_Field_Size is provided by | |||
times. </t> | the value of | |||
Frame_Content_Size_Flag. | ||||
FCS_Field_Size can be equal to | ||||
0 (not present), 1, 2, 4, or | ||||
8 bytes. | ||||
</t> | ||||
<t hangText="Compressed_B | <table anchor="Frame-Content-Size"> | |||
lock:"> | <name>Frame_Content_Size</name> | |||
This is a compressed | <thead> | |||
block as described in | <tr> | |||
<xref target="comp_blocks | <th align="center">FCS Field Size</th> | |||
"/>. | <th>Range</th> | |||
Block_Size is the | </tr> | |||
length of | </thead> | |||
Block_Content, namely | <tbody> | |||
the compressed data. | <tr> | |||
The decompressed size | <td align="center">0</td> | |||
is not known, but its | <td>unknown</td> | |||
maximum possible value | </tr> | |||
is guaranteed (see | <tr> | |||
below). </t> | <td align="center">1</td> | |||
<td>0 - 255</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td>256 - 65791</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td>0 - 2<sup>32</sup> - 1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">8</td> | ||||
<td>0 - 2<sup>64</sup> - 1</td> | ||||
</tr> | ||||
<t hangText="Reserved:"> | </tbody> | |||
This is not a block. | </table> | |||
This value cannot be | ||||
used with the current | ||||
specification. If such | ||||
a value is present, | ||||
it is considered to be | ||||
corrupt data, and a | ||||
compliant decoder must | ||||
reject it. </t> | ||||
</list> </t> | ||||
</section> | ||||
<section title="Block_Size"> | <t> | |||
<t> The upper 21 bits of | Frame_Content_Size format is | |||
Block_Header represent the | little-endian. When | |||
Block_Size. </t> | FCS_Field_Size is 1, 4, or 8 | |||
bytes, the value is read | ||||
directly. When FCS_Field_Size | ||||
is 2, the offset of 256 is | ||||
added. It's allowed to | ||||
represent a small size (for | ||||
example, 18) using any | ||||
compatible variant. | ||||
</t> | ||||
</section> | ||||
</section> | ||||
<section anchor="blocks" numbered="true" toc="default"> | ||||
<name>Blocks</name> | ||||
<t> After Magic_Number and | ||||
Frame_Header, there are some number of | ||||
blocks. Each frame must have at least | ||||
1 block, but there is no upper | ||||
limit on the number of blocks per | ||||
frame. </t> | ||||
<t> The structure of a block is as | ||||
follows: | ||||
</t> | ||||
<t> When Block_Type is | <table anchor="block"> | |||
Compressed_Block or Raw_Block, | <name>The Structure of a Block</name> | |||
Block_Size is the size of | <thead> | |||
Block_Content (hence excluding | <tr> | |||
Block_Header). </t> | <th align="center">Block_Header</th> | |||
<th align="center">Block_Content</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">3 bytes</td> | ||||
<td align="center">n bytes</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Block_Header uses 3 bytes, | ||||
written using little-endian | ||||
convention. It contains three fields: | ||||
</t> | ||||
<t> When Block_Type is | <table anchor="block-header"> | |||
RLE_Block, since Block_Content's | <name>Block_Header</name> | |||
size is always 1, Block_Size | <thead> | |||
represents the number of times | <tr> | |||
this byte must be repeated. </t> | <th align="center">Last_Block</th> | |||
<th align="center">Block_Type</th> | ||||
<th align="center">Block_Size</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">bit 0</td> | ||||
<td align="center">bits 1-2</td> | ||||
<td align="center">bits 3-23</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Block_Size is limited by | <section numbered="true" toc="default"> | |||
Block_Maximum_Size (see below). | <name>Last_Block</name> | |||
</t> | <t> The lowest bit (Last_Block) | |||
</section> | signals whether | |||
this block is the last one. | ||||
The frame will end after this | ||||
last block. It may be followed | ||||
by an optional Content_Checksum | ||||
(see | ||||
<xref target="comp_zstd_frames" format="default"/>). | ||||
</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Block_Type</name> | ||||
<t> The next 2 bits represent | ||||
the Block_Type. There are four | ||||
block types: | ||||
</t> | ||||
<section | <table anchor="block-types"> | |||
title="Block_Content and Block_Ma | <name>The Four Block Types</name> | |||
ximum_Size"> | <thead> | |||
<t> The size of Block_Content | <tr> | |||
is limited by | <th align="center">Value</th> | |||
Block_Maximum_Size, which is | <th>Block_Type</th> | |||
the smallest of: | </tr> | |||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">Raw_Block</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">RLE_Block</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">Compressed_Block</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">Reserved</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<list style="symbols"> | <dl newline="false" spacing="normal"> | |||
<t> Window_Size </t> | <dt>Raw_Block:</dt> | |||
<t> 128 KB </t> | <dd> | |||
</list> </t> | This is an | |||
uncompressed block. | ||||
Block_Content contains | ||||
Block_Size bytes. </dd> | ||||
<dt>RLE_Block:</dt> | ||||
<dd> | ||||
This is a single byte, | ||||
repeated Block_Size | ||||
times. Block_Content | ||||
consists of a single | ||||
byte. On the | ||||
decompression side, | ||||
this byte must be | ||||
repeated Block_Size | ||||
times. </dd> | ||||
<dt>Compressed_Block:</dt> | ||||
<dd> | ||||
This is a compressed | ||||
block as described in | ||||
<xref target="comp_blocks" format="default"/>. | ||||
Block_Size is the | ||||
length of | ||||
Block_Content, namely | ||||
the compressed data. | ||||
The decompressed size | ||||
is not known, but its | ||||
maximum possible value | ||||
is guaranteed (see | ||||
below). </dd> | ||||
<dt>Reserved:</dt> | ||||
<dd> | ||||
This is not a block. | ||||
This value cannot be | ||||
used with the current | ||||
specification. If such | ||||
a value is present, | ||||
it is considered to be | ||||
corrupt data, and a | ||||
compliant decoder must | ||||
reject it. </dd> | ||||
</dl> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<t> Block_Maximum_Size is | <name>Block_Size</name> | |||
<t> The upper 21 bits of | ||||
Block_Header represent the | ||||
Block_Size. </t> | ||||
<t> When Block_Type is | ||||
Compressed_Block or Raw_Block, | ||||
Block_Size is the size of | ||||
Block_Content (hence excluding | ||||
Block_Header). </t> | ||||
<t> When Block_Type is | ||||
RLE_Block, since Block_Content's | ||||
size is always 1, Block_Size | ||||
represents the number of times | ||||
this byte must be repeated. </t> | ||||
<t> Block_Size is limited by | ||||
Block_Maximum_Size (see below). | ||||
</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Block_Content and Block_Maximum_Size</name> | ||||
<t> The size of Block_Content | ||||
is limited by | ||||
Block_Maximum_Size, which is | ||||
the smallest of: | ||||
</t> | ||||
<ul spacing="normal"> | ||||
<li> Window_Size </li> | ||||
<li> 128 KB </li> | ||||
</ul> | ||||
<t> Block_Maximum_Size is | ||||
constant for a given frame. | constant for a given frame. | |||
This maximum is applicable to | This maximum is applicable to | |||
both the decompressed size | both the decompressed size | |||
and the compressed size of any | and the compressed size of any | |||
block in the frame. </t> | block in the frame. </t> | |||
<t> The reasoning for this | ||||
<t> The reasoning for this | ||||
limit is that a decoder can | limit is that a decoder can | |||
read this information at the | read this information at the | |||
beginning of a frame and use | beginning of a frame and use | |||
it to allocate buffers. | it to allocate buffers. | |||
The guarantees on the size of | The guarantees on the size of | |||
blocks ensure that the buffers | blocks ensure that the buffers | |||
will be large enough for any | will be large enough for any | |||
following block of the valid | following block of the valid | |||
frame. </t> | frame. </t> | |||
<t> If the compressed block | <t> If the compressed block | |||
is larger than the uncompressed | is larger than the uncompressed o | |||
ne, | ||||
sending the uncompressed | sending the uncompressed | |||
block (i.e., a Raw_Block) is | block (i.e., a Raw_Block) is | |||
recommended instead. </t> | recommended instead. </t> | |||
</section> | ||||
</section> | </section> | |||
<section anchor="comp_blocks" | </section> | |||
title="Compressed Blocks"> | <section anchor="comp_blocks" numbered="true" toc="default"> | |||
<t> To decompress a compressed block, | <name>Compressed Blocks</name> | |||
<t> To decompress a compressed block, | ||||
the compressed size must be provided | the compressed size must be provided | |||
from the Block_Size field within | from the Block_Size field within | |||
Block_Header. </t> | Block_Header. </t> | |||
<t> A compressed block consists of two | <t> A compressed block consists of two | |||
sections: a Literals Section | sections: a Literals_Section | |||
(<xref target="comp_literals"/>) and | (<xref target="comp_literals" format="def | |||
ault"/>) and | ||||
a Sequences_Section | a Sequences_Section | |||
(<xref target="comp_sequences"/>). | (<xref target="comp_sequences" format="de fault"/>). | |||
The results of the two sections are | The results of the two sections are | |||
then combined to produce the | then combined to produce the | |||
decompressed data in Sequence | decompressed data in Sequence | |||
Execution | Execution | |||
(<xref target="comp_sequence_exec"/>). | (<xref target="comp_sequence_exec" format | |||
</t> | ="default"/>). | |||
</t> | ||||
<t> To decode a compressed block, the | <t> To decode a compressed block, the | |||
following elements are necessary: | following elements are necessary: | |||
<list style="symbols"> | </t> | |||
<t> Previous decoded data, up | <ul spacing="normal"> | |||
<li> Previous decoded data, up | ||||
to a distance of Window_Size, | to a distance of Window_Size, | |||
or the beginning of the Frame, | or the beginning of the Frame, | |||
whichever is smaller. | whichever is smaller. | |||
Single_Segment_Flag | Single_Segment_Flag | |||
will be set in the latter | will be set in the latter | |||
case. </t> | case. </li> | |||
<li> List of "recent offsets" | ||||
<t> List of "recent offsets" | ||||
from the previous | from the previous | |||
Compressed_Block. | Compressed_Block. | |||
</t> | </li> | |||
<li> The previous Huffman tree, | ||||
<t> The previous Huffman tree, | ||||
required by | required by | |||
Treeless_Literals_Block type. | Treeless_Literals_Block type. | |||
</t> | </li> | |||
<t> Previous Finite State Entropy (FSE) decoding | <li> Previous Finite State Entropy (FSE) decoding | |||
tables, required by | tables, required by | |||
Repeat_Mode, for each symbol | Repeat_Mode, for each symbol | |||
type (literals lengths, | type (literals length codes, | |||
match lengths, offsets). </t> | match length codes, offset codes) | |||
</list> </t> | . </li> | |||
</ul> | ||||
<t> Note that decoding tables are not | <t> Note that decoding tables are not | |||
always from the previous | always from the previous | |||
Compressed_Block: | Compressed_Block: | |||
<list style="symbols"> | </t> | |||
<t> Every decoding table can | <ul spacing="normal"> | |||
come from a dictionary. </t> | <li> Every decoding table can | |||
come from a dictionary. </li> | ||||
<t> The Huffman tree comes from | <li> The Huffman tree comes from | |||
the previous | the previous | |||
Compressed_Literals_Block. </ | Compressed_Literals_Block. </ | |||
t> | li> | |||
</list></t> | </ul> | |||
<section anchor="comp_literals" numbered="true" toc="default"> | ||||
<section anchor="comp_literals" | <name>Literals_Section_Header</name> | |||
title="Literals_Section_Header"> | <t> All literals are regrouped | |||
<t> All literals are regrouped | ||||
in the first part of the | in the first part of the | |||
block. They can be decoded | block. They can be decoded | |||
first and then copied during | first and then copied during | |||
Sequence Execution (see | Sequence Execution (see | |||
<xref target="comp_sequence_exec" />), | <xref target="comp_sequence_exec" format="default"/>), | |||
or they can be decoded on the | or they can be decoded on the | |||
flow during Sequence | flow during Sequence | |||
Execution. </t> | Execution. </t> | |||
<t> Literals can be stored | ||||
<t> Literals can be stored | ||||
uncompressed or compressed | uncompressed or compressed | |||
using Huffman prefix codes. | using Huffman prefix codes. | |||
When compressed, an optional | When compressed, an optional | |||
tree description can be | tree description can be | |||
present, followed by 1 or | present, followed by 1 or | |||
4 streams. | 4 streams. | |||
<figure><artwork> | </t> | |||
+----------------------------+ | <table anchor="compressed-literals"> | |||
| Literals_Section_Header | | <name>Compressed Literals</name> | |||
+----------------------------+ | <tbody> | |||
| [Huffman_Tree_Description] | | <tr> | |||
+----------------------------+ | <td align="center">Literals_Section_Header</td> | |||
| [Jump_Table] | | </tr> | |||
+----------------------------+ | <tr> | |||
| Stream_1 | | <td align="center">[Huffman_Tree_Description]</td> | |||
+----------------------------+ | </tr> | |||
| [Stream_2] | | <tr> | |||
+----------------------------+ | <td align="center">[Jump_Table]</td> | |||
| [Stream_3] | | </tr> | |||
+----------------------------+ | <tr> | |||
| [Stream_4] | | <td align="center">Stream_1</td> | |||
+----------------------------+ | </tr> | |||
</artwork></figure></t> | <tr> | |||
<td align="center">[Stream_2]</td> | ||||
<section title="Literals_Section_ | </tr> | |||
Header"> | <tr> | |||
<t> | <td align="center">[Stream_3]</td> | |||
This field describes | </tr> | |||
how literals are | <tr> | |||
packed. It's a | <td align="center">[Stream_4]</td> | |||
byte-aligned | </tr> | |||
variable-size bit field, | </tbody> | |||
ranging from 1 to | </table> | |||
5 bytes, using | <section numbered="true" toc="default"> | |||
little-endian | <name>Literals_Section_Header</name> | |||
convention. | <t> | |||
This field describes | ||||
<figure><artwork> | how literals are | |||
+---------------------+-----------+ | packed. It's a | |||
| Literals_Block_Type | 2 bits | | byte-aligned | |||
+---------------------+-----------+ | variable-size bit field, | |||
| Size_Format | 1-2 bits | | ranging from 1 to | |||
+---------------------+-----------+ | 5 bytes, using | |||
| Regenerated_Size | 5-20 bits | | little-endian | |||
+---------------------+-----------+ | convention. | |||
| [Compressed_Size] | 0-18 bits | | </t> | |||
+---------------------+-----------+ | ||||
</artwork></figure>< | ||||
/t> | ||||
<t> In this | ||||
representation, bits at | ||||
the top are the lowest | ||||
bits. </t> | ||||
<t> The | ||||
Literals_Block_Type | ||||
field uses the two | ||||
lowest bits of the | ||||
first byte, describing | ||||
four different block | ||||
types: | ||||
<figure><artwork> | ||||
+---------------------------+-------+ | ||||
| Literals_Block_Type | Value | | ||||
+---------------------------+-------+ | ||||
| Raw_Literals_Block | 0 | | ||||
+---------------------------+-------+ | ||||
| RLE_Literals_Block | 1 | | ||||
+---------------------------+-------+ | ||||
| Compressed_Literals_Block | 2 | | ||||
+---------------------------+-------+ | ||||
| Treeless_Literals_Block | 3 | | ||||
+---------------------------+-------+ | ||||
</artwork></figure>< | ||||
/t> | ||||
<t> <list style="hanging" | ||||
> | ||||
<t hangText="Raw_Lite | ||||
rals_Block:"> | ||||
Literals are | ||||
stored | ||||
uncompressed. | ||||
Literals_Section_ | ||||
Content | ||||
is | ||||
Regenerated_Size. | ||||
</t> | ||||
<t hangText="RLE_Lite | ||||
rals_Block:"> | ||||
Literals | ||||
consist of a | ||||
single-byte | ||||
value repeated | ||||
Regenerated_Size | ||||
times. | ||||
Literals_Section_ | ||||
Content | ||||
is 1. </t> | ||||
<t hangText="Compress | ||||
ed_Literals_Block:"> | ||||
This is a | ||||
standard | ||||
Huffman-compresse | ||||
d | ||||
block, starting | ||||
with a Huffman | ||||
tree | ||||
description. | ||||
See details | ||||
below. Literals_ | ||||
Section_Content | ||||
is | ||||
Compressed_Size. | ||||
</t> | ||||
<t hangText="Treeless | ||||
_Literals_Block:"> | ||||
This is a | ||||
Huffman-compresse | ||||
d | ||||
block, using the | ||||
Huffman tree | ||||
from the previous | ||||
Compressed_Litera | ||||
ls_Block, | ||||
or a dictionary | ||||
if there is no pr | ||||
evious | ||||
Huffman-compresse | ||||
d | ||||
literals block. | ||||
Huffman_Tree_Desc | ||||
ription | ||||
will be | ||||
skipped. Note | ||||
that if this | ||||
mode is | ||||
triggered | ||||
without any | ||||
previous | ||||
Huffman-table | ||||
in the frame | ||||
(or | ||||
dictionary, per | ||||
<xref target="com | ||||
p_dict"/>), | ||||
it should be | ||||
treated as data | ||||
corruption. | ||||
Literals_Section_ | ||||
Content | ||||
is Compressed_Siz | ||||
e. | ||||
</t> | ||||
</list> </t> | ||||
<t> The Size_Format is | ||||
divided into two | ||||
families: | ||||
<list style="symbols" | ||||
> | ||||
<t> | ||||
For | ||||
Raw_Literals_Bloc | ||||
k | ||||
and | ||||
RLE_Literals_Bloc | ||||
k, | ||||
it's only | ||||
necessary to | ||||
decode | ||||
Regenerated_Size. | ||||
There is no | ||||
Compressed_Size | ||||
field. </t> | ||||
<t> | ||||
For | ||||
Compressed_Block | ||||
and | ||||
Treeless_Literals | ||||
_Block, | ||||
it's required | ||||
to decode both | ||||
Compressed_Size | ||||
and | ||||
Regenerated_Size | ||||
(the | ||||
decompressed | ||||
size). It's | ||||
also necessary | ||||
to decode the | ||||
number of | ||||
streams | ||||
(1 or 4). </t> | ||||
</list> </t> | ||||
<t> For values | ||||
spanning several bytes, | ||||
the convention is | ||||
little endian. </t> | ||||
<t> Size_Format for | ||||
Raw_Literals_Block | ||||
and RLE_Literals_Block | ||||
uses 1 or 2 bits. Its | ||||
value is (Literals_Sectio | ||||
n_Header[0]>>2) & 0x3. | ||||
<list style="hanging" | ||||
> | ||||
<t hangText="Size | ||||
_Format == 00 or 10:"> | ||||
Size_Format | ||||
uses 1 bit. | ||||
Regenerated_Size | ||||
uses 5 bits | ||||
(value 0-31). | ||||
Literals_Section_ | ||||
Header | ||||
uses 1 byte. | ||||
Regenerated_Size | ||||
= Literal_Section | ||||
_Header[0]>>3. | ||||
</t> | ||||
<t hangText="Size | ||||
_Format == 01:"> | ||||
Size_Format | ||||
uses 2 bits. | ||||
Regenerated_Size | ||||
uses 12 bits | ||||
(values 0-4095). | ||||
Literals_Section_ | ||||
Header | ||||
uses 2 bytes. | ||||
Regenerated_Size | ||||
= | ||||
(Literals_Section | ||||
_Header[0]>>4) | ||||
+ | ||||
(Literals_Section | ||||
_Header[1]<<4). | ||||
</t> | ||||
<t hangText="Size | ||||
_Format == 11:"> | ||||
Size_Format | ||||
uses 2 bits. | ||||
Regenerated_Size | ||||
uses 20 bits | ||||
(values | ||||
0-1048575). | ||||
Literals_Section_ | ||||
Header | ||||
uses 3 | ||||
bytes. | ||||
Regenerated_Size | ||||
= | ||||
(Literals_Section | ||||
_Header[0]>>4) | ||||
+ | ||||
(Literals_Section | ||||
_Header[1]<<4) | ||||
+ | ||||
(Literals_Section | ||||
_Header[2]<<12) | ||||
</t> | ||||
</list> </t> | <table anchor="Literals_Section_Header"> | |||
<name>Literals_Section_Header</name> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">Literals_Block_Type</td> | ||||
<td align="center">2 bits</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Size_Format</td> | ||||
<td align="center">1-2 bits</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Regenerated_Size</td> | ||||
<td align="center">5-20 bits</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">[Compressed_Size] </td> | ||||
<td align="center">0-18 bits</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Only Stream_1 is | <t> In this | |||
present for these | representation, bits at | |||
cases. Note that it is | the top are the lowest | |||
permitted to represent | bits. </t> | |||
a short value (for | <t> The | |||
example, 13) using a | Literals_Block_Type | |||
long format, even if | field uses the two | |||
it's less efficient. | lowest bits of the | |||
</t> | first byte, describing | |||
four different block | ||||
types: | ||||
</t> | ||||
<t> Size_Format for | <table anchor="Literals_Block_Type"> | |||
Compressed_Literals_Block | <name>Literals_Block_Type</name> | |||
and | <thead> | |||
Treeless_Literals_Block | <tr> | |||
always uses 2 bits. | <th align="center">Literals_Block_Type</th> | |||
<list style="hanging" | <th align="center">Value</th> | |||
> | </tr> | |||
<t hangText="Size | </thead> | |||
_Format == 00:"> | <tbody> | |||
<tr> | ||||
<td align="center">Raw_Literals_Block</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">RLE_Literals_Block</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Compressed_Literals_Block</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Treeless_Literals_Block</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Raw_Literals_Block:</dt> | ||||
<dd> | ||||
Literals are | ||||
stored | ||||
uncompressed. | ||||
Literals_Section_Content | ||||
is | ||||
Regenerated_Size. | ||||
</dd> | ||||
<dt>RLE_Literals_Block:</dt> | ||||
<dd> | ||||
Literals consist of a single-byte value repeated | ||||
Regenerated_Size times. Literals_Section_Content is | ||||
1. </dd> | ||||
<dt>Compressed_Literals_Block:</dt> | ||||
<dd> | ||||
This is a standard Huffman-compressed | ||||
block, starting with a Huffman tree description. | ||||
See details below. Literals_Section_Content | ||||
is Compressed_Size. | ||||
</dd> | ||||
<dt>Treeless_Literals_Block:</dt> | ||||
<dd> | ||||
This is a | ||||
Huffman-compressed | ||||
block, using the | ||||
Huffman tree | ||||
from the previous | ||||
Compressed_Literals_Block | ||||
or a dictionary | ||||
if there is no previous | ||||
Huffman-compressed | ||||
literals block. | ||||
Huffman_Tree_Description | ||||
will be | ||||
skipped. Note that if this mode is triggered without any previous Huffman table | ||||
in the frame (or dictionary, per <xref target="comp_dict" format="default"/>), | ||||
it should be treated as data corruption. Literals_Section_Content is | ||||
Compressed_Size. | ||||
</dd> | ||||
</dl> | ||||
<t> The Size_Format is divided into two families: | ||||
</t> | ||||
<ul spacing="normal"> | ||||
<li> | ||||
For Raw_Literals_Block and RLE_Literals_Block, | ||||
it's only necessary to decode Regenerated_Size. | ||||
There is no Compressed_Size field.</li> | ||||
<li> | ||||
For Compressed_Block and Treeless_Literals_Block, | ||||
it's required to decode both Compressed_Size | ||||
and Regenerated_Size (the decompressed | ||||
size). It's also necessary to decode the | ||||
number of streams (1 or 4). </li> | ||||
</ul> | ||||
<t> For values | ||||
spanning several bytes, | ||||
the convention is | ||||
little endian. </t> | ||||
<t> Size_Format for | ||||
Raw_Literals_Block | ||||
and RLE_Literals_Block | ||||
uses 1 or 2 bits. Its | ||||
value is (Literals_Section_Header[0]>>2) & 0x3. | ||||
</t> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Size_Format == 00 or 10:</dt> | ||||
<dd> | ||||
Size_Format | ||||
uses 1 bit. | ||||
Regenerated_Size | ||||
uses 5 bits | ||||
(value 0-31). | ||||
Literals_Section_Header | ||||
uses 1 byte. | ||||
Regenerated_Size | ||||
= Literal_Section_Header[0]>>3. | ||||
</dd> | ||||
<dt>Size_Format == 01:</dt> | ||||
<dd> | ||||
Size_Format | ||||
uses 2 bits. | ||||
Regenerated_Size | ||||
uses 12 bits | ||||
(values 0-4095). | ||||
Literals_Section_Header | ||||
uses 2 bytes. | ||||
Regenerated_Size | ||||
= | ||||
(Literals_Section_Header[0]>>4) | ||||
+ | ||||
(Literals_Section_Header[1]<<4). | ||||
</dd> | ||||
<dt>Size_Format == 11:</dt> | ||||
<dd> | ||||
Size_Format | ||||
uses 2 bits. | ||||
Regenerated_Size | ||||
uses 20 bits | ||||
(values | ||||
0-1048575). | ||||
Literals_Section_Header | ||||
uses 3 | ||||
bytes. | ||||
Regenerated_Size | ||||
= | ||||
(Literals_Section_Header[0]>>4) | ||||
+ | ||||
(Literals_Section_Header[1]<<4) | ||||
+ | ||||
(Literals_Section_Header[2]<<12). | ||||
</dd> | ||||
</dl> | ||||
<t> Only Stream_1 is | ||||
present for these | ||||
cases. Note that it is | ||||
permitted to represent | ||||
a short value (for | ||||
example, 13) using a | ||||
long format, even if | ||||
it's less efficient. | ||||
</t> | ||||
<t> Size_Format for | ||||
Compressed_Literals_Block | ||||
and | ||||
Treeless_Literals_Block | ||||
always uses 2 bits. | ||||
</t> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Size_Format == 00:</dt> | ||||
<dd> | ||||
A single | A single | |||
stream. Both | stream. Both | |||
Regenerated_Size | Regenerated_Size | |||
and Compressed_Si ze | and Compressed_Si ze | |||
use 10 bits | use 10 bits | |||
(values | (values | |||
0-1023). | 0-1023). | |||
Literals_Section_ Header | Literals_Section_ Header | |||
uses 3 | uses 3 | |||
bytes. | bytes. | |||
</t> | </dd> | |||
<dt>Size_Format == 01:</dt> | ||||
<t hangText="Size | <dd> | |||
_Format == 01:"> | ||||
4 streams. | 4 streams. | |||
Both | Both | |||
Regenerated_Size | Regenerated_Size | |||
and | and | |||
Compressed_Size | Compressed_Size | |||
use 10 bits | use 10 bits | |||
(values | (values | |||
0-1023). | 0-1023). | |||
Literals_Section_ Header | Literals_Section_ Header | |||
uses 3 | uses 3 | |||
bytes. | bytes. | |||
</t> | </dd> | |||
<dt>Size_Format == 10:</dt> | ||||
<t hangText="Size | <dd> | |||
_Format == 10:"> | ||||
4 streams. | 4 streams. | |||
Both | Both | |||
Regenerated_Size | Regenerated_Size | |||
and | and | |||
Compressed_Size | Compressed_Size | |||
use 14 bits | use 14 bits | |||
(values | (values | |||
0-16383). | 0-16383). | |||
Literals_Section_ Header | Literals_Section_ Header | |||
uses 4 | uses 4 | |||
bytes. | bytes. | |||
</t> | </dd> | |||
<dt>Size_Format == 11:</dt> | ||||
<t hangText="Size | <dd> | |||
_Format == 11:"> | ||||
4 streams. | 4 streams. | |||
Both | Both | |||
Regenerated_Size | Regenerated_Size | |||
and | and | |||
Compressed_Size | Compressed_Size | |||
use 18 bits | use 18 bits | |||
(values | (values | |||
0-262143). | 0-262143). | |||
Literals_Section_ Header | Literals_Section_ Header | |||
uses 5 | uses 5 | |||
bytes. | bytes. | |||
</t> | </dd> | |||
</list> </t> | </dl> | |||
<t> Both the | ||||
<t> Both the | ||||
Compressed_Size and | Compressed_Size and | |||
Regenerated_Size fields | Regenerated_Size fields | |||
follow little-endian | follow little-endian | |||
convention. Note that | convention. Note that | |||
Compressed_Size | Compressed_Size | |||
includes the size of | includes the size of | |||
the Huffman_Tree_Descript ion | the Huffman_Tree_Descript ion | |||
when it is | when it is | |||
present. </t> | present. </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Raw_Literals_Bloc | <name>Raw_Literals_Block</name> | |||
k"> | <t> | |||
<t> | ||||
The data in Stream_1 is | The data in Stream_1 is | |||
Regenerated_Size bytes | Regenerated_Size bytes | |||
long. It contains | long. It contains | |||
the raw literals data | the raw literals data | |||
to be used during | to be used during | |||
Sequence Execution | Sequence Execution | |||
(<xref target="comp_seque | (<xref target="comp_seque | |||
nces"/>). | nces" format="default"/>). | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="RLE_Literals_Bloc | <name>RLE_Literals_Block</name> | |||
k"> | <t> | |||
<t> | ||||
Stream_1 consists of a | Stream_1 consists of a | |||
single byte that | single byte that | |||
should be repeated | should be repeated | |||
Regenerated_Size times | Regenerated_Size times | |||
to generate the | to generate the | |||
decoded literals. | decoded literals. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<name>Compressed_Literals_Block and Treeless_Literals_Block</nam | ||||
e> | ||||
<t> | ||||
<section title="Compressed_Litera | ||||
ls_Block and Treeless_Literals_Block"> | ||||
<t> | ||||
Both of these modes | Both of these modes | |||
contain Huffman-encoded | contain Huffman-coded | |||
data. | data. | |||
For Treeless_Literals_Blo ck, | For Treeless_Literals_Blo ck, | |||
the Huffman table comes f rom | the Huffman table comes f rom | |||
the previously | the previously | |||
compressed literals | compressed literals | |||
block, or from a | block, or from a | |||
dictionary; | dictionary; | |||
see <xref target="comp_di | see <xref target="comp_di | |||
ct"/>. | ct" format="default"/>. | |||
</t> | </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Huffman_Tree_Desc | <name>Huffman_Tree_Description</name> | |||
ription"> | <t> | |||
<t> | ||||
This section is | This section is | |||
only present | only present | |||
when the | when the | |||
Literals_Block_Type type | Literals_Block_Type type | |||
is | is | |||
Compressed_Literals_Block | Compressed_Literals_Block | |||
(2). The format | (2). The format | |||
of Huffman_Tree_Descripti on | of Huffman_Tree_Descripti on | |||
can be found in | can be found in | |||
<xref target="huffman_tre e_desc"/>. | <xref target="huffman_tre e_desc" format="default"/>. | |||
The size of | The size of | |||
Huffman_Tree_Description | Huffman_Tree_Description | |||
is determined | is determined | |||
during the | during the | |||
decoding process. It | decoding process. It | |||
must be used | must be used | |||
to determine | to determine | |||
where streams | where streams | |||
begin. | begin. | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
Total_Streams_Size = Compressed_Size | Total_Streams_Size = Compressed_Size | |||
- Huffman_Tree_Description_Size | - Huffman_Tree_Description_Size | |||
</artwork></figu | ]]></artwork> | |||
re></t> | </section> | |||
</section> | <section numbered="true" toc="default"> | |||
<name>Jump_Table</name> | ||||
<section title="Jump_Table"> | <t> The Jump_Table | |||
<t> The Jump_Table | ||||
is only present | is only present | |||
when there are | when there are | |||
4 Huffman-coded | 4 Huffman-coded | |||
streams. </t> | streams. </t> | |||
<t> (Reminder: | ||||
<t> (Reminder: | ||||
Huffman-compresse d | Huffman-compresse d | |||
data | data | |||
consists of | consists of | |||
either 1 or | either 1 or | |||
4 | 4 | |||
Huffman-coded | Huffman-coded | |||
streams.) | streams.) | |||
</t> | </t> | |||
<t> If only 1 | ||||
<t> If only 1 | ||||
stream is | stream is | |||
present, it is | present, it is | |||
a single | a single | |||
bitstream | bitstream | |||
occupying the | occupying the | |||
entire | entire | |||
remaining | remaining | |||
portion of the | portion of the | |||
literals | literals | |||
block, encoded | block, encoded | |||
as described | as described | |||
within | within | |||
<xref target="huf | <xref target="huf | |||
fman_coded_streams"/>. | fman_coded_streams" format="default"/>. | |||
</t> | </t> | |||
<t> If there are | ||||
<t> If there are | ||||
4 streams, | 4 streams, | |||
Literals_Section_Header | Literals_Section_Header | |||
only provides | only provides | |||
enough | enough | |||
information to | information to | |||
know the | know the | |||
decompressed | decompressed | |||
and compressed | and compressed | |||
sizes of all | sizes of all | |||
4 streams | 4 streams | |||
skipping to change at line 1410 ¶ | skipping to change at line 1437 ¶ | |||
last stream, | last stream, | |||
which may be | which may be | |||
up to 3 | up to 3 | |||
bytes smaller, | bytes smaller, | |||
to reach a | to reach a | |||
total | total | |||
decompressed | decompressed | |||
size as | size as | |||
specified in | specified in | |||
Regenerated_Size. </t> | Regenerated_Size. </t> | |||
<t> | ||||
<t> | ||||
The compressed | The compressed | |||
size of each | size of each | |||
stream is | stream is | |||
provided | provided | |||
explicitly in | explicitly in | |||
the Jump_Table. | the Jump_Table. | |||
The Jump_Table | The Jump_Table | |||
is 6 bytes long and | is 6 bytes long and | |||
consists of three | consists of three | |||
2-byte | 2-byte | |||
skipping to change at line 1433 ¶ | skipping to change at line 1459 ¶ | |||
fields, | fields, | |||
describing the | describing the | |||
compressed | compressed | |||
sizes of the | sizes of the | |||
first 3 | first 3 | |||
streams. | streams. | |||
Stream4_Size | Stream4_Size | |||
is computed | is computed | |||
from | from | |||
Total_Streams_Siz e | Total_Streams_Siz e | |||
minus sizes of | minus the sizes o f | |||
other streams. | other streams. | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
Stream4_Size = Total_Streams_Size - 6 | Stream4_Size = Total_Streams_Size - 6 | |||
- Stream1_Size - Stream2_Size | - Stream1_Size - Stream2_Size | |||
- Stream3_Size | - Stream3_Size | |||
</artwork></figu | ]]></artwork> | |||
re></t> | <t> | |||
<t> | ||||
Note that if | Note that if | |||
Stream1_Size + | Stream1_Size + | |||
Stream2_Size + | Stream2_Size + | |||
Stream3_Size | Stream3_Size | |||
exceeds | exceeds | |||
Total_Streams_Siz e, | Total_Streams_Siz e, | |||
the data are | the data are | |||
considered | considered | |||
corrupted. </t> | corrupted. </t> | |||
<t> | <t> | |||
Each of these | Each of these | |||
4 bitstreams | 4 bitstreams | |||
is then decoded | is then decoded | |||
independently | independently | |||
as a | as a | |||
Huffman-Coded | Huffman-coded | |||
stream, as | stream, as | |||
described in | described in | |||
<xref target="huf | <xref target="huf | |||
fman_coded_streams"/>. | fman_coded_streams" format="default"/>. | |||
</t> | </t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="comp_sequences" numbered="true" toc="default"> | ||||
<section anchor="comp_sequences" | <name>Sequences_Section</name> | |||
title="Sequences_Section"> | <t> A compressed block is a | |||
<t> A compressed block is a | ||||
succession of sequences. | succession of sequences. | |||
A sequence is a literal | A sequence is a literal | |||
copy command, followed by | copy command, followed by | |||
a match copy command. A | a match copy command. A | |||
literal copy command | literal copy command | |||
specifies a length. It is | specifies a length. It is | |||
the number of bytes to be | the number of bytes to be | |||
copied (or extracted) from | copied (or extracted) from | |||
the Literals Section. | the Literals_Section. | |||
A match copy command | A match copy command | |||
specifies an offset and a | specifies an offset and a | |||
length. </t> | length. </t> | |||
<t> When all sequences are | ||||
<t> When all sequences are | ||||
decoded, if there are | decoded, if there are | |||
literals left in the | literals left in the | |||
literals section, these | Literals_Section, these | |||
bytes are added at the | bytes are added at the | |||
end of the block. </t> | end of the block. </t> | |||
<t> This is described in more | ||||
<t> This is described in more | ||||
detail in | detail in | |||
<xref target="comp_sequence_e | <xref target="comp_sequence_e | |||
xec"/>. </t> | xec" format="default"/>. </t> | |||
<t> The Sequences_Section | ||||
<t> The Sequences_Section | ||||
regroups all symbols | regroups all symbols | |||
required to decode | required to decode | |||
commands. There are three | commands. There are three | |||
symbol types: literals | symbol types: literals | |||
lengths, offsets, and match | length codes, offset codes, a | |||
lengths. They are encoded | nd match | |||
length codes. They are encod | ||||
ed | ||||
together, interleaved, in | together, interleaved, in | |||
a single "bitstream". </t> | a single "bitstream". </t> | |||
<t> The Sequences_Section | ||||
<t> The Sequences_Section | ||||
starts by a header, | starts by a header, | |||
followed by optional | followed by optional | |||
probability tables for | probability tables for | |||
each symbol type, followed | each symbol type, followed | |||
by the bitstream. </t> | by the bitstream. </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
<t> <figure><artwork> | ||||
Sequences_Section_Header | Sequences_Section_Header | |||
[Literals_Length_Table] | [Literals_Length_Table] | |||
[Offset_Table] | [Offset_Table] | |||
[Match_Length_Table] | [Match_Length_Table] | |||
bitStream | bitStream | |||
</artwork></figure> </t> | ]]></artwork> | |||
<t> To decode the | ||||
<t> To decode the | ||||
Sequences_Section, it's | Sequences_Section, it's | |||
necessary to know its | necessary to know its | |||
size. This size is deduced | size. This size is deduced | |||
from the size of the Literals _Section: | from the size of the Literals _Section: | |||
Sequences_Section_Size = Bloc | Sequences_Section_Size = Bloc | |||
k_Size - Literals_Section_Header - Literals_Section_Content </t> | k_Size - Literals_Section_Header - Literals_Section_Content. </t> | |||
<section anchor="seq_sec_hdr" numbered="true" toc="default"> | ||||
<section title="Sequences_Section | <name>Sequences_Section_Header</name> | |||
_Header" anchor="seq_sec_hdr"> | <t> This header | |||
<t> This header | ||||
consists of two | consists of two | |||
items: | items: | |||
<list style="symbols" | </t> | |||
> | <ul spacing="normal"> | |||
<t> Number_of_Seq | <li> Number_of_Sequences </li> | |||
uences </t> | <li> Symbol_Compression_Modes </li> | |||
<t> Symbol_Compre | </ul> | |||
ssion_Modes </t> | <t> Number_of_Sequences | |||
</list> </t> | is a variable size | |||
field using between | ||||
<t> Number_of_Sequences | 1 and 3 | |||
is a variable size | bytes. If the | |||
field using between | first byte is | |||
1 and 3 | "byte0": | |||
bytes. If the | ||||
first byte is | ||||
"byte0": | ||||
<list style="symbols" | </t> | |||
> | <ul spacing="normal"> | |||
<t> if | <li> if | |||
(byte0 == | (byte0 == | |||
0): there | 0): there | |||
are no | are no | |||
sequences. | sequences. | |||
The | The | |||
sequence | sequence | |||
section | section | |||
stops here. | stops here. | |||
Decompressed | Decompressed | |||
content is | content is | |||
defined | defined | |||
entirely as | entirely as | |||
Literals | Literals_Sect | |||
Section | ion | |||
content. | content. | |||
The FSE | The FSE | |||
tables used | tables used | |||
in Repeat_Mod e | in Repeat_Mod e | |||
are not | are not | |||
updated. </t> | updated. </li | |||
> | ||||
<t> if (byte0 | <li> if (byte0 | |||
< 128): | < 128): | |||
Number_of_Seq uences | Number_of_Seq uences | |||
= byte0. | = byte0. | |||
Uses 1 | Uses 1 | |||
byte. </t> | byte. </li> | |||
<li> if (byte0 | ||||
<t> if (byte0 | ||||
< 255): | < 255): | |||
Number_of_Seq uences | Number_of_Seq uences | |||
= | = | |||
((byte0 - 128 ) | ((byte0 - 128 ) | |||
<< 8) + | << 8) + | |||
byte1. Uses | byte1. Uses | |||
2 bytes. </t> | 2 bytes. </li | |||
> | ||||
<t> if (byte0 | <li> if (byte0 | |||
== 255): | == 255): | |||
Number_of_Seq uences | Number_of_Seq uences | |||
= byte1 + | = byte1 + | |||
(byte2 <&l t; 8) | (byte2 <&l t; 8) | |||
+ 0x7F00. | + 0x7F00. | |||
Uses 3 | Uses 3 | |||
bytes. </t> | bytes. </li> | |||
</list> </t> | </ul> | |||
<t> Symbol_Compression_Modes | ||||
<t> Symbol_Compression_Mo | is a single byte, | |||
des | defining the | |||
is a single byte, | compression mode of | |||
defining the | each symbol type. | |||
compression mode of | </t> | |||
each symbol type. | ||||
<figure><artwork> | ||||
+-------------+----------------------+ | ||||
| Bit Number | Field Name | | ||||
+-------------+----------------------+ | ||||
| 7-6 | Literal_Lengths_Mode | | ||||
+-------------+----------------------+ | ||||
| 5-4 | Offsets_Mode | | ||||
+-------------+----------------------+ | ||||
| 3-2 | Match_Lengths_Mode | | ||||
+-------------+----------------------+ | ||||
| 1-0 | Reserved | | ||||
+-------------+----------------------+ | ||||
</artwork></figure> | ||||
</t> | ||||
<t> The last field, | <table anchor="Symbol_Compression_Modes"> | |||
Reserved, must be | <name>Symbol_Compression_Modes</name> | |||
all zeroes. </t> | <thead> | |||
<tr> | ||||
<th align="center">Bit Number</th> | ||||
<th align="center">Field Name</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">7-6</td> | ||||
<td align="center">Literal_Lengths_Mode</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5-4</td> | ||||
<td align="center">Offsets_Mode</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3-2</td> | ||||
<td align="center">Match_Lengths_Mode</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1-0</td> | ||||
<td align="center">Reserved</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Literals_Lengths_Mode | <t> The last field, | |||
, | Reserved, must be | |||
Offsets_Mode, and | all zeroes. </t> | |||
Match_Lengths_Mode | <t> Literals_Lengths_Mode, | |||
define the | Offsets_Mode, and | |||
Compression_Mode of | Match_Lengths_Mode | |||
literals lengths, | define the | |||
offsets, and match | Compression_Mode of | |||
lengths symbols, | literals length codes, | |||
respectively. They | offset codes, and match | |||
follow the same | length codes, | |||
enumeration: | respectively. They | |||
follow the same | ||||
enumeration: | ||||
</t> | ||||
<figure><artwork> | <table anchor="literals"> | |||
+-------+---------------------+ | <name>Literals_Lengths_Mode, Offsets_Mode, and Match_Lengths_Mode</name> | |||
| Value | Compression_Mode | | <thead> | |||
+-------+---------------------+ | <tr> | |||
| 0 | Predefined_Mode | | <th align="center">Value</th> | |||
+-------+---------------------+ | <th align="center">Compression_Mode</th> | |||
| 1 | RLE_Mode | | </tr> | |||
+-------+---------------------+ | </thead> | |||
| 2 | FSE_Compressed_Mode | | <tbody> | |||
+-------+---------------------+ | <tr> | |||
| 3 | Repeat_Mode | | <td align="center">0</td> | |||
+-------+---------------------+ | <td align="center">Predefined_Mode</td> | |||
</artwork></figure>< | </tr> | |||
/t> | <tr> | |||
<td align="center">1</td> | ||||
<td align="center">RLE_Mode</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">FSE_Compressed_Mode</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">Repeat_Mode</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> <list style="hanging" | <dl newline="false" spacing="normal"> | |||
> | <dt>Predefined_Mode:</dt> | |||
<t hangText="Predefin | <dd> | |||
ed_Mode:"> | ||||
A predefined | A predefined | |||
FSE | FSE | |||
(see <xref target ="comp_fse"/>) | (see <xref target ="comp_fse" format="default"/>) | |||
distribution | distribution | |||
table is used, as | table is used, as | |||
defined in | defined in | |||
<xref target="def ault_dist"/>. | <xref target="def ault_dist" format="default"/>. | |||
No distribution | No distribution | |||
table will be | table will be | |||
present. </t> | present. </dd> | |||
<dt>RLE_Mode:</dt> | ||||
<t hangText="RLE_Mode | <dd> | |||
:"> | ||||
The table | The table | |||
description | description | |||
consists of a | consists of a | |||
single byte, | single byte, | |||
which contains | which contains | |||
the symbol's | the symbol's | |||
value. This | value. This | |||
symbol will | symbol will | |||
be used for | be used for | |||
all sequences. | all sequences. | |||
</t> | </dd> | |||
<dt>FSE_Compressed_Mode:</dt> | ||||
<t hangText="FSE_Comp | <dd> | |||
ressed_Mode:"> | ||||
Standard FSE | Standard FSE | |||
compression. A | compression. A | |||
distribution | distribution | |||
table will be | table will be | |||
present. The | present. The | |||
format of this | format of this | |||
distribution | distribution | |||
table is | table is | |||
described in | described in | |||
<xref target="com p_fse_table"/>. | <xref target="com p_fse_table" format="default"/>. | |||
Note that the | Note that the | |||
maximum allowed | maximum allowed | |||
accuracy log | accuracy log | |||
for literals | for literals | |||
length and | length code and | |||
match length | match length code | |||
tables is 9, | tables is 9, | |||
and the | and the | |||
maximum | maximum | |||
accuracy log | accuracy log | |||
for the | for the | |||
offsets table | offset code table | |||
is 8. | is 8. | |||
This mode must | This mode must | |||
not be used when | not be used when | |||
only one symbol | only one symbol | |||
is present; | is present; | |||
RLE_Mode should | RLE_Mode should | |||
be used instead | be used instead | |||
(although any | (although any | |||
other mode | other mode | |||
will work). </t> | will work). </dd> | |||
<dt>Repeat_Mode:</dt> | ||||
<t hangText="Repeat_M | <dd> | |||
ode:"> | ||||
The table used | The table used | |||
in the previous | in the previous | |||
Compressed_Block | Compressed_Block | |||
with | with | |||
Number_Of_Sequenc es > 0 | Number_Of_Sequenc es > 0 | |||
will be | will be | |||
used again, or | used again, or | |||
if this is the | if this is the | |||
first block, | first block, | |||
the table in | the table in | |||
the dictionary | the dictionary | |||
will be used. | will be used. | |||
Note that this | Note that this | |||
includes | includes | |||
RLE_Mode, | RLE_Mode, | |||
skipping to change at line 1745 ¶ | skipping to change at line 1790 ¶ | |||
No distribution | No distribution | |||
table will be | table will be | |||
present. | present. | |||
If this mode is | If this mode is | |||
used without | used without | |||
any previous | any previous | |||
sequence table | sequence table | |||
in the frame | in the frame | |||
(or dictionary; | (or dictionary; | |||
see | see | |||
<xref target="com p_dict"/>) | <xref target="com p_dict" format="default"/>) | |||
to repeat, this | to repeat, this | |||
should be | should be | |||
treated as | treated as | |||
corruption. </t> | corruption. </dd> | |||
</dl> | ||||
</list></t> | <section anchor="codes_lengths_offsets" numbered="true" toc="def | |||
ault"> | ||||
<section title="Sequence | <name>Sequence Codes for Lengths and Offsets</name> | |||
Codes for Lengths and Offsets" anchor="codes_lengths_offsets"> | <t> Each symbol is a | |||
<t> Each symbol is a | ||||
code in its own | code in its own | |||
context, which | context, which | |||
specifies Baseline | specifies Baseline | |||
and Number_of_Bits | and Number_of_Bits | |||
to add. Codes are | to add. Codes are | |||
FSE compressed | FSE compressed | |||
and interleaved | and interleaved | |||
with raw additional | with raw additional | |||
bits in the same | bits in the same | |||
bitstream. </t> | bitstream. </t> | |||
<t> Literals length | ||||
<t> Literals length | ||||
codes are values | codes are values | |||
ranging from 0 to | ranging from 0 to | |||
35 inclusive. They | 35, inclusive. They | |||
define lengths from | define lengths from | |||
0 to 131071 bytes. | 0 to 131071 bytes. | |||
The literals length | The literals length | |||
is equal to the | is equal to the | |||
decoded Baseline | decoded Baseline | |||
plus the result of | plus the result of | |||
reading | reading | |||
Number_of_Bits bits | Number_of_Bits bits | |||
from the bitstream, | from the bitstream, | |||
as a little-endian | as a little-endian | |||
value. | value. | |||
<figure><artwork> | </t> | |||
+----------------------+----------+----------------+ | ||||
| Literals_Length_Code | Baseline | Number_of_Bits | | ||||
+----------------------+----------+----------------+ | ||||
| 0-15 | length | 0 | | ||||
+----------------------+----------+----------------+ | ||||
| 16 | 16 | 1 | | ||||
+----------------------+----------+----------------+ | ||||
| 17 | 18 | 1 | | ||||
+----------------------+----------+----------------+ | ||||
| 18 | 20 | 1 | | ||||
+----------------------+----------+----------------+ | ||||
| 19 | 22 | 1 | | ||||
+----------------------+----------+----------------+ | ||||
| 20 | 24 | 2 | | ||||
+----------------------+----------+----------------+ | ||||
| 21 | 28 | 2 | | ||||
+----------------------+----------+----------------+ | ||||
| 22 | 32 | 3 | | ||||
+----------------------+----------+----------------+ | ||||
| 23 | 40 | 3 | | ||||
+----------------------+----------+----------------+ | ||||
| 24 | 48 | 4 | | ||||
+----------------------+----------+----------------+ | ||||
| 25 | 64 | 6 | | ||||
+----------------------+----------+----------------+ | ||||
| 26 | 128 | 7 | | ||||
+----------------------+----------+----------------+ | ||||
| 27 | 256 | 8 | | ||||
+----------------------+----------+----------------+ | ||||
| 28 | 512 | 9 | | ||||
+----------------------+----------+----------------+ | ||||
| 29 | 1024 | 10 | | ||||
+----------------------+----------+----------------+ | ||||
| 30 | 2048 | 11 | | ||||
+----------------------+----------+----------------+ | ||||
| 31 | 4096 | 12 | | ||||
+----------------------+----------+----------------+ | ||||
| 32 | 8192 | 13 | | ||||
+----------------------+----------+----------------+ | ||||
| 33 | 16384 | 14 | | ||||
+----------------------+----------+----------------+ | ||||
| 34 | 32768 | 15 | | ||||
+----------------------+----------+----------------+ | ||||
| 35 | 65536 | 16 | | ||||
+----------------------+----------+----------------+ | ||||
</artwork></figure>< | ||||
/t> | ||||
<t> Match length codes | <table anchor="length"> | |||
<name>Literals Length Codes</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Literals_Length_Code</th> | ||||
<th align="center">Baseline</th> | ||||
<th align="center">Number_of_Bits</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0-15</td> | ||||
<td align="center">length</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">16</td> | ||||
<td align="center">16</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">17</td> | ||||
<td align="center">18</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">18</td> | ||||
<td align="center">20</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">19</td> | ||||
<td align="center">22</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">20</td> | ||||
<td align="center">24</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">21</td> | ||||
<td align="center">28</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">22</td> | ||||
<td align="center">32</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">23</td> | ||||
<td align="center">40</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">24</td> | ||||
<td align="center">48</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">25</td> | ||||
<td align="center">64</td> | ||||
<td align="center">6</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">26</td> | ||||
<td align="center">128</td> | ||||
<td align="center">7</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">27</td> | ||||
<td align="center">256</td> | ||||
<td align="center">8</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">28</td> | ||||
<td align="center">512</td> | ||||
<td align="center">9</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">29</td> | ||||
<td align="center">1024</td> | ||||
<td align="center">10</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">30</td> | ||||
<td align="center">2048</td> | ||||
<td align="center">11</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">31</td> | ||||
<td align="center">4096</td> | ||||
<td align="center">12</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">32</td> | ||||
<td align="center">8192</td> | ||||
<td align="center">13</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">33</td> | ||||
<td align="center">16384</td> | ||||
<td align="center">14</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">34</td> | ||||
<td align="center">32768</td> | ||||
<td align="center">15</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">35</td> | ||||
<td align="center">65536</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Match length codes | ||||
are values ranging | are values ranging | |||
from 0 to 52 | from 0 to 52, | |||
inclusive. They | inclusive. They | |||
define lengths from | define lengths from | |||
3 to 131074 bytes. | 3 to 131074 bytes. | |||
The match length is | The match length is | |||
equal to the | equal to the | |||
decoded Baseline | decoded Baseline | |||
plus the result of | plus the result of | |||
reading | reading | |||
Number_of_Bits bits | Number_of_Bits bits | |||
from the bitstream, | from the bitstream, | |||
as a little-endian | as a little-endian | |||
value. | value. | |||
<figure><artwork> | </t> | |||
+-------------------+-----------------------+----------------+ | ||||
| Match_Length_Code | Baseline | Number_of_Bits | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 0-31 | Match_Length_Code + 3 | 0 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 32 | 35 | 1 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 33 | 37 | 1 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 34 | 39 | 1 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 35 | 41 | 1 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 36 | 43 | 2 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 37 | 47 | 2 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 38 | 51 | 3 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 39 | 59 | 3 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 40 | 67 | 4 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 41 | 83 | 4 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 42 | 99 | 5 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 43 | 131 | 7 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 44 | 259 | 8 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 45 | 515 | 9 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 46 | 1027 | 10 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 47 | 2051 | 11 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 48 | 4099 | 12 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 49 | 8195 | 13 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 50 | 16387 | 14 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 51 | 32771 | 15 | | ||||
+-------------------+-----------------------+----------------+ | ||||
| 52 | 65539 | 16 | | ||||
+-------------------+-----------------------+----------------+ | ||||
</artwork></figure>< | ||||
/t> | ||||
<t> Offset codes are | <table anchor="Match_Length_Code"> | |||
<name>Match Length Codes</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Match_Length_Code</th> | ||||
<th align="center">Baseline</th> | ||||
<th align="center">Number_of_Bits</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0-31</td> | ||||
<td align="center">Match_Length_Code + 3</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">32</td> | ||||
<td align="center">35</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">33</td> | ||||
<td align="center">37</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">34</td> | ||||
<td align="center">39</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">35</td> | ||||
<td align="center">41</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">36</td> | ||||
<td align="center">43</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">37</td> | ||||
<td align="center">47</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">38</td> | ||||
<td align="center">51</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">39</td> | ||||
<td align="center">59</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">40</td> | ||||
<td align="center">67</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">41</td> | ||||
<td align="center">83</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">42</td> | ||||
<td align="center">99</td> | ||||
<td align="center">5</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">43</td> | ||||
<td align="center">131</td> | ||||
<td align="center">7</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">44</td> | ||||
<td align="center">259</td> | ||||
<td align="center">8</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">45</td> | ||||
<td align="center">515</td> | ||||
<td align="center">9</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">46</td> | ||||
<td align="center">1027</td> | ||||
<td align="center">10</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">47</td> | ||||
<td align="center">2051</td> | ||||
<td align="center">11</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">48</td> | ||||
<td align="center">4099</td> | ||||
<td align="center">12</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">49</td> | ||||
<td align="center">8195</td> | ||||
<td align="center">13</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">50</td> | ||||
<td align="center">16387</td> | ||||
<td align="center">14</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">51</td> | ||||
<td align="center">32771</td> | ||||
<td align="center">15</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">52</td> | ||||
<td align="center">65539</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Offset codes are | ||||
values ranging from | values ranging from | |||
0 to N. </t> | 0 to N. </t> | |||
<t> A decoder is free | ||||
<t> A decoder is free | ||||
to limit its | to limit its | |||
maximum supported | maximum supported | |||
value for N. | value for N. | |||
Support for values | Support for values | |||
of at least 22 is | of at least 22 is | |||
recommended. | recommended. | |||
At the time of this | At the time of this | |||
writing, the | writing, the | |||
reference decoder | reference decoder | |||
supports a maximum | supports a maximum | |||
N value of 31. </t> | N value of 31. </t> | |||
<t> An offset code is | ||||
<t> An offset code is | ||||
also the number of | also the number of | |||
additional bits to | additional bits to | |||
read in | read in | |||
little-endian | little-endian | |||
fashion and can | fashion and can | |||
be translated into | be translated into | |||
an Offset_Value | an Offset_Value | |||
using the following | using the following | |||
formulas: | formulas: | |||
<figure><artwork> | </t> | |||
Offset_Value = (1 << offsetCode) + readNBits(offsetCode); | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
Offset_Value = (1 << offsetCode) + readNBits(offsetCode); | ||||
if (Offset_Value > 3) Offset = Offset_Value - 3; | if (Offset_Value > 3) Offset = Offset_Value - 3; | |||
</artwork></figure>< | ]]></artwork> | |||
/t> | <t> This means that | |||
maximum | ||||
<t> This means that | Offset_Value is | |||
maximum | (2<sup>N+1</sup>) - 1, | |||
Offset_Value is | supporting | |||
(2^(N+1))-1, | back-reference | |||
supporting | distance up to | |||
back-reference | (2<sup>N+1</sup>) - 4, but it is | |||
distance up to | limited by the | |||
(2^(N+1))-4, but it i | maximum | |||
s | back-reference | |||
limited by the | distance (see | |||
maximum | <xref target="comp_window_descr" format="default"/>). </t> | |||
back-reference | <t> Offset_Value from | |||
distance (see | ||||
<xref target="comp_wi | ||||
ndow_descr"/>). </t> | ||||
<t> Offset_Value from | ||||
1 to 3 are special: | 1 to 3 are special: | |||
they define "repeat | they define "repeat | |||
codes". This is | codes". This is | |||
described in more | described in more | |||
detail in | detail in | |||
<xref target="repeat_ | <xref target="repeat_ | |||
offsets"/>. </t> | offsets" format="default"/>. </t> | |||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<section title="Decoding | <name>Decoding Sequences</name> | |||
Sequences"> | <t> FSE bitstreams are | |||
<t> FSE bitstreams are | ||||
read in reverse of | read in reverse of | |||
the direction they | the direction they | |||
are written. In zstd, | are written. In zstd, | |||
the compressor | the compressor | |||
writes bits forward | writes bits forward | |||
into a block, and | into a block, and | |||
the decompressor | the decompressor | |||
must read the | must read the | |||
bitstream | bitstream | |||
backwards. </t> | backwards. </t> | |||
<t> To find the start | ||||
<t> To find the start | ||||
of the bitstream, it | of the bitstream, it | |||
is therefore | is therefore | |||
necessary to know | necessary to know | |||
the offset of the | the offset of the | |||
last byte of the | last byte of the | |||
block, which can be | block, which can be | |||
found by counting | found by counting | |||
Block_Size bytes | Block_Size bytes | |||
after the block | after the block | |||
header. </t> | header. </t> | |||
<t> After writing the | ||||
<t> After writing the | ||||
last bit containing | last bit containing | |||
information, the | information, the | |||
compressor writes a | compressor writes a | |||
single 1 bit and | single 1 bit and | |||
then fills the rest | then fills the rest | |||
of the byte with | of the byte with | |||
zero bits. The | zero bits. The | |||
last byte of the | last byte of the | |||
compressed | compressed | |||
bitstream cannot be | bitstream cannot be | |||
zero for that | zero for that | |||
reason. </t> | reason. </t> | |||
<t> When decompressing, | ||||
<t> When decompressing, | ||||
the last byte | the last byte | |||
containing the | containing the | |||
padding is the | padding is the | |||
first byte to read. | first byte to read. | |||
The decompressor | The decompressor | |||
needs to skip the | needs to skip the | |||
up to 7 bits of | up to 7 bits of | |||
0-padding as well | 0-padding as well | |||
as the the first 1 | as the first 1 | |||
bit that occurs. | bit that occurs. | |||
Afterwards, the | Afterwards, the | |||
useful part of the | useful part of the | |||
bitstream | bitstream | |||
begins. </t> | begins. </t> | |||
<t> FSE decoding | <t> FSE decoding | |||
requires a 'state' | requires a 'state' | |||
to be carried from | to be carried from | |||
symbol to symbol. | symbol to symbol. | |||
For more | For more | |||
explanation on FSE | explanation on FSE | |||
decoding, see | decoding, see | |||
<xref target="comp_fs | <xref target="comp_fse" format="default"/>. </t> | |||
e"/>. </t> | <t> For sequence | |||
<t> For sequence | ||||
decoding, a | decoding, a | |||
separate state | separate state | |||
keeps track of | keeps track of | |||
each literal | each literals | |||
lengths, offsets, | length, offset, | |||
and match lengths | and match length | |||
symbols. Some FSE | code. Some FSE | |||
primitives are | primitives are | |||
also used. For | also used. For | |||
more details on | more details on | |||
the operation of | the operation of | |||
these primitives, | these primitives, | |||
see | see | |||
<xref target="comp_fs | <xref target="comp_fs | |||
e"/>. </t> | e" format="default"/>. </t> | |||
<t> The bitstream | ||||
<t> The bitstream | ||||
starts with initial | starts with initial | |||
FSE state values, | FSE state values, | |||
each using the | each using the | |||
required number of | required number of | |||
bits in their | bits in their | |||
respective | respective | |||
accuracy, decoded | accuracy, decoded | |||
previously from | previously from | |||
their normalized | their normalized | |||
distribution. It | distribution. It | |||
starts with | starts with | |||
Literals_Length_State , | Literals_Length_State , | |||
followed by | followed by | |||
Offset_State, and | Offset_State, and | |||
finally | finally | |||
Match_Length_State. < /t> | Match_Length_State. < /t> | |||
<t> Note that all | ||||
<t> Note that all | ||||
values are read | values are read | |||
backward, so the | backward, so the | |||
'start' of the | 'start' of the | |||
bitstream is at the | bitstream is at the | |||
highest position in | highest position in | |||
memory, immediately | memory, immediately | |||
before the last | before the last | |||
1 bit for | 1 bit for | |||
padding. </t> | padding. </t> | |||
<t> After decoding the | ||||
<t> After decoding the | ||||
starting states, a | starting states, a | |||
single sequence is | single sequence is | |||
decoded | decoded | |||
Number_Of_Sequences | Number_Of_Sequences | |||
times. These | times. These | |||
sequences are | sequences are | |||
decoded in order | decoded in order | |||
from first to last. | from first to last. | |||
Since the | Since the | |||
compressor writes | compressor writes | |||
the bitstream in | the bitstream in | |||
the forward | the forward | |||
direction, this | direction, this | |||
means the | means the | |||
compressor must | compressor must | |||
encode the | encode the | |||
sequences starting | sequences starting | |||
with the last one | with the last one | |||
and ending with the | and ending with the | |||
first. </t> | first. </t> | |||
<t> For each of the | ||||
<t> For each of the | ||||
symbol types, the | symbol types, the | |||
FSE state can be | FSE state can be | |||
used to determine | used to determine | |||
the appropriate | the appropriate | |||
code. The code then | code. The code then | |||
defines the | defines the | |||
Baseline and Number_o f_Bits | Baseline and Number_o f_Bits | |||
to read for | to read for | |||
each type. The | each type. The | |||
description of the | description of the | |||
codes for how to | codes for how to | |||
determine these | determine these | |||
values can be | values can be | |||
found in | found in | |||
<xref target="seq_sec | <xref target="seq_sec | |||
_hdr"/>. </t> | _hdr" format="default"/>. </t> | |||
<t> Decoding starts by | ||||
<t> Decoding starts by | ||||
reading the | reading the | |||
Number_of_Bits | Number_of_Bits | |||
required to decode | required to decode | |||
offset. It | offset. It | |||
does the same for | does the same for | |||
Match_Length and | Match_Length and | |||
then for | then for | |||
Literals_Length. | Literals_Length. | |||
This sequence is | This sequence is | |||
then used for | then used for | |||
Sequence Execution | Sequence Execution | |||
(see | (see | |||
<xref target="comp_se | <xref target="comp_se | |||
quence_exec"/>). </t> | quence_exec" format="default"/>). </t> | |||
<t> If it is not the | ||||
<t> If it is not the | ||||
last sequence in | last sequence in | |||
the block, the next | the block, the next | |||
operation is to | operation is to | |||
update states. | update states. | |||
Using the rules | Using the rules | |||
pre-calculated in | precalculated in | |||
the decoding | the decoding | |||
tables, | tables, | |||
Literals_Length_State | Literals_Length_State | |||
is updated, | is updated, | |||
followed by | followed by | |||
Match_Length_State, | Match_Length_State, | |||
and then | and then | |||
Offset_State. | Offset_State. | |||
See | See | |||
<xref target="comp_fs e"/> | <xref target="comp_fs e" format="default"/> | |||
for details on how | for details on how | |||
to update states | to update states | |||
from the | from the | |||
bitstream. </t> | bitstream. </t> | |||
<t> This operation will | ||||
<t> This operation will | ||||
be repeated | be repeated | |||
Number_of_Sequences | Number_of_Sequences | |||
times. At the end, | times. At the end, | |||
the bitstream shall | the bitstream shall | |||
be entirely | be entirely | |||
consumed; otherwise, | consumed; otherwise, | |||
the bitstream is | the bitstream is | |||
considered | considered | |||
corrupted. </t> | corrupted. </t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="default_dist" numbered="true" toc="default"> | ||||
<section anchor="default_dist" | <name>Default Distributions</name> | |||
title="Default Distribu | <t> If Predefined_Mode | |||
tions"> | ||||
<t> If Predefined_Mode | ||||
is selected for a | is selected for a | |||
symbol type, its | symbol type, its | |||
FSE decoding table | FSE decoding table | |||
is generated from a | is generated from a | |||
predefined | predefined | |||
distribution table | distribution table | |||
defined here. For | defined here. For | |||
details on how to | details on how to | |||
convert this | convert this | |||
distribution into | distribution into | |||
a decoding table, | a decoding table, | |||
see <xref target="com | see <xref target="com | |||
p_fse"/>. </t> | p_fse" format="default"/>. </t> | |||
<section numbered="true" toc="default"> | ||||
<section title="Literals | <name>Literals Length Codes</name> | |||
Length"> | <t> The decoding | |||
<t> The decoding | ||||
table uses an | table uses an | |||
accuracy log of | accuracy log of | |||
6 bits (64 | 6 bits (64 | |||
states). | states). | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
short literalsLength_defaultDistribution[36] = | short literalsLength_defaultDistribution[36] = | |||
{ 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, | { 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, | |||
2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1, | 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1, | |||
-1,-1,-1,-1 | -1,-1,-1,-1 | |||
}; | }; | |||
</artwork></figu | ]]></artwork> | |||
re></t> | </section> | |||
</section> | <section numbered="true" toc="default"> | |||
<name>Match Length Codes</name> | ||||
<section title="Match Len | <t> The decoding | |||
gth"> | ||||
<t> The decoding | ||||
table uses an | table uses an | |||
accuracy log of | accuracy log of | |||
6 bits (64 | 6 bits (64 | |||
states). | states). | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
short matchLengths_defaultDistribution[53] = | short matchLengths_defaultDistribution[53] = | |||
{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | |||
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | |||
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1, | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1, | |||
-1,-1,-1,-1,-1 | -1,-1,-1,-1,-1 | |||
}; | }; | |||
</artwork></figu | ]]></artwork> | |||
re></t> | </section> | |||
</section> | <section numbered="true" toc="default"> | |||
<name>Offset Codes</name> | ||||
<section title="Offset Co | <t> The decoding | |||
des"> | ||||
<t> The decoding | ||||
table uses an | table uses an | |||
accuracy log of | accuracy log of | |||
5 bits (32 | 5 bits (32 | |||
states), and | states) and | |||
supports a | supports a | |||
maximum N value | maximum N value | |||
of 28, allowing | of 28, allowing | |||
offset values | offset values | |||
up to | up to | |||
536,870,908. </t> | 536,870,908. </t> | |||
<t> If any sequence | ||||
<t> If any sequence | ||||
in the | in the | |||
compressed | compressed | |||
block requires | block requires | |||
a larger offset | a larger offset | |||
than this, it's | than this, it's | |||
not possible to | not possible to | |||
use the default | use the default | |||
distribution to | distribution to | |||
represent it. | represent it. | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
short offsetCodes_defaultDistribution[29] = | short offsetCodes_defaultDistribution[29] = | |||
{ 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | { 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | |||
1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 | 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 | |||
}; | }; | |||
</artwork></figu | ]]></artwork> | |||
re></t> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
</section> | <section anchor="comp_sequence_exec" numbered="true" toc="default"> | |||
<name>Sequence Execution</name> | ||||
<section anchor="comp_sequence_exec" | <t> Once literals and sequences have been decoded, | |||
title="Sequence Execution"> | ||||
<t> Once literals and sequences have been decoded, | ||||
they are combined to produce the decoded content | they are combined to produce the decoded content | |||
of a block. </t> | of a block. </t> | |||
<t> Each sequence consists of a tuple of | ||||
<t> Each sequence consists of a tuple of | ||||
(literals_length, offset_value, match_length), | (literals_length, offset_value, match_length), | |||
decoded as described in the Sequences_Section | decoded as described in the Sequences_Section | |||
(<xref target="comp_sequences"/>). To execute a | (<xref target="comp_sequences" format="default"/>). T o execute a | |||
sequence, first copy literals_length bytes from | sequence, first copy literals_length bytes from | |||
the decoded literals to the output. </t> | the decoded literals to the output. </t> | |||
<t> Then, match_length bytes are copied from previous | ||||
<t> Then, match_length bytes are copied from previous | ||||
decoded data. The offset to copy from is | decoded data. The offset to copy from is | |||
determined by offset_value: | determined by offset_value: | |||
<list style="symbols"> | </t> | |||
<t> if Offset_Value > 3, then the offset is | <ul spacing="normal"> | |||
Offset_Value - 3; </t> | <li> if Offset_Value > 3, then the offset is | |||
Offset_Value - 3; </li> | ||||
<t> if Offset_Value is from 1-3, the offset is | <li> if Offset_Value is from 1-3, the offset is | |||
a special repeat offset value. See | a special repeat offset value. See | |||
<xref target="repeat_offsets"/> for how | <xref target="repeat_offsets" format="default | |||
the offset is determined in this case. </t> | "/> for how | |||
</list> </t> | the offset is determined in this case. </li> | |||
</ul> | ||||
<t> The offset is defined as from the current | <t> The offset is defined as from the current | |||
position (after copying the literals), so an | position (after copying the literals), so an | |||
offset of 6 and a match length of | offset of 6 and a match length of | |||
3 means that 3 bytes should be copied from 6 bytes | 3 means that 3 bytes should be copied from 6 bytes | |||
back. Note that all offsets leading to previously | back. Note that all offsets leading to previously | |||
decoded data must be smaller than Window_Size | decoded data must be smaller than Window_Size | |||
defined in Frame_Header_Descriptor | defined in Frame_Header_Descriptor | |||
(<xref target="comp_frame_header_desc"/>). </t> | (<xref target="comp_frame_header_desc" format="defaul | |||
</section> | t"/>). </t> | |||
</section> | ||||
<section title="Repeat Offsets" | <section anchor="repeat_offsets" numbered="true" toc="default"> | |||
anchor="repeat_offsets"> | <name>Repeat Offsets</name> | |||
<t> As seen above, the first three values | <t> As seen above, the first three values | |||
define a repeated offset; we will call | define a repeated offset; we will call | |||
them Repeated_Offset1, Repeated_Offset2, | them Repeated_Offset1, Repeated_Offset2, | |||
and Repeated_Offset3. They are sorted in | and Repeated_Offset3. They are sorted in | |||
recency order, with Repeated_Offset1 | recency order, with Repeated_Offset1 | |||
meaning "most recent one". </t> | meaning "most recent one". </t> | |||
<t> If offset_value is 1, then the offset used | ||||
<t> If offset_value is 1, then the offset used | ||||
is Repeated_Offset1, etc. </t> | is Repeated_Offset1, etc. </t> | |||
<t> There is one exception: when the current | ||||
<t> There is one exception: When the current | ||||
sequence's literals_length is 0, repeated | sequence's literals_length is 0, repeated | |||
offsets are shifted by 1, so an | offsets are shifted by 1, so an | |||
offset_value of 1 means Repeated_Offset2, | offset_value of 1 means Repeated_Offset2, | |||
an offset_value of 2 means Repeated_Offset3, | an offset_value of 2 means Repeated_Offset3, | |||
and an offset_value of 3 means | and an offset_value of 3 means | |||
Repeated_Offset1 - 1_byte. </t> | Repeated_Offset1 - 1_byte. </t> | |||
<t> For the first block, the starting offset | ||||
<t> For the first block, the starting offset | ||||
history is populated with the following | history is populated with the following | |||
values: Repeated_Offset1 (1), | values: Repeated_Offset1 (1), | |||
Repeated_Offset2 (4), and | Repeated_Offset2 (4), and | |||
Repeated_Offset3 (8), unless | Repeated_Offset3 (8), unless | |||
a dictionary is used, in which case they | a dictionary is used, in which case they | |||
come from the dictionary. </t> | come from the dictionary. </t> | |||
<t> Then each block gets its starting offset | ||||
<t> Then each block gets its starting offset | ||||
history from the ending values of the most | history from the ending values of the most | |||
recent Compressed_Block. Note that blocks | recent Compressed_Block. Note that blocks | |||
that are not Compressed_Block are skipped; | that are not Compressed_Block are skipped; | |||
they do not contribute to offset | they do not contribute to offset | |||
history. </t> | history. </t> | |||
<t> The newest offset takes the lead in offset | <t> During the execution of the sequences of a | |||
history, shifting others back (up to its | Compressed_Block, the Repeated_Offsets' | |||
previous place if it was already | values are kept up to date, so that they | |||
present). This means that when | always represent the three most recently | |||
Repeated_Offset1 (most recent) is used, | used offsets. In order to achieve that, | |||
history is unmodified. When | they are updated after executing each | |||
Repeated_Offset2 is used, it is swapped | sequence in the following way: </t> | |||
with Repeated_Offset1. If any other offset | ||||
is used, it becomes Repeated_Offset1, and | ||||
the rest are shifted back by 1. </t> | ||||
</section> | ||||
</section> | ||||
<section anchor="comp_skippable" | <t> When the sequence's offset_value does not | |||
title="Skippable Frames"> | refer to one of the Repeated_Offsets -- | |||
<t> <figure><artwork> | when it has value greater than 3, or when | |||
+--------------+------------+-----------+ | it has value 3 and the sequence's | |||
| Magic_Number | Frame_Size | User_Data | | literals_length is zero -- the | |||
+--------------+------------+-----------+ | Repeated_Offsets' values are shifted back | |||
| 4 bytes | 4 bytes | n bytes | | one, and Repeated_Offset1 takes on the | |||
+--------------+------------+-----------+ | value of the offset that was just used. </t> | |||
</artwork></figure> </t> | ||||
<t> Skippable frames allow the insertion of | <t> Otherwise, when the sequence's offset_value | |||
refers to one of the Repeated_Offsets -- | ||||
when it has value 1 or 2, or when it has | ||||
value 3 and the sequence's literals_length | ||||
is non-zero -- the Repeated_Offsets are | ||||
reordered, so that Repeated_Offset1 takes | ||||
on the value of the used Repeated_Offset, | ||||
and the existing values are pushed back | ||||
from the first Repeated_Offset through to | ||||
the Repeated_Offset selected by the | ||||
offset_value. This effectively performs a | ||||
single-stepped wrapping rotation of the | ||||
values of these offsets, so that their | ||||
order again reflects the recency of their | ||||
use.</t> | ||||
<t>The following table shows the values of | ||||
the Repeated_Offsets as a series of | ||||
sequences are applied to them:</t> | ||||
<table anchor="repeated_offsets"> | ||||
<name>Repeated_Offsets</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">offset_&zwsp;value</th> | ||||
<th align="center">literals_&zwsp;length</th> | ||||
<th align="center">Repeated_&zwsp;Offset1</th> | ||||
<th align="center">Repeated_&zwsp;Offset2</th> | ||||
<th align="center">Repeated_&zwsp;Offset3</th> | ||||
<th align="left">Comment</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="right"></td> | ||||
<td align="center"></td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">8</td> | ||||
<td align="left">starting values</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">1114</td> | ||||
<td align="center">11</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="left">non-repeat</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">1</td> | ||||
<td align="center">22</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="left">repeat 1; no change</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">2225</td> | ||||
<td align="center">22</td> | ||||
<td align="center">2222</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">1</td> | ||||
<td align="left">non-repeat</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">1114</td> | ||||
<td align="center">111</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">2222</td> | ||||
<td align="center">1111</td> | ||||
<td align="left">non-repeat</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">3336</td> | ||||
<td align="center">33</td> | ||||
<td align="center">3333</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">2222</td> | ||||
<td align="left">non-repeat</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">2</td> | ||||
<td align="center">22</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">3333</td> | ||||
<td align="center">2222</td> | ||||
<td align="left">repeat 2; swap 1 & 2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">3</td> | ||||
<td align="center">33</td> | ||||
<td align="center">2222</td> | ||||
<td align="center">1111</td> | ||||
<td align="center">3333</td> | ||||
<td align="left">repeat 3; rotate 3 to 1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">1</td> | ||||
<td align="center">0</td> | ||||
<td align="center">2221</td> | ||||
<td align="center">2222</td> | ||||
<td align="center">1111</td> | ||||
<td align="left">insert resolved offset</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="right">1</td> | ||||
<td align="center">0</td> | ||||
<td align="center">2222</td> | ||||
<td align="center">2221</td> | ||||
<td align="center">3333</td> | ||||
<td align="left">repeat 2</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
</section> | ||||
</section> | ||||
<section anchor="comp_skippable" numbered="true" toc="default"> | ||||
<name>Skippable Frames</name> | ||||
<table anchor="skippable"> | ||||
<name>Skippable Frames</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Magic_Number</th> | ||||
<th align="center">Frame_Size</th> | ||||
<th align="center">User_Data</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">4 bytes</td> | ||||
<td align="center">4 bytes</td> | ||||
<td align="center">n bytes</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Skippable frames allow the insertion of | ||||
user-defined metadata into a flow of concatenated | user-defined metadata into a flow of concatenated | |||
frames. </t> | frames. </t> | |||
<t> Skippable frames defined in this specification are | ||||
<t> Skippable frames defined in this specification are | ||||
compatible with skippable frames in | compatible with skippable frames in | |||
<xref target="LZ4"/>. </t> | <xref target="LZ4" format="default"/>. </t> | |||
<t> From a compliant decoder perspective, skippable | ||||
<t> From a compliant decoder perspective, skippable | ||||
frames simply need to be skipped, and their | frames simply need to be skipped, and their | |||
content ignored, resuming decoding after the | content ignored, resuming decoding after the | |||
skippable frame. </t> | skippable frame. </t> | |||
<t> It should be noted that a skippable frame can be | ||||
<t> It should be noted that a skippable frame can be | ||||
used to watermark a stream of concatenated frames | used to watermark a stream of concatenated frames | |||
embedding any kind of tracking information (even | embedding any kind of tracking information (even | |||
just a Universally Unique Identifier (UUID)). Users w ary of such possibility | just a Universally Unique Identifier (UUID)). Users w ary of such possibility | |||
should scan the stream of concatenated frames in | should scan the stream of concatenated frames in | |||
an attempt to detect such frames for analysis or | an attempt to detect such frames for analysis or | |||
removal. </t> | removal. </t> | |||
<t> The fields are: | ||||
</t> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Magic_Number:</dt> | ||||
<t> The fields are: | <dd>4 bytes, little-endian format. Value: | |||
<list style="hanging"> | ||||
<t hangText="Magic_Number:"> | ||||
4 bytes, little-endian format. Value: | ||||
0x184D2A5?, which means any value from | 0x184D2A5?, which means any value from | |||
0x184D2A50 to 0x184D2A5F. All 16 | 0x184D2A50 to 0x184D2A5F. All 16 | |||
values are valid to identify a | values are valid to identify a | |||
skippable frame. This specification | skippable frame. This specification | |||
does not detail any specific tagging | does not detail any specific tagging | |||
methods for skippable frames. | methods for skippable frames. | |||
</t> | </dd> | |||
<dt>Frame_Size:</dt> | ||||
<t hangText="Frame_Size:"> | <dd> | |||
This is the size, in bytes, of the | This is the size, in bytes, of the | |||
following User_Data (without including | following User_Data (without including | |||
the magic number nor the size field | the magic number nor the size field | |||
itself). This field is represented | itself). This field is represented | |||
using 4 bytes, little-endian format, | using 4 bytes, little-endian format, | |||
unsigned 32 bits. This means User_Data | unsigned 32 bits. This means User_Data | |||
can't be bigger than (2^32-1) bytes. | can't be bigger than | |||
</t> | (2<sup>32</sup> -1) bytes. | |||
</dd> | ||||
<t hangText="User_Data:"> | <dt>User_Data:</dt> | |||
<dd> | ||||
This field can be anything. Data will | This field can be anything. Data will | |||
just be skipped by the decoder. | just be skipped by the decoder. | |||
</t> | </dd> | |||
</list> </t> | </dl> | |||
</section> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="comp_entropy" numbered="true" toc="default"> | ||||
<section anchor="comp_entropy" title="Entropy Encoding"> | <name>Entropy Encoding</name> | |||
<t> Two types of entropy encoding are used by the | <t> Two types of entropy encoding are used by the | |||
Zstandard format: FSE and Huffman coding. | Zstandard format: FSE and Huffman coding. | |||
Huffman is used to compress literals, while FSE | Huffman is used to compress literals, while FSE | |||
is used for all other symbols | is used for all other symbols | |||
(Literals_Length_Code, Match_Length_Code, and offset | (Literals_Length_Code, Match_Length_Code, and offset | |||
codes) and to compress Huffman headers.</t> | codes) and to compress Huffman headers.</t> | |||
<section anchor="comp_fse" numbered="true" toc="default"> | ||||
<section anchor="comp_fse" title="FSE"> | <name>FSE</name> | |||
<t> FSE, short for Finite State Entropy, is | <t> FSE, short for Finite State Entropy, is | |||
an entropy codec based on | an entropy codec based on | |||
<xref target="ANS"/>. | <xref target="ANS" format="default"/>. | |||
FSE encoding/decoding involves | FSE encoding/decoding involves | |||
a state that is carried over between | a state that is carried over between | |||
symbols, so decoding must be done in the | symbols, so decoding must be done in the | |||
opposite direction as encoding. Therefore, | opposite direction as encoding. Therefore, | |||
all FSE bitstreams are read from end to | all FSE bitstreams are read from end to | |||
beginning. Note that the order of the | beginning. Note that the order of the | |||
bits in the stream is not reversed; | bits in the stream is not reversed; | |||
they are simply read in the reverse | they are simply read in the reverse | |||
order from which they were written. </t> | order from which they were written. </t> | |||
<t> For additional details on FSE, see | ||||
"FiniteStateEntropy" <xref target="FSE" format="default"/>. </t> | ||||
<t> For additional details on FSE, see | <t> FSE decoding involves a decoding table | |||
Finite State Entropy | that has a power-of-2 size and contains | |||
<xref target="FSE"/>. </t> | ||||
<t> FSE decoding involves a decoding table | ||||
that has a power of 2 size and contains | ||||
three elements: Symbol, Num_Bits, and | three elements: Symbol, Num_Bits, and | |||
Baseline. The base 2 logarithm | Baseline. The base 2 logarithm | |||
of the table size is its Accuracy_Log. | of the table size is its Accuracy_Log. | |||
An FSE state value represents an index in | An FSE state value represents an index in | |||
this table. </t> | this table. </t> | |||
<t> To obtain the initial state value, | ||||
<t> To obtain the initial state value, | ||||
consume Accuracy_Log bits from the stream | consume Accuracy_Log bits from the stream | |||
as a little-endian value. The next symbol | as a little-endian value. The next symbol | |||
in the stream is the Symbol indicated in | in the stream is the Symbol indicated in | |||
the table for that state. To obtain the | the table for that state. To obtain the | |||
next state value, the decoder should | next state value, the decoder should | |||
consume Num_Bits bits from the stream as a | consume Num_Bits bits from the stream as a | |||
little-endian value and add it to | little-endian value and add it to | |||
Baseline. </t> | Baseline. </t> | |||
<section anchor="comp_fse_table" numbered="true" toc="default"> | ||||
<section anchor="comp_fse_table" | <name>FSE Table Description</name> | |||
title="FSE Table Description"> | <t> To decode FSE streams, it is necessary | |||
<t> To decode FSE streams, it is necessary | ||||
to construct the decoding table. The | to construct the decoding table. The | |||
Zstandard format encodes FSE table | Zstandard format encodes FSE table | |||
descriptions as described here. </t> | descriptions as described here. </t> | |||
<t> An FSE distribution table describes the | ||||
<t> An FSE distribution table describes the | ||||
probabilities of all symbols from 0 to | probabilities of all symbols from 0 to | |||
the last present one (included) on a | the last present one (included) on a | |||
normalized scale of | normalized scale of | |||
(1 << Accuracy_Log). | (1 << Accuracy_Log). | |||
Note that there must be two or | ||||
more symbols with non-zero probability. | ||||
</t> | ||||
<t> A bitstream is read forward, in | Note that there must be two or | |||
more symbols with nonzero probability. | ||||
</t> | ||||
<t> A bitstream is read forward, in | ||||
little-endian fashion. It is not | little-endian fashion. It is not | |||
necessary to know its exact size, | necessary to know its exact size, | |||
since the size will be discovered and | since the size will be discovered and | |||
reported by the decoding process. The | reported by the decoding process. The | |||
bitstream starts by reporting on which | bitstream starts by reporting on which | |||
scale it operates. If low4bits | scale it operates. If low4bits | |||
designates the lowest 4 bits of | designates the lowest 4 bits of | |||
the first byte, then | the first byte, then | |||
Accuracy_Log = low4bits + 5. </t> | Accuracy_Log = low4bits + 5. </t> | |||
<t> This is followed by each symbol value, | ||||
<t> This is followed by each symbol value, | ||||
from 0 to the last present one. The | from 0 to the last present one. The | |||
number of bits used by each field is | number of bits used by each field is | |||
variable and depends on: | variable and depends on: | |||
<list style="hanging"> | </t> | |||
<t hangText="Remaining probabilities | <dl newline="false" spacing="normal"> | |||
+ 1:"> | <dt>Remaining probabilities + 1:</dt> | |||
<dd> | ||||
For example, presuming an | For example, presuming an | |||
Accuracy_Log of 8, and | Accuracy_Log of 8, and | |||
presuming 100 probabilities | presuming 100 probabilities | |||
points have already been | points have already been | |||
distributed, the decoder may | distributed, the decoder may | |||
read any value from 0 to | read any value from 0 to | |||
(256 - 100 + 1) == 157, | (256 - 100 + 1) == 157, | |||
inclusive. Therefore, it must | inclusive. Therefore, it must | |||
read log2sup(157) == 8 | read log<sub>2</sub>sup(157) == 8 | |||
bits. </t> | bits. </dd> | |||
<dt>Value decoded:</dt> | ||||
<t hangText="Value decoded:"> | <dd> | |||
<t> | ||||
Small values use 1 fewer bit. | Small values use 1 fewer bit. | |||
For example, presuming values | For example, presuming values | |||
from 0 to 157 (inclusive) are | from 0 to 157, inclusive, are | |||
possible, 255 - 157 = 98 values | possible, 255 - 157 = 98 values | |||
are remaining in an 8-bit | are remaining in an 8-bit | |||
field. The first 98 values | field. The first 98 values | |||
(hence from 0 to 97) use only | (hence, from 0 to 97) use only | |||
7 bits, and values from 98 to | 7 bits, and values from 98 to | |||
157 use 8 bits. This is | 157 use 8 bits. This is | |||
achieved through this scheme: | achieved through the scheme in | |||
<figure><artwork> | <xref target="value" />: | |||
+------------+---------------+-----------+ | </t> | |||
| Value Read | Value Decoded | Bits Used | | ||||
+------------+---------------+-----------+ | ||||
| 0 - 97 | 0 - 97 | 7 | | ||||
+------------+---------------+-----------+ | ||||
| 98 - 127 | 98 - 127 | 8 | | ||||
+------------+---------------+-----------+ | ||||
| 128 - 225 | 0 - 97 | 7 | | ||||
+------------+---------------+-----------+ | ||||
| 226 - 255 | 128 - 157 | 8 | | ||||
+------------+---------------+-----------+ | ||||
</artwork></figure> </t> | ||||
</list></t> | ||||
<t> Symbol probabilities are read | <table anchor="value"> | |||
<name>Values Decoded</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Value Read</th> | ||||
<th align="center">Value Decoded</th> | ||||
<th align="center">Bits Used</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0 - 97</td> | ||||
<td align="center">0 - 97</td> | ||||
<td align="center">7</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">98 - 127</td> | ||||
<td align="center">98 - 127</td> | ||||
<td align="center">8</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">128 - 225</td> | ||||
<td align="center">0 - 97</td> | ||||
<td align="center">7</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">226 - 255</td> | ||||
<td align="center">128 - 157</td> | ||||
<td align="center">8</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
</dd> | ||||
</dl> | ||||
<t> Symbol probabilities are read | ||||
one by one, in order. The | one by one, in order. The | |||
probability is obtained from | probability is obtained from | |||
Value decoded using the | Value Decoded using the | |||
formula P = Value - 1. This | formula P = Value - 1. This | |||
means the value 0 becomes the | means the value 0 becomes the | |||
negative probability -1. This | negative probability -1. This | |||
is a special probability that | is a special probability that | |||
means "less than 1". Its | means "less than 1". Its | |||
effect on the distribution | effect on the distribution | |||
table is described below. For | table is described below. For | |||
the purpose of calculating | the purpose of calculating | |||
total allocated probability | total allocated probability | |||
points, it counts as 1. </t> | points, it counts as 1. </t> | |||
<t> When a symbol has a | ||||
<t> When a symbol has a | ||||
probability of zero, it is | probability of zero, it is | |||
followed by a 2-bit repeat | followed by a 2-bit repeat | |||
flag. This repeat flag tells | flag. This repeat flag tells | |||
how many probabilities of | how many probabilities of | |||
zeroes follow the current one. | zeroes follow the current one. | |||
It provides a number ranging | It provides a number ranging | |||
from 0 to 3. If it is a 3, | from 0 to 3. If it is a 3, | |||
another 2-bit repeat flag | another 2-bit repeat flag | |||
follows, and so on. </t> | follows, and so on. </t> | |||
<t> When the last symbol reaches | ||||
<t> When the last symbol reaches | ||||
a cumulated total of | a cumulated total of | |||
(1 << Accuracy_Lo g), | (1 << Accuracy_Lo g), | |||
decoding is complete. If the | decoding is complete. If the | |||
last symbol makes the cumulated | last symbol makes the cumulated | |||
total go above | total go above | |||
(1 << Accuracy_Log), | (1 << Accuracy_Log), | |||
distribution is considered | distribution is considered | |||
corrupted. </t> | corrupted. </t> | |||
<t> Finally, the decoder can tell | ||||
<t> Finally, the decoder can tell | ||||
how many bytes were used in | how many bytes were used in | |||
this process and how many | this process and how many | |||
symbols are present. The | symbols are present. The | |||
bitstream consumes a round | bitstream consumes a round | |||
number of bytes. Any remaining | number of bytes. Any remaining | |||
bit within the last byte is | bit within the last byte is | |||
simply unused. </t> | simply unused. </t> | |||
<t> The context in which the table | ||||
<t> The context in which the table | ||||
is to be used specifies an expected | is to be used specifies an expected | |||
number of symbols. That expected | number of symbols. That expected | |||
number of symbols never exceeds 256. | number of symbols never exceeds 256. | |||
If the number of symbols decoded | If the number of symbols decoded | |||
is not equal to the expected, the | is not equal to the expected, the | |||
header should be considered | header should be considered | |||
corrupt. </t> | corrupt. </t> | |||
<t> The distribution of normalized | ||||
<t> The distribution of normalized | ||||
probabilities is enough to | probabilities is enough to | |||
create a unique decoding | create a unique decoding | |||
table. The table has a size | table. The table has a size | |||
of (1 << Accuracy_Log). | of (1 << Accuracy_Log). | |||
Each cell describes the symbol | Each cell describes the symbol | |||
decoded and instructions to | decoded and instructions to | |||
get the next state. </t> | get the next state. </t> | |||
<t> Symbols are scanned in their | ||||
<t> Symbols are scanned in their | ||||
natural order for "less than 1" | natural order for "less than 1" | |||
probabilities as described | probabilities as described | |||
above. Symbols with this | above. Symbols with this | |||
probability are being | probability are being | |||
attributed a single cell, | attributed a single cell, | |||
starting from the end of the | starting from the end of the | |||
table and retreating. These | table and retreating. These | |||
symbols define a | symbols define a | |||
full state reset, reading | full state reset, reading | |||
Accuracy_Log bits. </t> | Accuracy_Log bits. </t> | |||
<t> All remaining symbols are | ||||
<t> All remaining symbols are | ||||
allocated in their natural | allocated in their natural | |||
order. Starting from symbol 0 | order. Starting from symbol 0 | |||
and table position 0, each | and table position 0, each | |||
symbol gets allocated as many | symbol gets allocated as many | |||
cells as its probability. Cell | cells as its probability. Cell | |||
allocation is spread, not | allocation is spread, not | |||
linear; each successor | linear; each successor | |||
position follows this rule: | position follows this rule: | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
position += (tableSize >> 1) + (tableSize >> 3) + 3; | position += (tableSize >> 1) + (tableSize >> 3) + 3; | |||
position &= tableSize - 1; | position &= tableSize - 1; | |||
</artwork></figure></t> | ]]></artwork> | |||
<t> A position is skipped if it is | ||||
<t> A position is skipped if it is | ||||
already occupied by a "less | already occupied by a "less | |||
than 1" probability symbol. | than 1" probability symbol. | |||
Position does not reset between | Position does not reset between | |||
symbols; it simply iterates | symbols; it simply iterates | |||
through each position in the | through each position in the | |||
table, switching to the next | table, switching to the next | |||
symbol when enough states have | symbol when enough states have | |||
been allocated to the current | been allocated to the current | |||
one. </t> | one. </t> | |||
<t> The result is a list of state | ||||
<t> The result is a list of state | ||||
values. Each state will decode | values. Each state will decode | |||
the current symbol. </t> | the current symbol. </t> | |||
<t> To get the Number_of_Bits and | ||||
<t> To get the Number_of_Bits and | ||||
Baseline required for the next | Baseline required for the next | |||
state, it is first necessary | state, it is first necessary | |||
to sort all states in their | to sort all states in their | |||
natural order. The lower | natural order. The lower | |||
states will need 1 more bit | states will need 1 more bit | |||
than higher ones. The process | than higher ones. The process | |||
is repeated for each symbol. | is repeated for each symbol. | |||
</t> | </t> | |||
<t> For example, presuming a symbol | ||||
<t> For example, presuming a symbol | ||||
has a probability of 5, it | has a probability of 5, it | |||
receives five state values. | receives five state values. | |||
States are sorted in natural | States are sorted in natural | |||
order. The next power of | order. The next power of | |||
2 is 8. The space of | 2 is 8. The space of | |||
probabilities is divided into | probabilities is divided into | |||
8 equal parts. Presuming the | 8 equal parts. Presuming the | |||
Accuracy_Log is 7, this | Accuracy_Log is 7, this | |||
defines 128 states, and each | defines 128 states, and each | |||
share (divided by 8) is 16 | share (divided by 8) is 16 | |||
in size. In order to reach | in size. In order to reach | |||
8, 8 - 5 = 3 lowest states will | 8, 8 - 5 = 3 lowest states will | |||
count "double", doubling the | count "double", doubling the | |||
number of shares (32 in width), | number of shares (32 in width), | |||
requiring 1 more bit in the | requiring 1 more bit in the | |||
process. </t> | process. </t> | |||
<t> Baseline is assigned starting | ||||
<t> Baseline is assigned starting | ||||
from the higher states using | from the higher states using | |||
fewer bits, and proceeding | fewer bits, and proceeding | |||
naturally, then resuming at | naturally, then resuming at | |||
the first state, each taking | the first state, each taking | |||
its allocated width from | its allocated width from | |||
Baseline. </t> | Baseline. </t> | |||
<t> <figure><artwork> | <table anchor="state"> | |||
+----------------+-------+-------+--------+------+-------+ | <name>Baseline Assignments</name> | |||
| state order | 0 | 1 | 2 | 3 | 4 | | <tbody> | |||
+----------------+-------+-------+--------+------+-------+ | <tr> | |||
| width | 32 | 32 | 32 | 16 | 16 | | <td align="center">state order</td> | |||
+----------------+-------+-------+--------+------+-------+ | <td align="center">0</td> | |||
| Number_of_Bits | 5 | 5 | 5 | 4 | 4 | | <td align="center">1</td> | |||
+----------------+-------+-------+--------+------+-------+ | <td align="center">2</td> | |||
| range number | 2 | 4 | 6 | 0 | 1 | | <td align="center">3</td> | |||
+----------------+-------+-------+--------+------+-------+ | <td align="center">4</td> | |||
| Baseline | 32 | 64 | 96 | 0 | 16 | | </tr> | |||
+----------------+-------+-------+--------+------+-------+ | <tr> | |||
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 | | <td align="center">width</td> | |||
+----------------+-------+-------+--------+------+-------+ | <td align="center">32</td> | |||
</artwork></figure> </t> | <td align="center">32</td> | |||
<td align="center">32</td> | ||||
<td align="center">16</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Number_of_Bits</td> | ||||
<td align="center">5</td> | ||||
<td align="center">5</td> | ||||
<td align="center">5</td> | ||||
<td align="center">4</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">range number</td> | ||||
<td align="center">2</td> | ||||
<td align="center">4</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Baseline</td> | ||||
<td align="center">32</td> | ||||
<td align="center">64</td> | ||||
<td align="center">96</td> | ||||
<td align="center">0</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">range</td> | ||||
<td align="center">32-63</td> | ||||
<td align="center">64-95</td> | ||||
<td align="center">96-127</td> | ||||
<td align="center">0-15</td> | ||||
<td align="center">16-31</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> The next state is determined | <t> The next state is determined | |||
from the current state by | from the current state by | |||
reading the required | reading the required | |||
Number_of_Bits and adding the | Number_of_Bits and adding the | |||
specified Baseline. </t> | specified Baseline. </t> | |||
<t> See <xref target="app_tables" format="default"/> | ||||
<t> See <xref target="app_tables"/> | ||||
for the results of this | for the results of this | |||
process that are applied to the | process that are applied to the | |||
default distributions. </t> | default distributions. </t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="comp_huffman" numbered="true" toc="default"> | ||||
<section anchor="comp_huffman" title="Huffman Coding"> | <name>Huffman Coding</name> | |||
<t> Zstandard Huffman-coded streams are read | <t> Zstandard Huffman-coded streams are read | |||
backwards, similar to the FSE bitstreams. | backwards, similar to the FSE bitstreams. | |||
Therefore, to find the start of the | Therefore, to find the start of the | |||
bitstream, it is necessary to know the | bitstream, it is necessary to know the | |||
offset of the last byte of the | offset of the last byte of the | |||
Huffman-coded stream. </t> | Huffman-coded stream. </t> | |||
<t> After writing the last bit containing | ||||
<t> After writing the last bit containing | ||||
information, the compressor writes a | information, the compressor writes a | |||
single 1 bit and then fills the rest of | single 1 bit and then fills the rest of | |||
the byte with 0 bits. The last byte of | the byte with 0 bits. The last byte of | |||
the compressed bitstream cannot be 0 for | the compressed bitstream cannot be 0 for | |||
that reason. </t> | that reason. </t> | |||
<t> When decompressing, the last byte | ||||
<t> When decompressing, the last byte | ||||
containing the padding is the first byte | containing the padding is the first byte | |||
to read. The decompressor needs to skip | to read. The decompressor needs to skip | |||
the up to 7 bits of 0-padding as well as | the up to 7 bits of 0-padding as well as | |||
the first 1 bit that occurs. Afterwards, | the first 1 bit that occurs. Afterwards, | |||
the useful part of the bitstream | the useful part of the bitstream | |||
begins. </t> | begins. </t> | |||
<t> The bitstream contains Huffman-coded | ||||
<t> The bitstream contains Huffman-coded | ||||
symbols in little-endian order, with the | symbols in little-endian order, with the | |||
codes defined by the method below. </t> | codes defined by the method below. </t> | |||
<section anchor="huffman_tree_desc" numbered="true" toc="default"> | ||||
<section anchor="huffman_tree_desc" | <name>Huffman Tree Description</name> | |||
title="Huffman Tree Description"> | <t> Prefix coding represents symbols | |||
<t> Prefix coding represents symbols | ||||
from an a priori known alphabet by | from an a priori known alphabet by | |||
bit sequences (codewords), one | bit sequences (codewords), one | |||
codeword for each symbol, in a | codeword for each symbol, in a | |||
manner such that different symbols | manner such that different symbols | |||
may be represented by bit sequences | may be represented by bit sequences | |||
of different lengths, but a parser | of different lengths, but a parser | |||
can always parse an encoded string | can always parse an encoded string | |||
unambiguously | unambiguously, | |||
symbol by symbol. </t> | symbol by symbol. </t> | |||
<t> Given an alphabet with known symbol | ||||
<t> Given an alphabet with known symbol | ||||
frequencies, the Huffman algorithm | frequencies, the Huffman algorithm | |||
allows the construction of an | allows the construction of an | |||
optimal prefix code using the | optimal prefix code using the | |||
fewest bits of any possible prefix | fewest bits of any possible prefix | |||
codes for that alphabet. </t> | codes for that alphabet. </t> | |||
<t> The prefix code must not exceed a | ||||
<t> The prefix code must not exceed a | ||||
maximum code length. More bits | maximum code length. More bits | |||
improve accuracy but yield a larger | improve accuracy but yield a larger | |||
header size and require more | header size and require more | |||
memory or more complex decoding | memory or more complex decoding | |||
operations. This specification | operations. This specification | |||
limits the maximum code length to | limits the maximum code length to | |||
11 bits. </t> | 11 bits. </t> | |||
<t> All literal values from zero | ||||
<t> All literal values from zero | ||||
(included) to the last present one | (included) to the last present one | |||
(excluded) are represented by | (excluded) are represented by | |||
Weight with values from 0 to | Weight with values from 0 to | |||
Max_Number_of_Bits. Transformation | Max_Number_of_Bits. Transformation | |||
from Weight to Number_of_Bits | from Weight to Number_of_Bits | |||
follows this pseudocode: | follows this pseudocode: | |||
<figure><artwork> | </t> | |||
<sourcecode type="pseudocode"> | ||||
if Weight == 0 | if Weight == 0 | |||
Number_of_Bits = 0 | Number_of_Bits = 0 | |||
else | else | |||
Number_of_Bits = Max_Number_of_Bits + 1 - Weight | Number_of_Bits = Max_Number_of_Bits + 1 - Weight | |||
</artwork></figure> </t> | </sourcecode> | |||
<t> The last symbol's Weight is | ||||
<t> The last symbol's Weight is | ||||
deduced from previously decoded | deduced from previously decoded | |||
ones, by completing to the nearest | ones, by completing to the nearest | |||
power of 2. This power of 2 gives | power of 2. This power of 2 gives | |||
Max_Number_of_Bits the depth of | Max_Number_of_Bits the depth of | |||
the current tree. </t> | the current tree. </t> | |||
<t> For example, presume the following | ||||
<t> For example, presume the following | ||||
Huffman tree must be described: | Huffman tree must be described: | |||
<figure><artwork> | </t> | |||
+---------------+----------------+ | ||||
| Literal Value | Number_of_Bits | | ||||
+---------------+----------------+ | ||||
| 0 | 1 | | ||||
+---------------+----------------+ | ||||
| 1 | 2 | | ||||
+---------------+----------------+ | ||||
| 2 | 3 | | ||||
+---------------+----------------+ | ||||
| 3 | 0 | | ||||
+---------------+----------------+ | ||||
| 4 | 4 | | ||||
+---------------+----------------+ | ||||
| 5 | 4 | | ||||
+---------------+----------------+ | ||||
</artwork></figure> </t> | ||||
<t> The tree depth is 4, since its | <table anchor="Huffman"> | |||
<name>Huffman Tree</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Literal Value</th> | ||||
<th align="center">Number_of_Bits</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> The tree depth is 4, since its | ||||
longest element uses 4 bits. | longest element uses 4 bits. | |||
(The longest elements are those | (The longest elements are those | |||
with the smallest frequencies.) | with the smallest frequencies.) | |||
Value 5 will not be listed as it | Value 5 will not be listed as it | |||
can be determined from the values | can be determined from the values | |||
for 0-4, nor will values above 5 | for 0-4, nor will values above 5 | |||
as they are all 0. Values from 0 | as they are all 0. Values from 0 | |||
to 4 will be listed using Weight | to 4 will be listed using Weight | |||
instead of Number_of_Bits. The | instead of Number_of_Bits. The | |||
pseudocode to determine Weight is: | pseudocode to determine Weight is: | |||
<figure><artwork> | </t> | |||
<sourcecode type="pseudocode"> | ||||
if Number_of_Bits == 0 | if Number_of_Bits == 0 | |||
Weight = 0 | Weight = 0 | |||
else | else | |||
Weight = Max_Number_of_Bits + 1 - Number_of_Bits | Weight = Max_Number_of_Bits + 1 - Number_of_Bits | |||
</artwork></figure> </t> | </sourcecode> | |||
<t> It gives the following series of | ||||
<t> It gives the following series of | ||||
weights: | weights: | |||
<figure><artwork> | </t> | |||
+---------------+--------+ | ||||
| Literal Value | Weight | | ||||
+---------------+--------+ | ||||
| 0 | 4 | | ||||
+---------------+--------+ | ||||
| 1 | 3 | | ||||
+---------------+--------+ | ||||
| 2 | 2 | | ||||
+---------------+--------+ | ||||
| 3 | 0 | | ||||
+---------------+--------+ | ||||
| 4 | 1 | | ||||
+---------------+--------+ | ||||
</artwork></figure> </t> | ||||
<t> The decoder will do the inverse | <table anchor="weights"> | |||
<name>Weights</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Literal Value</th> | ||||
<th align="center">Weight</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> The decoder will do the inverse | ||||
operation: having collected weights | operation: having collected weights | |||
of literals from 0 to 4, it knows | of literals from 0 to 4, it knows | |||
the last literal, 5, is present | the last literal, 5, is present | |||
with a non-zero Weight. The Weight | with a nonzero Weight. The Weight | |||
of 5 can be determined by advancing | of 5 can be determined by advancing | |||
to the next power of 2. The sum of | to the next power of 2. The sum of | |||
2^(Weight-1) (excluding 0's) is 15. | 2<sup>(Weight-1)</sup> (excluding 0's ) is 15. | |||
The nearest power of 2 is 16. | The nearest power of 2 is 16. | |||
Therefore, Max_Number_of_Bits = 4 | Therefore, Max_Number_of_Bits = 4 | |||
and Weight[5] = 16 - 15 = 1. </t> | and Weight[5] = 16 - 15 = 1. </t> | |||
<section anchor="huffman_tree_header" numbered="true" toc="default"> | ||||
<section anchor="huffman_tree_header" | <name>Huffman Tree Header</name> | |||
title="Huffman Tree Header"> | <t> This is a single byte value | |||
<t> This is a single byte value | ||||
(0-255), which describes how | (0-255), which describes how | |||
the series of weights is | the series of weights is | |||
encoded. | encoded. | |||
<list style="hanging"> | </t> | |||
<t hangText="headerByte < | <dl newline="false" spacing="normal"> | |||
128:"> | <dt>headerByte < 128:</dt> | |||
<dd> | ||||
The series of weights | The series of weights | |||
is compressed using | is compressed using | |||
FSE (see below). The | FSE (see below). The | |||
length of the | length of the | |||
FSE-compressed series | FSE-compressed series | |||
is equal to headerByte | is equal to headerByte | |||
(0-127). </t> | (0-127). </dd> | |||
<dt>headerByte >= 128:</dt> | ||||
<t hangText="headerByte >= 12 | <dd> | |||
8:"> | <t> | |||
This is a direct | This is a direct | |||
representation, where | representation, where | |||
each Weight is written | each Weight is written | |||
directly as a 4-bit | directly as a 4-bit | |||
field (0-15). They are | field (0-15). They are | |||
encoded forward, 2 | encoded forward, 2 | |||
weights to a byte with | weights to a byte with | |||
the first weight taking | the first weight taking | |||
the top 4 bits and | the top 4 bits and | |||
the second taking the | the second taking the | |||
bottom 4; for example, th e | bottom 4; for example, th e | |||
following operations | following operations | |||
could be used to read | could be used to read | |||
the weights: | the weights: | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
Weight[0] = (Byte[0] >> 4) | Weight[0] = (Byte[0] >> 4) | |||
Weight[1] = (Byte[0] & 0xf), | Weight[1] = (Byte[0] & 0xf), | |||
etc. | etc. | |||
</artwork></figure> | ]]></artwork> | |||
<t> | ||||
The full representation | The full representation | |||
occupies | occupies | |||
ceiling(Number_of_Symbols /2) | ceiling(Number_of_Symbols /2) | |||
bytes, meaning it uses | bytes, meaning it uses | |||
only full bytes even | only full bytes even | |||
if Number_of_Symbols is | if Number_of_Symbols is | |||
odd. Number_of_Symbols | odd. Number_of_Symbols | |||
= headerByte - 127. | = headerByte - 127. | |||
Note that maximum | Note that maximum | |||
Number_of_Symbols is | Number_of_Symbols is | |||
255 - 127 = 128. If any | 255 - 127 = 128. If any | |||
literal | literal | |||
has a value over 128, | has a value over 128, | |||
raw header mode is not | raw header mode is not | |||
possible, and it is | possible, and it is | |||
necessary to use FSE | necessary to use FSE | |||
compression. </t> | compression. </t> | |||
</list></t> | </dd> | |||
</section> | </dl> | |||
</section> | ||||
<section anchor="huffman_tree_fse" | <section anchor="huffman_tree_fse" numbered="true" toc="default"> | |||
title="FSE Compression of Huffman | <name>FSE Compression of Huffman Weights</name> | |||
Weights"> | <t> In this case, the series of | |||
<t> In this case, the series of | ||||
Huffman weights is compressed | Huffman weights is compressed | |||
using FSE compression. It is a | using FSE compression. It is a | |||
single bitstream with two | single bitstream with two | |||
interleaved states, sharing a | interleaved states, sharing a | |||
single distribution table. </t> | single distribution table. </t> | |||
<t> To decode an FSE bitstream, it | ||||
<t> To decode an FSE bitstream, it | ||||
is necessary to know its | is necessary to know its | |||
compressed size. Compressed | compressed size. Compressed | |||
size is provided by headerByte. | size is provided by headerByte. | |||
It's also necessary to know its | It's also necessary to know its | |||
maximum possible decompressed | maximum possible decompressed | |||
size, which is 255, since | size, which is 255, since | |||
literal values span from 0 to | literal values span from 0 to | |||
255, and the last symbol's | 255, and the last symbol's | |||
Weight is not represented. </t> | Weight is not represented. </t> | |||
<t> An FSE bitstream starts by | ||||
<t> An FSE bitstream starts by | ||||
a header, describing | a header, describing | |||
probabilities distribution. It | probabilities distribution. It | |||
will create a decoding table. | will create a decoding table. | |||
For a list of Huffman weights, | For a list of Huffman weights, | |||
the maximum accuracy log is 6 | the maximum accuracy log is 6 | |||
bits. For more details, see | bits. For more details, see | |||
<xref target="comp_fse_table"/>. | <xref target="comp_fse_table" for | |||
</t> | mat="default"/>. | |||
</t> | ||||
<t> The Huffman header compression | <t> The Huffman header compression | |||
uses two states, which share | uses two states, which share | |||
the same FSE distribution | the same FSE distribution | |||
table. | table. | |||
The first state (State1) | The first state (State1) | |||
encodes the even-numbered index | encodes the even-numbered index | |||
symbols, and the second | symbols, and the second | |||
(State2) encodes the odd-numbered | (State2) encodes the odd-numbered | |||
index symbols. State1 is initiali zed | index symbols. State1 is initiali zed | |||
first, and then State2, and | first, and then State2, and | |||
they take turns decoding a | they take turns decoding a | |||
single symbol and updating | single symbol and updating | |||
their state. | their state. | |||
For more details | For more details | |||
on these FSE operations, see | on these FSE operations, see | |||
<xref target="comp_fse"/>. </t> | <xref target="comp_fse" format="d | |||
efault"/>. </t> | ||||
<t> The number of symbols to be | <t> The number of symbols to be | |||
decoded is determined by | decoded is determined by | |||
tracking the bitStream overflow | tracking the bitStream overflow | |||
condition: If updating state | condition: if updating state | |||
after decoding a symbol would | after decoding a symbol would | |||
require more bits than remain | require more bits than remain | |||
in the stream, it is assumed | in the stream, it is assumed | |||
that extra bits are zero. Then, | that extra bits are zero. Then, | |||
symbols for each of the | symbols for each of the | |||
final states are decoded and | final states are decoded and | |||
the process is complete.</t> | the process is complete.</t> | |||
</section> | </section> | |||
<section anchor="huffman_tree_conv" numbered="true" toc="default"> | ||||
<section anchor="huffman_tree_conv" | <name>Conversion from Weights to Huffman Prefix Codes</name> | |||
title="Conversion from Weights to | <t> All present symbols will now | |||
Huffman Prefix Codes"> | ||||
<t> All present symbols will now | ||||
have a Weight value. It is | have a Weight value. It is | |||
possible to transform weights | possible to transform weights | |||
into Number_of_Bits, using | into Number_of_Bits, using | |||
this formula: | this formula: | |||
<figure><artwork> | </t> | |||
<sourcecode type="pseudocode"> | ||||
if Weight > 0 | if Weight > 0 | |||
Number_of_Bits = Max_Number_of_Bits + 1 - Weight | Number_of_Bits = Max_Number_of_Bits + 1 - Weight | |||
else | else | |||
Number_of_Bits = 0 | Number_of_Bits = 0 | |||
</artwork></figure></t> | </sourcecode> | |||
<t> Symbols are sorted by Weight. | ||||
<t> Symbols are sorted by Weight. | ||||
Within the same Weight, symbols | Within the same Weight, symbols | |||
keep natural sequential | keep natural sequential | |||
order. Symbols | order. Symbols | |||
with a Weight of zero are | with a Weight of zero are | |||
removed. Then, starting from | removed. Then, starting from | |||
the | the | |||
lowest Weight, prefix codes | lowest Weight, prefix codes | |||
are distributed in sequential | are distributed in sequential | |||
order. </t> | order. </t> | |||
<t> For example, assume the | ||||
<t> For example, assume the | ||||
following list of weights | following list of weights | |||
has been decoded: | has been decoded: | |||
<figure><artwork> | </t> | |||
+---------+--------+ | ||||
| Literal | Weight | | ||||
+---------+--------+ | ||||
| 0 | 4 | | ||||
+---------+--------+ | ||||
| 1 | 3 | | ||||
+---------+--------+ | ||||
| 2 | 2 | | ||||
+---------+--------+ | ||||
| 3 | 0 | | ||||
+---------+--------+ | ||||
| 4 | 1 | | ||||
+---------+--------+ | ||||
| 5 | 1 | | ||||
+---------+--------+ | ||||
</artwork></figure> </t> | ||||
<t> Sorting by weight and then | <table anchor="decoded-weights"> | |||
<name>Decoded Weights</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Literal</th> | ||||
<th align="center">Weight</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">4</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">3</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">2</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> Sorting by weight and then | ||||
the natural sequential order | the natural sequential order | |||
yields the following | yields the following | |||
distribution: | distribution: | |||
<figure><artwork> | </t> | |||
+---------+--------+----------------+--------------+ | ||||
| Literal | Weight | Number_Of_Bits | Prefix Codes | | ||||
+---------+--------+----------------|--------------+ | ||||
| 3 | 0 | 0 | N/A | | ||||
+---------+--------+----------------|--------------+ | ||||
| 4 | 1 | 4 | 0000 | | ||||
+---------+--------+----------------|--------------+ | ||||
| 5 | 1 | 4 | 0001 | | ||||
+---------+--------+----------------|--------------+ | ||||
| 2 | 2 | 3 | 001 | | ||||
+---------+--------+----------------|--------------+ | ||||
| 1 | 3 | 2 | 01 | | ||||
+---------+--------+----------------|--------------+ | ||||
| 0 | 4 | 1 | 1 | | ||||
+---------+--------+----------------|--------------+ | ||||
</artwork></figure> </t> | ||||
</section> | ||||
</section> | ||||
<section anchor="huffman_coded_streams" | <table anchor="sorting-by-weight"> | |||
title="Huffman-Coded Streams"> | <name>Sorting by Weight</name> | |||
<t> Given a Huffman decoding table, it is | <thead> | |||
<tr> | ||||
<th align="center">Literal</th> | ||||
<th align="center">Weight</th> | ||||
<th align="center">Number_Of_Bits</th> | ||||
<th align="center">Prefix Codes</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">N/A</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0000</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0001</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">2</td> | ||||
<td align="center">3</td> | ||||
<td align="center">001</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">3</td> | ||||
<td align="center">2</td> | ||||
<td align="center">01</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">4</td> | ||||
<td align="center">1</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
</section> | ||||
</section> | ||||
<section anchor="huffman_coded_streams" numbered="true" toc="default"> | ||||
<name>Huffman-Coded Streams</name> | ||||
<t> Given a Huffman decoding table, it is | ||||
possible to decode a Huffman-coded | possible to decode a Huffman-coded | |||
stream. </t> | stream. </t> | |||
<t> Each bitstream must be read backward, starting from the end and go | ||||
<t> Each bitstream must be read backward, | ing up to | |||
which starts from the end and goes up to | ||||
the beginning. Therefore, it is | the beginning. Therefore, it is | |||
necessary to know the size of each | necessary to know the size of each | |||
bitstream. </t> | bitstream. </t> | |||
<t> It is also necessary to know exactly | ||||
<t> It is also necessary to know exactly | ||||
which bit is the last. This is | which bit is the last. This is | |||
detected by a final bit flag: the | detected by a final bit flag: the | |||
highest bit of the last byte is a | highest bit of the last byte is a | |||
final-bit-flag. Consequently, a last | final-bit-flag. Consequently, a last | |||
byte of 0 is not possible. And the | byte of 0 is not possible. And the | |||
final-bit-flag itself is not part of | final-bit-flag itself is not part of | |||
the useful bitstream. Hence, the last | the useful bitstream. Hence, the last | |||
byte contains between 0 and 7 useful | byte contains between 0 and 7 useful | |||
bits. </t> | bits. </t> | |||
<t> Starting from the end, it is possible | ||||
<t> Starting from the end, it is possible | ||||
to read the bitstream in a | to read the bitstream in a | |||
little-endian fashion, keeping track | little-endian fashion, keeping track | |||
of already used bits. Since the | of already used bits. Since the | |||
bitstream is encoded in reverse order, | bitstream is encoded in reverse order, | |||
starting from the end, read symbols in | starting from the end, read symbols in | |||
forward order. </t> | forward order. </t> | |||
<t> For example, if the literal sequence | ||||
<t> For example, if the literal sequence | ||||
"0145" was encoded using the above prefix | "0145" was encoded using the above prefix | |||
code, it would be encoded (in reverse | code, it would be encoded (in reverse | |||
order) as: | order) as: | |||
<figure><artwork> | </t> | |||
+---------+----------+ | ||||
| Symbol | Encoding | | ||||
+---------+----------+ | ||||
| 5 | 0000 | | ||||
+---------+----------+ | ||||
| 4 | 0001 | | ||||
+---------+----------+ | ||||
| 1 | 01 | | ||||
+---------+----------+ | ||||
| 0 | 1 | | ||||
+---------+----------+ | ||||
| Padding | 00001 | | ||||
+---------+----------+ | ||||
</artwork></figure></t> | ||||
<t> This results in the following 2-byte | <table anchor="coded-example"> | |||
<name>Literal Sequence "0145"</name> | ||||
<thead> | ||||
<tr> | ||||
<th align="center">Symbol</th> | ||||
<th align="center">Encoding</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">0000</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">0001</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">01</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">1</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">Padding</td> | ||||
<td align="center">00001</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t> This results in the following 2-byte | ||||
bitstream: | bitstream: | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
00010000 00001101 | 00010000 00001101 | |||
</artwork></figure></t> | ]]></artwork> | |||
<t> Here is an alternative representation | ||||
<t> Here is an alternative representation | ||||
with the symbol codes separated by | with the symbol codes separated by | |||
underscores: | underscores: | |||
<figure><artwork> | </t> | |||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
0001_0000 00001_1_01 | 0001_0000 00001_1_01 | |||
</artwork></figure></t> | ]]></artwork> | |||
<t> Reading the highest Max_Number_of_Bits | ||||
<t> Reading the highest Max_Number_of_Bits | ||||
bits, it's possible to compare the | bits, it's possible to compare the | |||
extracted value to the decoding table, | extracted value to the decoding table, | |||
determining the symbol to decode and | determining the symbol to decode and | |||
number of bits to discard. </t> | number of bits to discard. </t> | |||
<t> The process continues reading up to | ||||
<t> The process continues reading up to | ||||
the required number of symbols per | the required number of symbols per | |||
stream. If a bitstream is not entirely | stream. If a bitstream is not entirely | |||
and exactly consumed, hence reaching | and exactly consumed, hence reaching | |||
exactly its beginning position with | exactly its beginning position with | |||
all bits consumed, the decoding process | all bits consumed, the decoding process | |||
is considered faulty. </t> | is considered faulty. </t> | |||
</section> | </section> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="comp_dict" numbered="true" toc="default"> | ||||
<section anchor="comp_dict" title="Dictionary Format"> | <name>Dictionary Format</name> | |||
<t> Zstandard is compatible with "raw content" | <t> Zstandard is compatible with "raw content" | |||
dictionaries, free of any format restriction, | dictionaries, free of any format restriction, | |||
except that they must be at least 8 bytes. | except that they must be at least 8 bytes. | |||
These dictionaries function as if they were just | These dictionaries function as if they were just | |||
the content part of a formatted dictionary. </t> | the content part of a formatted dictionary. </t> | |||
<t> However, dictionaries created by "zstd --train" | ||||
<t> However, dictionaries created by "zstd --train" | ||||
in the reference implementation follow a specific | in the reference implementation follow a specific | |||
format, described here. </t> | format, described here. </t> | |||
<t> Dictionaries are not included in the compressed | ||||
<t> Dictionaries are not included in the compressed | ||||
content but rather are provided out of band. | content but rather are provided out of band. | |||
That is, the Dictionary_ID identifies which should | That is, the Dictionary_ID identifies which should | |||
be used, but this specification does not describe | be used, but this specification does not describe | |||
the mechanism by which the dictionary is obtained | the mechanism by which the dictionary is obtained | |||
prior to use during compression or | prior to use during compression or | |||
decompression. </t> | decompression. </t> | |||
<t> A dictionary has a size, defined either by a | ||||
<t> A dictionary has a size, defined either by a | ||||
buffer limit or a file size. The general format | buffer limit or a file size. The general format | |||
is: | is: | |||
<figure><artwork> | </t> | |||
+--------------+---------------+----------------+---------+ | ||||
| Magic_Number | Dictionary_ID | Entropy_Tables | Content | | ||||
+--------------+---------------+----------------+---------+ | ||||
</artwork></figure> </t> | ||||
<t> <list style="hanging"> | <table anchor="dictionary"> | |||
<t hangText="Magic_Number:"> 4 bytes ID, | <name>Dictionary General Format</name> | |||
value 0xEC30A437, little-endian | <thead> | |||
format. </t> | <tr> | |||
<th>Magic_Number</th> | ||||
<th>Dictionary_ID</th> | ||||
<th>Entropy_Tables</th> | ||||
<th>Content</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td></td> | ||||
<td></td> | ||||
<td></td> | ||||
<td></td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t hangText="Dictionary_ID:"> 4 bytes, stored | <dl newline="false" spacing="normal"> | |||
<dt>Magic_Number:</dt> | ||||
<dd> 4 bytes ID, | ||||
value 0xEC30A437, little-endian | ||||
format. </dd> | ||||
<dt>Dictionary_ID:</dt> | ||||
<dd> | ||||
<t> 4 bytes, stored | ||||
in little-endian format. Dictionary_ID | in little-endian format. Dictionary_ID | |||
can be any value, except 0 (which | can be any value, except 0 (which | |||
means no Dictionary_ID). It is used by | means no Dictionary_ID). It is used by | |||
decoders to check if they use the | decoders to check if they use the | |||
correct dictionary. If the frame is | correct dictionary. If the frame is | |||
going to be distributed in a private | going to be distributed in a private | |||
environment, any Dictionary_ID can be | environment, any Dictionary_ID can be | |||
used. However, for public distribution | used. However, for public distribution | |||
of compressed frames, the following | of compressed frames, the following | |||
ranges are reserved and shall not be | ranges are reserved and shall not be | |||
used: | used: | |||
<list style="hanging"> | ||||
<?rfc subcompact="yes"?> | ||||
<t>low range: <= 32767</t> | ||||
<t>high range: >= (2^31)</t> | ||||
<?rfc subcompact="no"?> | ||||
</list> | ||||
</t> | </t> | |||
<dl newline="false" spacing="normal"> | ||||
<t hangText="Entropy_Tables:"> Follow the | <dt/> | |||
<dd>low range: <= 32767</dd> | ||||
<dt/> | ||||
<dd>high range: >= (2<sup>31</sup>)</dd> | ||||
</dl> | ||||
</dd> | ||||
<dt>Entropy_Tables:</dt> | ||||
<dd> Follow the | ||||
same format as the tables in | same format as the tables in | |||
compressed blocks. See the relevant | compressed blocks. See the relevant | |||
FSE and Huffman sections for how to | FSE and Huffman sections for how to | |||
decode these tables. They are stored | decode these tables. They are stored | |||
in the following order: Huffman table for | in the following order: Huffman table for | |||
literals, FSE table for offsets, FSE | literals, FSE table for offsets, FSE | |||
table for match lengths, and FSE table | table for match lengths, and FSE table | |||
for literals lengths. These tables | for literals lengths. These tables | |||
populate the Repeat Stats literals | populate the Repeat Stats literals | |||
mode and Repeat distribution mode for | mode and Repeat distribution mode for | |||
sequence decoding. It is finally | sequence decoding. It is finally | |||
followed by 3 offset values, | followed by 3 offset values, | |||
populating repeat offsets (instead of | populating repeat offsets (instead of | |||
using {1,4,8}), stored in order, | using {1,4,8}), stored in order, | |||
4-bytes little-endian each, for a | 4 bytes little-endian each, for a | |||
total of 12 bytes. Each repeat offset | total of 12 bytes. Each repeat offset | |||
must have a value less than the | must have a value less than the | |||
dictionary size. </t> | dictionary size. </dd> | |||
<dt>Content:</dt> | ||||
<t hangText="Content:"> The rest of the | <dd> The rest of the | |||
dictionary is its content. The content | dictionary is its content. The content | |||
acts as a "past" in front of data to be | acts as a "past" in front of data to be | |||
compressed or decompressed, so it can be | compressed or decompressed, so it can be | |||
referenced in sequence commands. As | referenced in sequence commands. As | |||
long as the amount of data decoded | long as the amount of data decoded | |||
from this frame is less than or equal | from this frame is less than or equal | |||
to Window_Size, sequence commands may | to Window_Size, sequence commands may | |||
specify offsets longer than the total | specify offsets longer than the total | |||
length of decoded output so far to | length of decoded output so far to | |||
reference back to the dictionary, | reference back to the dictionary, | |||
even parts of the dictionary with | even parts of the dictionary with | |||
offsets larger than Window_Size. | offsets larger than Window_Size. | |||
After the total output has surpassed | After the total output has surpassed | |||
Window_Size, however, this is no longer | Window_Size, however, this is no longer | |||
allowed, and the dictionary is no | allowed, and the dictionary is no | |||
longer accessible. </t> | longer accessible. </dd> | |||
</list> </t> | </dl> | |||
</section> | </section> | |||
<section anchor="dict_future" numbered="true" toc="default"> | ||||
<section anchor="dict_future" title="Use of Dictionaries"> | <name>Use of Dictionaries</name> | |||
<t> Provisioning for use of dictionaries with zstd is being | <t> Provisioning for use of dictionaries with zstd is being | |||
explored. See, for example, <xref target="DICT-SEC"/>. | explored. See, for example, <xref target="I-D.handte-httpbis | |||
-dict-sec" format="default"/>. | ||||
The likely outcome will be a registry of well-tested | The likely outcome will be a registry of well-tested | |||
dictionaries optimized for different use cases and | dictionaries optimized for different use cases and | |||
identifiers for each, possibly with a private negotiation | identifiers for each, possibly with a private negotiation | |||
mechanism for use of unregistered dictionaries. </t> | mechanism for use of unregistered dictionaries. </t> | |||
<t> To ensure compatibility with the | ||||
<t> To ensure compatibility with the | ||||
future specification of use of dictionaries with zstd | future specification of use of dictionaries with zstd | |||
payloads, especially with MIME, content encoded with the | payloads, especially with MIME, content encoded with the | |||
media type registered here should not use a dictionary. | media type registered here should not use a dictionary. | |||
The exception to this requirement might be a private | The exception to this requirement might be a private | |||
dictionary negotiaton, suggested above, which is not part | dictionary negotiation, suggested above, which is not part | |||
of this specification. </t> | of this specification. </t> | |||
</section> | </section> | |||
<section anchor="iana" numbered="true" toc="default"> | ||||
<section anchor="iana" title="IANA Considerations"> | <name>IANA Considerations</name> | |||
<t> IANA has updated two previously existing registrations and | <t> IANA has updated two previously existing registrations and | |||
made one new registration as described below. </t> | made one new registration as described below. </t> | |||
<section anchor="iana_media_type" numbered="true" toc="default"> | ||||
<section anchor="iana_media_type" | <name>The 'application/zstd' Media Type</name> | |||
title="The 'application/zstd' Media Type"> | <t> The 'application/zstd' media type identifies a | |||
<t> The 'application/zstd' media type identifies a | ||||
block of data that is compressed using zstd | block of data that is compressed using zstd | |||
compression. The data is a stream of bytes as | compression. The data is a stream of bytes as | |||
described in this document. IANA has | described in this document. IANA has | |||
added the following to the "Media Types" | added the following to the "Media Types" | |||
registry: | registry: | |||
<list style="hanging"> | </t> | |||
<t hangText="Type name:"> application </t> | <dl> | |||
<dt>Type name:</dt> | ||||
<t hangText="Subtype name:"> zstd </t> | <dd>application</dd> | |||
<dt>Subtype name:</dt> | ||||
<t hangText="Required parameters:"> N/A </t> | <dd>zstd</dd> | |||
<dt>Required parameters:</dt> | ||||
<t hangText="Optional parameters:"> N/A </t> | <dd>N/A</dd> | |||
<dt>Optional parameters:</dt> | ||||
<t hangText="Encoding considerations:"> | <dd>N/A</dd> | |||
<dt>Encoding considerations:</dt> | ||||
<dd> | ||||
binary | binary | |||
</t> | </dd> | |||
<dt>Security considerations:</dt> | ||||
<t hangText="Security considerations:"> | <dd> | |||
See <xref target="security"/> of | See <xref target="security" format="default"/> of | |||
[this document] | RFC 8878. | |||
</t> | </dd> | |||
<dt>Interoperability considerations:</dt> | ||||
<t hangText="Interoperability considerations:"> | <dd> | |||
N/A | N/A | |||
</t> | </dd> | |||
<dt>Published specification:</dt> | ||||
<t hangText="Published specification:"> | <dd> | |||
[this document] | RFC 8878 | |||
</t> | </dd> | |||
<dt>Applications which use this media type:</dt> | ||||
<t hangText="Applications that use this media typ | <dd> | |||
e:"> | ||||
anywhere data size is an issue | anywhere data size is an issue | |||
</t> | </dd> | |||
<dt>Fragment identifier considerations:</dt> | ||||
<t hangText="Fragment identifier considerations:" | <dd> | |||
> | ||||
No fragment identifiers are defined | No fragment identifiers are defined | |||
for this type. | for this type. | |||
</t> | </dd> | |||
<dt>Additional information:</dt> | ||||
<dd> | ||||
<t><br/></t> | ||||
<dl spacing="compact"> | ||||
<dt>Deprecated alias names for this type:</dt> | ||||
<dd> | ||||
N/A | ||||
</dd> | ||||
<dt>Magic number(s):</dt> | ||||
<dd> | ||||
4 bytes, little-endian format. Value: 0xFD2FB528 | ||||
</dd> | ||||
<t hangText="Additional information:"> | <dt>File extension(s):</dt> | |||
<list style="hanging"> | <dd> | |||
<t hangText="Magic number(s):"> | zst | |||
4 bytes, little-endian fo | </dd> | |||
rmat. Value: 0xFD2FB528 | <dt>Macintosh file type code(s):</dt> | |||
</t> | <dd> | |||
<!--note: changed 'zstd' to 'zst' per author request--> | N/A | |||
<t hangText="File extension(s):"> | </dd> | |||
zst | </dl> | |||
</t> | </dd> | |||
<t hangText="Macintosh file type | ||||
code(s):"> | ||||
N/A | ||||
</t> | ||||
</list> | ||||
</t> | ||||
<t hangText="For further information:"> | <dt>Person & email address to contact for further | |||
See <xref target="ZSTD"/> | information:</dt><dd>Yann Collet <cyan@fb.com></dd> | |||
</t> | ||||
<t hangText="Intended usage:"> | <dt>Intended usage:</dt> | |||
<dd> | ||||
common | common | |||
</t> | </dd> | |||
<dt>Restrictions on usage:</dt> | ||||
<t hangText="Restrictions on usage:"> | <dd> | |||
N/A | N/A | |||
</t> | </dd> | |||
<dt>Author:</dt> | ||||
<t hangText="Author:"> | <dd> | |||
Murray S. Kucherawy | Murray S. Kucherawy | |||
</t> | </dd> | |||
<dt>Change Controller:</dt> | ||||
<t hangText="Change Controller:"> | <dd> | |||
IETF | IETF | |||
</t> | </dd> | |||
<!--note: changed 'yes' to 'no' per author request--> | ||||
<t hangText="Provisional registration:"> | ||||
no | ||||
</t> | ||||
</list> </t> | ||||
</section> | ||||
<section anchor="iana_content_encoding" | <dt>Provisional registration:</dt> | |||
title="Content Encoding"> | <dd> | |||
<t> IANA has added the following entry | no | |||
</dd> | ||||
<dt>For further information:</dt> | ||||
<dd> | ||||
See <xref target="ZSTD" format="default"/ | ||||
> | ||||
</dd> | ||||
</dl> | ||||
</section> | ||||
<section anchor="iana_content_encoding" numbered="true" toc="default"> | ||||
<name>Content Encoding</name> | ||||
<t> IANA has added the following entry | ||||
to the "HTTP Content Coding Registry" | to the "HTTP Content Coding Registry" | |||
within the "Hypertext Transfer Protocol (HTTP) | within the "Hypertext Transfer Protocol (HTTP) | |||
Parameters" registry: | Parameters" registry: | |||
<list style="hanging"> | </t> | |||
<t hangText="Name:"> zstd </t> | <dl newline="false" spacing="normal"> | |||
<dt>Name:</dt> | ||||
<t hangText="Description:"> A stream of bytes | <dd> zstd </dd> | |||
<dt>Description:</dt> | ||||
<dd> A stream of bytes | ||||
compressed using the Zstandard | compressed using the Zstandard | |||
protocol </t> | protocol </dd> | |||
<dt>Reference:</dt> | ||||
<t hangText="Pointer to specification text:"> | <dd> | |||
[this document]</t> | RFC 8878</dd> | |||
</list> </t> | </dl> | |||
</section> | </section> | |||
<section anchor="iana_suffix" numbered="true" toc="default"> | ||||
<section anchor="iana_suffix" | <name>Structured Syntax Suffix</name> | |||
title="Structured Syntax Suffix"> | <t> IANA has registered the following | |||
<t> IANA is requested to register the following | into the "Structured Syntax Suffix" | |||
into the Structured Syntax Suffix registry: | registry: | |||
<list style="hanging"> | ||||
<t hangText="Name:"> Zstandard </t> | ||||
<t hangText="+suffix:"> +zstd </t> | ||||
<t hangText="Encoding Considerations:"> | ||||
binary </t> | ||||
<t hangText="Interoperability Considerations:"> | ||||
N/A </t> | ||||
<t hangText="Fragment Identifier Considerations:" | </t> | |||
> | <dl newline="false" spacing="normal"> | |||
<dt>Name:</dt> | ||||
<dd> Zstandard </dd> | ||||
<dt>+suffix:</dt> | ||||
<dd> +zstd </dd> | ||||
<dt>Encoding Considerations:</dt> | ||||
<dd> | ||||
binary </dd> | ||||
<dt>Interoperability Considerations:</dt> | ||||
<dd> | ||||
N/A </dd> | ||||
<dt>Fragment Identifier Considerations:</dt> | ||||
<dd> | ||||
The syntax and semantics of fragment | The syntax and semantics of fragment | |||
identifiers specified for +zstd should | identifiers specified for +zstd should | |||
be as specified for "application/zstd". | be as specified for 'application/zstd'. | |||
</t> | </dd> | |||
<dt>Security Considerations:</dt> | ||||
<t hangText="Security Considerations:"> | <dd> | |||
See <xref target="security"/> of | See <xref target="security" format="default"/> of | |||
[this document]. </t> | RFC 8878. </dd> | |||
<dt>Contact:</dt> | ||||
<t hangText="Contact:"> | <dd> | |||
Refer to the author for the | Refer to the author for the | |||
application/zstd media type. </t> | 'application/zstd' media type. </dd> | |||
<dt>Author/Change Controller:</dt> | ||||
<t hangText="Author/Change Controller:"> | <dd> | |||
IETF </t> | IETF </dd> | |||
</list> </t> | </dl> | |||
</section> | </section> | |||
<section anchor="iana_dict" numbered="true" toc="default"> | ||||
<section anchor="iana_dict" title="Dictionaries"> | <name>Dictionaries</name> | |||
<t> Work in progress includes | <t> Work in progress includes | |||
development of dictionaries that will optimize | development of dictionaries that will optimize | |||
compression and decompression of particular | compression and decompression of particular | |||
types of data. Specification of such | types of data. Specification of such | |||
dictionaries for public use will necessitate | dictionaries for public use will necessitate | |||
registration of a code point from the reserved | registration of a code point from the reserved | |||
range described in | range described in | |||
<xref target="comp_dictionary_id"/> and its | <xref target="comp_dictionary_id" format="default"/> and its | |||
association with a specific dictionary. </t> | association with a specific dictionary. </t> | |||
<t> At present, there are no such dictionaries | ||||
<t> However, there are at present no such dictionaries | published for public use, so this document | |||
published for public use, so this document makes | has made | |||
no immediate request of IANA to create such a | no immediate request of IANA to create such a | |||
registry. </t> | registry. </t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section anchor="security" numbered="true" toc="default"> | ||||
<name>Security Considerations</name> | ||||
<section anchor="security" title="Security Considerations"> | <t> Any data-compression method involves the reduction of | |||
<t> Any data compression method involves the reduction of | ||||
redundancy in the data. Zstandard is no exception, | redundancy in the data. Zstandard is no exception, | |||
and the usual precautions apply. </t> | and the usual precautions apply. </t> | |||
<t> One should never compress a message whose | ||||
<t> One should never compress a message whose | ||||
content must remain secret with a message generated by | content must remain secret with a message generated by | |||
a third party. Such a compression can be used to guess the | a third party. Such a compression can be used to guess the | |||
content of the secret message through analysis of | content of the secret message through analysis of | |||
entropy reduction. | entropy reduction. | |||
This was demonstrated in the Compression Ratio | This was demonstrated in the Compression Ratio | |||
Info-leak Made Easy (CRIME) attack <xref target="CRIME"/>, for example. </t> | Info-leak Made Easy (CRIME) attack <xref target="CRIME" format="default"/>, f | |||
or example. </t> | ||||
<t> A decoder has to demonstrate capabilities to detect | <t> A decoder has to demonstrate capabilities to detect | |||
and prevent any kind of data tampering in the compressed | and prevent any kind of data tampering in the compressed | |||
frame from triggering system faults, such as reading or | frame from triggering system faults, such as reading or | |||
writing beyond allowed memory ranges. This can be | writing beyond allowed memory ranges. This can be | |||
guaranteed by either the implementation language | guaranteed by either the implementation language | |||
or careful bound checkings. Of particular note is the | or careful bound checkings. Of particular note is the | |||
encoding of Number_of_Sequences values that cause the | encoding of Number_of_Sequences values that cause the | |||
decoder to read into the block header (and beyond), as | decoder to read into the block header (and beyond), as | |||
well as the indication of a Frame_Content_Size that is | well as the indication of a Frame_Content_Size that is | |||
smaller than the actual decompressed data, in an attempt | smaller than the actual decompressed data, in an attempt | |||
to trigger a buffer overflow. It is highly recommended | to trigger a buffer overflow. It is highly recommended | |||
to fuzz-test (i.e., provide invalid, unexpected, or | to fuzz-test (i.e., provide invalid, unexpected, or | |||
random input and verify safe operation of) decoder | random input and verify safe operation of) decoder | |||
implementations to test and harden their capability to | implementations to test and harden their capability to | |||
detect bad frames and deal with them without any adverse | detect bad frames and deal with them without any adverse | |||
system side effect. </t> | system side effect. </t> | |||
<t> An attacker may provide correctly formed compressed frames | ||||
<t> An attacker may provide correctly formed compressed frames | ||||
with unreasonable memory requirements. A decoder must | with unreasonable memory requirements. A decoder must | |||
always control memory requirements and enforce some | always control memory requirements and enforce some | |||
(system-specific) limits in order to protect memory usage | (system-specific) limits in order to protect memory usage | |||
from such scenarios. </t> | from such scenarios. </t> | |||
<t> Compression can be optimized by training a dictionary | ||||
<t> Compression can be optimized by training a dictionary | ||||
on a variety of related content payloads. This dictionary | on a variety of related content payloads. This dictionary | |||
must then be available at the decoder for decompression | must then be available at the decoder for decompression | |||
of the payload to be possible. While this document does | of the payload to be possible. While this document does | |||
not specify how to acquire a dictionary for a given | not specify how to acquire a dictionary for a given | |||
compressed payload, it is worth noting that third-party | compressed payload, it is worth noting that third-party | |||
dictionaries may interact unexpectedly with a decoder, | dictionaries may interact unexpectedly with a decoder, | |||
leading to possible memory or other resource exhaustion | leading to possible memory or other resource-exhaustion | |||
attacks. We expect such topics to be discussed in further | attacks. We expect such topics to be discussed in further | |||
detail in the Security Considerations section of a | detail in the Security Considerations section of a | |||
forthcoming RFC for dictionary acquisition and | forthcoming RFC for dictionary acquisition and | |||
transmission, but highlight this issue now out of an | transmission, but highlight this issue now out of an | |||
abundance of caution. </t> | abundance of caution. </t> | |||
<t> As discussed in <xref target="comp_skippable" format="default"/>, it i | ||||
<t> As discussed in <xref target="comp_skippable"/>, it is | s | |||
possible to store arbitrary user metadata in skippable | possible to store arbitrary user metadata in skippable | |||
frames. While such frames are ignored during decompression | frames. While such frames are ignored during decompression | |||
of the data, they can be used as a watermark to track | of the data, they can be used as a watermark to track | |||
the path of the compressed payload. </t> | the path of the compressed payload. </t> | |||
</section> | </section> | |||
<section anchor="impl" title="Implementation Status"> | ||||
<t> Source code for a C language implementation of a | ||||
Zstandard-compliant library is available at | ||||
<xref target="ZSTD-GITHUB"/>. This implementation is | ||||
considered to be the reference implementation and is | ||||
production ready; it implements the full range of the | ||||
specification. It is routinely tested against security | ||||
hazards and widely deployed within Facebook | ||||
infrastructure. </t> | ||||
<t> The reference version is optimized for speed and is highly | ||||
portable. It has been proven to run safely on multiple | ||||
architectures (e.g., x86, x64, ARM, MIPS, PowerPC, IA64) | ||||
featuring 32- or 64-bit addressing schemes, a little- or | ||||
big-endian storage scheme, a number of different operating | ||||
systems (e.g., UNIX (including Linux, BSD, OS-X, and | ||||
Solaris) and Windows), and a number of compilers (e.g., | ||||
gcc, clang, visual, and icc). </t> | ||||
<t> A comprehensive and current list of known implementations | ||||
can be found at <xref target="ZSTD"/>. </t> | ||||
</section> | ||||
</middle> | ||||
<back> | ||||
<references title="Normative References"> | ||||
<reference anchor="ZSTD" target="http://www.zstd.net"> | ||||
<front> | ||||
<title> Zstandard | ||||
</title> | ||||
<author fullname="Yann Collet"> | ||||
</author> | ||||
<date year="2017"/> | ||||
</front> | ||||
</reference> | ||||
</references> | </middle> | |||
<back> | ||||
<references title="Informative References"> | <displayreference target="I-D.handte-httpbis-dict-sec" to="DICT-SEC"/> | |||
<reference anchor="ANS" | ||||
target="https://arxiv.org/pdf/1311.2540"> | ||||
<front> | ||||
<title> Asymmetric numeral systems: entropy | ||||
coding combining speed of Huffman | ||||
coding with compression rate of | ||||
arithmetic coding | ||||
</title> | ||||
<author initials='J' surname='Duda' fullname='Ja | ||||
rek Duda'/> | ||||
<date month="January" year="2014"/> | ||||
</front> | ||||
</reference> | ||||
<reference anchor="DICT-SEC"> | <references> | |||
<front> | <name>References</name> | |||
<title> Security Considerations Regarding | <references> | |||
Compression Dictionaries </title> | <name>Normative References</name> | |||
<author initials='W' surname='Handte' fullname=' | ||||
W. Handte'/> | ||||
<date month="October" year="2019"/> | ||||
</front> | ||||
<seriesInfo name="(work in progress)" | ||||
value="draft-handte-httpbis-dict-sec"/> | ||||
</reference> | ||||
<reference anchor="LZ4" | <reference anchor="ZSTD" target="http://www.zstd.net"> | |||
target="https://github.com/lz4/lz4/blob/master/doc/lz4_Fr | <front> | |||
ame_format.md"> | <title> Zstandard | |||
<front> | </title> | |||
<title>LZ4 Frame Format Description</title> | <author /> | |||
<author fullname="Yann Collet"/> | </front> | |||
<date month="January" year="2018"/> | </reference> | |||
</front> | </references> | |||
</reference> | ||||
<reference anchor="FSE" | <references> | |||
target="https://github.com/Cyan4973/FiniteStateEntropy/"> | <name>Informative References</name> | |||
<front> | ||||
<title> FiniteStateEntropy | ||||
</title> | ||||
<author fullname="Yann Collet"/> | ||||
<date month="June" year="2018"/> | ||||
</front> | ||||
</reference> | ||||
<reference anchor="CRIME" | <reference anchor="ANS" target="https://arxiv.org/pdf/1311.2540"> | |||
target="https://en.wikipedia.org/w/index.php?title=CRIME& | <front> | |||
amp;oldid=844538656"> | <title> Asymmetric numeral systems: entropy | |||
<front> | coding combining speed of Huffman | |||
<title>CRIME | coding with compression rate of | |||
</title> | arithmetic coding | |||
<author/> | </title> | |||
<date month="June" year="2018"/> | <author initials="J" surname="Duda" fullname="Jarek Duda"/> | |||
</front> | <date month="January" year="2014"/> | |||
</reference> | </front> | |||
</reference> | ||||
<reference anchor="ZSTD-GITHUB" | <reference anchor='I-D.handte-httpbis-dict-sec'> | |||
target="https://github.com/facebook/zstd"> | <front> | |||
<front> | <title>Security Considerations Regarding Compression Dictionaries</title> | |||
<title>zstd | <author initials='F' surname='Handte' fullname='Felix Handte'> | |||
</title> | <organization /> | |||
</author> | ||||
<date month='October' day='29' year='2019' /> | ||||
</front> | ||||
<seriesInfo name='Internet-Draft' value='draft-handte-httpbis-dict-sec-00' /> | ||||
<format type='TXT' | ||||
target='http://www.ietf.org/internet-drafts/draft-handte-httpbis-dict-se | ||||
c-00.txt' /> | ||||
</reference> | ||||
<author fullname="Yann Collet"/> | <reference anchor="LZ4" target="https://github.com/lz4/lz4/blob/master/d | |||
oc/lz4_Frame_format.md"> | ||||
<front> | ||||
<title>LZ4 Frame Format Description</title> | ||||
<author /> | ||||
<date month="January" year="2019"/> | ||||
</front> | ||||
<refcontent>commit ec735ac</refcontent> | ||||
</reference> | ||||
<date month="August" year="2018"/> | <reference anchor="FSE" target="https://github.com/Cyan4973/FiniteStateE | |||
</front> | ntropy/"> | |||
</reference> | <front> | |||
<title> FiniteStateEntropy | ||||
</title> | ||||
<author /> | ||||
<date month="July" year="2020"/> | ||||
</front> | ||||
<refcontent>commit 12a533a</refcontent> | ||||
</reference> | ||||
&RFC1952; | <reference anchor="CRIME" target="https://en.wikipedia.org/w/index.php?t | |||
itle=CRIME&oldid=844538656"> | ||||
<front> | ||||
<title>CRIME | ||||
</title> | ||||
<author/> | ||||
<date month="June" year="2018"/> | ||||
</front> | ||||
</reference> | ||||
<reference anchor="XXHASH" target="http://www.xxhash.org"> | <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refer | |||
<front> | ence.RFC.1952.xml"/> | |||
<title> XXHASH Algorithm | ||||
</title> | ||||
<author fullname="(unknown author)"/> | <reference anchor="XXHASH" target="http://www.xxhash.org"> | |||
<front> | ||||
<title>xxHash | ||||
</title> | ||||
<author /> | ||||
</front> | ||||
</reference> | ||||
<date year="2017"/> | <reference anchor="Err5786" target="https://www.rfc-editor.org/errata/eid5786"> | |||
</front> | <front> | |||
</reference> | <title>Erratum ID 5786</title> | |||
<author><organization>RFC Errata</organization></author> | ||||
</front> | ||||
<refcontent>RFC 8478</refcontent> | ||||
</reference> | ||||
</references> | <reference anchor="Err6303" target="https://www.rfc-editor.org/errata/eid6303"> | |||
<front> | ||||
<title>Erratum ID 6303</title> | ||||
<author><organization>RFC Errata</organization></author> | ||||
</front> | ||||
<refcontent>RFC 8478</refcontent> | ||||
</reference> | ||||
<section anchor="app_tables" | </references> | |||
title="Decoding Tables for Predefined Codes"> | </references> | |||
<t> This appendix contains FSE decoding tables for the | <section anchor="app_tables" numbered="true" toc="default"> | |||
predefined literal length, match length, and offset codes. | <name>Decoding Tables for Predefined Codes</name> | |||
<t> This appendix contains FSE decoding tables for the | ||||
predefined literals length, match length, and offset codes. | ||||
The tables have been constructed using the algorithm as | The tables have been constructed using the algorithm as | |||
given above in <xref target="comp_fse_table"/>. The tables he re can be used as examples | given above in <xref target="comp_fse_table" format="default" />. The tables here can be used as examples | |||
to crosscheck that an implementation has built its decoding | to crosscheck that an implementation has built its decoding | |||
tables correctly. </t> | tables correctly. </t> | |||
<section anchor="app_tables_literal" numbered="true" toc="default"> | ||||
<name>Literals Length Code Table</name> | ||||
<section anchor="app_tables_literal" | <table anchor="lit-length-code"> | |||
title="Literal Length Code Table"> | <name>Literals Length Code</name> | |||
<t> <figure><artwork> | <thead> | |||
+-------+--------+----------------+------+ | <tr> | |||
| State | Symbol | Number_Of_Bits | Base | | <th>State</th> | |||
+-------+--------+----------------+------+ | <th>Symbol</th> | |||
| 0 | 0 | 0 | 0 | | <th>Number_Of_Bits</th> | |||
+-------+--------+----------------+------+ | <th>Base</th> | |||
| 0 | 0 | 4 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | </thead> | |||
| 1 | 0 | 4 | 16 | | <tbody> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 2 | 1 | 5 | 32 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 3 | 3 | 5 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 4 | 4 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 5 | 6 | 5 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 6 | 7 | 5 | 0 | | <td align="center">4</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 7 | 9 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 8 | 10 | 5 | 0 | | <td align="center">1</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 9 | 12 | 5 | 0 | | <td align="center">4</td> | |||
+-------+--------+----------------+------+ | <td align="center">16</td> | |||
| 10 | 14 | 6 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 11 | 16 | 5 | 0 | | <td align="center">2</td> | |||
+-------+--------+----------------+------+ | <td align="center">1</td> | |||
| 12 | 18 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 13 | 19 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 14 | 21 | 5 | 0 | | <td align="center">3</td> | |||
+-------+--------+----------------+------+ | <td align="center">3</td> | |||
| 15 | 22 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 16 | 24 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 17 | 25 | 5 | 32 | | <td align="center">4</td> | |||
+-------+--------+----------------+------+ | <td align="center">4</td> | |||
| 18 | 26 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 19 | 27 | 6 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 20 | 29 | 6 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 21 | 31 | 6 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 22 | 0 | 4 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 23 | 1 | 4 | 0 | | <td align="center">6</td> | |||
+-------+--------+----------------+------+ | <td align="center">7</td> | |||
| 24 | 2 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 25 | 4 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 26 | 5 | 5 | 0 | | <td align="center">7</td> | |||
+-------+--------+----------------+------+ | <td align="center">9</td> | |||
| 27 | 7 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 28 | 8 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 29 | 10 | 5 | 32 | | <td align="center">8</td> | |||
+-------+--------+----------------+------+ | <td align="center">10</td> | |||
| 30 | 11 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 31 | 13 | 6 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 32 | 16 | 5 | 32 | | <td align="center">9</td> | |||
+-------+--------+----------------+------+ | <td align="center">12</td> | |||
| 33 | 17 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 34 | 19 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 35 | 20 | 5 | 0 | | <td align="center">10</td> | |||
+-------+--------+----------------+------+ | <td align="center">14</td> | |||
| 36 | 22 | 5 | 32 | | <td align="center">6</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 37 | 23 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 38 | 25 | 4 | 0 | | <td align="center">11</td> | |||
+-------+--------+----------------+------+ | <td align="center">16</td> | |||
| 39 | 25 | 4 | 16 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 40 | 26 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 41 | 28 | 6 | 0 | | <td align="center">12</td> | |||
+-------+--------+----------------+------+ | <td align="center">18</td> | |||
| 42 | 30 | 6 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 43 | 0 | 4 | 48 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 44 | 1 | 4 | 16 | | <td align="center">13</td> | |||
+-------+--------+----------------+------+ | <td align="center">19</td> | |||
| 45 | 2 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 46 | 3 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 47 | 5 | 5 | 32 | | <td align="center">14</td> | |||
+-------+--------+----------------+------+ | <td align="center">21</td> | |||
| 48 | 6 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 49 | 8 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 50 | 9 | 5 | 32 | | <td align="center">15</td> | |||
+-------+--------+----------------+------+ | <td align="center">22</td> | |||
| 51 | 11 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 52 | 12 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 53 | 15 | 6 | 0 | | <td align="center">16</td> | |||
+-------+--------+----------------+------+ | <td align="center">24</td> | |||
| 54 | 17 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 55 | 18 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 56 | 20 | 5 | 32 | | <td align="center">17</td> | |||
+-------+--------+----------------+------+ | <td align="center">25</td> | |||
| 57 | 21 | 5 | 32 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 58 | 23 | 5 | 32 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 59 | 24 | 5 | 32 | | <td align="center">18</td> | |||
+-------+--------+----------------+------+ | <td align="center">26</td> | |||
| 60 | 35 | 6 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 61 | 34 | 6 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 62 | 33 | 6 | 0 | | <td align="center">19</td> | |||
+-------+--------+----------------+------+ | <td align="center">27</td> | |||
| 63 | 32 | 6 | 0 | | <td align="center">6</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
</artwork></figure></t> | </tr> | |||
</section> | <tr> | |||
<td align="center">20</td> | ||||
<section anchor="app_tables_match" | <td align="center">29</td> | |||
title="Match Length Code Table"> | <td align="center">6</td> | |||
<t> <figure><artwork> | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| State | Symbol | Number_Of_Bits | Base | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">21</td> | |||
| 0 | 0 | 0 | 0 | | <td align="center">31</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 0 | 0 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 1 | 1 | 4 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">22</td> | |||
| 2 | 2 | 5 | 32 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | <td align="center">4</td> | |||
| 3 | 3 | 5 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 4 | 5 | 5 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">23</td> | |||
| 5 | 6 | 5 | 0 | | <td align="center">1</td> | |||
+-------+--------+----------------+------+ | <td align="center">4</td> | |||
| 6 | 8 | 5 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 7 | 10 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">24</td> | |||
| 8 | 13 | 6 | 0 | | <td align="center">2</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 9 | 16 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 10 | 19 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">25</td> | |||
| 11 | 22 | 6 | 0 | | <td align="center">4</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 12 | 25 | 6 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 13 | 28 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">26</td> | |||
| 14 | 31 | 6 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 15 | 33 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 16 | 35 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">27</td> | |||
| 17 | 37 | 6 | 0 | | <td align="center">7</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 18 | 39 | 6 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 19 | 41 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">28</td> | |||
| 20 | 43 | 6 | 0 | | <td align="center">8</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 21 | 45 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 22 | 1 | 4 | 16 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">29</td> | |||
| 23 | 2 | 4 | 0 | | <td align="center">10</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 24 | 3 | 5 | 32 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 25 | 4 | 5 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">30</td> | |||
| 26 | 6 | 5 | 32 | | <td align="center">11</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 27 | 7 | 5 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 28 | 9 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">31</td> | |||
| 29 | 12 | 6 | 0 | | <td align="center">13</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 30 | 15 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 31 | 18 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 32 | 21 | 6 | 0 | | <td align="center">16</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 33 | 24 | 6 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 34 | 27 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">33</td> | |||
| 35 | 30 | 6 | 0 | | <td align="center">17</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 36 | 32 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 37 | 34 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">34</td> | |||
| 38 | 36 | 6 | 0 | | <td align="center">19</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 39 | 38 | 6 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 40 | 40 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">35</td> | |||
| 41 | 42 | 6 | 0 | | <td align="center">20</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 42 | 44 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 43 | 1 | 4 | 32 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">36</td> | |||
| 44 | 1 | 4 | 48 | | <td align="center">22</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 45 | 2 | 4 | 16 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 46 | 4 | 5 | 32 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">37</td> | |||
| 47 | 5 | 5 | 32 | | <td align="center">23</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 48 | 7 | 5 | 32 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 49 | 8 | 5 | 32 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">38</td> | |||
| 50 | 11 | 6 | 0 | | <td align="center">25</td> | |||
+-------+--------+----------------+------+ | <td align="center">4</td> | |||
| 51 | 14 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 52 | 17 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">39</td> | |||
| 53 | 20 | 6 | 0 | | <td align="center">25</td> | |||
+-------+--------+----------------+------+ | <td align="center">4</td> | |||
| 54 | 23 | 6 | 0 | | <td align="center">16</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 55 | 26 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">40</td> | |||
| 56 | 29 | 6 | 0 | | <td align="center">26</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 57 | 52 | 6 | 0 | | <td align="center">32</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 58 | 51 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">41</td> | |||
| 59 | 50 | 6 | 0 | | <td align="center">28</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 60 | 49 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
| 61 | 48 | 6 | 0 | | <tr> | |||
+-------+--------+----------------+------+ | <td align="center">42</td> | |||
| 62 | 47 | 6 | 0 | | <td align="center">30</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 63 | 46 | 6 | 0 | | <td align="center">0</td> | |||
+-------+--------+----------------+------+ | </tr> | |||
</artwork></figure></t> | <tr> | |||
</section> | <td align="center">43</td> | |||
<td align="center">0</td> | ||||
<section anchor="app_tables_offset" | <td align="center">4</td> | |||
title="Offset Code Table"> | <td align="center">48</td> | |||
<t> <figure><artwork> | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| State | Symbol | Number_Of_Bits | Base | | <td align="center">44</td> | |||
+-------+--------+----------------+------+ | <td align="center">1</td> | |||
| 0 | 0 | 0 | 0 | | <td align="center">4</td> | |||
+-------+--------+----------------+------+ | <td align="center">16</td> | |||
| 0 | 0 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 1 | 6 | 4 | 0 | | <td align="center">45</td> | |||
+-------+--------+----------------+------+ | <td align="center">2</td> | |||
| 2 | 9 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 3 | 15 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 4 | 21 | 5 | 0 | | <td align="center">46</td> | |||
+-------+--------+----------------+------+ | <td align="center">3</td> | |||
| 5 | 3 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 6 | 7 | 4 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 7 | 12 | 5 | 0 | | <td align="center">47</td> | |||
+-------+--------+----------------+------+ | <td align="center">5</td> | |||
| 8 | 18 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 9 | 23 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 10 | 5 | 5 | 0 | | <td align="center">48</td> | |||
+-------+--------+----------------+------+ | <td align="center">6</td> | |||
| 11 | 8 | 4 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 12 | 14 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 13 | 20 | 5 | 0 | | <td align="center">49</td> | |||
+-------+--------+----------------+------+ | <td align="center">8</td> | |||
| 14 | 2 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 15 | 7 | 4 | 16 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 16 | 11 | 5 | 0 | | <td align="center">50</td> | |||
+-------+--------+----------------+------+ | <td align="center">9</td> | |||
| 17 | 17 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 18 | 22 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 19 | 4 | 5 | 0 | | <td align="center">51</td> | |||
+-------+--------+----------------+------+ | <td align="center">11</td> | |||
| 20 | 8 | 4 | 16 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 21 | 13 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 22 | 19 | 5 | 0 | | <td align="center">52</td> | |||
+-------+--------+----------------+------+ | <td align="center">12</td> | |||
| 23 | 1 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 24 | 6 | 4 | 16 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 25 | 10 | 5 | 0 | | <td align="center">53</td> | |||
+-------+--------+----------------+------+ | <td align="center">15</td> | |||
| 26 | 16 | 5 | 0 | | <td align="center">6</td> | |||
+-------+--------+----------------+------+ | <td align="center">0</td> | |||
| 27 | 28 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 28 | 27 | 5 | 0 | | <td align="center">54</td> | |||
+-------+--------+----------------+------+ | <td align="center">17</td> | |||
| 29 | 26 | 5 | 0 | | <td align="center">5</td> | |||
+-------+--------+----------------+------+ | <td align="center">32</td> | |||
| 30 | 25 | 5 | 0 | | </tr> | |||
+-------+--------+----------------+------+ | <tr> | |||
| 31 | 24 | 5 | 0 | | <td align="center">55</td> | |||
+-------+--------+----------------+------+ | <td align="center">18</td> | |||
</artwork></figure></t> | <td align="center">5</td> | |||
</section> | <td align="center">32</td> | |||
</section> | </tr> | |||
<tr> | ||||
<td align="center">56</td> | ||||
<td align="center">20</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">57</td> | ||||
<td align="center">21</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">58</td> | ||||
<td align="center">23</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">59</td> | ||||
<td align="center">24</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">60</td> | ||||
<td align="center">35</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">61</td> | ||||
<td align="center">34</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">62</td> | ||||
<td align="center">33</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">63</td> | ||||
<td align="center">32</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<section anchor="changes" title="Changes Since RFC8478"> | </tbody> | |||
<t> The following are the changes in this document relative | </table> | |||
to RFC 8478: | ||||
<list style="symbols"> | </section> | |||
<t> Apply erratum #5786. </t> | <section anchor="app_tables_match" numbered="true" toc="default"> | |||
<name>Match Length Code Table</name> | ||||
<t> Clarify forward compatibility regarding | <table anchor="match-length"> | |||
dictionaries. </t> | <name>Match Length Code Table</name> | |||
<thead> | ||||
<tr> | ||||
<th>State</th> | ||||
<th>Symbol</th> | ||||
<th>Number_Of_Bits</th> | ||||
<th>Base</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">2</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">3</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">5</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">6</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">6</td> | ||||
<td align="center">8</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">7</td> | ||||
<td align="center">10</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">8</td> | ||||
<td align="center">13</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">9</td> | ||||
<td align="center">16</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">10</td> | ||||
<td align="center">19</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">11</td> | ||||
<td align="center">22</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">12</td> | ||||
<td align="center">25</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">13</td> | ||||
<td align="center">28</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">14</td> | ||||
<td align="center">31</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">15</td> | ||||
<td align="center">33</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">16</td> | ||||
<td align="center">35</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">17</td> | ||||
<td align="center">37</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">18</td> | ||||
<td align="center">39</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">19</td> | ||||
<td align="center">41</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">20</td> | ||||
<td align="center">43</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">21</td> | ||||
<td align="center">45</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">22</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">23</td> | ||||
<td align="center">2</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">24</td> | ||||
<td align="center">3</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">25</td> | ||||
<td align="center">4</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">26</td> | ||||
<td align="center">6</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">27</td> | ||||
<td align="center">7</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">28</td> | ||||
<td align="center">9</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">29</td> | ||||
<td align="center">12</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">30</td> | ||||
<td align="center">15</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">31</td> | ||||
<td align="center">18</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">32</td> | ||||
<td align="center">21</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">33</td> | ||||
<td align="center">24</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">34</td> | ||||
<td align="center">27</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">35</td> | ||||
<td align="center">30</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">36</td> | ||||
<td align="center">32</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">37</td> | ||||
<td align="center">34</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">38</td> | ||||
<td align="center">36</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">39</td> | ||||
<td align="center">38</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">40</td> | ||||
<td align="center">40</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">41</td> | ||||
<td align="center">42</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">42</td> | ||||
<td align="center">44</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">43</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">44</td> | ||||
<td align="center">1</td> | ||||
<td align="center">4</td> | ||||
<td align="center">48</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">45</td> | ||||
<td align="center">2</td> | ||||
<td align="center">4</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">46</td> | ||||
<td align="center">4</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">47</td> | ||||
<td align="center">5</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">48</td> | ||||
<td align="center">7</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">49</td> | ||||
<td align="center">8</td> | ||||
<td align="center">5</td> | ||||
<td align="center">32</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">50</td> | ||||
<td align="center">11</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">51</td> | ||||
<td align="center">14</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">52</td> | ||||
<td align="center">17</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">53</td> | ||||
<td align="center">20</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">54</td> | ||||
<td align="center">23</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">55</td> | ||||
<td align="center">26</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">56</td> | ||||
<td align="center">29</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">57</td> | ||||
<td align="center">52</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">58</td> | ||||
<td align="center">51</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">59</td> | ||||
<td align="center">50</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">60</td> | ||||
<td align="center">49</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">61</td> | ||||
<td align="center">48</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">62</td> | ||||
<td align="center">47</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">63</td> | ||||
<td align="center">46</td> | ||||
<td align="center">6</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<t> Clarify application of Block_Maximum_Size. </t> | </tbody> | |||
</table> | ||||
<t> Add structured media type suffix registration. </t> | </section> | |||
<section anchor="app_tables_offset" numbered="true" toc="default"> | ||||
<name>Offset Code Table</name> | ||||
<t> Clarify that the content checksum is always | <table anchor="offset-code"> | |||
4 bytes. </t> | <name>Offset Code</name> | |||
<thead> | ||||
<tr> | ||||
<th>State</th> | ||||
<th>Symbol</th> | ||||
<th>Number_Of_Bits</th> | ||||
<th>Base</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">0</td> | ||||
<td align="center">0</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">1</td> | ||||
<td align="center">6</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">2</td> | ||||
<td align="center">9</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">3</td> | ||||
<td align="center">15</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">4</td> | ||||
<td align="center">21</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">5</td> | ||||
<td align="center">3</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">6</td> | ||||
<td align="center">7</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">7</td> | ||||
<td align="center">12</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">8</td> | ||||
<td align="center">18</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">9</td> | ||||
<td align="center">23</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">10</td> | ||||
<td align="center">5</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">11</td> | ||||
<td align="center">8</td> | ||||
<td align="center">4</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">12</td> | ||||
<td align="center">14</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">13</td> | ||||
<td align="center">20</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">14</td> | ||||
<td align="center">2</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">15</td> | ||||
<td align="center">7</td> | ||||
<td align="center">4</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">16</td> | ||||
<td align="center">11</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">17</td> | ||||
<td align="center">17</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">18</td> | ||||
<td align="center">22</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">19</td> | ||||
<td align="center">4</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">20</td> | ||||
<td align="center">8</td> | ||||
<td align="center">4</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">21</td> | ||||
<td align="center">13</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">22</td> | ||||
<td align="center">19</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">23</td> | ||||
<td align="center">1</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">24</td> | ||||
<td align="center">6</td> | ||||
<td align="center">4</td> | ||||
<td align="center">16</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">25</td> | ||||
<td align="center">10</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">26</td> | ||||
<td align="center">16</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">27</td> | ||||
<td align="center">28</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">28</td> | ||||
<td align="center">27</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">29</td> | ||||
<td align="center">26</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">30</td> | ||||
<td align="center">25</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<tr> | ||||
<td align="center">31</td> | ||||
<td align="center">24</td> | ||||
<td align="center">5</td> | ||||
<td align="center">0</td> | ||||
</tr> | ||||
<t> Clarify handling of reserved and corrupt | </tbody> | |||
inputs. </t> | </table> | |||
<t> Add fragment identifier considerations to the | </section> | |||
media type registration. </t> | </section> | |||
</list> </t> | <section anchor="changes" numbered="true" toc="default"> | |||
</section> | <name>Changes since RFC 8478</name> | |||
<t> The following are the changes in this document relative | ||||
to RFC 8478:</t> | ||||
<section anchor="ack" title="Acknowledgments"> | <ul spacing="normal"> | |||
<t> zstd was developed by Yann Collet. </t> | ||||
<t> Felix Handte and Nick Terrell provided feedback that went | <li> Applied errata <xref target="Err5786" format="default"/> and <xref tar | |||
into this revision and RFC 8478. RFC 8478 also received | get="Err6303" format="default"/>. </li> | |||
contributions from Bobo Bose-Kolanu, Kyle Nekritz, and | ||||
David Schleimer. </t> | ||||
</section> | ||||
</back> | <li> | |||
Clarified forward compatibility regarding dictionaries. </li> | ||||
<li> | ||||
Clarified application of Block_Maximum_Size. </li> | ||||
<li> Added | ||||
structured media type suffix registration. </li> | ||||
<li> Clarified | ||||
that the content checksum is always 4 bytes. </li> | ||||
<li> Clarified | ||||
handling of reserved and corrupt inputs. </li> | ||||
<li> Added fragment | ||||
identifier considerations to the media type registration. </li> | ||||
</ul> | ||||
</section> | ||||
</rfc> | <section anchor="ack" numbered="false" | |||
toc="default"> <name>Acknowledgments</name> | ||||
<t>zstd was | ||||
developed by <contact fullname="Yann Collet"/>. </t> <t><contact fullname= | ||||
"Felix | ||||
Handte"/> and <contact fullname="Nick Terrell"/> provided | ||||
feedback that went into this revision and RFC 8478. RFC 8478 | ||||
also received contributions from <contact fullname="Bobo | ||||
Bose-Kolanu" />, <contact fullname="Kyle Nekritz" />, and | ||||
<contact fullname="David Schleimer"/>. </t> </section> </back> | ||||
</rfc> | ||||
End of changes. 377 change blocks. | ||||
2472 lines changed or deleted | 3514 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |