<?xmlversion="1.0" encoding="US-ASCII"?>version='1.0' encoding='utf-8'?> <!DOCTYPERFC SYSTEM "rfc2629.dtd"rfc [ <!ENTITYRFC4648 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml">nbsp " "> <!ENTITYRFC8174 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">zwsp "​"> <!ENTITYRFC2119 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">nbhy "‑"> <!ENTITY wj "⁠"> ]><?xml-stylesheet type='text/xsl' href='RFC2629.xslt' ?> <?rfc compact="yes"?> <?rfc toc="yes"?> <?rfc symrefs="yes"?> <?rfc sortrefs="yes"?> <!-- Expand crefs and put them inline --> <?rfc comments='yes' ?> <?rfc inline='yes' ?><rfc number="9285" docName="draft-faltstrom-base45-12" xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" submissionType="IETF"category="info">category="info" obsoletes="" updates="" consensus="true" xml:lang="en" tocInclude="true" symRefs="true" sortRefs="true" version="3"> <!-- xml2rfc v2v3 conversion 3.12.10 --> <front> <title abbrev="Base45"> The Base45 Data Encoding </title> <seriesInfo name="RFC" value="9285" /> <author fullname="PatrikFaltstrom"Fältström" initials="P."surname="Faltstrom"> <organization abbrev="Netnod">Netnod</organization>surname="Fältström"> <organization>Netnod</organization> <address> <email>paf@netnod.se</email> </address> </author> <author fullname="Fredrik Ljunggren" initials="F." surname="Ljunggren"><organization abbrev="Kirei">Kirei</organization><organization>Kirei</organization> <address> <email>fredrik@kirei.se</email> </address> </author> <author fullname="Dirk-Willem van Gulik"initials="D."initials="D.W." surname="van Gulik"><organization abbrev="Webweaving">Webweaving</organization><organization>Webweaving</organization> <address> <email>dirkx@webweaving.org</email> </address> </author> <datemonth="June" year="2022" day="16"/> <area>Operations</area>month="August" year="2022"/> <keyword>BASE45</keyword> <abstract> <t> This document describes the Base45 encodingschemescheme, which is built upon the Base64,Base32Base32, and Base16 encoding schemes. </t> </abstract> </front> <middle> <section anchor="intro"title="Introduction">numbered="true" toc="default"> <name>Introduction</name> <t> AQR-codeQR code is used to encode text as a graphical image. Depending on the characters used in thetexttext, various encoding options for aQR-codeQR code exist,e.g.e.g., Numeric,AlphanumericAlphanumeric, and Byte mode. Even in Bytemodemode, a typicalQR-codeQR code reader tries to interpret a byte sequence asatext encoded in UTF-8 or ISO/IEC8859-1 encoded text.8859-1. Thus,QR-codesQR codes cannot be used to encode arbitrary binary data directly. Such data has to be converted into an appropriate text before that text could be encoded as aQR-code.QR code. Compared to already established Base64,Base32Base32, and Base16 encodingschemes,schemes that are described in <xreftarget="RFC4648">RFC 4648</xref>,target="RFC4648" format="default"></xref>, the Base45 scheme described in this documentofferoffers a more compactQR-codeQR code encoding. </t> <t> One important difference from those others and Base45 is the key table and that the padding with '=' is not required. </t> </section> <sectiontitle="Conventionsnumbered="true" toc="default"> <name>Conventions Used in ThisDocument">Document</name> <t> The key words"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY","<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and"OPTIONAL""<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xreftarget="RFC2119" />target="RFC2119"/> <xreftarget="RFC8174" />target="RFC8174"/> when, and only when, they appear in all capitals, as shown here. </t> </section> <sectiontitle="Interpretationnumbered="true" toc="default"> <name>Interpretation of EncodedData">Data</name> <t> Encoded data is to be interpreted as described in <xreftarget="RFC4648">RFC 4648</xref>target="RFC4648" format="default"></xref> with the exception that a different alphabet is selected. </t> </section> <sectiontitle="Thenumbered="true" toc="default"> <name>The Base45Encoding">Encoding</name> <t> QR codes have a limited ability to store binary data. Inpracticepractice, binary data have to be encoded in characters according to one of the modes already defined in the standard for QR codes. The easiest mode to use in called Alphanumeric mode (seesectionSection 7.3.4 and Table 2 of <xreftarget="ISO18004">ISO/IEC 18004:2015</xref>).target="ISO18004" format="default"></xref>. Unfortunately Alphanumeric mode uses 45 different characters which implies neither Base32 nor Base64 are very effective encodings. </t> <t> A 45-character subset of US-ASCII is used; the 45 characters usable in a QR code in Alphanumeric mode (seesectionSection 7.3.4 and Table 2 of <xreftarget="ISO18004">ISO/IEC 18004:2015</xref>).target="ISO18004" format="default"></xref>). Base45 encodes 2 bytes in 3 characters, compared to Base64, which encodes 3 bytes in 4 characters. </t> <t> For encoding, two bytes [a, b]MUST<bcp14>MUST</bcp14> be interpreted as a number n in base 256, i.e. as an unsigned integer over 16 bits so that the number n =(a*256)(a * 256) + b. </t> <t> This number n is converted to base 45 [c, d, e] so that n = c +(d*45)(d * 45) +(e*45*45).(e * 45 * 45). Note the order of c, d and e which are chosen so that the left-most [c] is the least significant. </t> <t> The values c, d and e are then looked up inTable 1<xref target="table1"/> to produce a three character string. The process is reversed when decoding. </t> <t> For encoding a single byte [a], itMUST<bcp14>MUST</bcp14> be interpreted as a base 256 number, i.e. as an unsigned integer over 8 bits. That integerMUST<bcp14>MUST</bcp14> be converted to base 45 [c d] so that a = c +(45*d).(45 * d). The values c and d are then looked up inTable 1<xref target="table1"/> to produce atwo charactertwo-character string. </t> <t> A byte string [a b c d ... x y z] with arbitrary content and arbitrary lengthMUST<bcp14>MUST</bcp14> be encoded as follows: From left to right pairs of bytesMUST<bcp14>MUST</bcp14> be encoded as described above. If the number of bytes is even, then the encoded form is a string with a lengthwhichthat is evenly divisible by 3. If the number of bytes is odd, then the last (rightmost) byteMUST<bcp14>MUST</bcp14> be encoded on two characters as described above. </t> <t> For decoding a Base45 encoded string the inverse operations are performed. </t> <sectiontitle="When to, and not to, use Base45">numbered="true" toc="default"> <name>When to Use and Not Use Base45</name> <t> If binary data is to be stored in aQR-Code,QR code, the suggested mechanism is to use the Alphanumeric mode that uses 11 bits for 2 characters as defined insectionSection 7.3.4inof <xreftarget="ISO18004">ISO/IEC 18004:2015</xref>.target="ISO18004" format="default"></xref>. TheECIExtended Channel Interpretation (ECI) mode indicator for this encoding is 0010. </t> <t> On the other hand if the data is to be sent via some other transport, a transport encoding suitable for that transport should be used instead of Base45. For example, it is not recommended to first encode data in Base45 and then encode the resulting string in Base64 if the data is to be sent via email. Instead, the Base45 encoding should be removed, and the data itself should be encoded in Base64. </t> </section> <sectiontitle="The alphabet usednumbered="true" toc="default"> <name>The Alphabet Used inBase45">Base45</name> <t> The Alphanumeric mode is defined to use 45 characters as specified in this alphabet. </t><t> <figure><artwork> Table 1: The<table anchor="table1"> <name>The Base45Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 00 0 12 C 24 O 36 Space 01 1 13 D 25 P 37 $ 02 2 14 E 26 Q 38 % 03 3 15 F 27 R 39 * 04 4 16 G 28 S 40 + 05 5 17 H 29 T 41 - 06 6 18 I 30 U 42 . 07 7 19 J 31 V 43 / 08 8 20 K 32 W 44 : 09 9 21 L 33 X 10 A 22 M 34 Y 11 B 23 N 35 Z </artwork></figure> </t>Alphabet</name> <thead> <tr> <th align="right">Value</th><th>Encoding</th> <th align="right">Value</th><th>Encoding</th> <th align="right">Value</th><th>Encoding</th> <th align="right">Value</th><th>Encoding</th> </tr> </thead> <tbody> <tr> <td align="right">00</td><td>0</td> <td align="right">12</td><td>C</td> <td align="right">24</td><td>O</td> <td align="right">36</td><td>Space</td> </tr> <tr> <td align="right">01</td><td>1</td> <td align="right">13</td><td>D</td> <td align="right">25</td><td>P</td> <td align="right">37</td><td>$</td> </tr> <tr> <td align="right">02</td><td>2</td> <td align="right">14</td><td>E</td> <td align="right">26</td><td>Q</td> <td align="right">38</td><td>%</td> </tr> <tr> <td align="right">03</td><td>3</td> <td align="right">15</td><td>F</td> <td align="right">27</td><td>R</td> <td align="right">39</td><td>*</td> </tr> <tr> <td align="right">04</td><td>4</td> <td align="right">16</td><td>G</td> <td align="right">28</td><td>S</td> <td align="right">40</td><td>+</td> </tr> <tr> <td align="right">05</td><td>5</td> <td align="right">17</td><td>H</td> <td align="right">29</td><td>T</td> <td align="right">41</td><td>-</td> </tr> <tr> <td align="right">06</td><td>6</td> <td align="right">18</td><td>I</td> <td align="right">30</td><td>U</td> <td align="right">42</td><td>.</td> </tr> <tr> <td align="right">07</td><td>7</td> <td align="right">19</td><td>J</td> <td align="right">31</td><td>V</td> <td align="right">43</td><td>/</td> </tr> <tr> <td align="right">08</td><td>8</td> <td align="right">20</td><td>K</td> <td align="right">32</td><td>W</td> <td align="right">44</td><td>:</td> </tr> <tr> <td align="right">09</td><td>9</td> <td align="right">21</td><td>L</td> <td align="right">33</td><td>X</td> <td></td><td></td> </tr> <tr> <td align="right">10</td><td>A</td> <td align="right">22</td><td>M</td> <td align="right">34</td><td>Y</td> <td></td><td></td> </tr> <tr> <td align="right">11</td><td>B</td> <td align="right">23</td><td>N</td> <td align="right">35</td><td>Z</td> <td></td><td></td> </tr> </tbody> </table> </section> <sectiontitle="Encoding examples">numbered="true" toc="default"> <name>Encoding Examples</name> <t> It should be noted that although the examples are all text, Base45 is an encoding for binary data where each octet can have any value 0-255. </t><t> Encoding<t>Encoding example1:1:</t> <t indent="3"> The string "AB" is the byte sequence[65 66]. The[[65 66]]. If we look at all 16bit value isbits, we get 65 * 256 + 66 = 16706. 16706 equals 11 +45(11 *1145) +45(8 * 45 *8,45), so the sequence in base 45 is [11 11 8].By looking up these values in the Table 1Referring to <xref target="table1"/>, we get the encoded string "BB8". </t><t> Encoding<table> <name>Example 1 in Detail</name> <tbody> <tr> <td>AB</td> <td>Initial string</td> </tr> <tr> <td>[[65 66]]</td> <td>Decimal value</td> </tr> <tr> <td>[16706]</td> <td>Value in base 16</td> </tr> <tr> <td>[11 11 8]</td> <td>Value in base 45</td> </tr> <tr> <td>BB8</td> <td>Encoded string</td> </tr> </tbody> </table> <t>Encoding example2:2:</t> <t indent="3"> The string "Hello!!" as ASCII is the byte sequence[72 101 108 108 111 33 33].[[72 101] [108 108] [111 33] [33]]. If we look ateachthis 16bit value, it isbits at a time, we get [18533 27756 28449 33]. Note the 33 for the last byte. When looking at the values in base 45, we get [[38 6 9] [36 31 13] [9 2 14] [330]]0]], where the last byte is represented bytwo.two values. The resulting string "%69 VD92EX0" is created by looking up these values inTable 1.<xref target="table1"/>. It should be noted it includes a space. </t><t> Encoding<table> <name>Example 2 in Detail</name> <tbody> <tr> <td>Hello!!</td> <td>Initial string</td> </tr> <tr> <td>[[72 101] [108 108] [111 33] [33]]</td> <td>Decimal value</td> </tr> <tr> <td>[18533 27756 28449 33]</td> <td>Value in base 16</td> </tr> <tr> <td>[[38 6 9] [36 31 13] [9 2 14] [33 0]]</td> <td>Value in base 45</td> </tr> <tr> <td>%69 VD92EX0</td> <td>Encoded string</td> </tr> </tbody> </table> <t>Encoding example3:3:</t> <t indent="3"> The string "base-45" as ASCII is the byte sequence[98 97 115 101 45 52 53].[[98 97] [115 101] [45 52] [53]]. If we look ateach 16 bit value, it isthis two bytes at a time, we get [25185 29541 11572 53]. Note the 53 for the last byte. When looking at the values in base 45, we get [[30 19 12] [21 26 14] [7 32 5] [8 1]] where the last byte is represented bytwo. By looking up these values in the Table 1two values. Referring to <xref target="table1"/>, we get the encoded string "UJCLQE7W581". </t> <table> <name>Example 3 in Detail</name> <tbody> <tr> <td>base-45</td> <td>Initial string</td> </tr> <tr> <td>[[98 97] [115 101] [45 52] [53]]</td> <td>Decimal value</td> </tr> <tr> <td>[25185 29541 11572 53]</td> <td>Value in base 16</td> </tr> <tr> <td>[[30 19 12] [21 26 14] [7 32 5] [8 1]]</td> <td>Value in base 45</td> </tr> <tr> <td>UJCLQE7W581</td> <td>Encoded string</td> </tr> </tbody> </table> </section> <sectiontitle="Decoding examples"> <t> Decodingnumbered="true" toc="default"> <name>Decoding Example</name> <t>Decoding example1:1:</t> <t indent="3"> The string "QED8WEX0" represents, when looked up in Table 1, the values [26 14 13 8 32 14 33 0]. We arrange the numbers in chunks of three, except for the last one which can betwo,two numbers, and get [[26 14 13] [8 32 14] [33 0]]. In base4545, we get [26981 29798 33] where the bytes are [[105 101] [116 102] [33]]. If we look at the ASCIIvaluesvalues, we get the string "ietf!". </t> <table> <name>Example 4 in Detail</name> <tbody> <tr> <td>QED8WEX0</td> <td>Initial string</td> </tr> <tr> <td>[26 14 13 8 32 14 33 0]</td> <td>Looked up values</td> </tr> <tr> <td>[[26 14 13] [8 32 14] [33 0]]</td> <td>Groups of three</td> </tr> <tr> <td>[26981 29798 33]</td> <td>Interpreted as base 45</td> </tr> <tr> <td>[[105 101] [116 102] [33]]</td> <td>Values in base 8</td> </tr> <tr> <td>ietf!</td> <td>Decoded string</td> </tr> </tbody> </table> </section> </section> <sectiontitle="IANA Considerations">numbered="true" toc="default"> <name>IANA Considerations</name> <t>There areThis document has noconsiderations forIANAin this document.actions. </t> </section> <sectiontitle="Security Considerations">numbered="true" toc="default"> <name>Security Considerations</name> <t> When implementing encoding and decoding it is important to be very careful so that buffer overflow or similardoesissues do not occur. This of course includes the calculations in base 45 and lookup in the table of characters(Table 1).(<xref target="table1"/>). A decoder must also be robust regarding input, including proper handling of any octet value 0-255, including the NUL character (ASCII 0). </t> <t> It should be noted that Base64 and some other encodings pad the string so that the encoding starts with an aligned number of characters while Base45 specifically avoids padding. Because of this, special care has to be taken when an odd number of octetsareis to be encoded. Similarly, care must be taken if the number of characters to decode are not evenly divisible by 3. </t> <t> Base encodings use a specific, reduced alphabet to encode binary data. Non-alphabet characters could exist within base-encoded data, caused by data corruption or by design. Non-alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non-alphabet characters might also be sent in order to exploit implementation errors leading to,e.g.,for example, buffer overflow attacks. </t> <t> ImplementationsMUST<bcp14>MUST</bcp14> reject any input that is not a valid encoding. For example, itMUST<bcp14>MUST</bcp14> reject the input (encoded data) if it contains characters outside the base alphabet (inTable 1)<xref target="table1"/>) when interpreting base-encoded data. </t> <t> Even though aBase45 encodedBase45-encoded string contains only characters from the alphabet inTable 1,<xref target="table1"/>, cases like the followinghashave to be considered: The string "FGW" represents 65535 (FFFF in base 16), which is a valid encoding of 16 bits. A slightly different encoded string of the same length, "GGW", would represent 65536 (10000 in base 16), which is represented by more than 16 bits. ImplementationsMUST<bcp14>MUST</bcp14> also reject the encoded data if it contains a triplet of characterswhich,that, when decoded, results in an unsigned integerwhichthat is greater than 65535(ffff(FFFF in base 16). </t> <t> It should be noted that the resulting string after encoding to Base45 might include non-URL-safe characters so if the URL including the Base45 encoded data has to beURL safe,URL-safe, one has to use%-encoding. </t> </section> <section title="Acknowledgements"> <t> The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby Whitehead for the feedback. Also, everyone that have been working with Base64 over a long period of years and have proven the implementations are stable.percent-encoding. </t> </section> </middle> <back><references title='Normative References'> &RFC4648; &RFC2119; &RFC8174;<references> <name>Normative References</name> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml"/> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> <referenceanchor="ISO18004">anchor="ISO18004" target="https://www.iso.org/standard/62021.html"> <front> <title>ISO/IEC 18004:2015Information technology - Automatic identification and data capture techniques - QR Code bar code symbology specification </title> <author><organization>ISO/IEC JTC 1/SC 31</organization><organization>ISO/IEC</organization> </author> <date month="February"year="2015" />year="2015"/> </front> <seriesInfoname="ISO/IEC 18004:2015" value="https://www.iso.org/standard/62021.html" />name="ISO/IEC" value="18004:2015"/> </reference> </references> <section numbered="false" toc="default"> <name>Acknowledgements</name> <t> The authors thank <contact fullname="Mark Adler"/>, <contact fullname="Anders Ahl"/>, <contact fullname="Alan Barrett"/>, <contact fullname="Sam Spens Clason"/>, <contact fullname="Alfred Fiedler"/>, <contact fullname="Tomas Harreveld"/>, <contact fullname="Erik Hellman"/>, <contact fullname="Joakim Jardenberg"/>, <contact fullname="Michael Joost"/>, <contact fullname="Erik Kline"/>, <contact fullname="Christian Landgren"/>, <contact fullname="Anders Lowinger"/>, <contact fullname="Mans Nilsson"/>, <contact fullname="Jakob Schlyter"/>, <contact fullname="Peter Teufl"/>, and <contact fullname="Gaby Whitehead"/> for the feedback. Also, everyone who has been working with Base64 over a long period of years and has proven the implementations are stable. </t> </section> </back> </rfc>