Network Working Group

Internet Engineering Task Force (IETF)                      P. Faltstrom
Internet-Draft Fältström
Request for Comments: 9285                                        Netnod
Intended status:
Category: Informational                                     F. Ljunggren
Expires: December 18, 2022
ISSN: 2070-1721                                                    Kirei
                                                            D.
                                                          D.W. van Gulik
                                                              Webweaving
                                                           June 16,
                                                             August 2022

                        The Base45 Data Encoding
                       draft-faltstrom-base45-12

Abstract

   This document describes the Base45 encoding scheme scheme, which is built
   upon the Base64, Base32 Base32, and Base16 encoding schemes.

Status of This Memo

   This Internet-Draft document is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force
   (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list  It represents the consensus of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Not all documents valid
   approved by the IESG are candidates for a maximum any level of Internet
   Standard; see Section 2 of six months RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be updated, replaced, or obsoleted by other documents obtained at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 18, 2022.
   https://www.rfc-editor.org/info/rfc9285.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified Revised BSD License text as described in Section 4.e of the
   Trust Legal Provisions and are provided without warranty as described
   in the Simplified Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions Used in This Document . . . . . . . . . . . . . .   2
   3.  Interpretation of Encoded Data  . . . . . . . . . . . . . . .   2
   4.  The Base45 Encoding . . . . . . . . . . . . . . . . . . . . .   3
     4.1.  When to, to Use and not to, use Not Use Base45 . . . . . . . . . . . . .   4
     4.2.  The alphabet used Alphabet Used in Base45 . . . . . . . . . . . . . . .   4
     4.3.  Encoding examples . . . . . . . . . . . . . . . . . . . .   4 Examples
     4.4.  Decoding examples . . . . . . . . . . . . . . . . . . . .   5 Example
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   6
   8.  Normative References  . . . . . . . . . . . . . . . . . . . .   6
   Acknowledgements
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Introduction

   A QR-code QR code is used to encode text as a graphical image.  Depending on
   the characters used in the text text, various encoding options for a QR- QR
   code exist, e.g. e.g., Numeric, Alphanumeric Alphanumeric, and Byte mode.  Even in Byte
   mode
   mode, a typical QR-code QR code reader tries to interpret a byte sequence as a
   text encoded in UTF-8 or ISO/IEC 8859-1 encoded text. 8859-1.  Thus, QR-codes QR codes cannot be
   used to encode arbitrary binary data directly.  Such data has to be
   converted into an appropriate text before that text could be encoded
   as a QR-code. QR code.  Compared to already established Base64, Base32 Base32, and
   Base16 encoding schemes, schemes that are described in RFC 4648 [RFC4648], the Base45
   scheme described in this document offer offers a more compact QR- QR code
   encoding.

   One important difference from those others and Base45 is the key
   table and that the padding with '=' is not required.

2.  Conventions Used in This Document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Interpretation of Encoded Data

   Encoded data is to be interpreted as described in RFC 4648 [RFC4648] with the
   exception that a different alphabet is selected.

4.  The Base45 Encoding

   QR codes have a limited ability to store binary data.  In practice practice,
   binary data have to be encoded in characters according to one of the
   modes already defined in the standard for QR codes.  The easiest mode
   to use in called Alphanumeric mode (see section Section 7.3.4 and Table 2 of
   ISO/IEC 18004:2015 [ISO18004]).
   [ISO18004].  Unfortunately Alphanumeric mode uses 45 different
   characters which implies neither Base32 nor Base64 are very effective
   encodings.

   A 45-character subset of US-ASCII is used; the 45 characters usable
   in a QR code in Alphanumeric mode (see section Section 7.3.4 and Table 2 of
   ISO/IEC 18004:2015
   [ISO18004]).  Base45 encodes 2 bytes in 3 characters, compared to
   Base64, which encodes 3 bytes in 4 characters.

   For encoding, two bytes [a, b] MUST be interpreted as a number n in
   base 256, i.e. as an unsigned integer over 16 bits so that the number
   n = (a*256) (a * 256) + b.

   This number n is converted to base 45 [c, d, e] so that n = c +
   (d*45) (d *
   45) + (e*45*45). (e * 45 * 45).  Note the order of c, d and e which are chosen
   so that the left-most [c] is the least significant.

   The values c, d and e are then looked up in Table 1 to produce a
   three character string.  The process is reversed when decoding.

   For encoding a single byte [a], it MUST be interpreted as a base 256
   number, i.e. as an unsigned integer over 8 bits.  That integer MUST
   be converted to base 45 [c d] so that a = c + (45*d). (45 * d).  The values c
   and d are then looked up in Table 1 to produce a two character two-character
   string.

   A byte string [a b c d ... x y z] with arbitrary content and
   arbitrary length MUST be encoded as follows: From left to right pairs
   of bytes MUST be encoded as described above.  If the number of bytes
   is even, then the encoded form is a string with a length which that is
   evenly divisible by 3.  If the number of bytes is odd, then the last
   (rightmost) byte MUST be encoded on two characters as described
   above.

   For decoding a Base45 encoded string the inverse operations are
   performed.

4.1.  When to, to Use and not to, use Not Use Base45

   If binary data is to be stored in a QR-Code, QR code, the suggested mechanism
   is to use the Alphanumeric mode that uses 11 bits for 2 characters as
   defined in section Section 7.3.4 in ISO/IEC 18004:2015 of [ISO18004].  The ECI Extended Channel
   Interpretation (ECI) mode indicator for this encoding is 0010.

   On the other hand if the data is to be sent via some other transport,
   a transport encoding suitable for that transport should be used
   instead of Base45.  For example, it is not recommended to first
   encode data in Base45 and then encode the resulting string in Base64
   if the data is to be sent via email.  Instead, the Base45 encoding
   should be removed, and the data itself should be encoded in Base64.

4.2.  The alphabet used Alphabet Used in Base45

   The Alphanumeric mode is defined to use 45 characters as specified in
   this alphabet.

                  Table 1: The Base45 Alphabet

   Value

   +=====+==========+=====+==========+=====+==========+=====+==========+
   |Value| Encoding  Value |Value| Encoding  Value |Value| Encoding  Value |Value| Encoding
      00 |
   +=====+==========+=====+==========+=====+==========+=====+==========+
   |   00| 0            12        |   12| C            24        |   24| O            36        |   36| Space
      01    |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   01| 1            13        |   13| D            25        |   25| P            37        |   37| $
      02        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   02| 2            14        |   14| E            26        |   26| Q            38        |   38| %
      03        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   03| 3            15        |   15| F            27        |   27| R            39        |   39| *
      04        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   04| 4            16        |   16| G            28        |   28| S            40        |   40| +
      05        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   05| 5            17        |   17| H            29        |   29| T            41        |   41| -
      06        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   06| 6            18        |   18| I            30        |   30| U            42        |   42| .
      07        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   07| 7            19        |   19| J            31        |   31| V            43        |   43| /
      08        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   08| 8            20        |   20| K            32        |   32| W            44        |   44| :
      09        |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   09| 9            21        |   21| L            33        |   33| X
      10        |     |          |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   10| A            22        |   22| M            34        |   34| Y
      11        |     |          |
   +-----+----------+-----+----------+-----+----------+-----+----------+
   |   11| B            23        |   23| N            35        |   35| Z        |     |          |
   +-----+----------+-----+----------+-----+----------+-----+----------+

                        Table 1: The Base45 Alphabet

4.3.  Encoding examples Examples

   It should be noted that although the examples are all text, Base45 is
   an encoding for binary data where each octet can have any value
   0-255.

   Encoding example 1:

      The string "AB" is the byte sequence [65 66].
   The [[65 66]].  If we look at all
      16 bit value is bits, we get 65 * 256 + 66 = 16706.  16706 equals 11 + 45 (11 * 11
      45) + 45 (8 * 45 * 8, 45), so the sequence in base 45 is [11 11 8].  By looking
   up these values in the
      Referring to Table 1 1, we get the encoded string "BB8".

                     +-----------+------------------+
                     | AB        | Initial string   |
                     +-----------+------------------+
                     | [[65 66]] | Decimal value    |
                     +-----------+------------------+
                     | [16706]   | Value in base 16 |
                     +-----------+------------------+
                     | [11 11 8] | Value in base 45 |
                     +-----------+------------------+
                     | BB8       | Encoded string   |
                     +-----------+------------------+

                       Table 2: Example 1 in Detail

   Encoding example 2:

      The string "Hello!!" as ASCII is the byte sequence [72 101 108 108 111 33 33]. [[72 101] [108
      108] [111 33] [33]].  If we look at each this 16 bit
   value, it is bits at a time, we get
      [18533 27756 28449 33].  Note the 33 for the last byte.  When
      looking at the values in base 45, we get [[38 6 9] [36 31 13] [9 2
      14] [33 0]] 0]], where the last byte is represented by two. two values.
      The resulting string "%69 VD92EX0" is created by looking up these
      values in Table 1.  It should be noted it includes a space.

       +---------------------------------------+------------------+
       | Hello!!                               | Initial string   |
       +---------------------------------------+------------------+
       | [[72 101] [108 108] [111 33] [33]]    | Decimal value    |
       +---------------------------------------+------------------+
       | [18533 27756 28449 33]                | Value in base 16 |
       +---------------------------------------+------------------+
       | [[38 6 9] [36 31 13] [9 2 14] [33 0]] | Value in base 45 |
       +---------------------------------------+------------------+
       | %69 VD92EX0                           | Encoded string   |
       +---------------------------------------+------------------+

                       Table 3: Example 2 in Detail

   Encoding example 3:

      The string "base-45" as ASCII is the byte sequence [98 97 115 101 45 52 53]. [[98 97] [115
      101] [45 52] [53]].  If we look at each 16 bit value,
   it is this two bytes at a time, we
      get [25185 29541 11572 53].  Note the 53 for the last byte.  When
      looking at the values in base 45, we get [[30 19 12] [21 26 14] [7
      32 5] [8 1]] where the last byte is represented by two.  By looking up
   these values in the two values.
      Referring to Table 1 1, we get the encoded string "UJCLQE7W581".

       +----------------------------------------+------------------+
       | base-45                                | Initial string   |
       +----------------------------------------+------------------+
       | [[98 97] [115 101] [45 52] [53]]       | Decimal value    |
       +----------------------------------------+------------------+
       | [25185 29541 11572 53]                 | Value in base 16 |
       +----------------------------------------+------------------+
       | [[30 19 12] [21 26 14] [7 32 5] [8 1]] | Value in base 45 |
       +----------------------------------------+------------------+
       | UJCLQE7W581                            | Encoded string   |
       +----------------------------------------+------------------+

                        Table 4: Example 3 in Detail

4.4.  Decoding examples Example

   Decoding example 1:

      The string "QED8WEX0" represents, when looked up in Table 1, the
      values [26 14 13 8 32 14 33 0].  We arrange the numbers in chunks
      of three, except for the last one which can be two, two numbers, and
      get [[26 14 13] [8 32 14] [33 0]].  In base 45 45, we get [26981
      29798 33] where the bytes are [[105 101] [116 102] [33]].  If we
      look at the ASCII values values, we get the string "ietf!".

        +-------------------------------+------------------------+
        | QED8WEX0                      | Initial string         |
        +-------------------------------+------------------------+
        | [26 14 13 8 32 14 33 0]       | Looked up values       |
        +-------------------------------+------------------------+
        | [[26 14 13] [8 32 14] [33 0]] | Groups of three        |
        +-------------------------------+------------------------+
        | [26981 29798 33]              | Interpreted as base 45 |
        +-------------------------------+------------------------+
        | [[105 101] [116 102] [33]]    | Values in base 8       |
        +-------------------------------+------------------------+
        | ietf!                         | Decoded string         |
        +-------------------------------+------------------------+

                       Table 5: Example 4 in Detail

5.  IANA Considerations

   There are

   This document has no considerations for IANA in this document. actions.

6.  Security Considerations

   When implementing encoding and decoding it is important to be very
   careful so that buffer overflow or similar does issues do not occur.  This
   of course includes the calculations in base 45 and lookup in the
   table of characters (Table 1).  A decoder must also be robust
   regarding input, including proper handling of any octet value 0-255,
   including the NUL character (ASCII 0).

   It should be noted that Base64 and some other encodings pad the
   string so that the encoding starts with an aligned number of
   characters while Base45 specifically avoids padding.  Because of
   this, special care has to be taken when an odd number of octets are is to
   be encoded.  Similarly, care must be taken if the number of
   characters to decode are not evenly divisible by 3.

   Base encodings use a specific, reduced alphabet to encode binary
   data.  Non-alphabet characters could exist within base-encoded data,
   caused by data corruption or by design.  Non-alphabet characters may
   be exploited as a "covert channel", where non-protocol data can be
   sent for nefarious purposes.  Non-alphabet characters might also be
   sent in order to exploit implementation errors leading to, e.g., for
   example, buffer overflow attacks.

   Implementations MUST reject any input that is not a valid encoding.
   For example, it MUST reject the input (encoded data) if it contains
   characters outside the base alphabet (in Table 1) when interpreting
   base-encoded data.

   Even though a Base45 encoded Base45-encoded string contains only characters from the
   alphabet in Table 1, cases like the following has have to be considered:
   The string "FGW" represents 65535 (FFFF in base 16), which is a valid
   encoding of 16 bits.  A slightly different encoded string of the same
   length, "GGW", would represent 65536 (10000 in base 16), which is
   represented by more than 16 bits.  Implementations MUST also reject
   the encoded data if it contains a triplet of characters which, that, when
   decoded, results in an unsigned integer which that is greater than 65535
   (ffff
   (FFFF in base 16).

   It should be noted that the resulting string after encoding to Base45
   might include non-URL-safe characters so if the URL including the
   Base45 encoded data has to be URL safe, URL-safe, one has to use %-encoding. percent-
   encoding.

7.  Acknowledgements

   The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens
   Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim
   Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders
   Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl and Gaby
   Whitehead for the feedback.  Also, everyone that have been working
   with Base64 over a long period of years and have proven the
   implementations are stable.

8.  Normative References

   [ISO18004]
              ISO/IEC JTC 1/SC 31, "ISO/IEC 18004:2015 Information ISO/IEC, "Information technology - Automatic
              identification and data capture techniques - QR Code bar
              code symbology specification", ISO/IEC
              18004:2015 https://www.iso.org/standard/62021.html, 18004:2015,
              February 2015. 2015, <https://www.iso.org/standard/62021.html>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
              <https://www.rfc-editor.org/info/rfc4648>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Acknowledgements

   The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam Spens
   Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman, Joakim
   Jardenberg, Michael Joost, Erik Kline, Christian Landgren, Anders
   Lowinger, Mans Nilsson, Jakob Schlyter, Peter Teufl, and Gaby
   Whitehead for the feedback.  Also, everyone who has been working with
   Base64 over a long period of years and has proven the implementations
   are stable.

Authors' Addresses

   Patrik Faltstrom Fältström
   Netnod
   Email: paf@netnod.se

   Fredrik Ljunggren
   Kirei
   Email: fredrik@kirei.se

   Dirk-Willem van Gulik
   Webweaving
   Email: dirkx@webweaving.org