draft-ietf-sframe-encv1.txt   rfc9605.txt 
sframe E. Omara Internet Engineering Task Force (IETF) E. Omara
Internet-Draft Apple Request for Comments: 9605 Apple
Intended status: Standards Track J. Uberti Category: Standards Track J. Uberti
Expires: 26 December 2024 Google ISSN: 2070-1721 Fixie.ai
S. Murillo S. G. Murillo
CoSMo Software CoSMo Software
R. L. Barnes, Ed. R. Barnes, Ed.
Cisco Cisco
Y. Fablet Y. Fablet
Apple Apple
24 June 2024 July 2024
Secure Frame (SFrame) Secure Frame (SFrame): Lightweight Authenticated Encryption for Real-
draft-ietf-sframe-enc-latest Time Media
Abstract Abstract
This document describes the Secure Frame (SFrame) end-to-end This document describes the Secure Frame (SFrame) end-to-end
encryption and authentication mechanism for media frames in a encryption and authentication mechanism for media frames in a
multiparty conference call, in which central media servers (Selective multiparty conference call, in which central media servers (Selective
Forwarding Units or SFUs) can access the media metadata needed to Forwarding Units or SFUs) can access the media metadata needed to
make forwarding decisions without having access to the actual media. make forwarding decisions without having access to the actual media.
This mechanism differs from the Secure Real-Time Protocol (SRTP) in This mechanism differs from the Secure Real-Time Protocol (SRTP) in
that it is independent of RTP (thus compatible with non-RTP media that it is independent of RTP (thus compatible with non-RTP media
transport) and can be applied to whole media frames in order to be transport) and can be applied to whole media frames in order to be
more bandwidth efficient. more bandwidth efficient.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on 26 December 2024. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9605.
Copyright Notice Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology
3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Goals
4. SFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. SFrame
4.1. Application Context . . . . . . . . . . . . . . . . . . . 5 4.1. Application Context
4.2. SFrame Ciphertext . . . . . . . . . . . . . . . . . . . . 7 4.2. SFrame Ciphertext
4.3. SFrame Header . . . . . . . . . . . . . . . . . . . . . . 7 4.3. SFrame Header
4.4. Encryption Schema . . . . . . . . . . . . . . . . . . . . 9 4.4. Encryption Schema
4.4.1. Key Selection . . . . . . . . . . . . . . . . . . . . 10 4.4.1. Key Selection
4.4.2. Key Derivation . . . . . . . . . . . . . . . . . . . 10 4.4.2. Key Derivation
4.4.3. Encryption . . . . . . . . . . . . . . . . . . . . . 11 4.4.3. Encryption
4.4.4. Decryption . . . . . . . . . . . . . . . . . . . . . 13 4.4.4. Decryption
4.5. Cipher Suites . . . . . . . . . . . . . . . . . . . . . . 15 4.5. Cipher Suites
4.5.1. AES-CTR with SHA2 . . . . . . . . . . . . . . . . . . 16 4.5.1. AES-CTR with SHA2
5. Key Management . . . . . . . . . . . . . . . . . . . . . . . 18 5. Key Management
5.1. Sender Keys . . . . . . . . . . . . . . . . . . . . . . . 19 5.1. Sender Keys
5.2. MLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2. MLS
6. Media Considerations . . . . . . . . . . . . . . . . . . . . 22 6. Media Considerations
6.1. Selective Forwarding Units . . . . . . . . . . . . . . . 22 6.1. Selective Forwarding Units
6.1.1. LastN and RTP Stream Reuse . . . . . . . . . . . . . 23 6.1.1. RTP Stream Reuse
6.1.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . 23 6.1.2. Simulcast
6.1.3. SVC . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.1.3. Scalable Video Coding (SVC)
6.2. Video Key Frames . . . . . . . . . . . . . . . . . . . . 23 6.2. Video Key Frames
6.3. Partial Decoding . . . . . . . . . . . . . . . . . . . . 24 6.3. Partial Decoding
7. Security Considerations . . . . . . . . . . . . . . . . . . . 24 7. Security Considerations
7.1. No Header Confidentiality . . . . . . . . . . . . . . . . 24 7.1. No Header Confidentiality
7.2. No per-Sender Authentication . . . . . . . . . . . . . . 25 7.2. No Per-Sender Authentication
7.3. Key Management . . . . . . . . . . . . . . . . . . . . . 25 7.3. Key Management
7.4. Replay . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.4. Replay
7.5. Risks Due to Short Tags . . . . . . . . . . . . . . . . . 25 7.5. Risks Due to Short Tags
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 8. IANA Considerations
8.1. SFrame Cipher Suites . . . . . . . . . . . . . . . . . . 27 8.1. SFrame Cipher Suites
9. Application Responsibilities
9. Application Responsibilities . . . . . . . . . . . . . . . . 28 9.1. Header Value Uniqueness
9.1. Header Value Uniqueness . . . . . . . . . . . . . . . . . 29 9.2. Key Management Framework
9.2. Key Management Framework . . . . . . . . . . . . . . . . 29 9.3. Anti-Replay
9.3. Anti-Replay . . . . . . . . . . . . . . . . . . . . . . . 30 9.4. Metadata
9.4. Metadata . . . . . . . . . . . . . . . . . . . . . . . . 30 10. References
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 10.1. Normative References
10.1. Normative References . . . . . . . . . . . . . . . . . . 30 10.2. Informative References
10.2. Informative References . . . . . . . . . . . . . . . . . 31 Appendix A. Example API
Appendix A. Example API . . . . . . . . . . . . . . . . . . . . 32 Appendix B. Overhead Analysis
Appendix B. Overhead Analysis . . . . . . . . . . . . . . . . . 34 B.1. Assumptions
B.1. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 35 B.2. Audio
B.2. Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 35 B.3. Video
B.3. Video . . . . . . . . . . . . . . . . . . . . . . . . . . 36 B.4. Conferences
B.4. Conferences . . . . . . . . . . . . . . . . . . . . . . . 38 B.5. SFrame over RTP
B.5. SFrame over RTP . . . . . . . . . . . . . . . . . . . . . 38 Appendix C. Test Vectors
Appendix C. Test Vectors . . . . . . . . . . . . . . . . . . . . 40 C.1. Header Encoding/Decoding
C.1. Header Encoding/Decoding . . . . . . . . . . . . . . . . 41 C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC
C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC . . . . 65 C.3. SFrame Encryption/Decryption
C.3. SFrame Encryption/Decryption . . . . . . . . . . . . . . 67 Acknowledgements
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 72 Contributors
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 72
1. Introduction 1. Introduction
Modern multiparty video call systems use Selective Forwarding Unit Modern multiparty video call systems use Selective Forwarding Unit
(SFU) servers to efficiently route media streams to call endpoints (SFU) servers to efficiently route media streams to call endpoints
based on factors such as available bandwidth, desired video size, based on factors such as available bandwidth, desired video size,
codec support, and other factors. An SFU typically does not need codec support, and other factors. An SFU typically does not need
access to the media content of the conference, which allows the media access to the media content of the conference, which allows the media
to be encrypted "end to end" so that it cannot be decrypted by the to be encrypted "end to end" so that it cannot be decrypted by the
SFU. In order for the SFU to work properly, though, it usually needs SFU. In order for the SFU to work properly, though, it usually needs
skipping to change at page 4, line 22 skipping to change at line 158
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
MAC: Message Authentication Code MAC: Message Authentication Code
E2EE: End-to-End Encryption E2EE: End-to-End Encryption
HBH: Hop-By-Hop HBH: Hop-by-Hop
We use "Selective Forwarding Unit (SFU)" and "media stream" in a less We use "Selective Forwarding Unit (SFU)" and "media stream" in a less
formal sense than in [RFC7656]. An SFU is a selective switching formal sense than in [RFC7656]. An SFU is a selective switching
function for media payloads, and a media stream a sequence of media function for media payloads, and a media stream is a sequence of
payloads, in both cases regardless of whether those media payloads media payloads, regardless of whether those media payloads are
are transported over RTP or some other protocol. transported over RTP or some other protocol.
3. Goals 3. Goals
SFrame is designed to be a suitable E2EE protection scheme for SFrame is designed to be a suitable E2EE protection scheme for
conference call media in a broad range of scenarios, as outlined by conference call media in a broad range of scenarios, as outlined by
the following goals: the following goals:
1. Provide a secure E2EE mechanism for audio and video in conference 1. Provide a secure E2EE mechanism for audio and video in conference
calls that can be used with arbitrary SFU servers. calls that can be used with arbitrary SFU servers.
2. Decouple media encryption from key management to allow SFrame to 2. Decouple media encryption from key management to allow SFrame to
be used with an arbitrary key management system. be used with an arbitrary key management system.
3. Minimize packet expansion to allow successful conferencing in as 3. Minimize packet expansion to allow successful conferencing in as
many network conditions as possible. many network conditions as possible.
4. Independence from the underlying transport, including use in non- 4. Decouple the media encryption framework from the underlying
RTP transports, e.g., WebTransport [I-D.ietf-webtrans-overview]. transport, allowing use in non-RTP scenarios, e.g., WebTransport
[WEBTRANSPORT].
5. When used with RTP and its associated error-resilience 5. When used with RTP and its associated error-resilience
mechanisms, i.e., RTX and Forward Error Correction (FEC), require mechanisms, i.e., RTX and Forward Error Correction (FEC), require
no special handling for RTX and FEC packets. no special handling for RTX and FEC packets.
6. Minimize the changes needed in SFU servers. 6. Minimize the changes needed in SFU servers.
7. Minimize the changes needed in endpoints. 7. Minimize the changes needed in endpoints.
8. Work with the most popular audio and video codecs used in 8. Work with the most popular audio and video codecs used in
skipping to change at page 5, line 23 skipping to change at line 209
E2EE, is simple to implement, has no dependencies on RTP, and E2EE, is simple to implement, has no dependencies on RTP, and
minimizes encryption bandwidth overhead. This section describes how minimizes encryption bandwidth overhead. This section describes how
the mechanism works and includes details of how applications utilize the mechanism works and includes details of how applications utilize
SFrame for media protection as well as the actual mechanics of E2EE SFrame for media protection as well as the actual mechanics of E2EE
for protecting media. for protecting media.
4.1. Application Context 4.1. Application Context
SFrame is a general encryption framing, intended to be used as an SFrame is a general encryption framing, intended to be used as an
E2EE layer over an underlying HBH-encrypted transport such as SRTP or E2EE layer over an underlying HBH-encrypted transport such as SRTP or
QUIC [RFC3711][I-D.ietf-moq-transport]. QUIC [RFC3711][MOQ-TRANSPORT].
The scale at which SFrame encryption is applied to media determines The scale at which SFrame encryption is applied to media determines
the overall amount of overhead that SFrame adds to the media stream the overall amount of overhead that SFrame adds to the media stream
as well as the engineering complexity involved in integrating SFrame as well as the engineering complexity involved in integrating SFrame
into a particular environment. Two patterns are common: using SFrame into a particular environment. Two patterns are common: using SFrame
to encrypt either whole media frames (per frame) or individual to encrypt either whole media frames (per frame) or individual
transport-level media payloads (per packet). transport-level media payloads (per packet).
For example, Figure 1 shows a typical media sender stack that takes For example, Figure 1 shows a typical media sender stack that takes
media from some source, encodes it into frames, divides those frames media from some source, encodes it into frames, divides those frames
skipping to change at page 7, line 35 skipping to change at line 314
| | | | | | | |
| | | | | | | |
| | | | | | | |
+->+-------------------------------------------------------+<-+ +->+-------------------------------------------------------+<-+
| | Authentication Tag | | | | Authentication Tag | |
| +-------------------------------------------------------+ | | +-------------------------------------------------------+ |
| | | |
| | | |
+--- Encrypted Portion Authenticated Portion ---+ +--- Encrypted Portion Authenticated Portion ---+
Figure 2: Structure of an SFrame Ciphertext
When SFrame is applied per packet, the payload of each packet will be When SFrame is applied per packet, the payload of each packet will be
an SFrame ciphertext. When SFrame is applied per frame, the SFrame an SFrame ciphertext. When SFrame is applied per frame, the SFrame
ciphertext representing an encrypted frame will span several packets, ciphertext representing an encrypted frame will span several packets,
with the header appearing in the first packet and the authentication with the header appearing in the first packet and the authentication
tag in the last packet. It is the responsibility of the application tag in the last packet. It is the responsibility of the application
to reassemble an encrypted frame from individual packets, accounting to reassemble an encrypted frame from individual packets, accounting
for packet loss and reordering as necessary. for packet loss and reordering as necessary.
4.3. SFrame Header 4.3. SFrame Header
The SFrame header specifies two values from which encryption The SFrame header specifies two values from which encryption
parameters are derived: parameters are derived:
* A Key ID (KID) that determines which encryption key should be used * A Key ID (KID) that determines which encryption key should be used
* A counter (CTR) that is used to construct the nonce for the * A Counter (CTR) that is used to construct the nonce for the
encryption encryption
Applications MUST ensure that each (KID, CTR) combination is used for Applications MUST ensure that each (KID, CTR) combination is used for
exactly one SFrame encryption operation. A typical approach to exactly one SFrame encryption operation. A typical approach to
achieve this guarantee is outlined in Section 9.1. achieve this guarantee is outlined in Section 9.1.
Config Byte Config Byte
| |
.-----' '-----. .-----' '-----.
| | | |
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+------------+------------+ +-+-+-+-+-+-+-+-+------------+------------+
|X| K |Y| C | KID... | CTR... | |X| K |Y| C | KID... | CTR... |
+-+-+-+-+-+-+-+-+------------+------------+ +-+-+-+-+-+-+-+-+------------+------------+
Figure 2: SFrame Header Figure 3: SFrame Header
The SFrame header has the overall structure shown in Figure 2. The The SFrame header has the overall structure shown in Figure 3. The
first byte is a "config byte", with the following fields: first byte is a "config byte", with the following fields:
Extended Key ID Flag (X, 1 bit): Indicates if the K field contains Extended KID Flag (X, 1 bit): Indicates if the K field contains the
the Key ID or the Key ID length. KID or the KID length.
Key or Key Length (K, 3 bits): If the X flag is set to 0, this field KID or KID Length (K, 3 bits): If the X flag is set to 0, this field
contains the Key ID. If the X flag is set to 1, then it contains contains the KID. If the X flag is set to 1, then it contains the
the length of the Key ID, minus one. length of the KID, minus one.
Extended Counter Flag (Y, 1 bit): Indicates if the C field contains Extended CTR Flag (Y, 1 bit): Indicates if the C field contains the
the counter or the counter length. CTR or the CTR length.
Counter or Counter Length (C, 3 bits): This field contains the CTR or CTR Length (C, 3 bits): This field contains the CTR if the Y
counter (CTR) if the Y flag is set to 0, or the counter length, flag is set to 0, or the CTR length, minus one, if set to 1.
minus one, if set to 1.
The Key ID and Counter fields are encoded as compact unsigned The KID and CTR fields are encoded as compact unsigned integers in
integers in network (big-endian) byte order. If the value of one of network (big-endian) byte order. If the value of one of these fields
these fields is in the range 0-7, then the value is carried in the is in the range 0-7, then the value is carried in the corresponding
corresponding bits of the config byte (K or C) and the corresponding bits of the config byte (K or C) and the corresponding flag (X or Y)
flag (X or Y) is set to zero. Otherwise, the value MUST be encoded is set to zero. Otherwise, the value MUST be encoded with the
with the minimum number of bytes required and appended after the minimum number of bytes required and appended after the config byte,
config byte, with the Key ID first and Counter second. The header with the KID first and CTR second. The header field (K or C) is set
field (K or C) is set to the number of bytes in the encoded value, to the number of bytes in the encoded value, minus one. The value
minus one. The value 000 represents a length of 1, 001 a length of 000 represents a length of 1, 001 a length of 2, etc. This allows a
2, etc. This allows a 3-bit length field to represent the value 3-bit length field to represent the value lengths 1-8.
lengths 1-8.
The SFrame header can thus take one of the four forms shown in The SFrame header can thus take one of the four forms shown in
Figure 3, depending on which of the X and Y flags are set. Figure 4, depending on which of the X and Y flags are set.
KID < 8, CTR < 8: KID < 8, CTR < 8:
+-+-----+-+-----+ +-+-----+-+-----+
|0| KID |0| CTR | |0| KID |0| CTR |
+-+-----+-+-----+ +-+-----+-+-----+
KID < 8, CTR >= 8: KID < 8, CTR >= 8:
+-+-----+-+-----+------------------------+ +-+-----+-+-----+------------------------+
|0| KID |1|CLEN | CTR... (length=CLEN) | |0| KID |1|CLEN | CTR... (length=CLEN) |
+-+-----+-+-----+------------------------+ +-+-----+-+-----+------------------------+
skipping to change at page 9, line 25 skipping to change at line 399
KID >= 8, CTR < 8: KID >= 8, CTR < 8:
+-+-----+-+-----+------------------------+ +-+-----+-+-----+------------------------+
|1|KLEN |0| CTR | KID... (length=KLEN) | |1|KLEN |0| CTR | KID... (length=KLEN) |
+-+-----+-+-----+------------------------+ +-+-----+-+-----+------------------------+
KID >= 8, CTR >= 8: KID >= 8, CTR >= 8:
+-+-----+-+-----+------------------------+------------------------+ +-+-----+-+-----+------------------------+------------------------+
|1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) | |1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) |
+-+-----+-+-----+------------------------+------------------------+ +-+-----+-+-----+------------------------+------------------------+
Figure 3: Forms of Encoded SFrame Header Figure 4: Forms of Encoded SFrame Header
4.4. Encryption Schema 4.4. Encryption Schema
SFrame encryption uses an AEAD encryption algorithm and hash function SFrame encryption uses an AEAD encryption algorithm and hash function
defined by the cipher suite in use (see Section 4.5). We will refer defined by the cipher suite in use (see Section 4.5). We will refer
to the following aspects of the AEAD and the hash algorithm below: to the following aspects of the AEAD and the hash algorithm below:
* AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption * AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption
functions for the AEAD. We follow the convention of RFC 5116 functions for the AEAD. We follow the convention of RFC 5116
[RFC5116] and consider the authentication tag part of the [RFC5116] and consider the authentication tag part of the
skipping to change at page 9, line 49 skipping to change at line 423
* AEAD.Nk - The size in bytes of a key for the encryption algorithm * AEAD.Nk - The size in bytes of a key for the encryption algorithm
* AEAD.Nn - The size in bytes of a nonce for the encryption * AEAD.Nn - The size in bytes of a nonce for the encryption
algorithm algorithm
* AEAD.Nt - The overhead in bytes of the encryption algorithm * AEAD.Nt - The overhead in bytes of the encryption algorithm
(typically the size of a "tag" that is added to the plaintext) (typically the size of a "tag" that is added to the plaintext)
* AEAD.Nka - For cipher suites using the compound AEAD described in * AEAD.Nka - For cipher suites using the compound AEAD described in
Section 4.5.1, the size in bytes of a key for the underlying Section 4.5.1, the size in bytes of a key for the underlying
Advanced Encryption Standard Counter Mode (AES-CTR) algorithm encryption algorithm
* Hash.Nh - The size in bytes of the output of the hash function * Hash.Nh - The size in bytes of the output of the hash function
4.4.1. Key Selection 4.4.1. Key Selection
Each SFrame encryption or decryption operation is premised on a Each SFrame encryption or decryption operation is premised on a
single secret base_key, which is labeled with an integer KID value single secret base_key, which is labeled with an integer KID value
signaled in the SFrame header. signaled in the SFrame header.
The sender and receivers need to agree on which base_key should be The sender and receivers need to agree on which base_key should be
skipping to change at page 10, line 23 skipping to change at line 445
on whether a base_key will be used for encryption or decryption only. on whether a base_key will be used for encryption or decryption only.
The process for provisioning base_key values and their KID values is The process for provisioning base_key values and their KID values is
beyond the scope of this specification, but its security properties beyond the scope of this specification, but its security properties
will bound the assurances that SFrame provides. For example, if will bound the assurances that SFrame provides. For example, if
SFrame is used to provide E2E security against intermediary media SFrame is used to provide E2E security against intermediary media
nodes, then SFrame keys need to be negotiated in a way that does not nodes, then SFrame keys need to be negotiated in a way that does not
make them accessible to these intermediaries. make them accessible to these intermediaries.
For each known KID value, the client stores the corresponding For each known KID value, the client stores the corresponding
symmetric key base_key. For keys that can be used for encryption, symmetric key base_key. For keys that can be used for encryption,
the client also stores the next counter value CTR to be used when the client also stores the next CTR value to be used when encrypting
encrypting (initially 0). (initially 0).
When encrypting a plaintext, the application specifies which KID is When encrypting a plaintext, the application specifies which KID is
to be used, and the counter is incremented after successful to be used, and the CTR value is incremented after successful
encryption. When decrypting, the base_key for decryption is selected encryption. When decrypting, the base_key for decryption is selected
from the available keys using the KID value in the SFrame header. from the available keys using the KID value in the SFrame header.
A given base_key MUST NOT be used for encryption by multiple senders. A given base_key MUST NOT be used for encryption by multiple senders.
Such reuse would result in multiple encrypted frames being generated Such reuse would result in multiple encrypted frames being generated
with the same (key, nonce) pair, which harms the protections provided with the same (key, nonce) pair, which harms the protections provided
by many AEAD algorithms. Implementations MUST mark each base_key as by many AEAD algorithms. Implementations MUST mark each base_key as
usable for encryption or decryption, never both. usable for encryption or decryption, never both.
Note that the set of available keys might change over the lifetime of Note that the set of available keys might change over the lifetime of
skipping to change at page 11, line 9 skipping to change at line 478
SFrame encryption and decryption use a key and salt derived from the SFrame encryption and decryption use a key and salt derived from the
base_key associated with a KID. Given a base_key value, the key and base_key associated with a KID. Given a base_key value, the key and
salt are derived using HMAC-based Key Derivation Function (HKDF) salt are derived using HMAC-based Key Derivation Function (HKDF)
[RFC5869] as follows: [RFC5869] as follows:
def derive_key_salt(KID, base_key): def derive_key_salt(KID, base_key):
sframe_secret = HKDF-Extract("", base_key) sframe_secret = HKDF-Extract("", base_key)
sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite
sframe_key = HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk) sframe_key =
HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk)
sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite
sframe_salt = HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn) sframe_salt =
HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn)
return sframe_key, sframe_salt return sframe_key, sframe_salt
In the derivation of sframe_secret: In the derivation of sframe_secret:
* The + operator represents concatenation of byte strings. * The + operator represents concatenation of byte strings.
* The KID value is encoded as an 8-byte big-endian integer, not the * The KID value is encoded as an 8-byte big-endian integer, not the
compressed form used in the SFrame header. compressed form used in the SFrame header.
* The cipher_suite value is a 2-byte big-endian integer representing * The cipher_suite value is a 2-byte big-endian integer representing
the cipher suite in use (see Section 8.1). the cipher suite in use (see Section 8.1).
The hash function used for HKDF is determined by the cipher suite in The hash function used for HKDF is determined by the cipher suite in
use. use.
4.4.3. Encryption 4.4.3. Encryption
SFrame encryption uses the AEAD encryption algorithm for the cipher SFrame encryption uses the AEAD encryption algorithm for the cipher
suite in use. The key for the encryption is the sframe_key and the suite in use. The key for the encryption is the sframe_key. The
nonce is formed by XORing the sframe_salt with the current counter, nonce is formed by first XORing the sframe_salt with the current CTR
encoded as a big-endian integer of length AEAD.Nn. value, and then encoding the result as a big-endian integer of length
AEAD.Nn.
The encryptor forms an SFrame header using the CTR and KID values The encryptor forms an SFrame header using the CTR and KID values
provided. The encoded header is provided as AAD to the AEAD provided. The encoded header is provided as AAD to the AEAD
encryption operation, together with application-provided metadata encryption operation, together with application-provided metadata
about the encrypted media (see Section 9.4). about the encrypted media (see Section 9.4).
def encrypt(CTR, KID, metadata, plaintext): def encrypt(CTR, KID, metadata, plaintext):
sframe_key, sframe_salt = key_store[KID] sframe_key, sframe_salt = key_store[KID]
# encode_big_endian(x, n) produces an n-byte string encoding the # encode_big_endian(x, n) produces an n-byte string encoding the
skipping to change at page 13, line 42 skipping to change at line 572
| +---------------+ | | +---------------+ |
+-------------->| SFrame Header | | +-------------->| SFrame Header | |
+---------------+ | +---------------+ |
| | | | | |
| |<----+ | |<----+
| ciphertext | | ciphertext |
| | | |
| | | |
+---------------+ +---------------+
Figure 4: Encrypting an SFrame Ciphertext Figure 5: Encrypting an SFrame Ciphertext
4.4.4. Decryption 4.4.4. Decryption
Before decrypting, a receiver needs to assemble a full SFrame Before decrypting, a receiver needs to assemble a full SFrame
ciphertext. When an SFrame ciphertext is fragmented into multiple ciphertext. When an SFrame ciphertext is fragmented into multiple
parts for transport (e.g., a whole encrypted frame sent in multiple parts for transport (e.g., a whole encrypted frame sent in multiple
SRTP packets), the receiving client collects all the fragments of the SRTP packets), the receiving client collects all the fragments of the
ciphertext, using appropriate sequencing and start/end markers in the ciphertext, using appropriate sequencing and start/end markers in the
transport. Once all of the required fragments are available, the transport. Once all of the required fragments are available, the
client reassembles them into the SFrame ciphertext, then it passes client reassembles them into the SFrame ciphertext and passes the
the ciphertext to SFrame for decryption. ciphertext to SFrame for decryption.
The KID field in the SFrame header is used to find the right key and The KID field in the SFrame header is used to find the right key and
salt for the encrypted frame, and the CTR field is used to construct salt for the encrypted frame, and the CTR field is used to construct
the nonce. The SFrame decryption procedure is as follows: the nonce. The SFrame decryption procedure is as follows:
def decrypt(metadata, sframe_ciphertext): def decrypt(metadata, sframe_ciphertext):
KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext) KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext)
sframe_key, sframe_salt = key_store[KID] sframe_key, sframe_salt = key_store[KID]
skipping to change at page 14, line 28 skipping to change at line 607
return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext)
If a ciphertext fails to decrypt because there is no key available If a ciphertext fails to decrypt because there is no key available
for the KID in the SFrame header, the client MAY buffer the for the KID in the SFrame header, the client MAY buffer the
ciphertext and retry decryption once a key with that KID is received. ciphertext and retry decryption once a key with that KID is received.
If a ciphertext fails to decrypt for any other reason, the client If a ciphertext fails to decrypt for any other reason, the client
MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded
in a way that is indistinguishable (to an external observer) from in a way that is indistinguishable (to an external observer) from
having processed a valid ciphertext. In other words, the SFrame having processed a valid ciphertext. In other words, the SFrame
decrypt operation should be constant time, regardless of whether decrypt operation should take the same amount of time regardless of
decryption succeeds or fails. whether decryption succeeds or fails.
SFrame Ciphertext SFrame Ciphertext
+---------------+ +---------------+
+---------------| SFrame Header | +---------------| SFrame Header |
| +---------------+ | +---------------+
| | | | | |
| | |-----+ | | |-----+
| | ciphertext | | | | ciphertext | |
| | | | | | | |
| | | | | | | |
skipping to change at page 15, line 43 skipping to change at line 648
| |
V V
+---------------+ +---------------+
| | | |
| | | |
| plaintext | | plaintext |
| | | |
| | | |
+---------------+ +---------------+
Figure 5: Decrypting an SFrame Ciphertext Figure 6: Decrypting an SFrame Ciphertext
4.5. Cipher Suites 4.5. Cipher Suites
Each SFrame session uses a single cipher suite that specifies the Each SFrame session uses a single cipher suite that specifies the
following primitives: following primitives:
* A hash function used for key derivation * A hash function used for key derivation
* An AEAD encryption algorithm [RFC5116] used for frame encryption, * An AEAD encryption algorithm [RFC5116] used for frame encryption,
optionally with a truncated authentication tag optionally with a truncated authentication tag
This document defines the following cipher suites, with the constants This document defines the following cipher suites, with the constants
defined in Section 4.4: defined in Section 4.4:
+============================+====+=====+====+====+====+ +============================+====+=====+====+====+====+
| Name | Nh | Nka | Nk | Nn | Nt | | Name | Nh | Nka | Nk | Nn | Nt |
+============================+====+=====+====+====+====+ +============================+====+=====+====+====+====+
| AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 | | AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 |
skipping to change at page 17, line 9 skipping to change at line 704
In order to allow very short tag sizes, we define a synthetic AEAD In order to allow very short tag sizes, we define a synthetic AEAD
function using the authenticated counter mode of AES together with function using the authenticated counter mode of AES together with
HMAC for authentication. We use an encrypt-then-MAC approach, as in HMAC for authentication. We use an encrypt-then-MAC approach, as in
SRTP [RFC3711]. SRTP [RFC3711].
Before encryption or decryption, encryption and authentication Before encryption or decryption, encryption and authentication
subkeys are derived from the single AEAD key. The overall length of subkeys are derived from the single AEAD key. The overall length of
the AEAD key is Nka + Nh, where Nka represents the key size for the the AEAD key is Nka + Nh, where Nka represents the key size for the
AES block cipher in use and Nh represents the output size of the hash AES block cipher in use and Nh represents the output size of the hash
function (as in Table 1). The encryption subkey comprises the first function (as in Section 4.4). The encryption subkey comprises the
Nka bytes and the authentication subkey comprises the remaining Nh first Nka bytes and the authentication subkey comprises the remaining
bytes. Nh bytes.
def derive_subkeys(sframe_key): def derive_subkeys(sframe_key):
# The encryption key comprises the first Nka bytes # The encryption key comprises the first Nka bytes
enc_key = sframe_key[..Nka] enc_key = sframe_key[..Nka]
# The authentication key comprises Nh remaining bytes # The authentication key comprises Nh remaining bytes
auth_key = sframe_key[Nka..] auth_key = sframe_key[Nka..]
return enc_key, auth_key return enc_key, auth_key
skipping to change at page 19, line 18 skipping to change at line 778
they can use it to distribute SFrame keys. Each client participating they can use it to distribute SFrame keys. Each client participating
in a call generates a fresh base_key value that it will use to in a call generates a fresh base_key value that it will use to
encrypt media. The client then uses the E2E-secure channel to send encrypt media. The client then uses the E2E-secure channel to send
their encryption key to the other participants. their encryption key to the other participants.
In this scheme, it is assumed that receivers have a signal outside of In this scheme, it is assumed that receivers have a signal outside of
SFrame for which client has sent a given frame (e.g., an RTP SFrame for which client has sent a given frame (e.g., an RTP
synchronization source (SSRC)). SFrame KID values are then used to synchronization source (SSRC)). SFrame KID values are then used to
distinguish between versions of the sender's base_key. distinguish between versions of the sender's base_key.
Key IDs in this scheme have two parts: a "key generation" and a KID values in this scheme have two parts: a "key generation" and a
"ratchet step". Both are unsigned integers that begin at zero. The "ratchet step". Both are unsigned integers that begin at zero. The
"key generation" increments each time the sender distributes a new key generation increments each time the sender distributes a new key
key to receivers. The "ratchet step" is incremented each time the to receivers. The ratchet step is incremented each time the sender
sender ratchets their key forward for forward secrecy: ratchets their key forward for forward secrecy:
base_key[i+1] = HKDF-Expand( base_key[i+1] = HKDF-Expand(
HKDF-Extract("", base_key[i]), HKDF-Extract("", base_key[i]),
"SFrame 1.0 Ratchet", CipherSuite.Nh) "SFrame 1.0 Ratchet", CipherSuite.Nh)
For compactness, we do not send the whole ratchet step. Instead, we For compactness, we do not send the whole ratchet step. Instead, we
send only its low-order R bits, where R is a value set by the send only its low-order R bits, where R is a value set by the
application. Different senders may use different values of R, but application. Different senders may use different values of R, but
each receiver of a given sender needs to know what value of R is used each receiver of a given sender needs to know what value of R is used
by the sender so that they can recognize when they need to ratchet by the sender so that they can recognize when they need to ratchet
(vs. expecting a new key). R effectively defines a reordering (vs. expecting a new key). R effectively defines a reordering
window, since no more than 2^R ratchet steps can be active at a given window, since no more than 2^R ratchet steps can be active at a given
time. The key generation is sent in the remaining 64 - R bits of the time. The key generation is sent in the remaining 64 - R bits of the
Key ID. KID.
KID = (key_generation << R) + (ratchet_step % (1 << R)) KID = (key_generation << R) + (ratchet_step % (1 << R))
64-R bits R bits 64-R bits R bits
<---------------> <------------> <---------------> <------------>
+-----------------+--------------+ +-----------------+--------------+
| Key Generation | Ratchet Step | | Key Generation | Ratchet Step |
+-----------------+--------------+ +-----------------+--------------+
Figure 6: Structure of a KID in the Sender Keys Scheme Figure 7: Structure of a KID in the Sender Keys Scheme
The sender signals such a ratchet step update by sending with a KID The sender signals such a ratchet step update by sending with a KID
value in which the ratchet step has been incremented. A receiver who value in which the ratchet step has been incremented. A receiver who
receives from a sender with a new KID computes the new key as above. receives from a sender with a new KID computes the new key as above.
The old key may be kept for some time to allow for out-of-order The old key may be kept for some time to allow for out-of-order
delivery, but should be deleted promptly. delivery, but should be deleted promptly.
If a new participant joins in the middle of a session, they will need If a new participant joins in the middle of a session, they will need
to receive from each sender (a) the current sender key for that to receive from each sender (a) the current sender key for that
sender and (b) the current KID value for the sender. Evicting a sender and (b) the current KID value for the sender. Evicting a
participant requires each sender to send a fresh sender key to all participant requires each sender to send a fresh sender key to all
receivers. receivers.
It is up to the application to decide when sender keys are updated. It is the application's responsibility to decide when sender keys are
A sender key may be updated by sending a new base_key (updating the updated. A sender key may be updated by sending a new base_key
key generation) or by hashing the current base_key (updating the (updating the key generation) or by hashing the current base_key
ratchet step). Ratcheting the key forward is useful when adding new (updating the ratchet step). Ratcheting the key forward is useful
receivers to an SFrame-based interaction, since it ensures that the when adding new receivers to an SFrame-based interaction, since it
new receivers can't decrypt any media encrypted before they were ensures that the new receivers can't decrypt any media encrypted
added. If a sender wishes to assure the opposite property when before they were added. If a sender wishes to assure the opposite
removing a receiver (i.e., ensuring that the receiver can't decrypt property when removing a receiver (i.e., ensuring that the receiver
media after they are removed), then the sender will need to can't decrypt media after they are removed), then the sender will
distribute a new sender key. need to distribute a new sender key.
5.2. MLS 5.2. MLS
The Messaging Layer Security (MLS) protocol provides group The Messaging Layer Security (MLS) protocol provides group
authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it
could be used to instantiate the sender key scheme above, but it can could be used to instantiate the sender key scheme above, but it can
also be used more efficiently directly. also be used more efficiently directly.
MLS creates a linear sequence of keys, each of which is shared among MLS creates a linear sequence of keys, each of which is shared among
the members of a group at a given point in time. When a member joins the members of a group at a given point in time. When a member joins
skipping to change at page 20, line 49 skipping to change at line 858
member has a unique sframe_key and sframe_salt that it uses to member has a unique sframe_key and sframe_salt that it uses to
encrypt with. Senders may choose any KID value within their assigned encrypt with. Senders may choose any KID value within their assigned
set of KID values, e.g., to allow a single sender to send multiple, set of KID values, e.g., to allow a single sender to send multiple,
uncoordinated outbound media streams. uncoordinated outbound media streams.
base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk)
For compactness, we do not send the whole epoch number. Instead, we For compactness, we do not send the whole epoch number. Instead, we
send only its low-order E bits, where E is a value set by the send only its low-order E bits, where E is a value set by the
application. E effectively defines a reordering window, since no application. E effectively defines a reordering window, since no
more than 2^E epochs can be active at a given time. Receivers MUST more than 2^E epochs can be active at a given time. To handle
be prepared for the epoch counter to roll over, removing an old epoch rollover of the epoch counter, receivers MUST remove an old epoch
when a new epoch with the same E lower bits is introduced. when a new epoch with the same low-order E bits is introduced.
Let S be the number of bits required to encode a member index in the Let S be the number of bits required to encode a member index in the
group, i.e., the smallest value such that group_size <= (1 << S). group, i.e., the smallest value such that group_size <= (1 << S).
The sender index is encoded in the S bits above the epoch. The The sender index is encoded in the S bits above the epoch. The
remaining 64 - S - E bits of the KID value are a context value chosen remaining 64 - S - E bits of the KID value are a context value chosen
by the sender (context value 0 will produce the shortest encoded by the sender (context value 0 will produce the shortest encoded
KID). KID).
KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E))
64-S-E bits S bits E bits 64-S-E bits S bits E bits
<-----------> <------> <------> <-----------> <------> <------>
+-------------+--------+-------+ +-------------+--------+-------+
| Context ID | Index | Epoch | | Context ID | Index | Epoch |
+-------------+--------+-------+ +-------------+--------+-------+
Figure 7: Structure of a KID for an MLS Sender Figure 8: Structure of a KID for an MLS Sender
Once an SFrame stack has been provisioned with the Once an SFrame stack has been provisioned with the
sframe_epoch_secret for an epoch, it can compute the required KID sframe_epoch_secret for an epoch, it can compute the required KID
values on demand (as well as the resulting SFrame keys/nonces derived values on demand (as well as the resulting SFrame keys/nonces derived
from the base_key and KID) as it needs to encrypt or decrypt for a from the base_key and KID) as it needs to encrypt or decrypt for a
given member. given member.
... ...
| |
| |
skipping to change at page 22, line 32 skipping to change at line 912
| +--> context = 3 --> KID = 0xc20 | +--> context = 3 --> KID = 0xc20
| |
| |
Epoch 17 +--+-- index=33 --> KID = 0x211 Epoch 17 +--+-- index=33 --> KID = 0x211
| | | |
| +-- index=51 --> KID = 0x331 | +-- index=51 --> KID = 0x331
| |
| |
... ...
Figure 8: An Example Sequence of KIDs for an MLS-based SFrame Figure 9: An Example Sequence of KIDs for an MLS-based SFrame
Session (E=4; S=6, Allowing for 64 Group Members) Session (E=4; S=6, Allowing for 64 Group Members)
6. Media Considerations 6. Media Considerations
6.1. Selective Forwarding Units 6.1. Selective Forwarding Units
SFUs (e.g., those described in Section 3.7 of [RFC7667]) receive the SFUs (e.g., those described in Section 3.7 of [RFC7667]) receive the
media streams from each participant and select which ones should be media streams from each participant and select which ones should be
forwarded to each of the other participants. There are several forwarded to each of the other participants. There are several
approaches for stream selection, but in general, the SFU needs to approaches for stream selection, but in general, the SFU needs to
access metadata associated with each frame and modify the RTP access metadata associated with each frame and modify the RTP
information of the incoming packets when they are transmitted to the information of the incoming packets when they are transmitted to the
received participants. received participants.
This section describes how these normal SFU modes of operation This section describes how these normal SFU modes of operation
interact with the E2EE provided by SFrame. interact with the E2EE provided by SFrame.
6.1.1. LastN and RTP Stream Reuse 6.1.1. RTP Stream Reuse
The SFU may choose to send only a certain number of streams based on The SFU may choose to send only a certain number of streams based on
the voice activity of the participants. To avoid the overhead the voice activity of the participants. To avoid the overhead
involved in establishing new transport streams, the SFU may decide to involved in establishing new transport streams, the SFU may decide to
reuse previously existing streams or even pre-allocate a predefined reuse previously existing streams or even pre-allocate a predefined
number of streams and choose in each moment in time which participant number of streams and choose in each moment in time which participant
media will be sent through it. media will be sent through it.
This means that in the same transport-level stream (e.g., an RTP This means that the same transport-level stream (e.g., an RTP stream
stream defined by either SSRC or Media Identification (MID)) may defined by either SSRC or Media Identification (MID)) may carry media
carry media from different streams of different participants. As from different streams of different participants. Because each
different keys are used by each participant for encoding their media, participant uses a different key to encrypt their media, the receiver
the receiver will be able to verify which is the sender of the media will be able to verify the sender of the media within the RTP stream
coming within the RTP stream at any given point in time, preventing at any given point in time. Thus the receiver will correctly
the SFU trying to impersonate any of the participants with another associate the media with the sender indicated by the authenticated
participant's media. SFrame KID value, irrespective of how the SFU transmits the media to
the client.
Note that in order to prevent impersonation by a malicious Note that in order to prevent impersonation by a malicious
participant (not the SFU), a mechanism based on digital signature participant (not the SFU), a mechanism based on digital signature
would be required. SFrame does not protect against such attacks. would be required. SFrame does not protect against such attacks.
6.1.2. Simulcast 6.1.2. Simulcast
When using simulcast, the same input image will produce N different When using simulcast, the same input image will produce N different
encoded frames (one per simulcast layer), which would be processed encoded frames (one per simulcast layer), which would be processed
independently by the frame encryptor and assigned an unique counter independently by the frame encryptor and assigned an unique CTR value
for each. for each.
6.1.3. SVC 6.1.3. Scalable Video Coding (SVC)
In both temporal and spatial scalability, the SFU may choose to drop In both temporal and spatial scalability, the SFU may choose to drop
layers in order to match a certain bitrate or to forward specific layers in order to match a certain bitrate or to forward specific
media sizes or frames per second. In order to support the SFU media sizes or frames per second. In order to support the SFU
selectively removing layers, the sender MUST encapsulate each layer selectively removing layers, the sender MUST encapsulate each layer
in a different SFrame ciphertext. in a different SFrame ciphertext.
6.2. Video Key Frames 6.2. Video Key Frames
Forward security and post-compromise security require that the E2EE Forward security and post-compromise security require that the E2EE
keys (base keys) are updated any time a participant joins or leaves keys (base keys) are updated any time a participant joins or leaves
the call. the call.
The key exchange happens asynchronously and on a different path than The key exchange happens asynchronously and on a different path than
the SFU signaling and media. So it may happen that, when a new the SFU signaling and media. So it may happen that when a new
participant joins the call and the SFU side requests a key frame, the participant joins the call and the SFU side requests a key frame, the
sender generates the E2EE frame with a key that is not known by the sender generates the E2EE frame with a key that is not known by the
receiver, so it will be discarded. When the sender updates his receiver, so it will be discarded. When the sender updates his
sending key with the new key, it will send it in a non-key frame, so sending key with the new key, it will send it in a non-key frame, so
the receiver will be able to decrypt it, but not decode it. the receiver will be able to decrypt it, but not decode it.
The new receiver will then re-request a key frame, but due to sender The new receiver will then re-request a key frame, but due to sender
and SFU policies, that new key frame could take some time to be and SFU policies, that new key frame could take some time to be
generated. generated.
If the sender sends a key frame after the new E2EE key is in use, the If the sender sends a key frame after the new E2EE key is in use, the
time required for the new participant to display the video is time required for the new participant to display the video is
minimized. minimized.
Note that this issue does not arise for media streams that do not Note that this issue does not arise for media streams that do not
have dependencies among frames, e.g., audio streams. In these have dependencies among frames, e.g., audio streams. In these
streams, each frame is independently decodable, so there is never a streams, each frame is independently decodable, so a frame never
need to process together two frames that might be on two sides of a depends on another frame that might be on the other side of a key
key rotation. rotation.
6.3. Partial Decoding 6.3. Partial Decoding
Some codecs support partial decoding, where individual packets can be Some codecs support partial decoding, where individual packets can be
decoded without waiting for the full frame to arrive. When SFrame is decoded without waiting for the full frame to arrive. When SFrame is
applied per frame, partial decoding is not possible because the applied per frame, partial decoding is not possible because the
decoder cannot access data until an entire frame has arrived and has decoder cannot access data until an entire frame has arrived and has
been decrypted. been decrypted.
7. Security Considerations 7. Security Considerations
7.1. No Header Confidentiality 7.1. No Header Confidentiality
SFrame provides integrity protection to the SFrame header (the Key ID SFrame provides integrity protection to the SFrame header (the KID
and counter values), but it does not provide confidentiality and CTR values), but it does not provide confidentiality protection.
protection. Parties that can observe the SFrame header may learn, Parties that can observe the SFrame header may learn, for example,
for example, which parties are sending SFrame payloads (from KID which parties are sending SFrame payloads (from KID values) and at
values) and at what rates (from CTR values). In cases where SFrame what rates (from CTR values). In cases where SFrame is used for end-
is used for end-to-end security on top of hop-by-hop protections to-end security on top of hop-by-hop protections (e.g., running over
(e.g., running over SRTP as described in Appendix B.5), the hop-by- SRTP as described in Appendix B.5), the hop-by-hop security
hop security mechanisms provide confidentiality protection of the mechanisms provide confidentiality protection of the SFrame header
SFrame header between hops. between hops.
7.2. No per-Sender Authentication 7.2. No Per-Sender Authentication
SFrame does not provide per-sender authentication of media data. Any SFrame does not provide per-sender authentication of media data. Any
sender in a session can send media that will be associated with any sender in a session can send media that will be associated with any
other sender. This is because SFrame uses symmetric encryption to other sender. This is because SFrame uses symmetric encryption to
protect media data, so that any receiver also has the keys required protect media data, so that any receiver also has the keys required
to encrypt packets for the sender. to encrypt packets for the sender.
7.3. Key Management 7.3. Key Management
The key exchange mechanism is out of scope of this document; however, The specifics of key management are beyond the scope of this
every client SHOULD change their keys when new clients join or leave document. However, every client SHOULD change their keys when new
the call for forward secrecy and post-compromise security. clients join or leave the call for forward secrecy and post-
compromise security.
7.4. Replay 7.4. Replay
The handling of replay is out of the scope of this document. The handling of replay is out of the scope of this document.
However, senders MUST reject requests to encrypt multiple times with However, senders MUST reject requests to encrypt multiple times with
the same key and nonce since several AEAD algorithms fail badly in the same key and nonce since several AEAD algorithms fail badly in
such cases (see, e.g., Section 5.1.1 of [RFC5116]). such cases (see, e.g., Section 5.1.1 of [RFC5116]).
7.5. Risks Due to Short Tags 7.5. Risks Due to Short Tags
skipping to change at page 26, line 14 skipping to change at line 1073
* Receivers only accept SFrame ciphertexts over HBH-secure channels * Receivers only accept SFrame ciphertexts over HBH-secure channels
(e.g., SRTP security associations or QUIC connections). If this (e.g., SRTP security associations or QUIC connections). If this
is the case, only an entity that is part of such a channel can is the case, only an entity that is part of such a channel can
mount the above attack. mount the above attack.
* The expected packet rate for a media stream is very predictable * The expected packet rate for a media stream is very predictable
(and typically far lower than the above example). On the one (and typically far lower than the above example). On the one
hand, attacks at this rate will succeed even less often than the hand, attacks at this rate will succeed even less often than the
high-rate attack described above. On the other hand, the high-rate attack described above. On the other hand, the
application may use an elevated packet-arrival rate as a signal of application may use an elevated packet arrival rate as a signal of
a brute-force attack. This latter approach is common in other a brute-force attack. This latter approach is common in other
settings, e.g., mitigating brute-force attacks on passwords. settings, e.g., mitigating brute-force attacks on passwords.
* Media applications typically do not provide feedback to media * Media applications typically do not provide feedback to media
senders as to which media packets failed to decrypt. When media- senders as to which media packets failed to decrypt. When media-
quality feedback mechanisms are used, decryption failures will quality feedback mechanisms are used, decryption failures will
typically appear as packet losses, but only at an aggregate level. typically appear as packet losses, but only at an aggregate level.
* Anti-replay mechanisms (see Section 7.4) prevent the attacker from * Anti-replay mechanisms (see Section 7.4) prevent the attacker from
reusing valid ciphertexts (either observed or guessed by the reusing valid ciphertexts (either observed or guessed by the
skipping to change at page 26, line 39 skipping to change at line 1098
encrypted content is unchanged. In other words, when the above encrypted content is unchanged. In other words, when the above
brute-force attack succeeds, it only allows the attacker to send a brute-force attack succeeds, it only allows the attacker to send a
single SFrame ciphertext; the ciphertext cannot be reused because single SFrame ciphertext; the ciphertext cannot be reused because
either it will have the same CTR value and be discarded as a either it will have the same CTR value and be discarded as a
replay, or else it will have a different CTR value and its tag replay, or else it will have a different CTR value and its tag
will no longer be valid. will no longer be valid.
Nonetheless, without these mitigations, an application that makes use Nonetheless, without these mitigations, an application that makes use
of short tags will be at heightened risk of forgery attacks. In many of short tags will be at heightened risk of forgery attacks. In many
cases, it is simpler to use full-size tags and tolerate slightly cases, it is simpler to use full-size tags and tolerate slightly
higher-bandwidth usage rather than to add the additional defenses higher bandwidth usage rather than to add the additional defenses
necessary to safely use short tags. necessary to safely use short tags.
8. IANA Considerations 8. IANA Considerations
IANA has created a new registry called "SFrame Cipher Suites" IANA has created a new registry called "SFrame Cipher Suites"
(Section 8.1) under the "SFrame" group registry heading. Assignments (Section 8.1) under the "SFrame" group registry heading.
are made via the Specification Required policy [RFC8126].
8.1. SFrame Cipher Suites 8.1. SFrame Cipher Suites
The "SFrame Cipher Suites" registry lists identifiers for SFrame The "SFrame Cipher Suites" registry lists identifiers for SFrame
cipher suites as defined in Section 4.5. The cipher suite field is cipher suites as defined in Section 4.5. The cipher suite field is
two bytes wide, so the valid cipher suites are in the range 0x0000 to two bytes wide, so the valid cipher suites are in the range 0x0000 to
0xFFFF. 0xFFFF. Except as noted below, assignments are made via the
Specification Required policy [RFC8126].
The registration template is as follows: The registration template is as follows:
* Value: The numeric value of the cipher suite * Value: The numeric value of the cipher suite
* Name: The name of the cipher suite * Name: The name of the cipher suite
* Recommended: Whether support for this cipher suite is recommended * Recommended: Whether support for this cipher suite is recommended
by the IETF. Valid values are "Y", "N", and "D" as described in by the IETF. Valid values are "Y", "N", and "D" as described in
Section 17.1 of [MLS-PROTO]. The default value of the Section 17.1 of [MLS-PROTO]. The default value of the
skipping to change at page 28, line 34 skipping to change at line 1163
+--------+----------------------------+---+-----------+------------+ +--------+----------------------------+---+-----------+------------+
Table 2: SFrame Cipher Suites Table 2: SFrame Cipher Suites
9. Application Responsibilities 9. Application Responsibilities
To use SFrame, an application needs to define the inputs to the To use SFrame, an application needs to define the inputs to the
SFrame encryption and decryption operations, and how SFrame SFrame encryption and decryption operations, and how SFrame
ciphertexts are delivered from sender to receiver (including any ciphertexts are delivered from sender to receiver (including any
fragmentation and reassembly). In this section, we lay out fragmentation and reassembly). In this section, we lay out
additional requirements that an implementation must meet in order for additional requirements that an application must meet in order for
SFrame to operate securely. SFrame to operate securely.
In general, an application using SFrame is responsible for In general, an application using SFrame is responsible for
configuring SFrame. The application must first define when SFrame is configuring SFrame. The application must first define when SFrame is
applied at all. When SFrame is applied, the application must define applied at all. When SFrame is applied, the application must define
which cipher suite is to be used. If new versions of SFrame are which cipher suite is to be used. If new versions of SFrame are
defined in the future, it will be up to the application to determine defined in the future, it will be the application's responsibility to
which version should be used. determine which version should be used.
This division of responsibilities is similar to the way other media This division of responsibilities is similar to the way other media
parameters (e.g., codecs) are typically handled in media parameters (e.g., codecs) are typically handled in media
applications, in the sense that they are set up in some signaling applications, in the sense that they are set up in some signaling
protocol and not described in the media. Applications might find it protocol and not described in the media. Applications might find it
useful to extend the protocols used for negotiating other media useful to extend the protocols used for negotiating other media
parameters (e.g., Session Description Protocol (SDP) [RFC8866]) to parameters (e.g., Session Description Protocol (SDP) [RFC8866]) to
also negotiate parameters for SFrame. also negotiate parameters for SFrame.
9.1. Header Value Uniqueness 9.1. Header Value Uniqueness
skipping to change at page 29, line 27 skipping to change at line 1203
persistent storage, this context needs to include the last-used CTR persistent storage, this context needs to include the last-used CTR
value. When the context is used later, the application should use value. When the context is used later, the application should use
the stored CTR value to determine the next CTR value to be used in an the stored CTR value to determine the next CTR value to be used in an
encryption operation, and then write the next CTR value back to encryption operation, and then write the next CTR value back to
storage before using the CTR value for encryption. Storing the CTR storage before using the CTR value for encryption. Storing the CTR
value before usage (vs. after) helps ensure that a storage failure value before usage (vs. after) helps ensure that a storage failure
will not cause reuse of the same (base_key, KID, CTR) combination. will not cause reuse of the same (base_key, KID, CTR) combination.
9.2. Key Management Framework 9.2. Key Management Framework
It is up to the application to provision SFrame with a mapping of KID The application is responsible for provisioning SFrame with a mapping
values to base_key values and the resulting keys and salts. More of KID values to base_key values and the resulting keys and salts.
importantly, the application specifies which KID values are used for More importantly, the application specifies which KID values are used
which purposes (e.g., by which senders). An application's KID for which purposes (e.g., by which senders). An application's KID
assignment strategy MUST be structured to assure the non-reuse assignment strategy MUST be structured to assure the non-reuse
properties discussed in Section 9.1. properties discussed in Section 9.1.
It is also up to the application to define a rotation schedule for The application is also responsible for defining a rotation schedule
keys. For example, one application might have an ephemeral group for for keys. For example, one application might have an ephemeral group
every call and keep rotating keys when endpoints join or leave the for every call and keep rotating keys when endpoints join or leave
call, while another application could have a persistent group that the call, while another application could have a persistent group
can be used for multiple calls and simply derives ephemeral symmetric that can be used for multiple calls and simply derives ephemeral
keys for a specific call. symmetric keys for a specific call.
It should be noted that KID values are not encrypted by SFrame and It should be noted that KID values are not encrypted by SFrame and
are thus visible to any application-layer intermediaries that might are thus visible to any application-layer intermediaries that might
handle an SFrame ciphertext. If there are application semantics handle an SFrame ciphertext. If there are application semantics
included in KID values, then this information would be exposed to included in KID values, then this information would be exposed to
intermediaries. For example, in the scheme of Section 5.1, the intermediaries. For example, in the scheme of Section 5.1, the
number of ratchet steps per sender is exposed, and in the scheme of number of ratchet steps per sender is exposed, and in the scheme of
Section 5.2, the number of epochs and the MLS sender ID of the SFrame Section 5.2, the number of epochs and the MLS sender ID of the SFrame
sender are exposed. sender are exposed.
skipping to change at page 30, line 21 skipping to change at line 1242
key and nonce. key and nonce.
It is not mandatory to implement anti-replay on the receiver side. It is not mandatory to implement anti-replay on the receiver side.
Receivers MAY apply time- or counter-based anti-replay mitigations. Receivers MAY apply time- or counter-based anti-replay mitigations.
For example, Section 3.3.2 of [RFC3711] specifies a counter-based For example, Section 3.3.2 of [RFC3711] specifies a counter-based
anti-replay mitigation, which could be adapted to use with SFrame, anti-replay mitigation, which could be adapted to use with SFrame,
using the CTR field as the counter. using the CTR field as the counter.
9.4. Metadata 9.4. Metadata
The metadata input to SFrame operations is pure application-specified The metadata input to SFrame operations is an opaque byte string
data. As such, it is up to the application to define what specified by the application. As such, the application needs to
information should go in the metadata input and ensure that it is define what information should go in the metadata input and ensure
provided to the encryption and decryption functions at the that it is provided to the encryption and decryption functions at the
appropriate points. A receiver MUST NOT use SFrame-authenticated appropriate points. A receiver MUST NOT use SFrame-authenticated
metadata until after the SFrame decrypt function has authenticated metadata until after the SFrame decrypt function has authenticated
it, unless the purpose of such usage is to prepare an SFrame it, unless the purpose of such usage is to prepare an SFrame
ciphertext for SFrame decryption. Essentially, metadata may be used ciphertext for SFrame decryption. Essentially, metadata may be used
"upstream of SFrame" in a processing pipeline, but only to prepare "upstream of SFrame" in a processing pipeline, but only to prepare
for SFrame decryption. for SFrame decryption.
For example, consider an application where SFrame is used to encrypt For example, consider an application where SFrame is used to encrypt
audio frames that are sent over SRTP, with some application data audio frames that are sent over SRTP, with some application data
included in the RTP header extension. Suppose the application also included in the RTP header extension. Suppose the application also
skipping to change at page 30, line 52 skipping to change at line 1273
data. data.
10. References 10. References
10.1. Normative References 10.1. Normative References
[MLS-PROTO] [MLS-PROTO]
Barnes, R., Beurdouche, B., Robert, R., Millican, J., Barnes, R., Beurdouche, B., Robert, R., Millican, J.,
Omara, E., and K. Cohn-Gordon, "The Messaging Layer Omara, E., and K. Cohn-Gordon, "The Messaging Layer
Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420,
July 2023, <https://www.rfc-editor.org/rfc/rfc9420>. July 2023, <https://www.rfc-editor.org/info/rfc9420>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated [RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated
Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008,
<https://www.rfc-editor.org/rfc/rfc5116>. <https://www.rfc-editor.org/info/rfc5116>.
[RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
Key Derivation Function (HKDF)", RFC 5869, Key Derivation Function (HKDF)", RFC 5869,
DOI 10.17487/RFC5869, May 2010, DOI 10.17487/RFC5869, May 2010,
<https://www.rfc-editor.org/rfc/rfc5869>. <https://www.rfc-editor.org/info/rfc5869>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26, Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017, RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/rfc/rfc8126>. <https://www.rfc-editor.org/info/rfc8126>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
10.2. Informative References 10.2. Informative References
[I-D.codec-agnostic-rtp-payload-format]
Murillo, S. G. and A. Gouaillard, "Codec agnostic RTP
payload format for video", Work in Progress, Internet-
Draft, draft-codec-agnostic-rtp-payload-format-00, 19
February 2021, <https://datatracker.ietf.org/doc/html/
draft-codec-agnostic-rtp-payload-format-00>.
[I-D.ietf-moq-transport]
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and
I. Swett, "Media over QUIC Transport", Work in Progress,
Internet-Draft, draft-ietf-moq-transport-04, 29 May 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-moq-
transport-04>.
[I-D.ietf-webtrans-overview]
Vasiliev, V., "The WebTransport Protocol Framework", Work
in Progress, Internet-Draft, draft-ietf-webtrans-overview-
07, 4 March 2024, <https://datatracker.ietf.org/doc/html/
draft-ietf-webtrans-overview-07>.
[MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and [MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and
A. Duric, "The Messaging Layer Security (MLS) A. Duric, "The Messaging Layer Security (MLS)
Architecture", Work in Progress, Internet-Draft, draft- Architecture", Work in Progress, Internet-Draft, draft-
ietf-mls-architecture-13, 22 March 2024, ietf-mls-architecture-14, 8 July 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-mls- <https://datatracker.ietf.org/doc/html/draft-ietf-mls-
architecture-13>. architecture-14>.
[MOQ-TRANSPORT]
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and
I. Swett, Ed., "Media over QUIC Transport", Work in
Progress, Internet-Draft, draft-ietf-moq-transport-05, 8
July 2024, <https://datatracker.ietf.org/doc/html/draft-
ietf-moq-transport-05>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, DOI 10.17487/RFC3711, March 2004, RFC 3711, DOI 10.17487/RFC3711, March 2004,
<https://www.rfc-editor.org/rfc/rfc3711>. <https://www.rfc-editor.org/info/rfc3711>.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
September 2012, <https://www.rfc-editor.org/rfc/rfc6716>. September 2012, <https://www.rfc-editor.org/info/rfc6716>.
[RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
for Real-Time Transport Protocol (RTP) Sources", RFC 7656, for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
DOI 10.17487/RFC7656, November 2015, DOI 10.17487/RFC7656, November 2015,
<https://www.rfc-editor.org/rfc/rfc7656>. <https://www.rfc-editor.org/info/rfc7656>.
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
DOI 10.17487/RFC7667, November 2015, DOI 10.17487/RFC7667, November 2015,
<https://www.rfc-editor.org/rfc/rfc7667>. <https://www.rfc-editor.org/info/rfc7667>.
[RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach,
"Double Encryption Procedures for the Secure Real-Time "Double Encryption Procedures for the Secure Real-Time
Transport Protocol (SRTP)", RFC 8723, Transport Protocol (SRTP)", RFC 8723,
DOI 10.17487/RFC8723, April 2020, DOI 10.17487/RFC8723, April 2020,
<https://www.rfc-editor.org/rfc/rfc8723>. <https://www.rfc-editor.org/info/rfc8723>.
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
Session Description Protocol", RFC 8866, Session Description Protocol", RFC 8866,
DOI 10.17487/RFC8866, January 2021, DOI 10.17487/RFC8866, January 2021,
<https://www.rfc-editor.org/rfc/rfc8866>. <https://www.rfc-editor.org/info/rfc8866>.
[RTP-PAYLOAD]
Murillo, S. G., Fablet, Y., and A. Gouaillard, "Codec
agnostic RTP payload format for video", Work in Progress,
Internet-Draft, draft-gouaillard-avtcore-codec-agn-rtp-
payload-01, 9 March 2021,
<https://datatracker.ietf.org/doc/html/draft-gouaillard-
avtcore-codec-agn-rtp-payload-01>.
[TestVectors] [TestVectors]
"SFrame Test Vectors", commit 025d568, September 2023, "SFrame Test Vectors", commit 025d568, September 2023,
<https://github.com/sframe-wg/sframe/blob/main/test- <https://github.com/sframe-wg/sframe/blob/025d568/test-
vectors/test-vectors.json>. vectors/test-vectors.json>.
[WEBTRANSPORT]
Vasiliev, V., "The WebTransport Protocol Framework", Work
in Progress, Internet-Draft, draft-ietf-webtrans-overview-
07, 4 March 2024, <https://datatracker.ietf.org/doc/html/
draft-ietf-webtrans-overview-07>.
Appendix A. Example API Appendix A. Example API
*This section is not normative.* *This section is not normative.*
This section describes a notional API that an SFrame implementation This section describes a notional API that an SFrame implementation
might expose. The core concept is an "SFrame context", within which might expose. The core concept is an "SFrame context", within which
KID values are meaningful. In the key management scheme described in KID values are meaningful. In the key management scheme described in
Section 5.1, each sender has a different context; in the scheme Section 5.1, each sender has a different context; in the scheme
described in Section 5.2, all senders share the same context. described in Section 5.2, all senders share the same context.
skipping to change at page 33, line 18 skipping to change at line 1386
operations). A key context tracks the key and salt associated to the operations). A key context tracks the key and salt associated to the
KID, and the current CTR value. A key context to be used for sending KID, and the current CTR value. A key context to be used for sending
also tracks the next CTR value to be used. also tracks the next CTR value to be used.
The primary operations on an SFrame context are as follows: The primary operations on an SFrame context are as follows:
* *Create an SFrame context:* The context is initialized with a * *Create an SFrame context:* The context is initialized with a
cipher suite and no KID mappings. cipher suite and no KID mappings.
* *Add a key for sending:* The key and salt are derived from the * *Add a key for sending:* The key and salt are derived from the
base key, and are used to initialize a send context, together with base key and used to initialize a send context, together with a
a zero counter value. zero CTR value.
* *Add a key for receiving:* The key and salt are derived from the * *Add a key for receiving:* The key and salt are derived from the
base key, and are used to initialize a send context. base key and used to initialize a send context.
* *Encrypt a plaintext:* Encrypt a given plaintext using the key for * *Encrypt a plaintext:* Encrypt a given plaintext using the key for
a given KID, including the specified metadata. a given KID, including the specified metadata.
* *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with * *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with
the KID and CTR values specified in the SFrame header, and the the KID and CTR values specified in the SFrame header, and the
provided metadata. provided metadata.
Figure 9 shows an example of the types of structures and methods that Figure 10 shows an example of the types of structures and methods
could be used to create an SFrame API in Rust. that could be used to create an SFrame API in Rust.
type KeyId = u64; type KeyId = u64;
type Counter = u64; type Counter = u64;
type CipherSuite = u16; type CipherSuite = u16;
struct SendKeyContext { struct SendKeyContext {
key: Vec<u8>, key: Vec<u8>,
salt: Vec<u8>, salt: Vec<u8>,
next_counter: Counter, next_counter: Counter,
} }
skipping to change at page 34, line 35 skipping to change at line 1432
trait SFrameContextMethods { trait SFrameContextMethods {
fn create(cipher_suite: CipherSuite) -> Self; fn create(cipher_suite: CipherSuite) -> Self;
fn add_send_key(&self, kid: KeyId, base_key: &[u8]); fn add_send_key(&self, kid: KeyId, base_key: &[u8]);
fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); fn add_recv_key(&self, kid: KeyId, base_key: &[u8]);
fn encrypt(&mut self, kid: KeyId, metadata: &[u8], fn encrypt(&mut self, kid: KeyId, metadata: &[u8],
plaintext: &[u8]) -> Vec<u8>; plaintext: &[u8]) -> Vec<u8>;
fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>; fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>;
} }
Figure 9: An Example SFrame API Figure 10: An Example SFrame API
Appendix B. Overhead Analysis Appendix B. Overhead Analysis
Any use of SFrame will impose overhead in terms of the amount of Any use of SFrame will impose overhead in terms of the amount of
bandwidth necessary to transmit a given media stream. Exactly how bandwidth necessary to transmit a given media stream. Exactly how
much overhead will be added depends on several factors: much overhead will be added depends on several factors:
* The number of senders involved in a conference (length of KID) * The number of senders involved in a conference (length of KID)
* The duration of the conference (length of CTR) * The duration of the conference (length of CTR)
skipping to change at page 35, line 24 skipping to change at line 1468
In the remainder of this section, we compute overhead estimates for a In the remainder of this section, we compute overhead estimates for a
collection of common scenarios. collection of common scenarios.
B.1. Assumptions B.1. Assumptions
In the below calculations, we make conservative assumptions about In the below calculations, we make conservative assumptions about
SFrame overhead so that the overhead amounts we compute here are SFrame overhead so that the overhead amounts we compute here are
likely to be an upper bound of those seen in practice. likely to be an upper bound of those seen in practice.
+==============+=======+================================+ +==============+=======+============================+
| Field | Bytes | Explanation | | Field | Bytes | Explanation |
+==============+=======+================================+ +==============+=======+============================+
| Fixed header | 1 | Fixed | | Config byte | 1 | Fixed |
+--------------+-------+--------------------------------+ +--------------+-------+----------------------------+
| Key ID (KID) | 2 | >255 senders; or MLS epoch | | Key ID (KID) | 2 | >255 senders; or MLS epoch |
| | | (E=4) and >16 senders | | | | (E=4) and >16 senders |
+--------------+-------+--------------------------------+ +--------------+-------+----------------------------+
| Counter | 3 | More than 24 hours of media in | | Counter | 3 | More than 24 hours of |
| (CTR) | | common cases | | (CTR) | | media in common cases |
+--------------+-------+--------------------------------+ +--------------+-------+----------------------------+
| Cipher | 16 | Full Galois/Counter Mode (GCM) | | Cipher | 16 | Full authentication tag |
| overhead | | tag (longest defined here) | | overhead | | (longest defined here) |
+--------------+-------+--------------------------------+ +--------------+-------+----------------------------+
Table 3: Overhead Analysis Assumptions Table 3: Overhead Analysis Assumptions
In total, then, we assume that each SFrame encryption will add 22 In total, then, we assume that each SFrame encryption will add 22
bytes of overhead. bytes of overhead.
We consider two scenarios: applying SFrame per frame and per packet. We consider two scenarios: applying SFrame per frame and per packet.
In each scenario, we compute the SFrame overhead in absolute terms In each scenario, we compute the SFrame overhead in absolute terms
(kbps) and as a percentage of the base bandwidth. (kbps) and as a percentage of the base bandwidth.
B.2. Audio B.2. Audio
In audio streams, there is typically a one-to-one relationship In audio streams, there is typically a one-to-one relationship
between frames and packets, so the overhead is the same whether one between frames and packets, so the overhead is the same whether one
uses SFrame at a per-packet or per-frame level. uses SFrame at a per-packet or per-frame level.
Table 4 considers three scenarios that are based on recommended Table 4 considers three scenarios that are based on recommended
configurations of the Opus codec [RFC6716]: configurations of the Opus codec [RFC6716] (where "fps" stands for
"frames per second"):
* Narrow-band (NB) speech: 120 ms packets, 8 kbps
* Full-band (FB) speech: 20 ms packets, 32 kbps
* Full-band stereo music: 10 ms packets, 128 kbps
+================+==============+======+==========+==========+ +==============+==============+=====+======+==========+==========+
| Scenario | Frames per | Base | Overhead | Overhead | | Scenario | Frame length | fps | Base | Overhead | Overhead |
| | Second (fps) | kbps | kbps | % | | | | | kbps | kbps | % |
+================+==============+======+==========+==========+ +==============+==============+=====+======+==========+==========+
| NB speech, 120 | 8.3 | 8 | 1.4 | 17.9% | | Narrow-band | 120 ms | 8.3 | 8 | 1.4 | 17.9% |
| ms packets | | | | | | speech | | | | | |
+----------------+--------------+------+----------+----------+ +--------------+--------------+-----+------+----------+----------+
| FB speech, 20 | 50 | 32 | 8.6 | 26.9% | | Full-band | 20 ms | 50 | 32 | 8.6 | 26.9% |
| ms packets | | | | | | speech | | | | | |
+----------------+--------------+------+----------+----------+ +--------------+--------------+-----+------+----------+----------+
| FB stereo, 10 | 100 | 128 | 17.2 | 13.4% | | Full-band | 10 ms | 100 | 128 | 17.2 | 13.4% |
| ms packets | | | | | | stereo music | | | | | |
+----------------+--------------+------+----------+----------+ +--------------+--------------+-----+------+----------+----------+
Table 4: SFrame Overhead for Audio Streams Table 4: SFrame Overhead for Audio Streams
B.3. Video B.3. Video
Video frames can be larger than an MTU and thus are commonly split Video frames can be larger than an MTU and thus are commonly split
across multiple frames. Table 5 and Table 6 show the estimated across multiple frames. Tables 5 and 6 show the estimated overhead
overhead of encrypting a video stream, where SFrame is applied per of encrypting a video stream, where SFrame is applied per frame and
frame and per packet, respectively. The choices of resolution, per packet, respectively. The choices of resolution, frames per
frames per second, and bandwidth roughly reflect the capabilities of second, and bandwidth roughly reflect the capabilities of modern
modern video codecs across a range from very-low to very-high video codecs across a range from very low to very high quality.
quality.
+=============+=====+===========+===============+============+ +=============+=====+===========+===============+============+
| Scenario | fps | Base kbps | Overhead kbps | Overhead % | | Scenario | fps | Base kbps | Overhead kbps | Overhead % |
+=============+=====+===========+===============+============+ +=============+=====+===========+===============+============+
| 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | | 426 x 240 | 7.5 | 45 | 1.3 | 2.9% |
+-------------+-----+-----------+---------------+------------+ +-------------+-----+-----------+---------------+------------+
| 640 x 360 | 15 | 200 | 2.6 | 1.3% | | 640 x 360 | 15 | 200 | 2.6 | 1.3% |
+-------------+-----+-----------+---------------+------------+ +-------------+-----+-----------+---------------+------------+
| 640 x 360 | 30 | 400 | 5.2 | 1.3% | | 640 x 360 | 30 | 400 | 5.2 | 1.3% |
+-------------+-----+-----------+---------------+------------+ +-------------+-----+-----------+---------------+------------+
skipping to change at page 38, line 9 skipping to change at line 1578
as the quality of the video improves since bandwidth is driven more as the quality of the video improves since bandwidth is driven more
by picture size than frame rate. In the per-packet case, the SFrame by picture size than frame rate. In the per-packet case, the SFrame
percentage overhead approaches the ratio between the SFrame overhead percentage overhead approaches the ratio between the SFrame overhead
per packet and the MTU (here 22 bytes of SFrame overhead divided by per packet and the MTU (here 22 bytes of SFrame overhead divided by
an assumed 1200-byte MTU, or about 1.8%). an assumed 1200-byte MTU, or about 1.8%).
B.4. Conferences B.4. Conferences
Real conferences usually involve several audio and video streams. Real conferences usually involve several audio and video streams.
The overhead of SFrame in such a conference is the aggregate of the The overhead of SFrame in such a conference is the aggregate of the
overhead of all the individual streams. Thus, while SFrame incurs a overhead across all the individual streams. Thus, while SFrame
large percentage overhead on an audio stream, if the conference also incurs a large percentage overhead on an audio stream, if the
involves a video stream, then the audio overhead is likely negligible conference also involves a video stream, then the audio overhead is
relative to the overall bandwidth of the conference. likely negligible relative to the overall bandwidth of the
conference.
For example, Table 7 shows the overhead estimates for a two-person For example, Table 7 shows the overhead estimates for a two-person
conference where one person is sending low-quality media and the conference where one person is sending low-quality media and the
other is sending high-quality media. (And we assume that SFrame is other is sending high-quality media. (And we assume that SFrame is
applied per frame.) The video streams dominate the bandwidth at the applied per frame.) The video streams dominate the bandwidth at the
SFU, so the total bandwidth overhead is only around 1%. SFU, so the total bandwidth overhead is only around 1%.
+=====================+===========+===============+============+ +=====================+===========+===============+============+
| Stream | Base Kbps | Overhead Kbps | Overhead % | | Stream | Base Kbps | Overhead Kbps | Overhead % |
+=====================+===========+===============+============+ +=====================+===========+===============+============+
skipping to change at page 38, line 48 skipping to change at line 1618
SFrame is a generic encapsulation format, but many of the SFrame is a generic encapsulation format, but many of the
applications in which it is likely to be integrated are based on RTP. applications in which it is likely to be integrated are based on RTP.
This section discusses how an integration between SFrame and RTP This section discusses how an integration between SFrame and RTP
could be done, and some of the challenges that would need to be could be done, and some of the challenges that would need to be
overcome. overcome.
As discussed in Section 4.1, there are two natural patterns for As discussed in Section 4.1, there are two natural patterns for
integrating SFrame into an application: applying SFrame per frame or integrating SFrame into an application: applying SFrame per frame or
per packet. In RTP-based applications, applying SFrame per packet per packet. In RTP-based applications, applying SFrame per packet
means that the payload of each RTP packet will be an SFrame means that the payload of each RTP packet will be an SFrame
ciphertext, starting with an SFrame header, as shown in Figure 10. ciphertext, starting with an SFrame header, as shown in Figure 11.
Applying SFrame per frame means that different RTP payloads will have Applying SFrame per frame means that different RTP payloads will have
different formats: the first payload of a frame will contain the different formats: The first payload of a frame will contain the
SFrame headers, and subsequent payloads will contain further chunks SFrame headers, and subsequent payloads will contain further chunks
of the ciphertext, as shown in Figure 11. of the ciphertext, as shown in Figure 12.
In order for these media payloads to be properly interpreted by In order for these media payloads to be properly interpreted by
receivers, receivers will need to be configured to know which of the receivers, receivers will need to be configured to know which of the
above schemes the sender has applied to a given sequence of RTP above schemes the sender has applied to a given sequence of RTP
packets. SFrame does not provide a mechanism for distributing this packets. SFrame does not provide a mechanism for distributing this
configuration information. In applications that use SDP for configuration information. In applications that use SDP for
negotiating RTP media streams [RFC8866], an appropriate extension to negotiating RTP media streams [RFC8866], an appropriate extension to
SDP could provide this function. SDP could provide this function.
Applying SFrame per frame also requires that packetization and Applying SFrame per frame also requires that packetization and
depacketization be done in a generic manner that does not depend on depacketization be done in a generic manner that does not depend on
the media content of the packets, since the content being packetized/ the media content of the packets, since the content being packetized
depacketized will be opaque ciphertext (except for the SFrame or depacketized will be opaque ciphertext (except for the SFrame
header). In order for such a generic packetization scheme to work header). In order for such a generic packetization scheme to work
interoperably, one would have to be defined, e.g., as proposed in interoperably, one would have to be defined, e.g., as proposed in
[I-D.codec-agnostic-rtp-payload-format]. [RTP-PAYLOAD].
+---+-+-+-------+-+-------------+-------------------------------+<-+ +---+-+-+-------+-+-----------+------------------------------+<-+
|V=2|P|X| CC |M| PT | sequence number | | |V=2|P|X| CC |M| PT | sequence number | |
+---+-+-+-------+-+-------------+-------------------------------+ | +---+-+-+-------+-+-----------+------------------------------+ |
| timestamp | | | timestamp | |
+---------------------------------------------------------------+ | +------------------------------------------------------------+ |
| synchronization source (SSRC) identifier | | | synchronization source (SSRC) identifier | |
+===============================================================+ | +============================================================+ |
| contributing source (CSRC) identifiers | | | contributing source (CSRC) identifiers | |
| .... | | | .... | |
+---------------------------------------------------------------+ | +------------------------------------------------------------+ |
| RTP extension(s) (OPTIONAL) | | | RTP extension(s) (OPTIONAL) | |
+->+--------------------+------------------------------------------+ | +->+-------------------+----------------------------------------+ |
| | SFrame header | | | | | SFrame header | | |
| +--------------------+ | | | +-------------------+ | |
| | | | | | | |
| | SFrame encrypted and authenticated payload | | | | SFrame encrypted and authenticated payload | |
| | | | | | | |
+->+---------------------------------------------------------------+<-+ +->+------------------------------------------------------------+<-+
| | SRTP authentication tag | | | | SRTP authentication tag | |
| +---------------------------------------------------------------+ | | +------------------------------------------------------------+ |
| | | |
+--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ +--- SRTP Encrypted Portion SRTP Authenticated Portion ---+
Figure 10: SRTP Packet with SFrame-Protected Payload Figure 11: SRTP Packet with SFrame-Protected Payload
+----------------+ +---------------+ +----------------+ +---------------+
| frame metadata | | | | frame metadata | | |
+-------+--------+ | | +-------+--------+ | |
| | frame | | | frame |
| | | | | |
| | | | | |
| +-------+-------+ | +-------+-------+
| | | |
| | | |
skipping to change at page 40, line 43 skipping to change at line 1703
| | | | | | | |
V V V V V V V V
+---------------+ +---------------+ +---------------+ +---------------+ +---------------+ +---------------+
| SFrame header | | | | | | SFrame header | | | | |
+---------------+ | | | | +---------------+ | | | |
| | | payload 2/N | ... | payload N/N | | | | payload 2/N | ... | payload N/N |
| payload 1/N | | | | | | payload 1/N | | | | |
| | | | | | | | | | | |
+---------------+ +---------------+ +---------------+ +---------------+ +---------------+ +---------------+
Figure 11: Encryption Flow with per-Frame Encryption for RTP Figure 12: Encryption Flow with per-Frame Encryption for RTP
Appendix C. Test Vectors Appendix C. Test Vectors
This section provides a set of test vectors that implementations can This section provides a set of test vectors that implementations can
use to verify that they correctly implement SFrame encryption and use to verify that they correctly implement SFrame encryption and
decryption. In addition to test vectors for the overall process of decryption. In addition to test vectors for the overall process of
SFrame encryption/decryption, we also provide test vectors for header SFrame encryption/decryption, we also provide test vectors for header
encoding/decoding, and for AEAD encryption/decryption using the AES- encoding/decoding, and for AEAD encryption/decryption using the AES-
CTR construction defined in Section 4.5.1. CTR construction defined in Section 4.5.1.
skipping to change at page 72, line 13 skipping to change at line 3156
3c1cc24d56ceabced279 3c1cc24d56ceabced279
Acknowledgements Acknowledgements
The authors wish to specially thank Dr. Alex Gouaillard as one of the The authors wish to specially thank Dr. Alex Gouaillard as one of the
early contributors to the document. His passion and energy were key early contributors to the document. His passion and energy were key
to the design and development of SFrame. to the design and development of SFrame.
Contributors Contributors
Frederic Jacobs Frédéric Jacobs
Apple Apple
Email: frederic.jacobs@apple.com Email: frederic.jacobs@apple.com
Marta Mularczyk Marta Mularczyk
Amazon Amazon
Email: mulmarta@amazon.com Email: mulmarta@amazon.com
Suhas Nandakumar Suhas Nandakumar
Cisco Cisco
Email: snandaku@cisco.com Email: snandaku@cisco.com
skipping to change at page 72, line 40 skipping to change at line 3183
Phoenix R&D Phoenix R&D
Email: ietf@raphaelrobert.com Email: ietf@raphaelrobert.com
Authors' Addresses Authors' Addresses
Emad Omara Emad Omara
Apple Apple
Email: eomara@apple.com Email: eomara@apple.com
Justin Uberti Justin Uberti
Google Fixie.ai
Email: juberti@google.com Email: justin@fixie.ai
Sergio Garcia Murillo Sergio Garcia Murillo
CoSMo Software CoSMo Software
Email: sergio.garcia.murillo@cosmosoftware.io Email: sergio.garcia.murillo@cosmosoftware.io
Richard L. Barnes (editor)
Richard Barnes (editor)
Cisco Cisco
Email: rlb@ipv.sx Email: rlb@ipv.sx
Youenn Fablet Youenn Fablet
Apple Apple
Email: youenn@apple.com Email: youenn@apple.com
 End of changes. 99 change blocks. 
324 lines changed or deleted 325 lines changed or added

This html diff was produced by rfcdiff 1.48.