rfc9605.original | rfc9605.txt | |||
---|---|---|---|---|
Network Working Group E. Omara | Internet Engineering Task Force (IETF) E. Omara | |||
Internet-Draft Apple | Request for Comments: 9605 Apple | |||
Intended status: Standards Track J. Uberti | Category: Standards Track J. Uberti | |||
Expires: 6 October 2024 Google | ISSN: 2070-1721 Fixie.ai | |||
S. Murillo | S. G. Murillo | |||
CoSMo Software | CoSMo Software | |||
R. L. Barnes, Ed. | R. Barnes, Ed. | |||
Cisco | Cisco | |||
Y. Fablet | Y. Fablet | |||
Apple | Apple | |||
4 April 2024 | August 2024 | |||
Secure Frame (SFrame) | Secure Frame (SFrame): Lightweight Authenticated Encryption for Real- | |||
draft-ietf-sframe-enc-09 | Time Media | |||
Abstract | Abstract | |||
This document describes the Secure Frame (SFrame) end-to-end | This document describes the Secure Frame (SFrame) end-to-end | |||
encryption and authentication mechanism for media frames in a | encryption and authentication mechanism for media frames in a | |||
multiparty conference call, in which central media servers (selective | multiparty conference call, in which central media servers (Selective | |||
forwarding units or SFUs) can access the media metadata needed to | Forwarding Units or SFUs) can access the media metadata needed to | |||
make forwarding decisions without having access to the actual media. | make forwarding decisions without having access to the actual media. | |||
The proposed mechanism differs from the Secure Real-Time Protocol | This mechanism differs from the Secure Real-Time Protocol (SRTP) in | |||
(SRTP) in that it is independent of RTP (thus compatible with non-RTP | that it is independent of RTP (thus compatible with non-RTP media | |||
media transport) and can be applied to whole media frames in order to | transport) and can be applied to whole media frames in order to be | |||
be more bandwidth efficient. | more bandwidth efficient. | |||
About This Document | ||||
This note is to be removed before publishing as an RFC. | ||||
The latest revision of this draft can be found at https://sframe- | ||||
wg.github.io/sframe/draft-ietf-sframe-enc.html. Status information | ||||
for this document may be found at https://datatracker.ietf.org/doc/ | ||||
draft-ietf-sframe-enc/. | ||||
Discussion of this document takes place on the Secure Media Frames | ||||
Working Group mailing list (mailto:sframe@ietf.org), which is | ||||
archived at https://mailarchive.ietf.org/arch/browse/sframe/. | ||||
Subscribe at https://www.ietf.org/mailman/listinfo/sframe/. | ||||
Source for this draft and an issue tracker can be found at | ||||
https://github.com/sframe-wg/sframe. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 6 October 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9605. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Terminology | |||
3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Goals | |||
4. SFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 4. SFrame | |||
4.1. Application Context . . . . . . . . . . . . . . . . . . . 5 | 4.1. Application Context | |||
4.2. SFrame Ciphertext . . . . . . . . . . . . . . . . . . . . 8 | 4.2. SFrame Ciphertext | |||
4.3. SFrame Header . . . . . . . . . . . . . . . . . . . . . . 8 | 4.3. SFrame Header | |||
4.4. Encryption Schema . . . . . . . . . . . . . . . . . . . . 10 | 4.4. Encryption Schema | |||
4.4.1. Key Selection . . . . . . . . . . . . . . . . . . . . 11 | 4.4.1. Key Selection | |||
4.4.2. Key Derivation . . . . . . . . . . . . . . . . . . . 11 | 4.4.2. Key Derivation | |||
4.4.3. Encryption . . . . . . . . . . . . . . . . . . . . . 12 | 4.4.3. Encryption | |||
4.4.4. Decryption . . . . . . . . . . . . . . . . . . . . . 14 | 4.4.4. Decryption | |||
4.5. Cipher Suites . . . . . . . . . . . . . . . . . . . . . . 16 | 4.5. Cipher Suites | |||
4.5.1. AES-CTR with SHA2 . . . . . . . . . . . . . . . . . . 17 | 4.5.1. AES-CTR with SHA2 | |||
5. Key Management . . . . . . . . . . . . . . . . . . . . . . . 19 | 5. Key Management | |||
5.1. Sender Keys . . . . . . . . . . . . . . . . . . . . . . . 20 | 5.1. Sender Keys | |||
5.2. MLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 5.2. MLS | |||
6. Media Considerations . . . . . . . . . . . . . . . . . . . . 23 | 6. Media Considerations | |||
6.1. Selective Forwarding Units . . . . . . . . . . . . . . . 23 | 6.1. Selective Forwarding Units | |||
6.1.1. LastN and RTP stream reuse . . . . . . . . . . . . . 24 | 6.1.1. RTP Stream Reuse | |||
6.1.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . 24 | 6.1.2. Simulcast | |||
6.1.3. SVC . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 6.1.3. Scalable Video Coding (SVC) | |||
6.2. Video Key Frames . . . . . . . . . . . . . . . . . . . . 24 | 6.2. Video Key Frames | |||
6.3. Partial Decoding . . . . . . . . . . . . . . . . . . . . 25 | 6.3. Partial Decoding | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 25 | 7. Security Considerations | |||
7.1. No Header Confidentiality . . . . . . . . . . . . . . . . 25 | 7.1. No Header Confidentiality | |||
7.2. No Per-Sender Authentication . . . . . . . . . . . . . . 26 | 7.2. No Per-Sender Authentication | |||
7.3. Key Management . . . . . . . . . . . . . . . . . . . . . 26 | 7.3. Key Management | |||
7.4. Replay . . . . . . . . . . . . . . . . . . . . . . . . . 26 | 7.4. Replay | |||
7.5. Risks due to Short Tags . . . . . . . . . . . . . . . . . 26 | 7.5. Risks Due to Short Tags | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 | 8. IANA Considerations | |||
8.1. SFrame Cipher Suites . . . . . . . . . . . . . . . . . . 28 | 8.1. SFrame Cipher Suites | |||
9. Application Responsibilities . . . . . . . . . . . . . . . . 29 | 9. Application Responsibilities | |||
9.1. Header Value Uniqueness . . . . . . . . . . . . . . . . . 29 | 9.1. Header Value Uniqueness | |||
9.2. Key Management Framework . . . . . . . . . . . . . . . . 30 | 9.2. Key Management Framework | |||
9.3. Anti-Replay . . . . . . . . . . . . . . . . . . . . . . . 30 | 9.3. Anti-Replay | |||
9.4. Metadata . . . . . . . . . . . . . . . . . . . . . . . . 30 | 9.4. Metadata | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 | 10. References | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 31 | 10.1. Normative References | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 32 | 10.2. Informative References | |||
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 33 | Appendix A. Example API | |||
Appendix B. Example API . . . . . . . . . . . . . . . . . . . . 33 | Appendix B. Overhead Analysis | |||
Appendix C. Overhead Analysis . . . . . . . . . . . . . . . . . 35 | B.1. Assumptions | |||
C.1. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 35 | B.2. Audio | |||
C.2. Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | B.3. Video | |||
C.3. Video . . . . . . . . . . . . . . . . . . . . . . . . . . 37 | B.4. Conferences | |||
C.4. Conferences . . . . . . . . . . . . . . . . . . . . . . . 38 | B.5. SFrame over RTP | |||
C.5. SFrame over RTP . . . . . . . . . . . . . . . . . . . . . 39 | Appendix C. Test Vectors | |||
Appendix D. Test Vectors . . . . . . . . . . . . . . . . . . . . 41 | C.1. Header Encoding/Decoding | |||
D.1. Header encoding/decoding . . . . . . . . . . . . . . . . 42 | C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC | |||
D.2. AEAD encryption/decryption using AES-CTR and HMAC . . . . 66 | C.3. SFrame Encryption/Decryption | |||
D.3. SFrame encryption/decryption . . . . . . . . . . . . . . 68 | Acknowledgements | |||
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 73 | Contributors | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 73 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
Modern multi-party video call systems use Selective Forwarding Unit | Modern multiparty video call systems use Selective Forwarding Unit | |||
(SFU) servers to efficiently route media streams to call endpoints | (SFU) servers to efficiently route media streams to call endpoints | |||
based on factors such as available bandwidth, desired video size, | based on factors such as available bandwidth, desired video size, | |||
codec support, and other factors. An SFU typically does not need | codec support, and other factors. An SFU typically does not need | |||
access to the media content of the conference, allowing for the media | access to the media content of the conference, which allows the media | |||
to be "end-to-end" encrypted so that it cannot be decrypted by the | to be encrypted "end to end" so that it cannot be decrypted by the | |||
SFU. In order for the SFU to work properly, though, it usually needs | SFU. In order for the SFU to work properly, though, it usually needs | |||
to be able to access RTP metadata and RTCP feedback messages, which | to be able to access RTP metadata and RTCP feedback messages, which | |||
is not possible if all RTP/RTCP traffic is end-to-end encrypted. | is not possible if all RTP/RTCP traffic is end-to-end encrypted. | |||
As such, two layers of encryption and authentication are required: | As such, two layers of encryption and authentication are required: | |||
1. Hop-by-hop (HBH) encryption of media, metadata, and feedback | 1. Hop-by-hop (HBH) encryption of media, metadata, and feedback | |||
messages between the endpoints and SFU | messages between the endpoints and SFU | |||
2. End-to-end (E2E) encryption (E2EE) of media between the endpoints | 2. End-to-end (E2E) encryption (E2EE) of media between the endpoints | |||
skipping to change at page 4, line 30 ¶ | skipping to change at line 150 ¶ | |||
This document proposes a new E2EE protection scheme known as SFrame, | This document proposes a new E2EE protection scheme known as SFrame, | |||
specifically designed to work in group conference calls with SFUs. | specifically designed to work in group conference calls with SFUs. | |||
SFrame is a general encryption framing that can be used to protect | SFrame is a general encryption framing that can be used to protect | |||
media payloads, agnostic of transport. | media payloads, agnostic of transport. | |||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
MAC: Message Authentication Code | MAC: Message Authentication Code | |||
E2EE: End to End Encryption | E2EE: End-to-End Encryption | |||
HBH: Hop By Hop | HBH: Hop-by-Hop | |||
We use "Selective Forwarding Unit (SFU)" and "media stream" in a less | We use "Selective Forwarding Unit (SFU)" and "media stream" in a less | |||
formal sense than in [RFC7656]. An SFU is a selective switching | formal sense than in [RFC7656]. An SFU is a selective switching | |||
function for media payloads, and a media stream a sequence of media | function for media payloads, and a media stream is a sequence of | |||
payloads, in both cases regardless of whether those media payloads | media payloads, regardless of whether those media payloads are | |||
are transported over RTP or some other protocol. | transported over RTP or some other protocol. | |||
3. Goals | 3. Goals | |||
SFrame is designed to be a suitable E2EE protection scheme for | SFrame is designed to be a suitable E2EE protection scheme for | |||
conference call media in a broad range of scenarios, as outlined by | conference call media in a broad range of scenarios, as outlined by | |||
the following goals: | the following goals: | |||
1. Provide a secure E2EE mechanism for audio and video in conference | 1. Provide a secure E2EE mechanism for audio and video in conference | |||
calls that can be used with arbitrary SFU servers. | calls that can be used with arbitrary SFU servers. | |||
2. Decouple media encryption from key management to allow SFrame to | 2. Decouple media encryption from key management to allow SFrame to | |||
be used with an arbitrary key management system. | be used with an arbitrary key management system. | |||
3. Minimize packet expansion to allow successful conferencing in as | 3. Minimize packet expansion to allow successful conferencing in as | |||
many network conditions as possible. | many network conditions as possible. | |||
4. Independence from the underlying transport, including use in non- | 4. Decouple the media encryption framework from the underlying | |||
RTP transports, e.g., WebTransport [I-D.ietf-webtrans-overview]. | transport, allowing use in non-RTP scenarios, e.g., WebTransport | |||
[WEBTRANSPORT]. | ||||
5. When used with RTP and its associated error resilience | 5. When used with RTP and its associated error-resilience | |||
mechanisms, i.e., RTX and FEC, require no special handling for | mechanisms, i.e., RTX and Forward Error Correction (FEC), require | |||
RTX and FEC packets. | no special handling for RTX and FEC packets. | |||
6. Minimize the changes needed in SFU servers. | 6. Minimize the changes needed in SFU servers. | |||
7. Minimize the changes needed in endpoints. | 7. Minimize the changes needed in endpoints. | |||
8. Work with the most popular audio and video codecs used in | 8. Work with the most popular audio and video codecs used in | |||
conferencing scenarios. | conferencing scenarios. | |||
4. SFrame | 4. SFrame | |||
This document defines an encryption mechanism that provides effective | This document defines an encryption mechanism that provides effective | |||
E2EE, is simple to implement, has no dependencies on RTP, and | E2EE, is simple to implement, has no dependencies on RTP, and | |||
minimizes encryption bandwidth overhead. This section describes how | minimizes encryption bandwidth overhead. This section describes how | |||
the mechanism works, including details of how applications utilize | the mechanism works and includes details of how applications utilize | |||
SFrame for media protection, as well as the actual mechanics of E2EE | SFrame for media protection as well as the actual mechanics of E2EE | |||
for protecting media. | for protecting media. | |||
4.1. Application Context | 4.1. Application Context | |||
SFrame is a general encryption framing, intended to be used as an | SFrame is a general encryption framing, intended to be used as an | |||
E2EE layer over an underlying HBH-encrypted transport such as SRTP or | E2EE layer over an underlying HBH-encrypted transport such as SRTP or | |||
QUIC [RFC3711][I-D.ietf-moq-transport]. | QUIC [RFC3711][MOQ-TRANSPORT]. | |||
The scale at which SFrame encryption is applied to media determines | The scale at which SFrame encryption is applied to media determines | |||
the overall amount of overhead that SFrame adds to the media stream, | the overall amount of overhead that SFrame adds to the media stream | |||
as well as the engineering complexity involved in integrating SFrame | as well as the engineering complexity involved in integrating SFrame | |||
into a particular environment. Two patterns are common: Either using | into a particular environment. Two patterns are common: using SFrame | |||
SFrame to encrypt whole media frames (per-frame) or individual | to encrypt either whole media frames (per frame) or individual | |||
transport-level media payloads (per-packet). | transport-level media payloads (per packet). | |||
For example, Figure 1 shows a typical media sender stack that takes | For example, Figure 1 shows a typical media sender stack that takes | |||
media from some source, encodes it into frames, divides those frames | media from some source, encodes it into frames, divides those frames | |||
into media packets, and then sends those payloads in SRTP packets. | into media packets, and then sends those payloads in SRTP packets. | |||
The receiver stack performs the reverse operations, reassembling | The receiver stack performs the reverse operations, reassembling | |||
frames from SRTP packets and decoding. Arrows indicate two different | frames from SRTP packets and decoding. Arrows indicate two different | |||
ways that SFrame protection could be integrated into this media | ways that SFrame protection could be integrated into this media | |||
stack, to encrypt whole frames or individual media packets. | stack: to encrypt whole frames or individual media packets. | |||
Applying SFrame per-frame in this system offers higher efficiency, | Applying SFrame per frame in this system offers higher efficiency but | |||
but may require a more complex integration in environments where | may require a more complex integration in environments where | |||
depacketization relies on the content of media packets. Applying | depacketization relies on the content of media packets. Applying | |||
SFrame per-packet avoids this complexity, at the cost of higher | SFrame per packet avoids this complexity at the cost of higher | |||
bandwidth consumption. Some quantitative discussion of these trade- | bandwidth consumption. Some quantitative discussion of these trade- | |||
offs is provided in Appendix C. | offs is provided in Appendix B. | |||
As noted above, however, SFrame is a general media encapsulation, and | As noted above, however, SFrame is a general media encapsulation and | |||
can be applied in other scenarios. The important thing is that the | can be applied in other scenarios. The important thing is that the | |||
sender and receivers of an SFrame-encrypted object agree on that | sender and receivers of an SFrame-encrypted object agree on that | |||
object's semantics. SFrame does not provide this agreement; it must | object's semantics. SFrame does not provide this agreement; it must | |||
be arranged by the application. | be arranged by the application. | |||
+------------------------------------------------------+ | +------------------------------------------------------+ | |||
| | | | | | |||
| +--------+ +-------------+ +-----------+ | | | +--------+ +-------------+ +-----------+ | | |||
.-. | | | | | | HBH | | | .-. | | | | | | HBH | | | |||
| | | | Encode |----->| Packetize |----->| Protect |----------+ | | | | | Encode |----->| Packetize |----->| Protect |----------+ | |||
'+' | | | ^ | | ^ | | | | | '+' | | | ^ | | ^ | | | | | |||
/|\ | +--------+ | +-------------+ | +-----------+ | | | /|\ | +--------+ | +-------------+ | +-----------+ | | | |||
/ + \ | | | ^ | | | / + \ | | | ^ | | | |||
/ \ | SFrame SFrame | | | | / \ | SFrame SFrame | | | | |||
/ \ | Protect Protect | | | | / \ | Protect Protect | | | | |||
Alice | (per-frame) (per-packet) | | | | Alice | (per frame) (per packet) | | | | |||
| ^ ^ | | | | | ^ ^ | | | | |||
| | | | | | | | | | | | | | |||
+---------------|-------------------|---------|--------+ | | +---------------|-------------------|---------|--------+ | | |||
| | | v | | | | v | |||
| | | +------+-+ | | | | +------+-+ | |||
| E2E Key | HBH Key | Media | | | E2E Key | HBH Key | Media | | |||
+---- Management ---+ Management | Server | | +---- Management ---+ Management | Server | | |||
| | | +------+-+ | | | | +------+-+ | |||
| | | | | | | | | | |||
+---------------|-------------------|---------|--------+ | | +---------------|-------------------|---------|--------+ | | |||
| | | | | | | | | | | | | | |||
| V V | | | | | V V | | | | |||
.-. | SFrame SFrame | | | | .-. | SFrame SFrame | | | | |||
| | | Unprotect Unprotect | | | | | | | Unprotect Unprotect | | | | |||
'+' | (per-frame) (per-packet) | | | | '+' | (per frame) (per packet) | | | | |||
/|\ | | | V | | | /|\ | | | V | | | |||
/ + \ | +--------+ | +-------------+ | +-----------+ | | | / + \ | +--------+ | +-------------+ | +-----------+ | | | |||
/ \ | | | V | | V | HBH | | | | / \ | | | V | | V | HBH | | | | |||
/ \ | | Decode |<-----| Depacketize |<-----| Unprotect |<---------+ | / \ | | Decode |<-----| Depacketize |<-----| Unprotect |<---------+ | |||
Bob | | | | | | | | | Bob | | | | | | | | | |||
| +--------+ +-------------+ +-----------+ | | | +--------+ +-------------+ +-----------+ | | |||
| | | | | | |||
+------------------------------------------------------+ | +------------------------------------------------------+ | |||
Figure 1 | Figure 1: Two Options for Integrating SFrame in a Typical Media Stack | |||
Like SRTP, SFrame does not define how the keys used for SFrame are | Like SRTP, SFrame does not define how the keys used for SFrame are | |||
exchanged by the parties in the conference. Keys for SFrame might be | exchanged by the parties in the conference. Keys for SFrame might be | |||
distributed over an existing E2E-secure channel (see Section 5.1), or | distributed over an existing E2E-secure channel (see Section 5.1) or | |||
derived from an E2E-secure shared secret (see Section 5.2). The key | derived from an E2E-secure shared secret (see Section 5.2). The key | |||
management system MUST ensure that each key used for encrypting media | management system MUST ensure that each key used for encrypting media | |||
is used by exactly one media sender, in order to avoid reuse of | is used by exactly one media sender in order to avoid reuse of | |||
nonces. | nonces. | |||
4.2. SFrame Ciphertext | 4.2. SFrame Ciphertext | |||
An SFrame ciphertext comprises an SFrame header followed by the | An SFrame ciphertext comprises an SFrame header followed by the | |||
output of an AEAD encryption of the plaintext [RFC5116], with the | output of an Authenticated Encryption with Associated Data (AEAD) | |||
header provided as additional authenticated data (AAD). | encryption of the plaintext [RFC5116], with the header provided as | |||
additional authenticated data (AAD). | ||||
The SFrame header is a variable-length structure described in detail | The SFrame header is a variable-length structure described in detail | |||
in Section 4.3. The structure of the encrypted data and | in Section 4.3. The structure of the encrypted data and | |||
authentication tag are determined by the AEAD algorithm in use. | authentication tag are determined by the AEAD algorithm in use. | |||
+-+----+-+----+--------------------+--------------------+<-+ | +-+----+-+----+--------------------+--------------------+<-+ | |||
|K|KLEN|C|CLEN| Key ID | Counter | | | |K|KLEN|C|CLEN| Key ID | Counter | | | |||
+->+-+----+-+----+--------------------+--------------------+ | | +->+-+----+-+----+--------------------+--------------------+ | | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
skipping to change at page 8, line 34 ¶ | skipping to change at line 314 ¶ | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
+->+-------------------------------------------------------+<-+ | +->+-------------------------------------------------------+<-+ | |||
| | Authentication Tag | | | | | Authentication Tag | | | |||
| +-------------------------------------------------------+ | | | +-------------------------------------------------------+ | | |||
| | | | | | |||
| | | | | | |||
+--- Encrypted Portion Authenticated Portion ---+ | +--- Encrypted Portion Authenticated Portion ---+ | |||
When SFrame is applied per-packet, the payload of each packet will be | Figure 2: Structure of an SFrame Ciphertext | |||
an SFrame ciphertext. When SFrame is applied per-frame, the SFrame | ||||
When SFrame is applied per packet, the payload of each packet will be | ||||
an SFrame ciphertext. When SFrame is applied per frame, the SFrame | ||||
ciphertext representing an encrypted frame will span several packets, | ciphertext representing an encrypted frame will span several packets, | |||
with the header appearing in the first packet and the authentication | with the header appearing in the first packet and the authentication | |||
tag in the last packet. It is the responsibility of the application | tag in the last packet. It is the responsibility of the application | |||
to reassemble an encrypted frame from individual packets, accounting | to reassemble an encrypted frame from individual packets, accounting | |||
for packet loss and reordering as necessary. | for packet loss and reordering as necessary. | |||
4.3. SFrame Header | 4.3. SFrame Header | |||
The SFrame header specifies two values from which encryption | The SFrame header specifies two values from which encryption | |||
parameters are derived: | parameters are derived: | |||
* A Key ID (KID) that determines which encryption key should be used | * A Key ID (KID) that determines which encryption key should be used | |||
* A counter (CTR) that is used to construct the nonce for the | * A Counter (CTR) that is used to construct the nonce for the | |||
encryption | encryption | |||
Applications MUST ensure that each (KID, CTR) combination is used for | Applications MUST ensure that each (KID, CTR) combination is used for | |||
exactly one SFrame encryption operation. A typical approach to | exactly one SFrame encryption operation. A typical approach to | |||
achieving this guarantee is outlined in Section 9.1. | achieve this guarantee is outlined in Section 9.1. | |||
Config Byte | Config Byte | |||
| | | | |||
.-----' '-----. | .-----' '-----. | |||
| | | | | | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+------------+------------+ | +-+-+-+-+-+-+-+-+------------+------------+ | |||
|X| K |Y| C | KID... | CTR... | | |X| K |Y| C | KID... | CTR... | | |||
+-+-+-+-+-+-+-+-+------------+------------+ | +-+-+-+-+-+-+-+-+------------+------------+ | |||
Figure 2: SFrame header | Figure 3: SFrame Header | |||
The SFrame Header has the overall structure shown in Figure 2. The | The SFrame header has the overall structure shown in Figure 3. The | |||
first byte is a "config byte", with the following fields: | first byte is a "config byte", with the following fields: | |||
Extended Key Id Flag (X, 1 bit): Indicates if the K field contains | Extended KID Flag (X, 1 bit): Indicates if the K field contains the | |||
the key id or the Key ID length. | KID or the KID length. | |||
Key or Key Length (K, 3 bits): If the X flag is set to 0, this field | KID or KID Length (K, 3 bits): If the X flag is set to 0, this field | |||
contains the Key ID. If the X flag is set to 1, then it contains | contains the KID. If the X flag is set to 1, then it contains the | |||
the length of the Key ID, minus one. | length of the KID, minus one. | |||
Extended Counter Flag (Y, 1 bit): Indicates if the C field contains | Extended CTR Flag (Y, 1 bit): Indicates if the C field contains the | |||
the counter or the counter length. | CTR or the CTR length. | |||
Counter or Counter Length (C, 3 bits): This field contains the | CTR or CTR Length (C, 3 bits): This field contains the CTR if the Y | |||
counter (CTR) if the Y flag is set to 0, or the counter length, | flag is set to 0, or the CTR length, minus one, if set to 1. | |||
minus one, if set to 1. | ||||
The Key ID and Counter fields are encoded as compact unsigned | The KID and CTR fields are encoded as compact unsigned integers in | |||
integers in network (big-endian) byte order. If the value of one of | network (big-endian) byte order. If the value of one of these fields | |||
these fields is in the range 0-7, then the value is carried in the | is in the range 0-7, then the value is carried in the corresponding | |||
corresponding bits of the config byte (K or C) and the corresponding | bits of the config byte (K or C) and the corresponding flag (X or Y) | |||
flag (X or Y) is set to zero. Otherwise, the value MUST be encoded | is set to zero. Otherwise, the value MUST be encoded with the | |||
with the minimum number of bytes required and appended after the | minimum number of bytes required and appended after the config byte, | |||
configuration byte, with the Key ID first and Counter second. The | with the KID first and CTR second. The header field (K or C) is set | |||
header field (K or C) is set to the number of bytes in the encoded | to the number of bytes in the encoded value, minus one. The value | |||
value, minus one. The value 000 represents a length of 1, 001 a | 000 represents a length of 1, 001 a length of 2, etc. This allows a | |||
length of 2, etc. This allows a 3-bit length field to represent the | 3-bit length field to represent the value lengths 1-8. | |||
value lengths 1-8. | ||||
The SFrame header can thus take one of the four forms shown in | The SFrame header can thus take one of the four forms shown in | |||
Figure 3, depending on which of the X and Y flags are set. | Figure 4, depending on which of the X and Y flags are set. | |||
KID < 8, CTR < 8: | KID < 8, CTR < 8: | |||
+-+-----+-+-----+ | +-+-----+-+-----+ | |||
|0| KID |0| CTR | | |0| KID |0| CTR | | |||
+-+-----+-+-----+ | +-+-----+-+-----+ | |||
KID < 8, CTR >= 8: | KID < 8, CTR >= 8: | |||
+-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
|0| KID |1|CLEN | CTR... (length=CLEN) | | |0| KID |1|CLEN | CTR... (length=CLEN) | | |||
+-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
skipping to change at page 10, line 25 ¶ | skipping to change at line 399 ¶ | |||
KID >= 8, CTR < 8: | KID >= 8, CTR < 8: | |||
+-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
|1|KLEN |0| CTR | KID... (length=KLEN) | | |1|KLEN |0| CTR | KID... (length=KLEN) | | |||
+-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
KID >= 8, CTR >= 8: | KID >= 8, CTR >= 8: | |||
+-+-----+-+-----+------------------------+------------------------+ | +-+-----+-+-----+------------------------+------------------------+ | |||
|1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) | | |1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) | | |||
+-+-----+-+-----+------------------------+------------------------+ | +-+-----+-+-----+------------------------+------------------------+ | |||
Figure 3: Forms of Encoded SFrame Header | Figure 4: Forms of Encoded SFrame Header | |||
4.4. Encryption Schema | 4.4. Encryption Schema | |||
SFrame encryption uses an AEAD encryption algorithm and hash function | SFrame encryption uses an AEAD encryption algorithm and hash function | |||
defined by the cipher suite in use (see Section 4.5). We will refer | defined by the cipher suite in use (see Section 4.5). We will refer | |||
to the following aspects of the AEAD and the hash algorithm below: | to the following aspects of the AEAD and the hash algorithm below: | |||
* AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption | * AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption | |||
functions for the AEAD. We follow the convention of RFC 5116 | functions for the AEAD. We follow the convention of RFC 5116 | |||
[RFC5116] and consider the authentication tag part of the | [RFC5116] and consider the authentication tag part of the | |||
skipping to change at page 10, line 48 ¶ | skipping to change at line 422 ¶ | |||
* AEAD.Nk - The size in bytes of a key for the encryption algorithm | * AEAD.Nk - The size in bytes of a key for the encryption algorithm | |||
* AEAD.Nn - The size in bytes of a nonce for the encryption | * AEAD.Nn - The size in bytes of a nonce for the encryption | |||
algorithm | algorithm | |||
* AEAD.Nt - The overhead in bytes of the encryption algorithm | * AEAD.Nt - The overhead in bytes of the encryption algorithm | |||
(typically the size of a "tag" that is added to the plaintext) | (typically the size of a "tag" that is added to the plaintext) | |||
* AEAD.Nka - For cipher suites using the compound AEAD described in | * AEAD.Nka - For cipher suites using the compound AEAD described in | |||
Section 4.5.1, the size in bytes of a key for the underlying AES- | Section 4.5.1, the size in bytes of a key for the underlying | |||
CTR algorithm | encryption algorithm | |||
* Hash.Nh - The size in bytes of the output of the hash function | * Hash.Nh - The size in bytes of the output of the hash function | |||
4.4.1. Key Selection | 4.4.1. Key Selection | |||
Each SFrame encryption or decryption operation is premised on a | Each SFrame encryption or decryption operation is premised on a | |||
single secret base_key, which is labeled with an integer KID value | single secret base_key, which is labeled with an integer KID value | |||
signaled in the SFrame header. | signaled in the SFrame header. | |||
The sender and receivers need to agree on which base_key should be | The sender and receivers need to agree on which base_key should be | |||
skipping to change at page 11, line 23 ¶ | skipping to change at line 445 ¶ | |||
on whether a base_key will be used for encryption or decryption only. | on whether a base_key will be used for encryption or decryption only. | |||
The process for provisioning base_key values and their KID values is | The process for provisioning base_key values and their KID values is | |||
beyond the scope of this specification, but its security properties | beyond the scope of this specification, but its security properties | |||
will bound the assurances that SFrame provides. For example, if | will bound the assurances that SFrame provides. For example, if | |||
SFrame is used to provide E2E security against intermediary media | SFrame is used to provide E2E security against intermediary media | |||
nodes, then SFrame keys need to be negotiated in a way that does not | nodes, then SFrame keys need to be negotiated in a way that does not | |||
make them accessible to these intermediaries. | make them accessible to these intermediaries. | |||
For each known KID value, the client stores the corresponding | For each known KID value, the client stores the corresponding | |||
symmetric key base_key. For keys that can be used for encryption, | symmetric key base_key. For keys that can be used for encryption, | |||
the client also stores the next counter value CTR to be used when | the client also stores the next CTR value to be used when encrypting | |||
encrypting (initially 0). | (initially 0). | |||
When encrypting a plaintext, the application specifies which KID is | When encrypting a plaintext, the application specifies which KID is | |||
to be used, and the counter is incremented after successful | to be used, and the CTR value is incremented after successful | |||
encryption. When decrypting, the base_key for decryption is selected | encryption. When decrypting, the base_key for decryption is selected | |||
from the available keys using the KID value in the SFrame Header. | from the available keys using the KID value in the SFrame header. | |||
A given base_key MUST NOT be used for encryption by multiple senders. | A given base_key MUST NOT be used for encryption by multiple senders. | |||
Such reuse would result in multiple encrypted frames being generated | Such reuse would result in multiple encrypted frames being generated | |||
with the same (key, nonce) pair, which harms the protections provided | with the same (key, nonce) pair, which harms the protections provided | |||
by many AEAD algorithms. Implementations MUST mark each base_key as | by many AEAD algorithms. Implementations MUST mark each base_key as | |||
usable for encryption or decryption, never both. | usable for encryption or decryption, never both. | |||
Note that the set of available keys might change over the lifetime of | Note that the set of available keys might change over the lifetime of | |||
a real-time session. In such cases, the client will need to manage | a real-time session. In such cases, the client will need to manage | |||
key usage to avoid media loss due to a key being used to encrypt | key usage to avoid media loss due to a key being used to encrypt | |||
before all receivers are able to use it to decrypt. For example, an | before all receivers are able to use it to decrypt. For example, an | |||
application may make decryption-only keys available immediately, but | application may make decryption-only keys available immediately, but | |||
delay the use of keys for encryption until (a) all receivers have | delay the use of keys for encryption until (a) all receivers have | |||
acknowledged receipt of the new key or (b) a timeout expires. | acknowledged receipt of the new key, or (b) a timeout expires. | |||
4.4.2. Key Derivation | 4.4.2. Key Derivation | |||
SFrame encryption and decryption use a key and salt derived from the | SFrame encryption and decryption use a key and salt derived from the | |||
base_key associated to a KID. Given a base_key value, the key and | base_key associated with a KID. Given a base_key value, the key and | |||
salt are derived using HKDF [RFC5869] as follows: | salt are derived using HMAC-based Key Derivation Function (HKDF) | |||
[RFC5869] as follows: | ||||
def derive_key_salt(KID, base_key): | def derive_key_salt(KID, base_key): | |||
sframe_secret = HKDF-Extract("", base_key) | sframe_secret = HKDF-Extract("", base_key) | |||
sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite | sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite | |||
sframe_key = HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk) | sframe_key = | |||
HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk) | ||||
sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite | sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite | |||
sframe_salt = HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn) | sframe_salt = | |||
HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn) | ||||
return sframe_key, sframe_salt | return sframe_key, sframe_salt | |||
In the derivation of sframe_secret: | In the derivation of sframe_secret: | |||
* The + operator represents concatenation of byte strings. | * The + operator represents concatenation of byte strings. | |||
* The KID value is encoded as an 8-byte big-endian integer, not the | * The KID value is encoded as an 8-byte big-endian integer, not the | |||
compressed form used in the SFrame header. | compressed form used in the SFrame header. | |||
* The cipher_suite value is a 2-byte big-endian integer representing | * The cipher_suite value is a 2-byte big-endian integer representing | |||
the cipher suite in use (see Section 8.1). | the cipher suite in use (see Section 8.1). | |||
The hash function used for HKDF is determined by the cipher suite in | The hash function used for HKDF is determined by the cipher suite in | |||
use. | use. | |||
4.4.3. Encryption | 4.4.3. Encryption | |||
SFrame encryption uses the AEAD encryption algorithm for the cipher | SFrame encryption uses the AEAD encryption algorithm for the cipher | |||
suite in use. The key for the encryption is the sframe_key and the | suite in use. The key for the encryption is the sframe_key. The | |||
nonce is formed by XORing the sframe_salt with the current counter, | nonce is formed by first XORing the sframe_salt with the current CTR | |||
encoded as a big-endian integer of length AEAD.Nn. | value, and then encoding the result as a big-endian integer of length | |||
AEAD.Nn. | ||||
The encryptor forms an SFrame header using the CTR, and KID values | The encryptor forms an SFrame header using the CTR and KID values | |||
provided. The encoded header is provided as AAD to the AEAD | provided. The encoded header is provided as AAD to the AEAD | |||
encryption operation, together with application-provided metadata | encryption operation, together with application-provided metadata | |||
about the encrypted media (see Section 9.4). | about the encrypted media (see Section 9.4). | |||
def encrypt(CTR, KID, metadata, plaintext): | def encrypt(CTR, KID, metadata, plaintext): | |||
sframe_key, sframe_salt = key_store[KID] | sframe_key, sframe_salt = key_store[KID] | |||
# encode_big_endian(x, n) produces an n-byte string encoding the integer x in | # encode_big_endian(x, n) produces an n-byte string encoding the | |||
# big-endian byte order. | # integer x in big-endian byte order. | |||
ctr = encode_big_endian(CTR, AEAD.Nn) | ctr = encode_big_endian(CTR, AEAD.Nn) | |||
nonce = xor(sframe_salt, CTR) | nonce = xor(sframe_salt, CTR) | |||
# encode_sframe_header produces a byte string encoding the provided KID and | # encode_sframe_header produces a byte string encoding the | |||
# CTR values into an SFrame Header. | # provided KID and CTR values into an SFrame header. | |||
header = encode_sframe_header(CTR, KID) | header = encode_sframe_header(CTR, KID) | |||
aad = header + metadata | aad = header + metadata | |||
ciphertext = AEAD.Encrypt(sframe_key, nonce, aad, plaintext) | ciphertext = AEAD.Encrypt(sframe_key, nonce, aad, plaintext) | |||
return header + ciphertext | return header + ciphertext | |||
For example, the metadata input to encryption allows for frame | For example, the metadata input to encryption allows for frame | |||
metadata to be authenticated when SFrame is applied per-frame. After | metadata to be authenticated when SFrame is applied per frame. After | |||
encoding the frame and before packetizing it, the necessary media | encoding the frame and before packetizing it, the necessary media | |||
metadata will be moved out of the encoded frame buffer, to be sent in | metadata will be moved out of the encoded frame buffer to be sent in | |||
some channel visible to the SFU (e.g., an RTP header extension). | some channel visible to the SFU (e.g., an RTP header extension). | |||
+---------------+ | +---------------+ | |||
| | | | | | |||
| | | | | | |||
| plaintext | | | plaintext | | |||
| | | | | | |||
| | | | | | |||
+-------+-------+ | +-------+-------+ | |||
| | | | |||
skipping to change at page 14, line 42 ¶ | skipping to change at line 572 ¶ | |||
| +---------------+ | | | +---------------+ | | |||
+-------------->| SFrame Header | | | +-------------->| SFrame Header | | | |||
+---------------+ | | +---------------+ | | |||
| | | | | | | | |||
| |<----+ | | |<----+ | |||
| ciphertext | | | ciphertext | | |||
| | | | | | |||
| | | | | | |||
+---------------+ | +---------------+ | |||
Figure 4: Encrypting an SFrame Ciphertext | Figure 5: Encrypting an SFrame Ciphertext | |||
4.4.4. Decryption | 4.4.4. Decryption | |||
Before decrypting, a receiver needs to assemble a full SFrame | Before decrypting, a receiver needs to assemble a full SFrame | |||
ciphertext. When an SFrame ciphertext may be fragmented into | ciphertext. When an SFrame ciphertext is fragmented into multiple | |||
multiple parts for transport (e.g., a whole encrypted frame sent in | parts for transport (e.g., a whole encrypted frame sent in multiple | |||
multiple SRTP packets), the receiving client collects all the | SRTP packets), the receiving client collects all the fragments of the | |||
fragments of the ciphertext, using appropriate sequencing and start/ | ciphertext, using appropriate sequencing and start/end markers in the | |||
end markers in the transport. Once all of the required fragments are | transport. Once all of the required fragments are available, the | |||
available, the client reassembles them into the SFrame ciphertext, | client reassembles them into the SFrame ciphertext and passes the | |||
then passes the ciphertext to SFrame for decryption. | ciphertext to SFrame for decryption. | |||
The KID field in the SFrame header is used to find the right key and | The KID field in the SFrame header is used to find the right key and | |||
salt for the encrypted frame, and the CTR field is used to construct | salt for the encrypted frame, and the CTR field is used to construct | |||
the nonce. The SFrame decryption procedure is as follows: | the nonce. The SFrame decryption procedure is as follows: | |||
def decrypt(metadata, sframe_ciphertext): | def decrypt(metadata, sframe_ciphertext): | |||
KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext) | KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext) | |||
sframe_key, sframe_salt = key_store[KID] | sframe_key, sframe_salt = key_store[KID] | |||
skipping to change at page 15, line 28 ¶ | skipping to change at line 607 ¶ | |||
return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) | return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) | |||
If a ciphertext fails to decrypt because there is no key available | If a ciphertext fails to decrypt because there is no key available | |||
for the KID in the SFrame header, the client MAY buffer the | for the KID in the SFrame header, the client MAY buffer the | |||
ciphertext and retry decryption once a key with that KID is received. | ciphertext and retry decryption once a key with that KID is received. | |||
If a ciphertext fails to decrypt for any other reason, the client | If a ciphertext fails to decrypt for any other reason, the client | |||
MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded | MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded | |||
in a way that is indistinguishable (to an external observer) from | in a way that is indistinguishable (to an external observer) from | |||
having processed a valid ciphertext. In other words, the SFrame | having processed a valid ciphertext. In other words, the SFrame | |||
decrypt operation should be constant-time, regardless of whether | decrypt operation should take the same amount of time regardless of | |||
decryption succeeds or fails. | whether decryption succeeds or fails. | |||
SFrame Ciphertext | SFrame Ciphertext | |||
+---------------+ | +---------------+ | |||
+---------------| SFrame Header | | +---------------| SFrame Header | | |||
| +---------------+ | | +---------------+ | |||
| | | | | | | | |||
| | |-----+ | | | |-----+ | |||
| | ciphertext | | | | | ciphertext | | | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
skipping to change at page 16, line 43 ¶ | skipping to change at line 648 ¶ | |||
| | | | |||
V | V | |||
+---------------+ | +---------------+ | |||
| | | | | | |||
| | | | | | |||
| plaintext | | | plaintext | | |||
| | | | | | |||
| | | | | | |||
+---------------+ | +---------------+ | |||
Figure 5: Decrypting an SFrame Ciphertext | Figure 6: Decrypting an SFrame Ciphertext | |||
4.5. Cipher Suites | 4.5. Cipher Suites | |||
Each SFrame session uses a single cipher suite that specifies the | Each SFrame session uses a single cipher suite that specifies the | |||
following primitives: | following primitives: | |||
* A hash function used for key derivation | * A hash function used for key derivation | |||
* An AEAD encryption algorithm [RFC5116] used for frame encryption, | * An AEAD encryption algorithm [RFC5116] used for frame encryption, | |||
optionally with a truncated authentication tag | optionally with a truncated authentication tag | |||
This document defines the following cipher suites, with the constants | This document defines the following cipher suites, with the constants | |||
defined in Section 4.4: | defined in Section 4.4: | |||
+============================+====+=====+====+====+====+ | +============================+====+=====+====+====+====+ | |||
| Name | Nh | Nka | Nk | Nn | Nt | | | Name | Nh | Nka | Nk | Nn | Nt | | |||
+============================+====+=====+====+====+====+ | +============================+====+=====+====+====+====+ | |||
| AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 | | | AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 | | |||
+----------------------------+----+-----+----+----+----+ | +----------------------------+----+-----+----+----+----+ | |||
| AES_128_CTR_HMAC_SHA256_64 | 32 | 16 | 48 | 12 | 8 | | | AES_128_CTR_HMAC_SHA256_64 | 32 | 16 | 48 | 12 | 8 | | |||
+----------------------------+----+-----+----+----+----+ | +----------------------------+----+-----+----+----+----+ | |||
| AES_128_CTR_HMAC_SHA256_32 | 32 | 16 | 48 | 12 | 4 | | | AES_128_CTR_HMAC_SHA256_32 | 32 | 16 | 48 | 12 | 4 | | |||
+----------------------------+----+-----+----+----+----+ | +----------------------------+----+-----+----+----+----+ | |||
| AES_128_GCM_SHA256_128 | 32 | n/a | 16 | 12 | 16 | | | AES_128_GCM_SHA256_128 | 32 | n/a | 16 | 12 | 16 | | |||
+----------------------------+----+-----+----+----+----+ | +----------------------------+----+-----+----+----+----+ | |||
| AES_256_GCM_SHA512_128 | 64 | n/a | 32 | 12 | 16 | | | AES_256_GCM_SHA512_128 | 64 | n/a | 32 | 12 | 16 | | |||
+----------------------------+----+-----+----+----+----+ | +----------------------------+----+-----+----+----+----+ | |||
Table 1: SFrame cipher suite constants | Table 1: SFrame Cipher Suite Constants | |||
Numeric identifiers for these cipher suites are defined in the IANA | Numeric identifiers for these cipher suites are defined in the IANA | |||
registry created in Section 8.1. | registry created in Section 8.1. | |||
In the suite names, the length of the authentication tag is indicated | In the suite names, the length of the authentication tag is indicated | |||
by the last value: "_128" indicates a hundred-twenty-eight-bit tag, | by the last value: "_128" indicates a 128-bit tag, "_80" indicates an | |||
"_80" indicates an eighty-bit tag, "_64" indicates a sixty-four-bit | 80-bit tag, "_64" indicates a 64-bit tag, and "_32" indicates a | |||
tag and "_32" indicates a thirty-two-bit tag. | 32-bit tag. | |||
In a session that uses multiple media streams, different cipher | In a session that uses multiple media streams, different cipher | |||
suites might be configured for different media streams. For example, | suites might be configured for different media streams. For example, | |||
in order to conserve bandwidth, a session might use a cipher suite | in order to conserve bandwidth, a session might use a cipher suite | |||
with eighty-bit tags for video frames and another cipher suite with | with 80-bit tags for video frames and another cipher suite with | |||
thirty-two-bit tags for audio frames. | 32-bit tags for audio frames. | |||
4.5.1. AES-CTR with SHA2 | 4.5.1. AES-CTR with SHA2 | |||
In order to allow very short tag sizes, we define a synthetic AEAD | In order to allow very short tag sizes, we define a synthetic AEAD | |||
function using the authenticated counter mode of AES together with | function using the authenticated counter mode of AES together with | |||
HMAC for authentication. We use an encrypt-then-MAC approach, as in | HMAC for authentication. We use an encrypt-then-MAC approach, as in | |||
SRTP [RFC3711]. | SRTP [RFC3711]. | |||
Before encryption or decryption, encryption and authentication | Before encryption or decryption, encryption and authentication | |||
subkeys are derived from the single AEAD key. The overall length of | subkeys are derived from the single AEAD key. The overall length of | |||
the AEAD key is Nka + Nh, where Nka represents the key size for the | the AEAD key is Nka + Nh, where Nka represents the key size for the | |||
AES block cipher in use and Nh represents the output size of the hash | AES block cipher in use and Nh represents the output size of the hash | |||
function (as in Table 2). The encryption subkey comprises the first | function (as in Section 4.4). The encryption subkey comprises the | |||
Nka bytes and the authentication subkey comprises the remaining Nh | first Nka bytes and the authentication subkey comprises the remaining | |||
bytes. | Nh bytes. | |||
def derive_subkeys(sframe_key): | def derive_subkeys(sframe_key): | |||
# The encryption key comprises the first Nka bytes | # The encryption key comprises the first Nka bytes | |||
enc_key = sframe_key[..Nka] | enc_key = sframe_key[..Nka] | |||
# The authentication key comprises Nh remaining bytes | # The authentication key comprises Nh remaining bytes | |||
auth_key = sframe_key[Nka..] | auth_key = sframe_key[Nka..] | |||
return enc_key, auth_key | return enc_key, auth_key | |||
skipping to change at page 20, line 7 ¶ | skipping to change at line 767 ¶ | |||
* Provisioning KID / base_key mappings to participating clients | * Provisioning KID / base_key mappings to participating clients | |||
* Updating the above data as clients join or leave | * Updating the above data as clients join or leave | |||
It is the responsibility of the application to provide the key | It is the responsibility of the application to provide the key | |||
management framework, as described in Section 9.2. | management framework, as described in Section 9.2. | |||
5.1. Sender Keys | 5.1. Sender Keys | |||
If the participants in a call have a pre-existing E2E-secure channel, | If the participants in a call have a preexisting E2E-secure channel, | |||
they can use it to distribute SFrame keys. Each client participating | they can use it to distribute SFrame keys. Each client participating | |||
in a call generates a fresh base_key value that it will use to | in a call generates a fresh base_key value that it will use to | |||
encrypt media. The client then uses the E2E-secure channel to send | encrypt media. The client then uses the E2E-secure channel to send | |||
their encryption key to the other participants. | their encryption key to the other participants. | |||
In this scheme, it is assumed that receivers have a signal outside of | In this scheme, it is assumed that receivers have a signal outside of | |||
SFrame for which client has sent a given frame (e.g., an RTP SSRC). | SFrame for which client has sent a given frame (e.g., an RTP | |||
SFrame KID values are then used to distinguish between versions of | synchronization source (SSRC)). SFrame KID values are then used to | |||
the sender's base_key. | distinguish between versions of the sender's base_key. | |||
Key IDs in this scheme have two parts: a "key generation" and a | KID values in this scheme have two parts: a "key generation" and a | |||
"ratchet step". Both are unsigned integers that begin at zero. The | "ratchet step". Both are unsigned integers that begin at zero. The | |||
key generation increments each time the sender distributes a new key | key generation increments each time the sender distributes a new key | |||
to receivers. The "ratchet step" is incremented each time the sender | to receivers. The ratchet step is incremented each time the sender | |||
ratchets their key forward for forward secrecy: | ratchets their key forward for forward secrecy: | |||
base_key[i+1] = HKDF-Expand( | base_key[i+1] = HKDF-Expand( | |||
HKDF-Extract("", base_key[i]), | HKDF-Extract("", base_key[i]), | |||
"SFrame 1.0 Ratchet", CipherSuite.Nh) | "SFrame 1.0 Ratchet", CipherSuite.Nh) | |||
For compactness, we do not send the whole ratchet step. Instead, we | For compactness, we do not send the whole ratchet step. Instead, we | |||
send only its low-order R bits, where R is a value set by the | send only its low-order R bits, where R is a value set by the | |||
application. Different senders may use different values of R, but | application. Different senders may use different values of R, but | |||
each receiver of a given sender needs to know what value of R is used | each receiver of a given sender needs to know what value of R is used | |||
by the sender so that they can recognize when they need to ratchet | by the sender so that they can recognize when they need to ratchet | |||
(vs. expecting a new key). R effectively defines a re-ordering | (vs. expecting a new key). R effectively defines a reordering | |||
window, since no more than 2^R ratchet steps can be active at a given | window, since no more than 2^R ratchet steps can be active at a given | |||
time. The key generation is sent in the remaining 64 - R bits of the | time. The key generation is sent in the remaining 64 - R bits of the | |||
key ID. | KID. | |||
KID = (key_generation << R) + (ratchet_step % (1 << R)) | KID = (key_generation << R) + (ratchet_step % (1 << R)) | |||
64-R bits R bits | 64-R bits R bits | |||
<---------------> <------------> | <---------------> <------------> | |||
+-----------------+--------------+ | +-----------------+--------------+ | |||
| Key Generation | Ratchet Step | | | Key Generation | Ratchet Step | | |||
+-----------------+--------------+ | +-----------------+--------------+ | |||
Figure 6: Structure of a KID in the Sender Keys scheme | Figure 7: Structure of a KID in the Sender Keys Scheme | |||
The sender signals such a ratchet step update by sending with a KID | The sender signals such a ratchet step update by sending with a KID | |||
value in which the ratchet step has been incremented. A receiver who | value in which the ratchet step has been incremented. A receiver who | |||
receives from a sender with a new KID computes the new key as above. | receives from a sender with a new KID computes the new key as above. | |||
The old key may be kept for some time to allow for out-of-order | The old key may be kept for some time to allow for out-of-order | |||
delivery, but should be deleted promptly. | delivery, but should be deleted promptly. | |||
If a new participant joins in the middle of a session, they will need | If a new participant joins in the middle of a session, they will need | |||
to receive from each sender (a) the current sender key for that | to receive from each sender (a) the current sender key for that | |||
sender and (b) the current KID value for the sender. Evicting a | sender and (b) the current KID value for the sender. Evicting a | |||
participant requires each sender to send a fresh sender key to all | participant requires each sender to send a fresh sender key to all | |||
receivers. | receivers. | |||
It is up to the application to decide when sender keys are updated. | It is the application's responsibility to decide when sender keys are | |||
A sender key may be updated by sending a new base_key (updating the | updated. A sender key may be updated by sending a new base_key | |||
key generation) or by hashing the current base_key (updating the | (updating the key generation) or by hashing the current base_key | |||
ratchet step). Ratcheting the key forward is useful when adding new | (updating the ratchet step). Ratcheting the key forward is useful | |||
receivers to an SFrame-based interaction, since it ensures that the | when adding new receivers to an SFrame-based interaction, since it | |||
new receivers can't decrypt any media encrypted before they were | ensures that the new receivers can't decrypt any media encrypted | |||
added. If a sender wishes to assure the opposite property when | before they were added. If a sender wishes to assure the opposite | |||
removing a receiver (i.e., ensuring that the receiver can't decrypt | property when removing a receiver (i.e., ensuring that the receiver | |||
media after they are removed), then the sender will need to | can't decrypt media after they are removed), then the sender will | |||
distribute a new sender key. | need to distribute a new sender key. | |||
5.2. MLS | 5.2. MLS | |||
The Messaging Layer Security (MLS) protocol provides group | The Messaging Layer Security (MLS) protocol provides group | |||
authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it | authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it | |||
could be used to instantiate the sender key scheme above, but it can | could be used to instantiate the sender key scheme above, but it can | |||
also be used more efficiently directly. | also be used more efficiently directly. | |||
MLS creates a linear sequence of keys, each of which is shared among | MLS creates a linear sequence of keys, each of which is shared among | |||
the members of a group at a given point in time. When a member joins | the members of a group at a given point in time. When a member joins | |||
or leaves the group, a new key is produced that is known only to the | or leaves the group, a new key is produced that is known only to the | |||
augmented or reduced group. Each step in the lifetime of the group | augmented or reduced group. Each step in the lifetime of the group | |||
is known as an "epoch", and each member of the group is assigned an | is known as an "epoch", and each member of the group is assigned an | |||
"index" that is constant for the time they are in the group. | "index" that is constant for the time they are in the group. | |||
To generate keys and nonces for SFrame, we use the MLS exporter | To generate keys and nonces for SFrame, we use the MLS exporter | |||
function to generate a base_key value for each MLS epoch. Each | function to generate a base_key value for each MLS epoch. Each | |||
member of the group is assigned a set of KID values, so that each | member of the group is assigned a set of KID values so that each | |||
member has a unique sframe_key and sframe_salt that it uses to | member has a unique sframe_key and sframe_salt that it uses to | |||
encrypt with. Senders may choose any KID value within their assigned | encrypt with. Senders may choose any KID value within their assigned | |||
set of KID values, e.g., to allow a single sender to send multiple | set of KID values, e.g., to allow a single sender to send multiple, | |||
uncoordinated outbound media streams. | uncoordinated outbound media streams. | |||
base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) | base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) | |||
For compactness, we do not send the whole epoch number. Instead, we | For compactness, we do not send the whole epoch number. Instead, we | |||
send only its low-order E bits, where E is a value set by the | send only its low-order E bits, where E is a value set by the | |||
application. E effectively defines a re-ordering window, since no | application. E effectively defines a reordering window, since no | |||
more than 2^E epochs can be active at a given time. Receivers MUST | more than 2^E epochs can be active at a given time. To handle | |||
be prepared for the epoch counter to roll over, removing an old epoch | rollover of the epoch counter, receivers MUST remove an old epoch | |||
when a new epoch with the same E lower bits is introduced. | when a new epoch with the same low-order E bits is introduced. | |||
Let S be the number of bits required to encode a member index in the | Let S be the number of bits required to encode a member index in the | |||
group, i.e., the smallest value such that group_size <= (1 << S). | group, i.e., the smallest value such that group_size <= (1 << S). | |||
The sender index is encoded in the S bits above the epoch. The | The sender index is encoded in the S bits above the epoch. The | |||
remaining 64 - S - E bits of the KID value are a context value chosen | remaining 64 - S - E bits of the KID value are a context value chosen | |||
by the sender (context value 0 will produce the shortest encoded | by the sender (context value 0 will produce the shortest encoded | |||
KID). | KID). | |||
KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) | KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) | |||
64-S-E bits S bits E bits | 64-S-E bits S bits E bits | |||
<-----------> <------> <------> | <-----------> <------> <------> | |||
+-------------+--------+-------+ | +-------------+--------+-------+ | |||
| Context ID | Index | Epoch | | | Context ID | Index | Epoch | | |||
+-------------+--------+-------+ | +-------------+--------+-------+ | |||
Figure 7: Structure of a KID for an MLS Sender | Figure 8: Structure of a KID for an MLS Sender | |||
Once an SFrame stack has been provisioned with the | Once an SFrame stack has been provisioned with the | |||
sframe_epoch_secret for an epoch, it can compute the required KID | sframe_epoch_secret for an epoch, it can compute the required KID | |||
values on demand (as well as the resulting SFrame keys/nonces derived | values on demand (as well as the resulting SFrame keys/nonces derived | |||
from the base_key and KID), as it needs to encrypt or decrypt for a | from the base_key and KID) as it needs to encrypt or decrypt for a | |||
given member. | given member. | |||
... | ... | |||
| | | | |||
| | | | |||
Epoch 14 +--+-- index=3 ---> KID = 0x3e | Epoch 14 +--+-- index=3 ---> KID = 0x3e | |||
| | | | | | |||
| +-- index=7 ---> KID = 0x7e | | +-- index=7 ---> KID = 0x7e | |||
| | | | | | |||
| +-- index=20 --> KID = 0x14e | | +-- index=20 --> KID = 0x14e | |||
skipping to change at page 23, line 32 ¶ | skipping to change at line 912 ¶ | |||
| +--> context = 3 --> KID = 0xc20 | | +--> context = 3 --> KID = 0xc20 | |||
| | | | |||
| | | | |||
Epoch 17 +--+-- index=33 --> KID = 0x211 | Epoch 17 +--+-- index=33 --> KID = 0x211 | |||
| | | | | | |||
| +-- index=51 --> KID = 0x331 | | +-- index=51 --> KID = 0x331 | |||
| | | | |||
| | | | |||
... | ... | |||
Figure 8: An example sequence of KIDs for an MLS-based SFrame | Figure 9: An Example Sequence of KIDs for an MLS-based SFrame | |||
session (E=4; S=6, allowing for 64 group members) | Session (E=4; S=6, Allowing for 64 Group Members) | |||
6. Media Considerations | 6. Media Considerations | |||
6.1. Selective Forwarding Units | 6.1. Selective Forwarding Units | |||
Selective Forwarding Units (SFUs) (e.g., those described in | SFUs (e.g., those described in Section 3.7 of [RFC7667]) receive the | |||
Section 3.7 of [RFC7667]) receive the media streams from each | media streams from each participant and select which ones should be | |||
participant and select which ones should be forwarded to each of the | forwarded to each of the other participants. There are several | |||
other participants. There are several approaches for stream | approaches for stream selection, but in general, the SFU needs to | |||
selection, but in general, the SFU needs to access metadata | access metadata associated with each frame and modify the RTP | |||
associated to each frame and modify the RTP information of the | information of the incoming packets when they are transmitted to the | |||
incoming packets when they are transmitted to the received | received participants. | |||
participants. | ||||
This section describes how this normal SFU modes of operation | This section describes how these normal SFU modes of operation | |||
interact with the E2EE provided by SFrame. | interact with the E2EE provided by SFrame. | |||
6.1.1. LastN and RTP stream reuse | 6.1.1. RTP Stream Reuse | |||
The SFU may choose to send only a certain number of streams based on | The SFU may choose to send only a certain number of streams based on | |||
the voice activity of the participants. To avoid the overhead | the voice activity of the participants. To avoid the overhead | |||
involved in establishing new transport streams, the SFU may decide to | involved in establishing new transport streams, the SFU may decide to | |||
reuse previously existing streams or even pre-allocate a predefined | reuse previously existing streams or even pre-allocate a predefined | |||
number of streams and choose in each moment in time which participant | number of streams and choose in each moment in time which participant | |||
media will be sent through it. | media will be sent through it. | |||
This means that in the same transport-level stream (e.g., an RTP | This means that the same transport-level stream (e.g., an RTP stream | |||
stream defined by either SSRC or MID) may carry media from different | defined by either SSRC or Media Identification (MID)) may carry media | |||
streams of different participants. As different keys are used by | from different streams of different participants. Because each | |||
each participant for encoding their media, the receiver will be able | participant uses a different key to encrypt their media, the receiver | |||
to verify which is the sender of the media coming within the RTP | will be able to verify the sender of the media within the RTP stream | |||
stream at any given point in time, preventing the SFU trying to | at any given point in time. Thus the receiver will correctly | |||
impersonate any of the participants with another participant's media. | associate the media with the sender indicated by the authenticated | |||
SFrame KID value, irrespective of how the SFU transmits the media to | ||||
the client. | ||||
Note that in order to prevent impersonation by a malicious | Note that in order to prevent impersonation by a malicious | |||
participant (not the SFU), a mechanism based on digital signature | participant (not the SFU), a mechanism based on digital signature | |||
would be required. SFrame does not protect against such attacks. | would be required. SFrame does not protect against such attacks. | |||
6.1.2. Simulcast | 6.1.2. Simulcast | |||
When using simulcast, the same input image will produce N different | When using simulcast, the same input image will produce N different | |||
encoded frames (one per simulcast layer) which would be processed | encoded frames (one per simulcast layer), which would be processed | |||
independently by the frame encryptor and assigned an unique counter | independently by the frame encryptor and assigned an unique CTR value | |||
for each. | for each. | |||
6.1.3. SVC | 6.1.3. Scalable Video Coding (SVC) | |||
In both temporal and spatial scalability, the SFU may choose to drop | In both temporal and spatial scalability, the SFU may choose to drop | |||
layers in order to match a certain bitrate or forward specific media | layers in order to match a certain bitrate or to forward specific | |||
sizes or frames per second. In order to support the SFU selectively | media sizes or frames per second. In order to support the SFU | |||
removing layers, the sender MUST encapsulate each layer in a | selectively removing layers, the sender MUST encapsulate each layer | |||
different SFrame ciphertext. | in a different SFrame ciphertext. | |||
6.2. Video Key Frames | 6.2. Video Key Frames | |||
Forward Security and Post-Compromise Security require that the E2EE | Forward security and post-compromise security require that the E2EE | |||
keys (base keys) are updated any time a participant joins or leaves | keys (base keys) are updated any time a participant joins or leaves | |||
the call. | the call. | |||
The key exchange happens asynchronously and on a different path than | The key exchange happens asynchronously and on a different path than | |||
the SFU signaling and media. So it may happen that when a new | the SFU signaling and media. So it may happen that when a new | |||
participant joins the call and the SFU side requests a key frame, the | participant joins the call and the SFU side requests a key frame, the | |||
sender generates the E2EE frame with a key not known by the receiver, | sender generates the E2EE frame with a key that is not known by the | |||
so it will be discarded. When the sender updates his sending key | receiver, so it will be discarded. When the sender updates his | |||
with the new key, it will send it in a non-key frame, so the receiver | sending key with the new key, it will send it in a non-key frame, so | |||
will be able to decrypt it, but not decode it. | the receiver will be able to decrypt it, but not decode it. | |||
The new Receiver will then re-request a key frame, but due to sender | The new receiver will then re-request a key frame, but due to sender | |||
and SFU policies, that new key frame could take some time to be | and SFU policies, that new key frame could take some time to be | |||
generated. | generated. | |||
If the sender sends a key frame after the new E2EE key is in use, the | If the sender sends a key frame after the new E2EE key is in use, the | |||
time required for the new participant to display the video is | time required for the new participant to display the video is | |||
minimized. | minimized. | |||
Note that this issue does not arise for media streams that do not | Note that this issue does not arise for media streams that do not | |||
have dependencies among frames, e.g., audio streams. In these | have dependencies among frames, e.g., audio streams. In these | |||
streams, each frame is independently decodeable, so there is never a | streams, each frame is independently decodable, so a frame never | |||
need to process two frames together which might be on two sides of a | depends on another frame that might be on the other side of a key | |||
key rotation. | rotation. | |||
6.3. Partial Decoding | 6.3. Partial Decoding | |||
Some codecs support partial decoding, where individual packets can be | Some codecs support partial decoding, where individual packets can be | |||
decoded without waiting for the full frame to arrive. When SFrame is | decoded without waiting for the full frame to arrive. When SFrame is | |||
applied per-frame, this won't be possible because the decoder cannot | applied per frame, partial decoding is not possible because the | |||
access data until an entire frame has arrived and has been decrypted. | decoder cannot access data until an entire frame has arrived and has | |||
been decrypted. | ||||
7. Security Considerations | 7. Security Considerations | |||
7.1. No Header Confidentiality | 7.1. No Header Confidentiality | |||
SFrame provides integrity protection to the SFrame Header (the key ID | SFrame provides integrity protection to the SFrame header (the KID | |||
and counter values), but does not provide confidentiality protection. | and CTR values), but it does not provide confidentiality protection. | |||
Parties that can observe the SFrame header may learn, for example, | Parties that can observe the SFrame header may learn, for example, | |||
which parties are sending SFrame payloads (from KID values) and at | which parties are sending SFrame payloads (from KID values) and at | |||
what rates (from CTR values). In cases where SFrame is used for end- | what rates (from CTR values). In cases where SFrame is used for end- | |||
to-end security on top of hop-by-hop protections (e.g., running over | to-end security on top of hop-by-hop protections (e.g., running over | |||
SRTP as described in Appendix C.5), the hop-by-hop security | SRTP as described in Appendix B.5), the hop-by-hop security | |||
mechanisms provide confidentiality protection of the SFrame header | mechanisms provide confidentiality protection of the SFrame header | |||
between hops. | between hops. | |||
7.2. No Per-Sender Authentication | 7.2. No Per-Sender Authentication | |||
SFrame does not provide per-sender authentication of media data. Any | SFrame does not provide per-sender authentication of media data. Any | |||
sender in a session can send media that will be associated with any | sender in a session can send media that will be associated with any | |||
other sender. This is because SFrame uses symmetric encryption to | other sender. This is because SFrame uses symmetric encryption to | |||
protect media data, so that any receiver also has the keys required | protect media data, so that any receiver also has the keys required | |||
to encrypt packets for the sender. | to encrypt packets for the sender. | |||
7.3. Key Management | 7.3. Key Management | |||
Key exchange mechanism is out of scope of this document, however | The specifics of key management are beyond the scope of this | |||
every client SHOULD change their keys when new clients joins or | document. However, every client SHOULD change their keys when new | |||
leaves the call for forward secrecy and post compromise security. | clients join or leave the call for forward secrecy and post- | |||
compromise security. | ||||
7.4. Replay | 7.4. Replay | |||
The handling of replay is out of the scope of this document. | The handling of replay is out of the scope of this document. | |||
However, senders MUST reject requests to encrypt multiple times with | However, senders MUST reject requests to encrypt multiple times with | |||
the same key and nonce, since several AEAD algorithms fail badly in | the same key and nonce since several AEAD algorithms fail badly in | |||
such cases (see, e.g., Section 5.1.1 of [RFC5116]). | such cases (see, e.g., Section 5.1.1 of [RFC5116]). | |||
7.5. Risks due to Short Tags | 7.5. Risks Due to Short Tags | |||
The SFrame ciphersuites based on AES-CTR allow for the use of short | The SFrame cipher suites based on AES-CTR allow for the use of short | |||
authentication tags, which bring a higher risk that an attacker will | authentication tags, which bring a higher risk that an attacker will | |||
be able to cause an SFrame receiver to accept an SFrame ciphertext of | be able to cause an SFrame receiver to accept an SFrame ciphertext of | |||
the attacker's choosing. | the attacker's choosing. | |||
Assuming that the authentication properties of the ciphersuite are | Assuming that the authentication properties of the cipher suite are | |||
robust, the only attack that an attacker can mount is an attempt to | robust, the only attack that an attacker can mount is an attempt to | |||
find an acceptable (ciphertext, tag) combination through brute force. | find an acceptable (ciphertext, tag) combination through brute force. | |||
Such a brute-force attack will have an expected success rate of the | Such a brute-force attack will have an expected success rate of the | |||
following form: | following form: | |||
attacker_success_rate = attempts_per_second / 2^(8*Nt) | attacker_success_rate = attempts_per_second / 2^(8*Nt) | |||
For example, a gigabit ethernet connection is able to transmit | For example, a gigabit Ethernet connection is able to transmit | |||
roughly 2^20 packets per second. If an attacker saturated such a | roughly 2^20 packets per second. If an attacker saturated such a | |||
link with guesses against a 32-bit authentication tag (Nt=4), then | link with guesses against a 32-bit authentication tag (Nt=4), then | |||
the attacker would succeed on average roughly once every 2^12 | the attacker would succeed on average roughly once every 2^12 | |||
seconds, or about once an hour. | seconds, or about once an hour. | |||
In a typical SFrame usage in a real-time media application, there are | In a typical SFrame usage in a real-time media application, there are | |||
a few approaches to mitigating this risk: | a few approaches to mitigating this risk: | |||
* Receivers only accept SFrame ciphertexts over HBH-secure channels | * Receivers only accept SFrame ciphertexts over HBH-secure channels | |||
(e.g., SRTP security associations or QUIC connections). If this | (e.g., SRTP security associations or QUIC connections). If this | |||
skipping to change at page 27, line 19 ¶ | skipping to change at line 1078 ¶ | |||
* The expected packet rate for a media stream is very predictable | * The expected packet rate for a media stream is very predictable | |||
(and typically far lower than the above example). On the one | (and typically far lower than the above example). On the one | |||
hand, attacks at this rate will succeed even less often than the | hand, attacks at this rate will succeed even less often than the | |||
high-rate attack described above. On the other hand, the | high-rate attack described above. On the other hand, the | |||
application may use an elevated packet arrival rate as a signal of | application may use an elevated packet arrival rate as a signal of | |||
a brute-force attack. This latter approach is common in other | a brute-force attack. This latter approach is common in other | |||
settings, e.g., mitigating brute-force attacks on passwords. | settings, e.g., mitigating brute-force attacks on passwords. | |||
* Media applications typically do not provide feedback to media | * Media applications typically do not provide feedback to media | |||
senders as to which media packets failed to decrypt. When media | senders as to which media packets failed to decrypt. When media- | |||
quality feedback mechanisms are used, decryption failures will | quality feedback mechanisms are used, decryption failures will | |||
typically appear as packet losses, but only at an aggregate level. | typically appear as packet losses, but only at an aggregate level. | |||
* Anti-replay mechanisms (see Section 7.4) prevent the attacker from | * Anti-replay mechanisms (see Section 7.4) prevent the attacker from | |||
re-using valid ciphertexts (either observed or guessed by the | reusing valid ciphertexts (either observed or guessed by the | |||
attacker). A receiver applying anti-replay controls will only | attacker). A receiver applying anti-replay controls will only | |||
accept one valid plaintext per CTR value. Since the CTR value is | accept one valid plaintext per CTR value. Since the CTR value is | |||
covered by SFrame authentication, an attacker has to do a fresh | covered by SFrame authentication, an attacker has to do a fresh | |||
search for a valid tag for every forged ciphertext, even if the | search for a valid tag for every forged ciphertext, even if the | |||
encrypted content is unchanged. In other words, when the above | encrypted content is unchanged. In other words, when the above | |||
brute force attack succeeds, it only allows the attacker to send a | brute-force attack succeeds, it only allows the attacker to send a | |||
single SFrame ciphertext; the ciphertext cannot be reused because | single SFrame ciphertext; the ciphertext cannot be reused because | |||
either it will have the same CTR value and be discarded as a | either it will have the same CTR value and be discarded as a | |||
replay, or else it will have a different CTR value its tag will no | replay, or else it will have a different CTR value and its tag | |||
longer be valid. | will no longer be valid. | |||
Nonetheless, without these mitigations, an application that makes use | Nonetheless, without these mitigations, an application that makes use | |||
of short tags will be at heightened risk of forgery attacks. In many | of short tags will be at heightened risk of forgery attacks. In many | |||
cases, it is simpler to use full-size tags and tolerate slightly | cases, it is simpler to use full-size tags and tolerate slightly | |||
higher bandwidth usage rather than add the additional defenses | higher bandwidth usage rather than to add the additional defenses | |||
necessary to safely use short tags. | necessary to safely use short tags. | |||
8. IANA Considerations | 8. IANA Considerations | |||
This document requests the creation of the following new IANA | IANA has created a new registry called "SFrame Cipher Suites" | |||
registry: | (Section 8.1) under the "SFrame" group registry heading. | |||
* SFrame Cipher Suites (Section 8.1) | ||||
This registry should be under a heading of "SFrame", and assignments | ||||
are made via the Specification Required policy [RFC8126]. | ||||
RFC EDITOR: Please replace XXXX throughout with the RFC number | ||||
assigned to this document | ||||
8.1. SFrame Cipher Suites | 8.1. SFrame Cipher Suites | |||
This registry lists identifiers for SFrame cipher suites, as defined | The "SFrame Cipher Suites" registry lists identifiers for SFrame | |||
in Section 4.5. The cipher suite field is two bytes wide, so the | cipher suites as defined in Section 4.5. The cipher suite field is | |||
valid cipher suites are in the range 0x0000 to 0xFFFF. | two bytes wide, so the valid cipher suites are in the range 0x0000 to | |||
0xFFFF. Except as noted below, assignments are made via the | ||||
Specification Required policy [RFC8126]. | ||||
Template: | The registration template is as follows: | |||
* Value: The numeric value of the cipher suite | * Value: The numeric value of the cipher suite | |||
* Name: The name of the cipher suite | * Name: The name of the cipher suite | |||
* Recommended: Whether support for this cipher suite is recommended | * Recommended: Whether support for this cipher suite is recommended | |||
by the IETF. Valid values are "Y", "N", and "D", as described in | by the IETF. Valid values are "Y", "N", and "D" as described in | |||
Section 17.1 of [MLS-PROTO]. The default value of the | Section 17.1 of [MLS-PROTO]. The default value of the | |||
"Recommended" column is "N". Setting the Recommended item to "Y" | "Recommended" column is "N". Setting the Recommended item to "Y" | |||
or "D", or changing an item whose current value is "Y" or "D", | or "D", or changing an item whose current value is "Y" or "D", | |||
requires Standards Action [RFC8126]. | requires Standards Action [RFC8126]. | |||
* Reference: The document where this cipher suite is defined | * Reference: The document where this cipher suite is defined | |||
* Change Controller: Who is authorized to update the row in the | ||||
registry | ||||
Initial contents: | Initial contents: | |||
+=================+============================+===+===========+ | +========+============================+===+===========+============+ | |||
| Value | Name | R | Reference | | | Value | Name | R | Reference | Change | | |||
+=================+============================+===+===========+ | | | | | | Controller | | |||
| 0x0000 | Reserved | - | RFC XXXX | | +========+============================+===+===========+============+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0000 | Reserved | - | RFC 9605 | IETF | | |||
| 0x0001 | AES_128_CTR_HMAC_SHA256_80 | Y | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0001 | AES_128_CTR_HMAC_SHA256_80 | Y | RFC 9605 | IETF | | |||
| 0x0002 | AES_128_CTR_HMAC_SHA256_64 | Y | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0002 | AES_128_CTR_HMAC_SHA256_64 | Y | RFC 9605 | IETF | | |||
| 0x0003 | AES_128_CTR_HMAC_SHA256_32 | Y | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0003 | AES_128_CTR_HMAC_SHA256_32 | Y | RFC 9605 | IETF | | |||
| 0x0004 | AES_128_GCM_SHA256_128 | Y | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0004 | AES_128_GCM_SHA256_128 | Y | RFC 9605 | IETF | | |||
| 0x0005 | AES_256_GCM_SHA512_128 | Y | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0x0005 | AES_256_GCM_SHA512_128 | Y | RFC 9605 | IETF | | |||
| 0xF000 - 0xFFFF | Reserved for private use | - | RFC XXXX | | +--------+----------------------------+---+-----------+------------+ | |||
+-----------------+----------------------------+---+-----------+ | | 0xF000 | Reserved for Private Use | - | RFC 9605 | IETF | | |||
| - | | | | | | ||||
| 0xFFFF | | | | | | ||||
+--------+----------------------------+---+-----------+------------+ | ||||
Table 2: SFrame cipher suites | Table 2: SFrame Cipher Suites | |||
9. Application Responsibilities | 9. Application Responsibilities | |||
To use SFrame, an application needs to define the inputs to the | To use SFrame, an application needs to define the inputs to the | |||
SFrame encryption and decryption operations, and how SFrame | SFrame encryption and decryption operations, and how SFrame | |||
ciphertexts are delivered from sender to receiver (including any | ciphertexts are delivered from sender to receiver (including any | |||
fragmentation and reassembly). In this section, we lay out | fragmentation and reassembly). In this section, we lay out | |||
additional requirements that an integration must meet in order for | additional requirements that an application must meet in order for | |||
SFrame to operate securely. | SFrame to operate securely. | |||
In general, an application using SFrame is responsible for | In general, an application using SFrame is responsible for | |||
configuring SFrame. The application must first define when SFrame is | configuring SFrame. The application must first define when SFrame is | |||
applied at all. When SFrame is applied, the application must define | applied at all. When SFrame is applied, the application must define | |||
which cipher suite is to be used. If new versions of SFrame are | which cipher suite is to be used. If new versions of SFrame are | |||
defined in the future, it will be up to the application to determine | defined in the future, it will be the application's responsibility to | |||
which version should be used. | determine which version should be used. | |||
This division of responsibilities is similar to the way other media | This division of responsibilities is similar to the way other media | |||
parameters (e.g., codecs) are typically handled in media | parameters (e.g., codecs) are typically handled in media | |||
applications, in the sense that they are set up in some signaling | applications, in the sense that they are set up in some signaling | |||
protocol, and then not described in the media. Applications might | protocol and not described in the media. Applications might find it | |||
find it useful to extend the protocols used for negotiating other | useful to extend the protocols used for negotiating other media | |||
media parameters (e.g., SDP [RFC8866]) to also negotiate parameters | parameters (e.g., Session Description Protocol (SDP) [RFC8866]) to | |||
for SFrame. | also negotiate parameters for SFrame. | |||
9.1. Header Value Uniqueness | 9.1. Header Value Uniqueness | |||
Applications MUST ensure that each (base_key, KID, CTR) combination | Applications MUST ensure that each (base_key, KID, CTR) combination | |||
is used for at most one SFrame encryption operation. This ensures | is used for at most one SFrame encryption operation. This ensures | |||
that the (key, nonce) pairs used by the underlying AEAD algorithm are | that the (key, nonce) pairs used by the underlying AEAD algorithm are | |||
never reused. Typically this is done by assigning each sender a KID | never reused. Typically this is done by assigning each sender a KID | |||
or set of KIDs, then having each sender use the CTR field as a | or set of KIDs, then having each sender use the CTR field as a | |||
monotonic counter, incrementing for each plaintext that is encrypted. | monotonic counter, incrementing for each plaintext that is encrypted. | |||
In addition to its simplicity, this scheme minimizes overhead by | In addition to its simplicity, this scheme minimizes overhead by | |||
keeping CTR values as small as possible. | keeping CTR values as small as possible. | |||
In applications where an SFrame context might be written to | In applications where an SFrame context might be written to | |||
persistent storage, this context needs to include the last used CTR | persistent storage, this context needs to include the last-used CTR | |||
value. When the context is used later, the application should use | value. When the context is used later, the application should use | |||
the stored CTR value to determine the next CTR value to be used in an | the stored CTR value to determine the next CTR value to be used in an | |||
encryption operation, and then write the next CTR value back to | encryption operation, and then write the next CTR value back to | |||
storage before using the CTR value for encryption. Storing the CTR | storage before using the CTR value for encryption. Storing the CTR | |||
value before usage (vs. after) helps ensure that a storage failure | value before usage (vs. after) helps ensure that a storage failure | |||
will not cause reuse of the same (base_key, KID, CTR) combination. | will not cause reuse of the same (base_key, KID, CTR) combination. | |||
9.2. Key Management Framework | 9.2. Key Management Framework | |||
It is up to the application to provision SFrame with a mapping of KID | The application is responsible for provisioning SFrame with a mapping | |||
values to base_key values and the resulting keys and salts. More | of KID values to base_key values and the resulting keys and salts. | |||
importantly, the application specifies which KID values are used for | More importantly, the application specifies which KID values are used | |||
which purposes (e.g., by which senders). An application's KID | for which purposes (e.g., by which senders). An application's KID | |||
assignment strategy MUST be structured to assure the non-reuse | assignment strategy MUST be structured to assure the non-reuse | |||
properties discussed in Section 9.1. | properties discussed in Section 9.1. | |||
It is also up to the application to define a rotation schedule for | The application is also responsible for defining a rotation schedule | |||
keys. For example, one application might have an ephemeral group for | for keys. For example, one application might have an ephemeral group | |||
every call and keep rotating keys when end points join or leave the | for every call and keep rotating keys when endpoints join or leave | |||
call, while another application could have a persistent group that | the call, while another application could have a persistent group | |||
can be used for multiple calls and simply derives ephemeral symmetric | that can be used for multiple calls and simply derives ephemeral | |||
keys for a specific call. | symmetric keys for a specific call. | |||
It should be noted that KID values are not encrypted by SFrame, and | It should be noted that KID values are not encrypted by SFrame and | |||
are thus visible to any application-layer intermediaries that might | are thus visible to any application-layer intermediaries that might | |||
handle an SFrame ciphertext. If there are application semantics | handle an SFrame ciphertext. If there are application semantics | |||
included in KID values, then this information would be exposed to | included in KID values, then this information would be exposed to | |||
intermediaries. For example, in the scheme of Section 5.1, the | intermediaries. For example, in the scheme of Section 5.1, the | |||
number of ratchet steps per sender is exposed, and in the scheme of | number of ratchet steps per sender is exposed, and in the scheme of | |||
Section 5.2, the number of epochs and the MLS sender ID of the SFrame | Section 5.2, the number of epochs and the MLS sender ID of the SFrame | |||
sender are exposed. | sender are exposed. | |||
9.3. Anti-Replay | 9.3. Anti-Replay | |||
It is the responsibility of the application to handle anti-replay. | It is the responsibility of the application to handle anti-replay. | |||
Replay by network attackers is assumed to be prevented by network- | Replay by network attackers is assumed to be prevented by network- | |||
layer facilities (e.g., TLS, SRTP). As mentioned in Section 7.4, | layer facilities (e.g., TLS, SRTP). As mentioned in Section 7.4, | |||
senders MUST reject requests to encrypt multiple times with the same | senders MUST reject requests to encrypt multiple times with the same | |||
key and nonce. | key and nonce. | |||
It is not mandatory to implement anti-replay on the receiver side. | It is not mandatory to implement anti-replay on the receiver side. | |||
Receivers MAY apply time or counter based anti-replay mitigations. | Receivers MAY apply time- or counter-based anti-replay mitigations. | |||
For example, Section 3.3.2 of [RFC3711] specifies a counter-based | For example, Section 3.3.2 of [RFC3711] specifies a counter-based | |||
anti-replay mitigation, which could be adapted to use with SFrame, | anti-replay mitigation, which could be adapted to use with SFrame, | |||
using the CTR field as the counter. | using the CTR field as the counter. | |||
9.4. Metadata | 9.4. Metadata | |||
The metadata input to SFrame operations is pure application-specified | The metadata input to SFrame operations is an opaque byte string | |||
data. As such, it is up to the application to define what | specified by the application. As such, the application needs to | |||
information should go in the metadata input and ensure that it is | define what information should go in the metadata input and ensure | |||
provided to the encryption and decryption functions at the | that it is provided to the encryption and decryption functions at the | |||
appropriate points. A receiver MUST NOT use SFrame-authenticated | appropriate points. A receiver MUST NOT use SFrame-authenticated | |||
metadata until after the SFrame decrypt function has authenticated | metadata until after the SFrame decrypt function has authenticated | |||
it, unless the purpose of such usage is to prepare an SFrame | it, unless the purpose of such usage is to prepare an SFrame | |||
ciphertext for SFrame decryption. Essentially, metadata may be used | ciphertext for SFrame decryption. Essentially, metadata may be used | |||
"upstream of SFrame" in a processing pipeline, but only to prepare | "upstream of SFrame" in a processing pipeline, but only to prepare | |||
for SFrame decryption. | for SFrame decryption. | |||
For example, consider an application where SFrame is used to encrypt | For example, consider an application where SFrame is used to encrypt | |||
audio frames that are sent over SRTP, with some application data | audio frames that are sent over SRTP, with some application data | |||
included in the RTP header extension. Suppose the application also | included in the RTP header extension. Suppose the application also | |||
includes this application data in the SFrame metadata, so that the | includes this application data in the SFrame metadata, so that the | |||
SFU is allowed to read, but not modify the application data. A | SFU is allowed to read, but not modify, the application data. A | |||
receiver can use the application data in the RTP header extension as | receiver can use the application data in the RTP header extension as | |||
part of the standard SRTP decryption process, since this is required | part of the standard SRTP decryption process since this is required | |||
to recover the SFrame ciphertext carried in the SRTP payload. | to recover the SFrame ciphertext carried in the SRTP payload. | |||
However, the receiver MUST NOT use the application data for other | However, the receiver MUST NOT use the application data for other | |||
purposes before SFrame decryption has authenticated the application | purposes before SFrame decryption has authenticated the application | |||
data. | data. | |||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[MLS-PROTO] | [MLS-PROTO] | |||
Barnes, R., Beurdouche, B., Robert, R., Millican, J., | Barnes, R., Beurdouche, B., Robert, R., Millican, J., | |||
Omara, E., and K. Cohn-Gordon, "The Messaging Layer | Omara, E., and K. Cohn-Gordon, "The Messaging Layer | |||
Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, | Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, | |||
July 2023, <https://www.rfc-editor.org/rfc/rfc9420>. | July 2023, <https://www.rfc-editor.org/info/rfc9420>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/rfc/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated | [RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated | |||
Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, | Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, | |||
<https://www.rfc-editor.org/rfc/rfc5116>. | <https://www.rfc-editor.org/info/rfc5116>. | |||
[RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand | [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand | |||
Key Derivation Function (HKDF)", RFC 5869, | Key Derivation Function (HKDF)", RFC 5869, | |||
DOI 10.17487/RFC5869, May 2010, | DOI 10.17487/RFC5869, May 2010, | |||
<https://www.rfc-editor.org/rfc/rfc5869>. | <https://www.rfc-editor.org/info/rfc5869>. | |||
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | |||
Writing an IANA Considerations Section in RFCs", BCP 26, | Writing an IANA Considerations Section in RFCs", BCP 26, | |||
RFC 8126, DOI 10.17487/RFC8126, June 2017, | RFC 8126, DOI 10.17487/RFC8126, June 2017, | |||
<https://www.rfc-editor.org/rfc/rfc8126>. | <https://www.rfc-editor.org/info/rfc8126>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
10.2. Informative References | 10.2. Informative References | |||
[I-D.codec-agnostic-rtp-payload-format] | ||||
Murillo, S. G. and A. Gouaillard, "Codec agnostic RTP | ||||
payload format for video", Work in Progress, Internet- | ||||
Draft, draft-codec-agnostic-rtp-payload-format-00, 19 | ||||
February 2021, <https://datatracker.ietf.org/doc/html/ | ||||
draft-codec-agnostic-rtp-payload-format-00>. | ||||
[I-D.ietf-moq-transport] | ||||
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and | ||||
I. Swett, "Media over QUIC Transport", Work in Progress, | ||||
Internet-Draft, draft-ietf-moq-transport-03, 4 March 2024, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-moq- | ||||
transport-03>. | ||||
[I-D.ietf-webtrans-overview] | ||||
Vasiliev, V., "The WebTransport Protocol Framework", Work | ||||
in Progress, Internet-Draft, draft-ietf-webtrans-overview- | ||||
07, 4 March 2024, <https://datatracker.ietf.org/doc/html/ | ||||
draft-ietf-webtrans-overview-07>. | ||||
[MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and | [MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and | |||
A. Duric, "The Messaging Layer Security (MLS) | A. Duric, "The Messaging Layer Security (MLS) | |||
Architecture", Work in Progress, Internet-Draft, draft- | Architecture", Work in Progress, Internet-Draft, draft- | |||
ietf-mls-architecture-13, 22 March 2024, | ietf-mls-architecture-15, 3 August 2024, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-mls- | <https://datatracker.ietf.org/doc/html/draft-ietf-mls- | |||
architecture-13>. | architecture-15>. | |||
[MOQ-TRANSPORT] | ||||
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and | ||||
I. Swett, Ed., "Media over QUIC Transport", Work in | ||||
Progress, Internet-Draft, draft-ietf-moq-transport-05, 8 | ||||
July 2024, <https://datatracker.ietf.org/doc/html/draft- | ||||
ietf-moq-transport-05>. | ||||
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | |||
Norrman, "The Secure Real-time Transport Protocol (SRTP)", | Norrman, "The Secure Real-time Transport Protocol (SRTP)", | |||
RFC 3711, DOI 10.17487/RFC3711, March 2004, | RFC 3711, DOI 10.17487/RFC3711, March 2004, | |||
<https://www.rfc-editor.org/rfc/rfc3711>. | <https://www.rfc-editor.org/info/rfc3711>. | |||
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | |||
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, | Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, | |||
September 2012, <https://www.rfc-editor.org/rfc/rfc6716>. | September 2012, <https://www.rfc-editor.org/info/rfc6716>. | |||
[RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | |||
B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | |||
for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | |||
DOI 10.17487/RFC7656, November 2015, | DOI 10.17487/RFC7656, November 2015, | |||
<https://www.rfc-editor.org/rfc/rfc7656>. | <https://www.rfc-editor.org/info/rfc7656>. | |||
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | |||
DOI 10.17487/RFC7667, November 2015, | DOI 10.17487/RFC7667, November 2015, | |||
<https://www.rfc-editor.org/rfc/rfc7667>. | <https://www.rfc-editor.org/info/rfc7667>. | |||
[RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, | [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, | |||
"Double Encryption Procedures for the Secure Real-Time | "Double Encryption Procedures for the Secure Real-Time | |||
Transport Protocol (SRTP)", RFC 8723, | Transport Protocol (SRTP)", RFC 8723, | |||
DOI 10.17487/RFC8723, April 2020, | DOI 10.17487/RFC8723, April 2020, | |||
<https://www.rfc-editor.org/rfc/rfc8723>. | <https://www.rfc-editor.org/info/rfc8723>. | |||
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | |||
Session Description Protocol", RFC 8866, | Session Description Protocol", RFC 8866, | |||
DOI 10.17487/RFC8866, January 2021, | DOI 10.17487/RFC8866, January 2021, | |||
<https://www.rfc-editor.org/rfc/rfc8866>. | <https://www.rfc-editor.org/info/rfc8866>. | |||
[TestVectors] | [RTP-PAYLOAD] | |||
"SFrame Test Vectors", 2023, | Murillo, S. G., Fablet, Y., and A. Gouaillard, "Codec | |||
<https://github.com/eomara/sframe/blob/master/test- | agnostic RTP payload format for video", Work in Progress, | |||
vectors.json>. | Internet-Draft, draft-gouaillard-avtcore-codec-agn-rtp- | |||
payload-01, 9 March 2021, | ||||
<https://datatracker.ietf.org/doc/html/draft-gouaillard- | ||||
avtcore-codec-agn-rtp-payload-01>. | ||||
Appendix A. Acknowledgements | [TestVectors] | |||
"SFrame Test Vectors", commit 025d568, September 2023, | ||||
<https://github.com/sframe-wg/sframe/blob/025d568/test- | ||||
vectors/test-vectors.json>. | ||||
The authors wish to specially thank Dr. Alex Gouaillard as one of the | [WEBTRANSPORT] | |||
early contributors to the document. His passion and energy were key | Vasiliev, V., "The WebTransport Protocol Framework", Work | |||
to the design and development of SFrame. | in Progress, Internet-Draft, draft-ietf-webtrans-overview- | |||
07, 4 March 2024, <https://datatracker.ietf.org/doc/html/ | ||||
draft-ietf-webtrans-overview-07>. | ||||
Appendix B. Example API | Appendix A. Example API | |||
*This section is not normative.* | *This section is not normative.* | |||
This section describes a notional API that an SFrame implementation | This section describes a notional API that an SFrame implementation | |||
might expose. The core concept is an "SFrame context", within which | might expose. The core concept is an "SFrame context", within which | |||
KID values are meaningful. In the key management scheme described in | KID values are meaningful. In the key management scheme described in | |||
Section 5.1, each sender has a different context; in the scheme | Section 5.1, each sender has a different context; in the scheme | |||
described in Section 5.2, all senders share the same context. | described in Section 5.2, all senders share the same context. | |||
An SFrame context stores mappings from KID values to "key contexts", | An SFrame context stores mappings from KID values to "key contexts", | |||
which are different depending on whether the KID is to be used for | which are different depending on whether the KID is to be used for | |||
sending or receiving (an SFrame key should never be used for both | sending or receiving (an SFrame key should never be used for both | |||
operations). A key context tracks the key and salt associated to the | operations). A key context tracks the key and salt associated to the | |||
KID, and the current CTR value. A key context to be used for sending | KID, and the current CTR value. A key context to be used for sending | |||
also tracks the next CTR value to be used. | also tracks the next CTR value to be used. | |||
The primary operations on an SFrame context are as follows: | The primary operations on an SFrame context are as follows: | |||
* *Create an SFrame context:* The context is initialized with a | * *Create an SFrame context:* The context is initialized with a | |||
ciphersuite and no KID mappings. | cipher suite and no KID mappings. | |||
* *Adding a key for sending:* The key and salt are derived from the | * *Add a key for sending:* The key and salt are derived from the | |||
base key, and used to initialize a send context, together with a | base key and used to initialize a send context, together with a | |||
zero counter value. | zero CTR value. | |||
* *Adding a key for receiving:* The key and salt are derived from | * *Add a key for receiving:* The key and salt are derived from the | |||
the base key, and used to initialize a send context. | base key and used to initialize a send context. | |||
* *Encrypt a plaintext:* Encrypt a given plaintext using the key for | * *Encrypt a plaintext:* Encrypt a given plaintext using the key for | |||
a given KID, including the specified metadata. | a given KID, including the specified metadata. | |||
* *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with | * *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with | |||
the KID and CTR values specified in the SFrame Header, and the | the KID and CTR values specified in the SFrame header, and the | |||
provided metadata. | provided metadata. | |||
Figure 9 shows an example of the types of structures and methods that | Figure 10 shows an example of the types of structures and methods | |||
could be used to create an SFrame API in Rust. | that could be used to create an SFrame API in Rust. | |||
type KeyId = u64; | type KeyId = u64; | |||
type Counter = u64; | type Counter = u64; | |||
type CipherSuite = u16; | type CipherSuite = u16; | |||
struct SendKeyContext { | struct SendKeyContext { | |||
key: Vec<u8>, | key: Vec<u8>, | |||
salt: Vec<u8>, | salt: Vec<u8>, | |||
next_counter: Counter, | next_counter: Counter, | |||
} | } | |||
skipping to change at page 34, line 48 ¶ | skipping to change at line 1432 ¶ | |||
trait SFrameContextMethods { | trait SFrameContextMethods { | |||
fn create(cipher_suite: CipherSuite) -> Self; | fn create(cipher_suite: CipherSuite) -> Self; | |||
fn add_send_key(&self, kid: KeyId, base_key: &[u8]); | fn add_send_key(&self, kid: KeyId, base_key: &[u8]); | |||
fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); | fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); | |||
fn encrypt(&mut self, kid: KeyId, metadata: &[u8], | fn encrypt(&mut self, kid: KeyId, metadata: &[u8], | |||
plaintext: &[u8]) -> Vec<u8>; | plaintext: &[u8]) -> Vec<u8>; | |||
fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>; | fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>; | |||
} | } | |||
Figure 9: An Example SFrame API | Figure 10: An Example SFrame API | |||
Appendix C. Overhead Analysis | Appendix B. Overhead Analysis | |||
Any use of SFrame will impose overhead in terms of the amount of | Any use of SFrame will impose overhead in terms of the amount of | |||
bandwidth necessary to transmit a given media stream. Exactly how | bandwidth necessary to transmit a given media stream. Exactly how | |||
much overhead will be added depends on several factors: | much overhead will be added depends on several factors: | |||
* How many senders are involved in a conference (length of KID) | * The number of senders involved in a conference (length of KID) | |||
* How long the conference has been going on (length of CTR) | * The duration of the conference (length of CTR) | |||
* The cipher suite in use (length of authentication tag) | * The cipher suite in use (length of authentication tag) | |||
* Whether SFrame is used to encrypt packets, whole frames, or some | * Whether SFrame is used to encrypt packets, whole frames, or some | |||
other unit | other unit | |||
Overall, the overhead rate in kilobits per second can be estimated | Overall, the overhead rate in kilobits per second can be estimated | |||
as: | as: | |||
OverheadKbps = (1 + |CTR| + |KID| + |TAG|) * 8 * CTPerSecond / 1024 | OverheadKbps = (1 + |CTR| + |KID| + |TAG|) * 8 * CTPerSecond / 1024 | |||
Here the constant value 1 reflects the fixed SFrame header; |CTR| | Here the constant value 1 reflects the fixed SFrame header; |CTR| | |||
and |KID| reflect the lengths of those fields; |TAG| reflects the | and |KID| reflect the lengths of those fields; |TAG| reflects the | |||
cipher overhead; and CTPerSecond reflects the number of SFrame | cipher overhead; and CTPerSecond reflects the number of SFrame | |||
ciphertexts sent per second (e.g., packets or frames per second). | ciphertexts sent per second (e.g., packets or frames per second). | |||
In the remainder of this secton, we compute overhead estimates for a | In the remainder of this section, we compute overhead estimates for a | |||
collection of common scenarios. | collection of common scenarios. | |||
C.1. Assumptions | B.1. Assumptions | |||
In the below calculations, we make conservative assumptions about | In the below calculations, we make conservative assumptions about | |||
SFrame overhead, so that the overhead amounts we compute here are | SFrame overhead so that the overhead amounts we compute here are | |||
likely to be an upper bound on those seen in practice. | likely to be an upper bound of those seen in practice. | |||
+==============+=======+============================+ | +==============+=======+============================+ | |||
| Field | Bytes | Explanation | | | Field | Bytes | Explanation | | |||
+==============+=======+============================+ | +==============+=======+============================+ | |||
| Fixed header | 1 | Fixed | | | Config byte | 1 | Fixed | | |||
+--------------+-------+----------------------------+ | +--------------+-------+----------------------------+ | |||
| Key ID (KID) | 2 | >255 senders; or MLS epoch | | | Key ID (KID) | 2 | >255 senders; or MLS epoch | | |||
| | | (E=4) and >16 senders | | | | | (E=4) and >16 senders | | |||
+--------------+-------+----------------------------+ | +--------------+-------+----------------------------+ | |||
| Counter | 3 | More than 24 hours of | | | Counter | 3 | More than 24 hours of | | |||
| (CTR) | | media in common cases | | | (CTR) | | media in common cases | | |||
+--------------+-------+----------------------------+ | +--------------+-------+----------------------------+ | |||
| Cipher | 16 | Full GCM tag (longest | | | Cipher | 16 | Full authentication tag | | |||
| overhead | | defined here) | | | overhead | | (longest defined here) | | |||
+--------------+-------+----------------------------+ | +--------------+-------+----------------------------+ | |||
Table 3 | Table 3: Overhead Analysis Assumptions | |||
In total, then, we assume that each SFrame encryption will add 22 | In total, then, we assume that each SFrame encryption will add 22 | |||
bytes of overhead. | bytes of overhead. | |||
We consider two scenarios, applying SFrame per-frame and per-packet. | We consider two scenarios: applying SFrame per frame and per packet. | |||
In each scenario, we compute the SFrame overhead in absolute terms | In each scenario, we compute the SFrame overhead in absolute terms | |||
(Kbps) and as a percentage of the base bandwidth. | (kbps) and as a percentage of the base bandwidth. | |||
C.2. Audio | B.2. Audio | |||
In audio streams, there is typically a one-to-one relationship | In audio streams, there is typically a one-to-one relationship | |||
between frames and packets, so the overhead is the same whether one | between frames and packets, so the overhead is the same whether one | |||
uses SFrame at a per-packet or per-frame level. | uses SFrame at a per-packet or per-frame level. | |||
The below table considers three scenarios, based on recommended | Table 4 considers three scenarios that are based on recommended | |||
configurations of the Opus codec [RFC6716]: | configurations of the Opus codec [RFC6716] (where "fps" stands for | |||
"frames per second"): | ||||
* Narrow-band speech: 120ms packets, 8Kbps | ||||
* Full-band speech: 20ms packets, 32Kbps | ||||
* Full-band stereo music: 10ms packets, 128Kbps | +==============+==============+=====+======+==========+==========+ | |||
+===============+=====+===========+===============+============+ | | Scenario | Frame length | fps | Base | Overhead | Overhead | | |||
| Scenario | fps | Base Kbps | Overhead Kbps | Overhead % | | | | | | kbps | kbps | % | | |||
+===============+=====+===========+===============+============+ | +==============+==============+=====+======+==========+==========+ | |||
| NB speech, | 8.3 | 8 | 1.4 | 17.9% | | | Narrow-band | 120 ms | 8.3 | 8 | 1.4 | 17.9% | | |||
| 120ms packets | | | | | | | speech | | | | | | | |||
+---------------+-----+-----------+---------------+------------+ | +--------------+--------------+-----+------+----------+----------+ | |||
| FB speech, | 50 | 32 | 8.6 | 26.9% | | | Full-band | 20 ms | 50 | 32 | 8.6 | 26.9% | | |||
| 20ms packets | | | | | | | speech | | | | | | | |||
+---------------+-----+-----------+---------------+------------+ | +--------------+--------------+-----+------+----------+----------+ | |||
| FB stereo, | 100 | 128 | 17.2 | 13.4% | | | Full-band | 10 ms | 100 | 128 | 17.2 | 13.4% | | |||
| 10ms packets | | | | | | | stereo music | | | | | | | |||
+---------------+-----+-----------+---------------+------------+ | +--------------+--------------+-----+------+----------+----------+ | |||
Table 4: SFrame overhead for audio streams | Table 4: SFrame Overhead for Audio Streams | |||
C.3. Video | B.3. Video | |||
Video frames can be larger than an MTU and thus are commonly split | Video frames can be larger than an MTU and thus are commonly split | |||
across multiple frames. Table 5 and Table 6 show the estimated | across multiple frames. Tables 5 and 6 show the estimated overhead | |||
overhead of encrypting a video stream, where SFrame is applied per- | of encrypting a video stream, where SFrame is applied per frame and | |||
frame and per-packet, respectively. The choices of resolution, | per packet, respectively. The choices of resolution, frames per | |||
frames per second, and bandwidth are chosen to roughly reflect the | second, and bandwidth roughly reflect the capabilities of modern | |||
capabilities of modern video codecs across a range from very low to | video codecs across a range from very low to very high quality. | |||
very high quality. | ||||
+=============+=====+===========+===============+============+ | +=============+=====+===========+===============+============+ | |||
| Scenario | fps | Base Kbps | Overhead Kbps | Overhead % | | | Scenario | fps | Base kbps | Overhead kbps | Overhead % | | |||
+=============+=====+===========+===============+============+ | +=============+=====+===========+===============+============+ | |||
| 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | | | 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | | |||
+-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| 640 x 360 | 15 | 200 | 2.6 | 1.3% | | | 640 x 360 | 15 | 200 | 2.6 | 1.3% | | |||
+-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| 640 x 360 | 30 | 400 | 5.2 | 1.3% | | | 640 x 360 | 30 | 400 | 5.2 | 1.3% | | |||
+-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| 1280 x 720 | 30 | 1500 | 5.2 | 0.3% | | | 1280 x 720 | 30 | 1500 | 5.2 | 0.3% | | |||
+-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| 1920 x 1080 | 60 | 7200 | 10.3 | 0.1% | | | 1920 x 1080 | 60 | 7200 | 10.3 | 0.1% | | |||
+-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
Table 5: SFrame overhead for a video stream encrypted per- | Table 5: SFrame Overhead for a Video Stream Encrypted per | |||
frame | Frame | |||
+=============+=====+=====+===========+===============+============+ | +==========+=====+==============+======+==========+==========+ | |||
| Scenario | fps | pps | Base Kbps | Overhead Kbps | Overhead % | | | Scenario | fps | Packets per | Base | Overhead | Overhead | | |||
+=============+=====+=====+===========+===============+============+ | | | | Second (pps) | kbps | kbps | % | | |||
| 426 x 240 | 7.5 | 7.5 | 45 | 1.3 | 2.9% | | +==========+=====+==============+======+==========+==========+ | |||
+-------------+-----+-----+-----------+---------------+------------+ | | 426 x | 7.5 | 7.5 | 45 | 1.3 | 2.9% | | |||
| 640 x 360 | 15 | 30 | 200 | 5.2 | 2.6% | | | 240 | | | | | | | |||
+-------------+-----+-----+-----------+---------------+------------+ | +----------+-----+--------------+------+----------+----------+ | |||
| 640 x 360 | 30 | 60 | 400 | 10.3 | 2.6% | | | 640 x | 15 | 30 | 200 | 5.2 | 2.6% | | |||
+-------------+-----+-----+-----------+---------------+------------+ | | 360 | | | | | | | |||
| 1280 x 720 | 30 | 180 | 1500 | 30.9 | 2.1% | | +----------+-----+--------------+------+----------+----------+ | |||
+-------------+-----+-----+-----------+---------------+------------+ | | 640 x | 30 | 60 | 400 | 10.3 | 2.6% | | |||
| 1920 x 1080 | 60 | 780 | 7200 | 134.1 | 1.9% | | | 360 | | | | | | | |||
+-------------+-----+-----+-----------+---------------+------------+ | +----------+-----+--------------+------+----------+----------+ | |||
| 1280 x | 30 | 180 | 1500 | 30.9 | 2.1% | | ||||
| 720 | | | | | | | ||||
+----------+-----+--------------+------+----------+----------+ | ||||
| 1920 x | 60 | 780 | 7200 | 134.1 | 1.9% | | ||||
| 1080 | | | | | | | ||||
+----------+-----+--------------+------+----------+----------+ | ||||
Table 6: SFrame overhead for a video stream encrypted per-packet | Table 6: SFrame Overhead for a Video Stream Encrypted per | |||
Packet | ||||
In the per-frame case, the SFrame percentage overhead approaches zero | In the per-frame case, the SFrame percentage overhead approaches zero | |||
as the quality of the video goes up, since bandwidth is driven more | as the quality of the video improves since bandwidth is driven more | |||
by picture size than frame rate. In the per-packet case, the SFrame | by picture size than frame rate. In the per-packet case, the SFrame | |||
percentage overhead approaches the ratio between the SFrame overhead | percentage overhead approaches the ratio between the SFrame overhead | |||
per packet and the MTU (here 22 bytes of SFrame overhead divided by | per packet and the MTU (here 22 bytes of SFrame overhead divided by | |||
an assumed 1200-byte MTU, or about 1.8%). | an assumed 1200-byte MTU, or about 1.8%). | |||
C.4. Conferences | B.4. Conferences | |||
Real conferences usually involve several audio and video streams. | Real conferences usually involve several audio and video streams. | |||
The overhead of SFrame in such a conference is the aggregate of the | The overhead of SFrame in such a conference is the aggregate of the | |||
overhead over all the individual streams. Thus, while SFrame incurs | overhead across all the individual streams. Thus, while SFrame | |||
a large percentage overhead on an audio stream, if the conference | incurs a large percentage overhead on an audio stream, if the | |||
also involves a video stream, then the audio overhead is likely | conference also involves a video stream, then the audio overhead is | |||
negligible relative to the overall bandwidth of the conference. | likely negligible relative to the overall bandwidth of the | |||
conference. | ||||
For example, Table 7 shows the overhead estimates for a two person | For example, Table 7 shows the overhead estimates for a two-person | |||
conference where one person is sending low-quality media and the | conference where one person is sending low-quality media and the | |||
other sending high-quality. (And we assume that SFrame is applied | other is sending high-quality media. (And we assume that SFrame is | |||
per-frame.) The video streams dominate the bandwidth at the SFU, so | applied per frame.) The video streams dominate the bandwidth at the | |||
the total bandwidth overhead is only around 1%. | SFU, so the total bandwidth overhead is only around 1%. | |||
+=====================+===========+===============+============+ | +=====================+===========+===============+============+ | |||
| Stream | Base Kbps | Overhead Kbps | Overhead % | | | Stream | Base Kbps | Overhead Kbps | Overhead % | | |||
+=====================+===========+===============+============+ | +=====================+===========+===============+============+ | |||
| Participant 1 audio | 8 | 1.4 | 17.9% | | | Participant 1 audio | 8 | 1.4 | 17.9% | | |||
+---------------------+-----------+---------------+------------+ | +---------------------+-----------+---------------+------------+ | |||
| Participant 1 video | 45 | 1.3 | 2.9% | | | Participant 1 video | 45 | 1.3 | 2.9% | | |||
+---------------------+-----------+---------------+------------+ | +---------------------+-----------+---------------+------------+ | |||
| Participant 2 audio | 32 | 9 | 26.9% | | | Participant 2 audio | 32 | 9 | 26.9% | | |||
+---------------------+-----------+---------------+------------+ | +---------------------+-----------+---------------+------------+ | |||
| Participant 2 video | 1500 | 5 | 0.3% | | | Participant 2 video | 1500 | 5 | 0.3% | | |||
+---------------------+-----------+---------------+------------+ | +---------------------+-----------+---------------+------------+ | |||
| Total at SFU | 1585 | 16.5 | 1.0% | | | Total at SFU | 1585 | 16.5 | 1.0% | | |||
+---------------------+-----------+---------------+------------+ | +---------------------+-----------+---------------+------------+ | |||
Table 7: SFrame overhead for a two-person conference | Table 7: SFrame Overhead for a Two-Person Conference | |||
C.5. SFrame over RTP | B.5. SFrame over RTP | |||
SFrame is a generic encapsulation format, but many of the | SFrame is a generic encapsulation format, but many of the | |||
applications in which it is likely to be integrated are based on RTP. | applications in which it is likely to be integrated are based on RTP. | |||
This section discusses how an integration between SFrame and RTP | This section discusses how an integration between SFrame and RTP | |||
could be done, and some of the challenges that would need to be | could be done, and some of the challenges that would need to be | |||
overcome. | overcome. | |||
As discussed in Section 4.1, there are two natural patterns for | As discussed in Section 4.1, there are two natural patterns for | |||
integrating SFrame into an application: applying SFrame per-frame or | integrating SFrame into an application: applying SFrame per frame or | |||
per-packet. In RTP-based applications, applying SFrame per-packet | per packet. In RTP-based applications, applying SFrame per packet | |||
means that the payload of each RTP packet will be an SFrame | means that the payload of each RTP packet will be an SFrame | |||
ciphertext, starting with an SFrame Header, as shown in Figure 10. | ciphertext, starting with an SFrame header, as shown in Figure 11. | |||
Applying SFrame per-frame means that different RTP payloads will have | Applying SFrame per frame means that different RTP payloads will have | |||
different formats: The first payload of a frame will contain the | different formats: The first payload of a frame will contain the | |||
SFrame headers, and subsequent payloads will contain further chunks | SFrame headers, and subsequent payloads will contain further chunks | |||
of the ciphertext, as shown in Figure 11. | of the ciphertext, as shown in Figure 12. | |||
In order for these media payloads to be properly interpreted by | In order for these media payloads to be properly interpreted by | |||
receivers, receivers will need to be configured to know which of the | receivers, receivers will need to be configured to know which of the | |||
above schemes the sender has applied to a given sequence of RTP | above schemes the sender has applied to a given sequence of RTP | |||
packets. SFrame does not provide a mechanism for distributing this | packets. SFrame does not provide a mechanism for distributing this | |||
configuration information. In applications that use SDP for | configuration information. In applications that use SDP for | |||
negotiating RTP media streams [RFC8866], an appropriate extension to | negotiating RTP media streams [RFC8866], an appropriate extension to | |||
SDP could provide this function. | SDP could provide this function. | |||
Applying SFrame per-frame also requires that packetization and | Applying SFrame per frame also requires that packetization and | |||
depacketization be done in a generic manner that does not depend on | depacketization be done in a generic manner that does not depend on | |||
the media content of the packets, since the content being packetized | the media content of the packets, since the content being packetized | |||
/ depacketized will be opaque ciphertext (except for the SFrame | or depacketized will be opaque ciphertext (except for the SFrame | |||
header). In order for such a generic packetization scheme to work | header). In order for such a generic packetization scheme to work | |||
interoperably one would have to be defined, e.g., as proposed in | interoperably, one would have to be defined, e.g., as proposed in | |||
[I-D.codec-agnostic-rtp-payload-format]. | [RTP-PAYLOAD]. | |||
+---+-+-+-------+-+-------------+-------------------------------+<-+ | +---+-+-+-------+-+-----------+------------------------------+<-+ | |||
|V=2|P|X| CC |M| PT | sequence number | | | |V=2|P|X| CC |M| PT | sequence number | | | |||
+---+-+-+-------+-+-------------+-------------------------------+ | | +---+-+-+-------+-+-----------+------------------------------+ | | |||
| timestamp | | | | timestamp | | | |||
+---------------------------------------------------------------+ | | +------------------------------------------------------------+ | | |||
| synchronization source (SSRC) identifier | | | | synchronization source (SSRC) identifier | | | |||
+===============================================================+ | | +============================================================+ | | |||
| contributing source (CSRC) identifiers | | | | contributing source (CSRC) identifiers | | | |||
| .... | | | | .... | | | |||
+---------------------------------------------------------------+ | | +------------------------------------------------------------+ | | |||
| RTP extension(s) (OPTIONAL) | | | | RTP extension(s) (OPTIONAL) | | | |||
+->+--------------------+------------------------------------------+ | | +->+-------------------+----------------------------------------+ | | |||
| | SFrame header | | | | | | SFrame header | | | | |||
| +--------------------+ | | | | +-------------------+ | | | |||
| | | | | | | | | | |||
| | SFrame encrypted and authenticated payload | | | | | SFrame encrypted and authenticated payload | | | |||
| | | | | | | | | | |||
+->+---------------------------------------------------------------+<-+ | +->+------------------------------------------------------------+<-+ | |||
| | SRTP authentication tag | | | | | SRTP authentication tag | | | |||
| +---------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | | | | | |||
+--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ | +--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ | |||
Figure 10: SRTP packet with SFrame-protected payload | Figure 11: SRTP Packet with SFrame-Protected Payload | |||
+----------------+ +---------------+ | +----------------+ +---------------+ | |||
| frame metadata | | | | | frame metadata | | | | |||
+-------+--------+ | | | +-------+--------+ | | | |||
| | frame | | | | frame | | |||
| | | | | | | | |||
| | | | | | | | |||
| +-------+-------+ | | +-------+-------+ | |||
| | | | | | |||
| | | | | | |||
skipping to change at page 41, line 43 ¶ | skipping to change at line 1703 ¶ | |||
| | | | | | | | | | |||
V V V V | V V V V | |||
+---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
| SFrame header | | | | | | | SFrame header | | | | | | |||
+---------------+ | | | | | +---------------+ | | | | | |||
| | | payload 2/N | ... | payload N/N | | | | | payload 2/N | ... | payload N/N | | |||
| payload 1/N | | | | | | | payload 1/N | | | | | | |||
| | | | | | | | | | | | | | |||
+---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
Figure 11: Encryption flow with per-frame encryption for RTP | Figure 12: Encryption Flow with per-Frame Encryption for RTP | |||
Appendix D. Test Vectors | Appendix C. Test Vectors | |||
This section provides a set of test vectors that implementations can | This section provides a set of test vectors that implementations can | |||
use to verify that they correctly implement SFrame encryption and | use to verify that they correctly implement SFrame encryption and | |||
decryption. In addition to test vectors for the overall process of | decryption. In addition to test vectors for the overall process of | |||
SFrame encryption/decryption, we also provide test vectors for header | SFrame encryption/decryption, we also provide test vectors for header | |||
encoding/decoding, and for AEAD encryption/decryption using the AES- | encoding/decoding, and for AEAD encryption/decryption using the AES- | |||
CTR construction defined in Section 4.5.1. | CTR construction defined in Section 4.5.1. | |||
All values are either numeric or byte strings. Numeric values are | All values are either numeric or byte strings. Numeric values are | |||
represented as hex values, prefixed with 0x. Byte strings are | represented as hex values, prefixed with 0x. Byte strings are | |||
skipping to change at page 42, line 18 ¶ | skipping to change at line 1727 ¶ | |||
Line breaks and whitespace within values are inserted to conform to | Line breaks and whitespace within values are inserted to conform to | |||
the width requirements of the RFC format. They should be removed | the width requirements of the RFC format. They should be removed | |||
before use. | before use. | |||
These test vectors are also available in JSON format at | These test vectors are also available in JSON format at | |||
[TestVectors]. In the JSON test vectors, numeric values are JSON | [TestVectors]. In the JSON test vectors, numeric values are JSON | |||
numbers and byte string values are JSON strings containing the hex | numbers and byte string values are JSON strings containing the hex | |||
encoding of the byte strings. | encoding of the byte strings. | |||
D.1. Header encoding/decoding | C.1. Header Encoding/Decoding | |||
For each case, we provide: | For each case, we provide: | |||
* kid: A KID value | * kid: A KID value | |||
* ctr: A CTR value | * ctr: A CTR value | |||
* header: An encoded SFrame header | * header: An encoded SFrame header | |||
An implementation should verify that: | An implementation should verify that: | |||
skipping to change at page 66, line 46 ¶ | skipping to change at line 2905 ¶ | |||
kid: 0xffffffffffffffff | kid: 0xffffffffffffffff | |||
ctr: 0x0100000000000000 | ctr: 0x0100000000000000 | |||
header: ffffffffffffffffff01000000000000 | header: ffffffffffffffffff01000000000000 | |||
00 | 00 | |||
kid: 0xffffffffffffffff | kid: 0xffffffffffffffff | |||
ctr: 0xffffffffffffffff | ctr: 0xffffffffffffffff | |||
header: ffffffffffffffffffffffffffffffff | header: ffffffffffffffffffffffffffffffff | |||
ff | ff | |||
D.2. AEAD encryption/decryption using AES-CTR and HMAC | C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC | |||
For each case, we provide: | For each case, we provide: | |||
* cipher_suite: The index of the cipher suite in use (see | * cipher_suite: The index of the cipher suite in use (see | |||
Section 8.1) | Section 8.1) | |||
* key: The key input to encryption/decryption | * key: The key input to encryption/decryption | |||
* enc_key: The encryption subkey produced by the derive_subkeys() | * enc_key: The encryption subkey produced by the derive_subkeys() | |||
algorithm | algorithm | |||
skipping to change at page 68, line 33 ¶ | skipping to change at line 2980 ¶ | |||
enc_key: 000102030405060708090a0b0c0d0e0f | enc_key: 000102030405060708090a0b0c0d0e0f | |||
auth_key: 101112131415161718191a1b1c1d1e1f | auth_key: 101112131415161718191a1b1c1d1e1f | |||
202122232425262728292a2b2c2d2e2f | 202122232425262728292a2b2c2d2e2f | |||
nonce: 101112131415161718191a1b | nonce: 101112131415161718191a1b | |||
aad: 4945544620534672616d65205747 | aad: 4945544620534672616d65205747 | |||
pt: 64726166742d696574662d736672616d | pt: 64726166742d696574662d736672616d | |||
652d656e63 | 652d656e63 | |||
ct: 6339af04ada1d064688a442b8dc69d5b | ct: 6339af04ada1d064688a442b8dc69d5b | |||
6bfa40f4be09480509 | 6bfa40f4be09480509 | |||
D.3. SFrame encryption/decryption | C.3. SFrame Encryption/Decryption | |||
For each case, we provide: | For each case, we provide: | |||
* cipher_suite: The index of the cipher suite in use (see | * cipher_suite: The index of the cipher suite in use (see | |||
Section 8.1) | Section 8.1) | |||
* kid: A KID value | * kid: A KID value | |||
* ctr: A CTR value | * ctr: A CTR value | |||
skipping to change at page 73, line 5 ¶ | skipping to change at line 3148 ¶ | |||
metadata: 4945544620534672616d65205747 | metadata: 4945544620534672616d65205747 | |||
nonce: 84991c167b8cd23c9370cba0 | nonce: 84991c167b8cd23c9370cba0 | |||
aad: 99012345674945544620534672616d65 | aad: 99012345674945544620534672616d65 | |||
205747 | 205747 | |||
pt: 64726166742d696574662d736672616d | pt: 64726166742d696574662d736672616d | |||
652d656e63 | 652d656e63 | |||
ct: 990123456794f509d36e9beacb0e261d | ct: 990123456794f509d36e9beacb0e261d | |||
99c7d1e972f1fed787d4049f17ca2135 | 99c7d1e972f1fed787d4049f17ca2135 | |||
3c1cc24d56ceabced279 | 3c1cc24d56ceabced279 | |||
Acknowledgements | ||||
The authors wish to specially thank Dr. Alex Gouaillard as one of the | ||||
early contributors to the document. His passion and energy were key | ||||
to the design and development of SFrame. | ||||
Contributors | Contributors | |||
Frederic Jacobs | Frédéric Jacobs | |||
Apple | Apple | |||
Email: frederic.jacobs@apple.com | Email: frederic.jacobs@apple.com | |||
Marta Mularczyk | Marta Mularczyk | |||
Amazon | Amazon | |||
Email: mulmarta@amazon.com | Email: mulmarta@amazon.com | |||
Suhas Nandakumar | Suhas Nandakumar | |||
Cisco | Cisco | |||
Email: snandaku@cisco.com | Email: snandaku@cisco.com | |||
skipping to change at page 73, line 34 ¶ | skipping to change at line 3183 ¶ | |||
Phoenix R&D | Phoenix R&D | |||
Email: ietf@raphaelrobert.com | Email: ietf@raphaelrobert.com | |||
Authors' Addresses | Authors' Addresses | |||
Emad Omara | Emad Omara | |||
Apple | Apple | |||
Email: eomara@apple.com | Email: eomara@apple.com | |||
Justin Uberti | Justin Uberti | |||
Fixie.ai | ||||
Email: juberti@google.com | Email: justin@fixie.ai | |||
Sergio Garcia Murillo | Sergio Garcia Murillo | |||
CoSMo Software | CoSMo Software | |||
Email: sergio.garcia.murillo@cosmosoftware.io | Email: sergio.garcia.murillo@cosmosoftware.io | |||
Richard L. Barnes (editor) | Richard Barnes (editor) | |||
Cisco | Cisco | |||
Email: rlb@ipv.sx | Email: rlb@ipv.sx | |||
Youenn Fablet | Youenn Fablet | |||
Apple | Apple | |||
Email: youenn@apple.com | Email: youenn@apple.com | |||
End of changes. 198 change blocks. | ||||
515 lines changed or deleted | 511 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |