Internet Engineering Task Force (IETF) M. Thornburgh Request for Comments: 7016 Adobe Category: InformationalSeptemberNovember 2013 ISSN: 2070-1721 Adobe's Secure Real-Time Media Flow Protocol Abstract This memo describes Adobe's Secure Real-Time Media Flow Protocol (RTMFP), an endpoint-to-endpoint communication protocol designed to securely transport parallel flows of real-time video, audio, and data messages, as well as bulk data, over IP networks. RTMFP has features that make it effective for peer-to-peer (P2P) as well as client- server communications, even when Network Address Translators (NATs) are used. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). Itrepresents the consensus of the IETF community. It has received public review andhas been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7016. IESG Note This document represents technology developed outside the processes of the IETF and the IETF community has determined that it is useful to publish it as an RFC in its current form. It is a product of the IETF only in that it has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG), but the content of the document does not represent a consensus of the IETF. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may not be modified, and derivative works of it may not be created, except to format it for publication as an RFC or to translate it into languages other than English. Table of Contents 1. Introduction ....................................................5 1.1. Design Highlights of RTMFP .................................6 1.2. Terminology ................................................7 2. Syntax ..........................................................8 2.1. Common Elements ............................................8 2.1.1. Elementary Types and Constructs .....................8 2.1.2. Variable Length Unsigned Integer (VLU) .............10 2.1.3. Option .............................................10 2.1.4. Option List ........................................11 2.1.5. Internet Socket Address (Address) ..................12 2.2. Network Layer .............................................13 2.2.1. Encapsulation ......................................13 2.2.2. Multiplex ..........................................13 2.2.3. Encryption .........................................14 2.2.4. Packet .............................................15 2.3. Chunks ....................................................18 2.3.1. Packet Fragment Chunk ..............................20 2.3.2. Initiator Hello Chunk (IHello) .....................21 2.3.3. Forwarded Initiator Hello Chunk (FIHello) ..........22 2.3.4. Responder Hello Chunk (RHello) .....................23 2.3.5. Responder Redirect Chunk (Redirect) ................24 2.3.6. RHello Cookie Change Chunk .........................26 2.3.7. Initiator Initial Keying Chunk (IIKeying) ..........27 2.3.8. Responder Initial Keying Chunk (RIKeying) ..........29 2.3.9. Ping Chunk .........................................31 2.3.10. Ping Reply Chunk ..................................32 2.3.11. User Data Chunk ...................................33 2.3.11.1. Options for User Data ....................35 2.3.11.1.1. User's Per-Flow Metadata ......35 2.3.11.1.2. Return Flow Association .......36 2.3.12. Next User Data Chunk ..............................37 2.3.13. Data Acknowledgement Bitmap Chunk (Bitmap Ack) ....39 2.3.14. Data Acknowledgement Ranges Chunk (Range Ack) .....41 2.3.15. Buffer Probe Chunk ................................43 2.3.16. Flow Exception Report Chunk .......................43 2.3.17. Session Close Request Chunk (Close) ...............44 2.3.18. Session Close Acknowledgement Chunk (Close Ack) ...44 3. Operation ......................................................45 3.1. Overview ..................................................45 3.2. Endpoint Identity .........................................46 3.3. Packet Multiplex ..........................................48 3.4. Packet Fragmentation ......................................48 3.5. Sessions ..................................................50 3.5.1. Startup ............................................53 3.5.1.1. Normal Handshake ..........................53 3.5.1.1.1. Initiator ......................54 3.5.1.1.2. Responder ......................55 3.5.1.2. Cookie Change .............................57 3.5.1.3. Glare .....................................59 3.5.1.4. Redirector ................................60 3.5.1.5. Forwarder .................................61 3.5.1.6. Redirector and Forwarder with NAT .........63 3.5.1.7. Load Distribution and Fault Tolerance .....66 3.5.2. Congestion Control .................................67 3.5.2.1. Time Critical Reverse Notification ........68 3.5.2.2. Retransmission Timeout ....................68 3.5.2.3. Burst Avoidance ...........................71 3.5.3. Address Mobility ...................................71 3.5.4. Ping ...............................................72 3.5.4.1. Keepalive .................................72 3.5.4.2. Address Mobility ..........................73 3.5.4.3. Path MTU Discovery ........................74 3.5.5. Close ..............................................74 3.6. Flows .....................................................75 3.6.1. Overview ...........................................75 3.6.1.1. Identity ..................................75 3.6.1.2. Messages and Sequencing ...................76 3.6.1.3. Lifetime ..................................77 3.6.2. Sender .............................................78 3.6.2.1. Startup ...................................80 3.6.2.2. Queuing Data ..............................80 3.6.2.3. Sending Data ..............................81 3.6.2.3.1. Startup Options ................83 3.6.2.3.2. Send Next Data .................83 3.6.2.4. Processing Acknowledgements ...............83 3.6.2.5. Negative Acknowledgement and Loss .........84 3.6.2.6. Timeout ...................................85 3.6.2.7. Abandoning Data ...........................86 3.6.2.7.1. Forward Sequence Number Update .........................86 3.6.2.8. Examples ..................................87 3.6.2.9. Flow Control ..............................89 3.6.2.9.1. Buffer Probe ...................89 3.6.2.10. Exception ................................89 3.6.2.11. Close ....................................90 3.6.3. Receiver ...........................................90 3.6.3.1. Startup ...................................93 3.6.3.2. Receiving Data ............................94 3.6.3.3. Buffering and Delivering Data .............95 3.6.3.4. Acknowledging Data ........................97 3.6.3.4.1. Timing .........................98 3.6.3.4.2. Size and Truncation ............99 3.6.3.4.3. Constructing ...................99 3.6.3.4.4. Delayed Acknowledgement .......100 3.6.3.4.5. Obligatory Acknowledgement ....100 3.6.3.4.6. Opportunistic Acknowledgement ...............100 3.6.3.4.7. Example .......................101 3.6.3.5. Flow Control .............................102 3.6.3.6. Receiving a Buffer Probe .................103 3.6.3.7. Rejecting a Flow .........................103 3.6.3.8. Close ....................................104 4. IANA Considerations ...........................................104 5. Security Considerations .......................................105 6. Acknowledgements ..............................................106 7. References ....................................................107 7.1. Normative References .....................................107 7.2. Informative References ...................................107 Appendix A. Example Congestion Control Algorithm .................108 A.1. Discussion ................................................108 A.2. Algorithm .................................................110 1. Introduction Adobe's Secure Real-Time Media Flow Protocol (RTMFP) is intended for use as a general purpose endpoint-to-endpoint data transport service in IP networks. It has features that make it well suited to the transport of real-time media (such as low-delay video, audio, and data) as well as bulk data, and for client-server as well as peer-to- peer (P2P) communication. These features include independent parallel message flows that may have different delivery priorities, variable message reliability (from TCP-like full reliability to UDP-like best effort), multi-point congestion control, and built-in security. Session multiplexing and facilities to support UDP hole-punching simplify Network Address Translator (NAT) traversal in peer-to-peer systems. RTMFP is implemented in Flash Player, Adobe Integrated Runtime (AIR), and Adobe Media Server (AMS, formerly Flash Media Server or FMS), all from Adobe Systems Incorporated, and is used as the foundation transport protocol for real-time video, audio, and data communication, both client-server and P2P, in those products. At the time of writing, the Adobe Flash Player runtime is installed on more than one billion end-user desktop computers. RTMFP was developed by Adobe Systems Incorporated and is not the product of an IETF activity. This memo describes the syntax and operation of the Secure Real-Time Media Flow Protocol. This memo describes a general security framework that, when combined with an application-specific Cryptography Profile, can be used to establish a confidential and authenticated session between endpoints. The application-specific Cryptography Profile, not defined herein, would detail the specific cryptographic algorithms, data formats, and semantics to be used within this framework. Interoperation between applications of RTMFP requires common or compatible Cryptography Profiles. Note to implementers: at the time of writing, the Cryptography Profile used by the above-mentioned Adobe products is not publicly described by Adobe. Implementers should investigate the availability of documentation of that Cryptography Profile prior to implementing RTMFP for the purpose of interoperation with the above-mentioned Adobe products. 1.1. Design Highlights of RTMFP Between any pair of communicating endpoints is a single, bidirectional, secured, congestion controlled session. Unidirectional flows convey messages from one end to the other within the session. An endpoint can have concurrent sessions with multiple other far endpoints. Design highlights of RTMFP include the following: o The security framework is an inherent part of the basic protocol. The application designer chooses the cryptographic formats and algorithms to suit the needs of the application, and may update them as the state of the security arts progresses. o Cryptographic Endpoint Discriminators can resist port scanning. o All header, control, and framing information, except for network addressing information and a session identifier, is encrypted according to the Cryptography Profile. o There is a single session and associated congestion control state between a pair of endpoints. o Each session may have zero or more unidirectional message-oriented flows in each direction. All of a session's sending flows share the session's congestion control state. o Return Flow Association (Section 2.3.11.1.2) generalizes bidirectional communication to arbitrarily complex trees of flows. o Messages in flows can be arbitrarily large and are fragmented for transmission. o Messages of any size may be sent with full, partial, or no reliability (sender's choice). Messages may be delivered to the receiving user in original queuing order or network arrival order (receiver's choice). o Flows are named with arbitrary, user-defined metadata (Section 2.3.11.1.1) rather than port or stream numbers. o The sequence numbers of each flow are independent of all other flows and are not permanently bound to a session-wide transmission ordering. This allows real-time priority decisions to be made at transmission or retransmission time. o Each flow has its own receive window and, therefore, independent flow control. o Round trips are expensive and are minimized or eliminated when possible. o After a session is established, flows begin by sending the flow's messages with no additional handshake (and associated round trips). o Transmitting bytes on the network is much more expensive than moving bytes in a CPU or memory. Wasted bytes are minimized or eliminated when possible and practical, and variable length encodings are used, even at the expense of breaking 32-bit alignment and making the text diagrams in this specification look awkward. o P2P lookup and peer introduction (including UDP hole-punching for NAT and firewall traversal) are supported directly by the session startup handshake. o Session identifiers allow an endpoint to multiplex many sessions over a single local transport address while allowing sessions to survive changes in transport address (as may happen in mobile or wireless deployments). The syntax of the protocol is detailed in Section 2. The operation of the protocol is detailed in Section 3. 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Syntax Definitions of types and structures in this specification use traditional text diagrams paired with procedural descriptions using a C-like syntax. The C-like procedural descriptions SHALL be construed as definitive. Structures are packed to take only as many bytes as explicitly indicated. There is no 32-bit alignment constraint, and fields are not padded for alignment unless explicitly indicated or described. Text diagrams may include a bit ruler across the top; this is a convenience for counting bits in individual fields and does not necessarily imply field alignment on a multiple of the ruler width. Unless specified otherwise, reserved fields SHOULD be set to 0 by a sender and MUST be ignored by a receiver. The procedural syntax of this specification defines correct and error-free encoded inputs to a parser. The procedural syntax does not describe a fully featured parser, including error detection and handling. Implementations MUST include means to identify error circumstances, including truncations causing elementary or composed types to not fit inside containing structures, fields, or elements. Unless specified otherwise, an error circumstance SHALL abort the parsing and processing of an element and its enclosing elements, up to the containing packet. 2.1. Common Elements This section lists types and structures that are used throughout this specification. 2.1.1. Elementary Types and Constructs This section lists the elementary types and constructs out of which all of the following sections' definitions are built. uint8_t var; An unsigned integer 8 bits (one byte) in length and byte aligned. uint16_t var; An unsigned integer 16 bits in length, in network byte order ("big endian") and byte aligned. uint32_t var; An unsigned integer 32 bits in length, in network byte order and byte aligned. uint128_t var; An unsigned integer 128 bits in length, in network byte order and byte aligned. uintn_t var :bitsize; An unsigned integer of any other size, potentially not byte aligned. Its size in bits is specified explicitly by bitsize. bool_t var :1; A boolean flag having the value true (1 or set) or false (0 or clear) and being one bit in length. type var[num]; A packed array of type with length num*sizeof(type)*8 bits. struct name_t { ... } name :bitsize; A packed structure. Its size in bits is specified by bitsize. remainder(); The number of bytes from the current offset to the end of the enclosing structure. type var[remainder()]; A packed array of type, its size extending to the end of the enclosing structure. Note that a bitsize of "variable" indicates that the size of the structure is determined by the sizes of its interior components. A bitsize of "n*8" indicates that the size of the structure is a whole number of bytes and is byte aligned. 2.1.2. Variable Length Unsigned Integer (VLU) A VLU encodes any finite non-negative integer into one or more bytes. For each encoded byte, if the high bit is set, the next byte is also part of the VLU. If the high bit is clear, this is the final byte of the VLU. The remaining bits encode the number, seven bits at a time, from most significant to least significant. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +~+~+~+~+~+~+~+~+ +-+-+-+-+-+-+-+-+ |1| digit |...............|0| digit | +~+~+~+~+~+~+~+~+ +-+-+-+-+-+-+-+-+ ^ ^ +--------- zero or more --------+ struct vlu_t { value = 0; do { bool_t more :1; uintn_t digit :7; value = (value * 128) + digit; } while(more); } :variable*8; +-------------/-+ | \ | +-------------/-+ Figure 1: VLU Depiction in Following Diagrams Unless stated otherwise in this specification, implementations SHOULD handle VLUs encoding unsigned integers at least 64 bits in length (that is, encoding a maximum value of at least 2^64 - 1). 2.1.3. Option An Option is a Length-Type-Value triplet. Length and Type are encoded in VLU format. Length is the number of bytes of payload following the Length field. The payload comprises the Type and Value fields. Type identifies the kind of option this is. The syntax of the Value field is determined by the type of option. An Option can have a length of zero, in which case it has no type and no value and is empty. An empty Option is called a "Marker". +-------------/-+~~~~~~~~~~~~~/~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | length \ | type \ | value | +-------------/-+~~~~~~~~~~~~~/~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ ^ ^ +-------- length bytes long (may be 0) ---------+ struct option_t { vlu_t length :variable*8; // "L" if(length > 0) { struct { vlu_t type :variable*8; // "T" uint8_t value[remainder()]; // "V" } payload :length*8; } } :variable*8; +---/---/-------+ | L \ T \ V | +---/---/-------+ Figure 2: Option Depiction in Following Diagrams 2.1.4. Option List An Option List is a sequence of zero or more non-empty Options terminated by a Marker. +~~~/~~~/~~~~~~~+ +~~~/~~~/~~~~~~~+-------------/-+ | L \ T \ V |...............| L \ T \ V | 0 \ | +~~~/~~~/~~~~~~~+ +~~~/~~~/~~~~~~~+-------------/-+ ^ ^ Marker +------- zero or more non-empty Options --------+ (empty Option) struct optionList_t { do { option_t option :variable*8; } while(option.length > 0); } :variable*8; 2.1.5. Internet Socket Address (Address) When communicating an Internet socket address (a combination of a 32-bit IPv4 [RFC0791] or 128-bit IPv6 [RFC2460] address and a 16-bit port number) to another RTMFP, this encoding is used. This encoding additionally allows an address to be tagged with an origin type, which an RTMFP MAY use to modify the use or disposition of the address. 1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-----/.../-----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |I| | O | Internet | | |P|0 0 0 0 0| R | address | port | |6| rsv | I |32 or 128 bits | | +-+-+-+-+-+-+-+-+-----/.../-----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ struct address_t { bool_t inet6 :1; // "IP6" uintn_t reserved :5 = 0; // "rsv" uintn_t origin :2; // "ORI" if(inet6) uint128_t ipAddress; else uint32_t ipAddress; uint16_t port; } :variable*8; inet6: If set, the Internet address is a 128-bit IPv6 address. If clear, the Internet address is a 32-bit IPv4 address. origin: The origin tag of this address. Possible values are: 0: Unknown, unspecified, or "other" 1: Address was reported by the origin as a local, directly attached interface address 2: Address was observed to be the source address from which a packet was received (a "reflexive transport address" in the terminology of [RFC5389]) 3: Address is a relay, proxy, or introducer (a Redirector and/or Forwarder) ipAddress: The Internet address, in network byte order. port: The 16-bit port number, in network byte order. 2.2. Network Layer 2.2.1. Encapsulation RTMFP Multiplex packets are usually carried in UDP [RFC0768] datagrams so that they may transit commonly deployed NATs and firewalls, and so that RTMFP may be implemented on commonly deployed operating systems without special privileges or permissions. RTMFP Multiplex packets MAY be carried by any suitable datagram transport or encapsulation where endpoints are addressed by an Internet socket address (that is, an IPv4 or IPv6 address and a 16-bit port number). The choice of port numbers is not mandated by this specification. Higher protocol layers or the application define the port numbers used. 2.2.2. Multiplex 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scrambled Session ID (SSID) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | e first32[0] | |- - - - - - n - - - - - - - - - - - - - - - - - - - - - - - -| | c first32[1] | +- - - - - - r - - - - - - - - - - - - - - - - - - - - - - - -+ | y | | pted packet | +---------------------------------------------------------------/ struct multiplex_t { uint32_t scrambledSessionID; // "SSID" union { uint32_t first32[2]; // see note uint8_t encryptedPacket[remainder()]; } :(encapsulation.length - 4)*8; // if encryptedPacket is less than 8 bytes long, treat it // as if it were end-padded with 0s for the following: sessionID = scrambledSessionID XOR first32[0] XOR first32[1]; } :encapsulation.length*8; The 32-bit Scrambled Session ID is the 32-bit session ID modified by performing a bitwise exclusive-or with the bitwise exclusive-or of the first two 32-bit words of the encrypted packet. The session ID is a 32-bit value that the receiver has requested to be used by the sender when sending packets to this receiver (Sections 2.3.7 and 2.3.8). The session ID identifies the session to which this packet belongs and the decryption key to be used to decrypt the encrypted packet. Note: Session ID 0 (prior to scrambling) denotes the startup pseudo- session and implies the Default Session Key. Note: If the encrypted packet is less than 8 bytes long, then for the scrambling operation, perform the exclusive-or as though the encrypted packet were end-padded with enough 0-bytes to bring its length to 8. 2.2.3. Encryption RTMFP packets are encrypted according to a Cryptography Profile. This specification doesn't define a Cryptography Profile or mandate a particular choice of cryptography. The application defines the cryptographic syntax and algorithms. Packet encryption is RECOMMENDED to be a block cipher operating in Cipher Block Chaining [CBC] or similar mode. Encrypted packets MUST be decipherable without inter-packet dependency, since packets may be lost, duplicated, or reordered in the network. The packet encryption layer is responsible for data integrity and authenticity of packets, for example by means of a checksum or cryptographic message authentication code. To mitigate replay attacks, data integrity SHOULD comprise duplicate packet detection, for example by means of a session-wide packet sequence number. The packet encryption layer SHALL discard a received packet that does not pass integrity or authenticity tests. Note that the structures described below are of plain, unencrypted packets. Encrypted packets MUST be decrypted according to the Session Key associated with the Multiplex Session ID before being interpreted according to this specification. The Cryptography Profile defines a well-known Default Session Key that is used at session startup, during which per-session key(s) are negotiated by the two endpoints. A session ID of zero denotes use of the Default Session Key. The Default Session Key is also used with non-zero session IDs during the latter phases of session startup (Sections 2.3.6 and 2.3.8). See Security Considerations (Section 5) for more about the Default Session Key. 2.2.4. Packet An (unencrypted, plain) RTMFP packet consists of a variable sized common header, zero or more chunks, and padding. Padding can be inserted by the encryption layer of the sender to meet cipher block size constraints and is ignored by the receiver. A sender's encryption layer MAY pad the end of a packet with bytes with value 0xff such that the resulting packet is a natural and appropriate size for the cipher. Alternatively, the Cryptography Profile MAY define its own framing and padding scheme, if needed, such that decrypted packets are compatible with the syntax defined in this section. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |T|T| r |T|T| M | |C|C| s |S|S| O | | |R| v | |E| D | +-+-+-+-+-+-+-+-+ +~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+ | if(TS) timestamp | if(TSE) timestampEcho | +~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | Chunk | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ : : +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | Chunk | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | padding | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct packet_t { bool_t timeCritical :1; // "TC" bool_t timeCriticalReverse :1; // "TCR" uintn_t reserved :2; // "rsv" bool_t timestampPresent :1; // "TS" bool_t timestampEchoPresent :1; // "TSE" uintn_t mode :2; // "MOD" if(0 != mode) { if(timestampPresent) uint16_t timestamp; if(timestampEchoPresent) uint16_t timestampEcho; while(remainder() > 2) { uint8_t chunkType; uint16_t chunkLength; if(remainder() < chunkLength) break; uint8_t chunkPayload[chunkLength]; } // chunks uint8_t padding[remainder()]; } } :plainPacket.length*8; timeCritical: Time Critical Forward Notification. If set, indicates that this packet contains real-time user data. timeCriticalReverse: Time Critical Reverse Notification. If set, indicates that the sender is currently receiving packets on other sessions that have the timeCritical flag set. timestampPresent: If set, indicates that the timestamp field is present. If clear, there is no timestamp field. timestampEchoPresent: If set, indicates that the timestamp echo field is present. If clear, there is no timestamp echo field. mode: The mode of this packet. See below for additional discussion of packet modes. Possible values are: 0: Forbidden value 1: Initiator Mark 2: Responder Mark 3: Startup timestamp: If the timestampPresent flag is set, this field is present and contains the low 16 bits of the sender's 250 Hz clock (4 milliseconds per tick) at transmit time. The sender's clock MAY have its origin at any time in the past. timestampEcho: If the timestampEchoPresent flag is set, this field is present and contains the sender's estimate of what the timestamp field of a packet received from the other end would be at the time this packet was transmitted, using the method described in Section 3.5.2.2. chunks: Zero or more chunks follow the header. It is RECOMMENDED that a packet contain at least one chunk. padding: Zero or more bytes of padding follow the chunks. The following conditions indicate padding: * Fewer than three bytes (the size of a chunk header) remain in the packet. * The chunkLength field of what would be the current chunk header indicates that the hypothetical chunk payload wouldn't fit in the remaining bytes of the packet. Packet mode 0 is not allowed. Packets marked with this mode are invalid and MUST be discarded. The original initiator of a session MUST mark all non-startup packets it sends in that session with packet mode 1 ("Initiator Mark"). It SHOULD ignore any packet received in that session with packet mode 1. The original responder of a session MUST mark all non-startup packets it sends in that session with packet mode 2 ("Responder Mark"). It SHOULD ignore any packet received in that session with packet mode 2. Packet mode 3 is for session startup. Session startup chunks are only allowed in packets with this mode. Chunks that are not for session startup are only allowed in packets with modes 1 or 2. 2.3. Chunks 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | chunkType | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | chunkPayload (chunkLength bytes, may be zero) | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct chunk_t { uint8_t chunkType; uint16_t chunkLength; uint8_t chunkPayload[chunkLength]; } :variable*8; chunkType: The chunk type code. chunkLength: The size, in bytes, of the chunk payload. chunkPayload: The type-specific payload of this chunk, chunkLength bytes in length (may be empty). Defined chunk types are enumerated here in the order they might be encountered in the course of a typical session. The following chunk type codes are defined: 0x7f: Packet Fragment (Section 2.3.1) 0x30: Initiator Hello (Section 2.3.2) 0x0f: Forwarded Initiator Hello (Section 2.3.3) 0x70: Responder Hello (Section 2.3.4) 0x71: Responder Redirect (Section 2.3.5) 0x79: RHello Cookie Change (Section 2.3.6) 0x38: Initiator Initial Keying (Section 2.3.7) 0x78: Responder Initial Keying (Section 2.3.8) 0x01: Ping (Section 2.3.9) 0x41: Ping Reply (Section 2.3.10) 0x10: User Data (Section 2.3.11) 0x11: Next User Data (Section 2.3.12) 0x50: Data Acknowledgement Bitmap (Section 2.3.13) 0x51: Data Acknowledgement Ranges (Section 2.3.14) 0x18: Buffer Probe (Section 2.3.15) 0x5e: Flow Exception Report (Section 2.3.16) 0x0c: Session Close Request (Section 2.3.17) 0x4c: Session Close Acknowledgement (Section 2.3.18) 0x00: Ignore/Padding 0xff: Ignore/Padding A receiver MUST ignore a chunk having an unrecognized chunk type code. A receiver MUST ignore a chunk appearing in a packet having a mode inappropriate to that chunk type. Unless specified otherwise, if a chunk has a syntax or processing error (for example, the chunk's payload field is not long enough to contain the specified syntax elements), the chunk SHALL be ignored as though it was not present in the packet, and parsing and processing SHALL commence with the next chunk in the packet, if any. 2.3.1. Packet Fragment Chunk This chunk is used to divide a plain RTMFP packet (Section 2.2.4) that is unavoidably larger than the path MTU (such as session startup packets containing Responder Hello (Section 2.3.4) or Initiator Initial Keying (Section 2.3.7) chunks with large certificates) into segments that do not exceed the path MTU, and to allow the segments to be sent through the network at a moderated rate to avoid jamming interfaces, links, or paths. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x7f | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-------------/-+-------------/-+ |M| reserved | packetID \ | fragmentNum \ | +-+-+-+-+-+-+-+-+-------------/-+-------------/-+ +---------------------------------------------------------------+ | packetFragment | +---------------------------------------------------------------/ struct fragmentChunkPayload_t { bool_t moreFragments :1; // M uintn_t reserved :7; vlu_t packetID :variable*8; vlu_t fragmentNum :variable*8; uint8_t packetFragment[remainder()]; } :chunkLength*8; moreFragments: If set, the indicated packet comprises additional fragments. If clear, this fragment is the final fragment of the packet. reserved: Reserved for future use. packetID: VLU, the identifier of this segmented packet. All fragments of the same packet have the same packetID. fragmentNum: VLU, the index of this fragment of the indicated packet. The first fragment of the packet MUST be index 0. Fragments are numbered consecutively. packetFragment: The bytes of the indicated segment of the indicated original plain RTMFP packet. A packetFragment MUST NOT be empty. The use of this mechanism is detailed in Section 3.4. 2.3.2. Initiator Hello Chunk (IHello) This chunk is sent by the initiator of a new session to begin the startup handshake. This chunk is only allowed in a packet with Session ID 0, encrypted with the Default Session Key, and having packet mode 3 (Startup). 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x30 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | epdLength \ | endpointDiscriminator (epdLength bytes) | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | tag | +---------------------------------------------------------------/ struct ihelloChunkPayload_t { vlu_t epdLength :variable*8; uint8_t endpointDiscriminator[epdLength]; uint8_t tag[remainder()]; } :chunkLength*8; epdLength: VLU, the length of the following endpointDiscriminator field in bytes. endpointDiscriminator: The Endpoint Discriminator for the identity with which the initiator wants to communicate. tag: Initiator-provided data to be returned in a Responder Hello's tagEcho field. The tag/tagEcho is used to match Responder Hellos to the initiator's session startup state independent of the responder's address. The use of IHello is detailed in Section 3.5.1. 2.3.3. Forwarded Initiator Hello Chunk (FIHello) This chunk is sent on behalf of an initiator by a Forwarder. It is only allowed in packets of an established session having packet mode 1 or 2. A receiver MAY treat this chunk as though it was an Initiator Hello received directly from replyAddress. Alternatively, if the receiver is selected by the Endpoint Discriminator, it MAY respond to replyAddress with an Implied Redirect (Section 2.3.5). 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x0f | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | epdLength \ | endpointDiscriminator (epdLength bytes) | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | replyAddress | +---------------------------------------------------------------/ +---------------------------------------------------------------+ | tag | +---------------------------------------------------------------/ struct fihelloChunkPayload_t { vlu_t epdLength :variable*8; uint8_t endpointDiscriminator[epdLength]; address_t replyAddress :variable*8; uint8_t tag[remainder()]; } :chunkLength*8; epdLength: VLU, the length of the following endpointDiscriminator field in bytes. endpointDiscriminator: The Endpoint Discriminator for the identity with which the original initiator wants to communicate, copied from the original Initiator Hello. replyAddress: Address format (Section 2.1.5), the address that the forwarding node derived from the received Initiator Hello, to which the receiver should respond. tag: Copied from the original Initiator Hello. The use of FIHello is detailed in Section 3.5.1.5. 2.3.4. Responder Hello Chunk (RHello) This chunk is sent by a responder in response to an Initiator Hello or Forwarded Initiator Hello if the Endpoint Discriminator indicates the responder's identity. This chunk is only allowed in a packet with Session ID 0, encrypted with the Default Session Key, and having packet mode 3 (Startup). 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x70 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | tagLength \ | tagEcho (tagLength bytes) | +-------------/-+-----------------------------------------------/ +-------------/-+-----------------------------------------------+ | cookieLength\ | cookie (cookieLength bytes) | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | responderCertificate | +---------------------------------------------------------------/ struct rhelloChunkPayload_t { vlu_t tagLength :variable*8; uint8_t tagEcho[tagLength]; vlu_t cookieLength :variable*8; uint8_t cookie[cookieLength]; uint8_t responderCertificate[remainder()]; } :chunkLength*8; tagLength: VLU, the length of the following tagEcho field in bytes. tagEcho: The tag from the Initiator Hello, unaltered. cookieLength: VLU, the length of the following cookie field in bytes. cookie: Responder-created state data to authenticate a future Initiator Initial Keying message (in order to prevent denial-of- service attacks). responderCertificate: The responder's cryptographic credentials. Note: This specification doesn't mandate a specific choice of certificate format. The Cryptography Profile determines the syntax, algorithms, and interpretation of the responderCertificate. The use of RHello is detailed in Section 3.5.1. 2.3.5. Responder Redirect Chunk (Redirect) This chunk is sent in response to an Initiator Hello or Forwarded Initiator Hello to indicate that the requested endpoint can be reached at one or more of the indicated addresses. A receiver can add none, some, or all of the indicated addresses to the set of addresses to which it is sending Initiator Hello messages for the opening session associated with tagEcho. This chunk is only allowed in a packet with Session ID 0, encrypted with the Default Session Key, and having packet mode 3 (Startup). 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x71 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | tagLength \ | tagEcho (tagLength bytes) | +-------------/-+-----------------------------------------------/ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | redirectDestination 1 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ : : +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | redirectDestination N | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct responderRedirectChunkPayload_t { vlu_t tagLength :variable*8; uint8_t tagEcho[tagLength]; addressCount = 0; while(remainder() > 0) { address_t redirectDestination :variable*8; addressCount++; } if(0 == addressCount) redirectDestination = packetSourceAddress(); } :chunkLength*8; tagLength: VLU, the length of the following tagEcho field in bytes. tagEcho: The tag from the Initiator Hello, unaltered. redirectDestination: (Zero or more) Address format (Section 2.1.5) addresses to add to the opening set for the indicated session. If this chunk lists zero redirectDestination addresses, then this is an Implied Redirect, and the indicated address is the address from which the packet containing this chunk was received. The use of Redirect is detailed in Sections 3.5.1.1.1, 3.5.1.1.2, and 3.5.1.4. 2.3.6. RHello Cookie Change Chunk This chunk SHOULD be sent by a responder to an initiator in response to an Initiator Initial Keying if that chunk's cookie appears to have been created by the responder but the cookie is incorrect (for example, it includes a hash of the initiator's address, but the initiator's address is different than the one that elicited the Responder Hello containing the original cookie). This chunk is only allowed in a packet encrypted with the Default Session Key and having packet mode 3, and with the session ID indicated in the initiatorSessionID field of the Initiator Initial Keying to which this is a response. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x79 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | oldCookieLen\ | oldCookie (oldCookieLen bytes) | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | newCookie | +---------------------------------------------------------------/ struct rhelloCookieChangeChunkPayload_t { vlu_t oldCookieLen :variable*8; uint8_t oldCookie[oldCookieLen]; uint8_t newCookie[remainder()]; } :chunkLength*8; oldCookieLen: VLU, the length of the following oldCookie field in bytes. oldCookie: The cookie that was sent in a previous Responder Hello and Initiator Initial Keying. newCookie: The new cookie that the responder would like sent (and signed) in a replacement Initiator Initial Keying. The old and new cookies need not have the same lengths. On receipt of this chunk, the initiator SHOULD compute, sign, and send a new Initiator Initial Keying having newCookie in place of oldCookie. The use of this chunk is detailed in Section 3.5.1.2. 2.3.7. Initiator Initial Keying Chunk (IIKeying) This chunk is sent by an initiator to establish a session with a responder. The initiator MUST have obtained a valid cookie to use with the responder, typically by receiving a Responder Hello from it. This chunk is only allowed in a packet with Session ID 0, encrypted with the Default Session Key, and having packet mode 3 (Startup). 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x38 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | initiatorSessionID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | cookieLength\ | cookieEcho | +-------------/-+-----------------------------------------------/ +-------------/-+-----------------------------------------------+ | certLength \ | initiatorCertificate | +-------------/-+-----------------------------------------------/ +-------------/-+-----------------------------------------------+ | skicLength \ | sessionKeyInitiatorComponent | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | signature | +---------------------------------------------------------------/ struct iikeyingChunkPayload_t { struct { uint32_t initiatorSessionID; vlu_t cookieLength :variable*8; uint8_t cookieEcho[cookieLength]; vlu_t certLength :variable*8; uint8_t initiatorCertificate[certLength]; vlu_t skicLength :variable*8; uint8_t sessionKeyInitiatorComponent[skicLength]; } initiatorSignedParameters :variable*8; uint8_t signature[remainder()]; } :chunkLength*8; initiatorSessionID: The session ID to be used by the responder when sending packets to the initiator. cookieLength: VLU, the length of the following cookieEcho field in bytes. cookieEcho: The cookie from the Responder Hello, unaltered. certLength: VLU, the length of the following initiatorCertificate field in bytes. initiatorCertificate: The initiator's identity credentials. skicLength: VLU, the length of the following sessionKeyInitiatorComponent field in bytes. sessionKeyInitiatorComponent: The initiator's portion of the session key negotiation according to the Cryptography Profile. initiatorSignedParameters: The payload portion of this chunk up to the signature field. signature: The initiator's digital signature of the initiatorSignedParameters according to the Cryptography Profile. Note: This specification doesn't mandate a specific choice of cryptography. The Cryptography Profile determines the syntax, algorithms, and interpretation of the initiatorCertificate, responderCertificate, sessionKeyInitiatorComponent, sessionKeyResponderComponent, and signature, and how the sessionKeyInitiatorComponent and sessionKeyResponderComponent are combined to derive the session keys. The use of IIKeying is detailed in Section 3.5.1. 2.3.8. Responder Initial Keying Chunk (RIKeying) This chunk is sent by a responder in response to an Initiator Initial Keying as the final phase of session startup. This chunk is only allowed in a packet encrypted with the Default Session Key, having packet mode 3 (Startup), and sent to the initiator with the session ID specified by the initiatorSessionID field from the Initiator Initial Keying. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x78 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | responderSessionID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-----------------------------------------------+ | skrcLength \ | sessionKeyResponderComponent | +-------------/-+-----------------------------------------------/ +---------------------------------------------------------------+ | signature | +---------------------------------------------------------------/ struct rikeyingChunkPayload_t { struct { uint32_t responderSessionID; vlu_t skrcLength :variable*8; uint8_t sessionKeyResponderComponent[skrcLength]; } responderSignedParametersPortion :variable*8; uint8_t signature[remainder()]; } :chunkLength*8; struct { responderSignedParametersPortion; sessionKeyInitiatorComponent; } responderSignedParameters; responderSessionID: The session ID to be used by the initiator when sending packets to the responder. skrcLength: VLU, the length of the following sessionKeyResponderComponent field in bytes. sessionKeyResponderComponent: The responder's portion of the session key negotiation according to the Cryptography Profile. responderSignedParametersPortion: The payload portion of this chunk up to the signature field. signature: The responder's digital signature of the responderSignedParameters (see below) according to the Cryptography Profile. responderSignedParameters: The concatenation of the responderSignedParametersPortion (the payload portion of this chunk up to the signature field) and the sessionKeyInitiatorComponent from the Initiator Initial Keying to which this chunk is a response. Note: This specification doesn't mandate a specific choice of cryptography. The Cryptography Profile determines the syntax, algorithms, and interpretation of the initiatorCertificate, responderCertificate, sessionKeyInitiatorComponent, sessionKeyResponderComponent, and signature, and how the sessionKeyInitiatorComponent and sessionKeyResponderComponent are combined to derive the session keys. Once the responder has computed the sessionKeyResponderComponent, it has all of the information and state necessary for an established session with the initiator. Once the responder has sent this chunk to the initiator, the session is established and ready to carry flows of user data. Once the initiator receives, verifies, and processes this chunk, it has all of the information and state necessary for an established session with the responder. The session is established and ready to carry flows of user data. The use of RIKeying is detailed in Section 3.5.1. 2.3.9. Ping Chunk This chunk is sent in order to elicit a Ping Reply from the receiver. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x01 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | message | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct pingChunkPayload_t { uint8_t message[chunkLength]; } :chunkLength*8; message: The (potentially empty) message that is expected to be returned by the other end of the session in a Ping Reply. The receiver of this chunk SHOULD reply as immediately as is practical with a Ping Reply. Ping and the expected Ping Reply are typically used for session keepalive, endpoint address change verification, and path MTU discovery. See Section 3.5.4 for details. 2.3.10. Ping Reply Chunk This chunk is sent in response to a Ping chunk. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x41 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | messageEcho | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct pingReplyChunkPayload_t { uint8_t messageEcho[chunkLength]; } :chunkLength*8; messageEcho: The message from the Ping to which this is a response, unaltered. 2.3.11. User Data Chunk This chunk is the basic unit of transmission for the user messages of a flow. A user message comprises one or more fragments. Each fragment is carried in its own chunk and has a unique sequence number in its flow. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x10 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |O|r| F | r |A|F| |P|s| R | s |B|I| |T|v| A | v |N|N| +-+-+-+-+-+-+-+-+ +-------------/-+-------------/-+-------------/-+ | flowID \ | seq# \ | fsnOffset \ | +-------------/-+-------------/-+-------------/-+ +~~~/~~~/~~~~~~~+ +~~~/~~~/~~~~~~~+-------------/-+ | L \ T \ V |... options ...| L \ T \ V | 0 \ | \~~~/~~~/~~~~~~~+ [if(OPT)] +~~~/~~~/~~~~~~~+-------------/-/ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | userData | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct userDataChunkPayload_t { bool_t optionsPresent :1; // "OPT" uintn_t reserved1 :1; // "rsv" uintn_t fragmentControl :2; // "FRA" // 0=whole, 1=begin, 2=end, 3=middle uintn_t reserved2 :2; // "rsv" bool_t abandon :1; // "ABN" bool_t final :1; // "FIN" vlu_t flowID :variable*8; vlu_t sequenceNumber :variable*8; // "seq#" vlu_t fsnOffset :variable*8; forwardSequenceNumber = sequenceNumber - fsnOffset; if(optionsPresent) optionList_t options :variable*8; uint8_t userData[remainder()]; } :chunkLength*8; optionsPresent: If set, indicates the presence of an option list before the user data. If clear, there is no option list in this chunk. fragmentControl: Indicates how this fragment is assembled, potentially with others, into a complete user message. Possible values: 0: This fragment is a complete message. 1: This fragment is the first of a multi-fragment message. 2: This fragment is the last of a multi-fragment message. 3: This fragment is in the middle of a multi-fragment message. A single-fragment user message has a fragment control of "0-whole". When a message has more than one fragment, the first fragment has a fragment control of "1-begin", then zero or more "3-middle" fragments, and finally a "2-end" fragment. The sequence numbers of a multi-fragment message MUST be contiguous. abandon: If set, this sequence number has been abandoned by the sender. The userData, if any, MUST be ignored. final: If set, this is the last sequence number of the flow. flowID: VLU, the flow identifier. sequenceNumber: VLU, the sequence number of this fragment. Fragments are assigned contiguous increasing sequence numbers in a flow. The first sequence number of a flow SHOULD be 1. The first sequence number of a flow MUST be greater than zero. Sequence numbers are unbounded and do not wrap. fsnOffset: VLU, the difference between the sequence number and the Forward Sequence Number. This field MUST NOT be zero if the abandon flag is not set. This field MUST NOT be greater than sequenceNumber. forwardSequenceNumber: The flow sender will not send (or resend) any fragment with a sequence number less than or equal to the Forward Sequence Number. options: If the optionsPresent flag is set, a list of zero or more Options terminated by a Marker is present. See Section 2.3.11.1 for defined options. userData: The actual user data for this fragment. The use of User Data is detailed in Section 3.6.2. 2.3.11.1. Options for User Data This section lists options that may appear in User Data option lists. A conforming implementation MUST support the options in this section. A flow receiver MUST reject a flow containing a flow option that is not understood if the option type is less than 8192 (0x2000). A flow receiver MUST ignore any flow option that is not understood if the option type is 8192 or greater. The following option type codes are defined for User Data: 0x00: User's Per-Flow Metadata (Section 2.3.11.1.1) 0x0a: Return Flow Association (Section 2.3.11.1.2) 2.3.11.1.1. User's Per-Flow Metadata This option conveys the user's per-flow metadata for the flow to which it's attached. +-------------/-+-------------/-+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | length \ | 0x00 \ | userMetadata | +-------------/-+-------------/-+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct userMetadataOptionValue_t { uint8_t userMetadata[remainder()]; } :remainder()*8; The user associates application-defined metadata with each flow. The metadata does not change over the life of the flow. Every flow MUST have metadata. A flow sender MUST send this option with the first User Data chunk for this flow in each packet until an acknowledgement for this flow is received. A flow sender SHOULD NOT send this option more than once for each flow in any one packet. A flow sender SHOULD NOT send this option for a flow once the flow has been acknowledged. This specification doesn't mandate the encoding, syntax, or interpretation of the user's per-flow metadata; this is determined by the application. The userMetadata SHOULD NOT exceed 512 bytes. The userMetadata MAY be 0 bytes in length. 2.3.11.1.2. Return Flow Association A new flow can be considered to be in return (or response) to a flow sent by the other endpoint. This option encodes the receive flow identifier to which this new sending flow is a response. +-------------/-+-------------/-+-------------/-+ | length \ | 0x0a \ | flowID \ | +-------------/-+-------------/-+-------------/-+ struct returnFlowAssociationOptionValue_t { vlu_t flowID :variable*8; } :variable*8; Consider endpoints A and B. Endpoint A begins a flow with identifier 5 to endpoint B. A is the flow sender for A's flowID=5, and B is the flow receiver for A's flowID=5. B begins a return flow with identifier 7 to A in response to A's flowID=5. B is the flow sender for B's flowID=7, and A is the flow receiver for B's flowID=7. B sends this option with flowID set to 5 to indicate that B's flowID=7 is in response to and associated with A's flowID=5. If there is a return association, the flow sender MUST send this option with the first User Data chunk for this flow in each packet until an acknowledgement for this flow is received. A flow sender SHOULD NOT send this option more than once for each flow in any one packet. A flow sender SHOULD NOT send this option for a flow once the flow has been acknowledged. A flow MUST NOT indicate more than one return association. A flow MUST indicate its return association, if any, upon its first transmission of a User Data chunk. A return association can't be added to a sending flow after it begins. A flow receiver MUST reject a new receiving flow having a return flow association that does not indicate an F_OPEN sending flow. 2.3.12. Next User Data Chunk This chunk is equivalent to the User Data chunk for purposes of sending the user messages of a flow. When used, it MUST follow a User Data chunk or another Next User Data chunk in the same packet. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x11 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |O|r| F | r |A|F| |P|s| R | s |B|I| |T|v| A | v |N|N| +-+-+-+-+-+-+-+-+ +~~~/~~~/~~~~~~~+ +~~~/~~~/~~~~~~~+-------------/-+ | L \ T \ V |... options ...| L \ T \ V | 0 \ | \~~~/~~~/~~~~~~~+ [if(OPT)] +~~~/~~~/~~~~~~~+-------------/-/ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | userData | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~/ struct nextUserDataChunkPayload_t { bool_t optionsPresent :1; // "OPT" uintn_t reserved1 :1; // "rsv" uintn_t fragmentControl :2; // "FRA" // 0=whole, 1=begin, 2=end, 3=middle uintn_t reserved2 :2; // "rsv" bool_t abandon :1; // "ABN" bool_t final :1; // "FIN" if(optionsPresent) optionList_t options :variable*8; uint8_t userData[remainder()]; } :chunkLength*8; This chunk is considered to be for the same flowID as the most recently preceding User Data or Next User Data chunk in the same packet, having the same Forward Sequence Number, and having the next sequence number. The optionsPresent, fragmentControl, abandon, and final flags, and the options (if present), have the same interpretation as for the User Data chunk. ... ----------+------------------------------------ 10 00 07 | User Data chunk, length=7 00 | OPT=0, FRA=0 "whole", ABN=0, FIN=0 02 05 03 | flowID=2, seq#=5, fsn=(5-3)=2 00 01 02 | data 3 bytes: 00, 01, 02 ----------+------------------------------------ 11 00 04 | Next User Data chunk,length=4 00 | OPT=0, FRA=0 "whole", ABN=0, FIN=0 | flowID=2, seq#=6, fsn=2 03 04 05 | data 3 bytes: 03, 04, 05 ----------+------------------------------------ 11 00 04 | Next User Data chunk, length=4 00 | OPT=0, FRA=0 "whole", ABN=0, FIN=0 | flowID=2, seq#=7, fsn=2 06 07 08 | data 3 bytes: 06, 07, 08 ----------+------------------------------------ Figure 3: Sequential Messages in One Packet Using Next User Data The use of Next User Data is detailed in Section 3.6.2.3.2. 2.3.13. Data Acknowledgement Bitmap Chunk (Bitmap Ack) This chunk is sent by the flow receiver to indicate to the flow sender the User Data fragment sequence numbers that have been received for one flow. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. The flow receiver can choose to acknowledge User Data with this chunk or with a Range Ack. It SHOULD choose whichever format has the most compact encoding of the sequence numbers received. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x50 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-------------/-+-------------/-+ | flowID \ | bufAvail \ | cumAck \ | +-------------/-+-------------/-+-------------/-+ +~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+ |C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C|C| |+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+|+| |9|8|7|6|5|4|3|2|1|1|1|1|1|1|1|1|2|2|2|2|2|2|1|1| .... | | | | | | | | |7|6|5|4|3|2|1|0|5|4|3|2|1|0|9|8| +~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+~+ struct dataAckBitmapChunkPayload_t { vlu_t flowID :variable*8; vlu_t bufferBlocksAvailable :variable*8; // "bufAvail" vlu_t cumulativeAck :variable*8; // "cumAck" bufferBytesAvailable = bufferBlocksAvailable * 1024; acknowledge(0 through cumulativeAck); ackCursor = cumulativeAck + 1; while(remainder() > 0) { for(bitPosition = 8; bitPosition > 0; bitPosition--) { bool_t bit :1; if(bit) acknowledge(ackCursor + bitPosition); } ackCursor += 8; } } :chunkLength*8; flowID: VLU, the flow identifier. bufferBlocksAvailable: VLU, the number of 1024-byte blocks of User Data that the receiver is currently able to accept. Section 3.6.3.5 describes how to calculate this value. cumulativeAck: VLU, the acknowledgement of every fragment sequence number in this flow that is less than or equal to this value. This MUST NOT be less than the highest Forward Sequence Number received in this flow. bit field: A sequence of zero or more bytes representing a bit field of received fragment sequence numbers after the cumulative acknowledgement, least significant bit first. A set bit indicates receipt of a sequence number. A clear bit indicates that sequence number was not received. The least significant bit of the first byte is the second sequence number following the cumulative acknowledgement, the next bit is the third sequence number following, and so on. Figure 4 shows an example Bitmap Ack indicating acknowledgement of fragment sequence numbers 0 through 16, 18, 21 through 24, 27, and 28. 50 00 05 | Bitmap Ack, length=5 bytes 05 7f 10 | flowID=5, bufAvail=127*1024 bytes, cumAck=0..16 79 06 | 01111001 00000110 = 18, 21, 22, 23, 24, 27, 28 Figure 4: Example Bitmap Ack 2.3.14. Data Acknowledgement Ranges Chunk (Range Ack) This chunk is sent by the flow receiver to indicate to the flow sender the User Data fragment sequence numbers that have been received for one flow. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. The flow receiver can choose to acknowledge User Data with this chunk or with a Bitmap Ack. It SHOULD choose whichever format has the most compact encoding of the sequence numbers received. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x51 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-------------/-+-------------/-+ | flowID \ | bufAvail \ | cumAck \ | +-------------/-+-------------/-+-------------/-+ +~~~~~~~~~~~~~/~+~~~~~~~~~~~~~/~+ | #holes-1 \ | #recv-1 \ | +~~~~~~~~~~~~~/~+~~~~~~~~~~~~~/~+ : : +~~~~~~~~~~~~~/~+~~~~~~~~~~~~~/~+ | #holes-1 \ | #recv-1 \ | +~~~~~~~~~~~~~/~+~~~~~~~~~~~~~/~+ struct dataAckRangesChunkPayload_t { vlu_t flowID :variable*8; vlu_t bufferBlocksAvailable :variable*8; // "bufAvail" vlu_t cumulativeAck :variable*8; // "cumAck" bufferBytesAvailable = bufferBlocksAvailable * 1024; acknowledge(0 through cumulativeAck); ackCursor = cumulativeAck; while(remainder() > 0) { vlu_t holesMinusOne :variable*8; // "#holes-1" vlu_t receivedMinusOne :variable*8; // "#recv-1" ackCursor++; rangeFrom = ackCursor + holesMinusOne + 1; rangeTo = rangeFrom + receivedMinusOne; acknowledge(rangeFrom through rangeTo); ackCursor = rangeTo; } } :chunkLength*8; flowID: VLU, the flow identifier. bufferBlocksAvailable: VLU, the number of 1024-byte blocks of User Data that the receiver is currently able to accept. Section 3.6.3.5 describes how to calculate this value. cumulativeAck: VLU, the acknowledgement of every fragment sequence number in this flow that is less than or equal to this value. This MUST NOT be less than the highest Forward Sequence Number received in this flow. holesMinusOne / receivedMinusOne: Zero or more acknowledgement ranges, run-length encoded. Runs are encoded as zero or more pairs of VLUs indicating the number (minus one) of missing sequence numbers followed by the number (minus one) of received sequence numbers, starting at the cumulative acknowledgement. NOTE: If a parser syntax error is encountered here (that is, if the chunk is truncated such that not enough bytes remain to completely encode both VLUs of the acknowledgement range), then treat and process this chunk as though it was properly formed up to the last completely encoded range. Figure 5 shows an example Range Ack indicating acknowledgement of fragment sequence numbers 0 through 16, 18, 21, 22, 23, and 24. 51 00 07 | Range Ack, length=7 05 7f 10 | flowID=5, bufAvail=127*1024 bytes, cumAck=0..16 00 00 | holes=1, received=1 -- missing 17, received 18 01 03 | holes=2, received=4 -- missing 19..20, received 21..24 Figure 5: Example Range Ack Figure 6 shows an example Range Ack indicating acknowledgement of fragment sequence numbers 0 through 16 and 18, with a truncated last range. Note that the truncation and parse error does not abort the entire chunk in this case. 51 00 07 | Range Ack, length=9 05 7f 10 | flowID=5, bufAvail=127*1024 bytes, cumAck=0..16 00 00 | holes=1, received=1 -- missing 17, received 18 01 83 | holes=2, received=VLU parse error, ignore this range Figure 6: Example Truncated Range Ack 2.3.15. Buffer Probe Chunk This chunk is sent by the flow sender in order to request the current available receive buffer (in the form of a Data Acknowledgement) for a flow. It is only allowed in a packet belonging to an established session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x18 | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+ | flowID \ | +-------------/-+ struct bufferProbeChunkPayload_t { vlu_t flowID :variable*8; } :chunkLength*8; flowID: VLU, the flow identifier. The receiver of this chunk SHOULD reply as immediately as is practical with a Data Acknowledgement. 2.3.16. Flow Exception Report Chunk This chunk is sent by the flow receiver to indicate that it is not (or is no longer) interested in the flow and would like the flow sender to close the flow. This chunk SHOULD precede every Data Acknowledgement chunk for the same flow in this condition. This chunk is only allowed in a packet belonging to an established session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x5e | chunkLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-------------/-+-------------/-+ | flowID \ | exception \ | +-------------/-+-------------/-+ struct flowExceptionReportChunkPayload_t { vlu_t flowID :variable*8; vlu_t exception :variable*8; } :chunkLength*8; flowID: VLU, the flow identifier. exception: VLU, the application-defined exception code being reported. A receiving RTMFP might reject a flow automatically, for example if it is missing metadata, or if an invalid return association is specified. In circumstances where an RTMFP rejects a flow automatically, the exception code MUST be 0. The application can specify any exception code, including 0, when rejecting a flow. All non-zero exception codes are reserved for the application. 2.3.17. Session Close Request Chunk (Close) This chunk is sent to cleanly terminate a session. It is only allowed in a packet belonging to an established or closing session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x0c | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This chunk has no payload. The use of Close is detailed in Section 3.5.5. 2.3.18. Session Close Acknowledgement Chunk (Close Ack) This chunk is sent in response to a Session Close Request to indicate that the sender has terminated the session. It is only allowed in a packet belonging to an established or closing session and having packet mode 1 or 2. 0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x4c | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This chunk has no payload. The use of Close Ack is detailed in Section 3.5.5. 3. Operation 3.1. Overview +--------+ +--------+ | Peer A | S E S S I O N | Peer B | | /=============================\ | | || Flows || | | ||---------------------------->|| | | ||---------------------------->|| | | ||<----------------------------|| | | ||<----------------------------|| | | ||<----------------------------|| | | \=============================/ | | | | | | | +--------+ | | | | +--------+ | | S E S S I O N | Peer C | | /=============================\ | | || Flows || | | ||---------------------------->|| | | ||<----------------------------|| | | ||<----------------------------|| | | \=============================/ | | | | | +--------+ +--------+ Figure 7: Sessions between Pairs of Communicating Endpoints Between any pair of communicating endpoints is a single, bidirectional, secured, congestion controlled session. Unidirectional flows convey messages from one end to the other within the session. An endpoint initiates a session to a far end when communication is desired. An initiator begins with one or more candidate destination socket addresses, and it may learn and try more candidate addresses during startup handshaking. Eventually, a first suitable response is received, and that endpoint is selected. Startup proceeds to the selected endpoint. In the case of session startup glare, one endpoint is the prevailing initiator and the other assumes the role of responder. Encryption keys and session identifiers are negotiated between the endpoints, and the session is established. Each endpoint may begin sending message flows to the other end. For each flow, the far end may accept it and deliver its messages to the user, or it may reject the flow and transmit an exception to the sender. The flow receiver may close and reject a flow at a later time, after first accepting it. The flow receiver acknowledges all data sent to it, regardless of whether the flow was accepted. Acknowledgements drive a congestion control mechanism. An endpoint may have concurrent sessions with other far endpoints. The multiple sessions are distinguished by a session identifier rather than by socket address. This allows an endpoint's address to change mid-session without having to tear down and re-establish a session. The existing cryptographic state for a session can be used to verify a change of address while protecting against session hijacking or denial of service. A sender may indicate to a receiver that some user messages are of a time critical or real-time nature. A receiver may indicate to senders on concurrent sessions that it is receiving time critical messages from another endpoint. The other senders SHOULD modify their congestion control parameters to yield capacity to the session carrying time critical messages. A sender may close a flow. The flow is completed when the receiver has no outstanding gaps before the final fragment of the flow. The sender and receiver reserve a completed flow's identifier for a time to allow in-flight messages to drain from the network. Eventually, neither end will have any flows open to the other. The session will be idle and quiescent. Either end may reliably close the session to recover its resources. In certain circumstances, an endpoint may be ceasing operation and not have time to wait for acknowledgement of a reliable session close. In this case, the halting endpoint may send an abrupt session close to advise the far end that it is halting immediately. 3.2. Endpoint Identity Each RTMFP endpoint has an identity. The identity is encoded in a certificate. This specification doesn't mandate any particular certificate format, cryptographic algorithms, or cryptographic properties for certificates. An endpoint is named by an Endpoint Discriminator. This specification doesn't mandate any particular format for Endpoint Discriminators. An Endpoint Discriminator MAY select more than one identity and MAY match more than one distinct certificate. Multiple distinct Endpoint Discriminators MAY match one certificate. It is RECOMMENDED that multiple endpoints not have the same identity. Entities with the same identity are indistinguishable during session startup; this situation could be undesirable in some applications. An endpoint MAY have more than one address. The Cryptography Profile implements the following functions for identities, certificates, and Endpoint Discriminators, whose operation MUST be deterministic: o Test whether a given certificate is authentic. Authenticity can comprise verifying an issuer signature chain in a public key infrastructure. o Test whether a given Endpoint Discriminator selects a given certificate. o Test whether a given Endpoint Discriminator selects the local endpoint. o Generate a Canonical Endpoint Discriminator for a given certificate. Canonical Endpoint Discriminators for distinct identities SHOULD be distinct. If two distinct identities have the same Canonical Endpoint Discriminator, an initiator might abort a new opening session to the second identity (Section 3.5.1.1.1); this behavior might not be desirable. o Given a certificate, a message, and a digital signature over the message, test whether the signature is valid and generated by the owner of the certificate. o Generate a digital signature for a given message corresponding to the near identity. o Given the near identity and a far certificate, determine which one shall prevail as Initiator and which shall assume the Responder role in the case of startup glare. The far end MUST arrive at the same conclusion. A comparison function can comprise performing a lexicographic ordering of the binary certificates, declaring the far identity the prevailing endpoint if the far certificate is ordered before the near certificate, and otherwise declaring the near identity to be the prevailing endpoint. o Given a first certificate and a second certificate, test whether a new incoming session from the second shall override an existing session with the first. It is RECOMMENDED that the test comprise testing whether the certificates are bitwise identical. All other semantics for certificates and Endpoint Discriminators are determined by the Cryptography Profile and the application. 3.3. Packet Multiplex An RTMFP typically has one or more interfaces through which it communicates with other RTMFP endpoints. RTMFP can communicate with multiple distinct other RTMFP endpoints through each local interface. Session multiplexing over a shared interface can facilitate peer-to- peer communications through a NAT, by enabling third-party endpoints such as Forwarders (Section 3.5.1.5) and Redirectors (Section 3.5.1.4) to observe the translated public address and inform peers of the translation. An interface is typically a UDP socket (Section 2.2.1) but MAY be any suitable datagram transport service where endpoints can be addressed by IPv4 or IPv6 socket addresses. RTMFP uses a session ID to multiplex and demultiplex communications with distinct endpoints (Section 2.2.2), in addition to the endpoint socket address. This allows an RTMFP to detect a far-end address change (as might happen, for example, in mobile and wireless scenarios) and allows communication sessions to survive address changes. This also allows an RTMFP to act as a Forwarder or Redirector for an endpoint with which it has an active session, by distinguishing startup packets from those of the active session. On receiving a packet, an RTMFP decodes the session ID to look up the corresponding session information context and decryption key. Session ID 0 is reserved for session startup and MUST NOT be used for an active session. A packet for Session ID 0 uses the Default Session Key as defined by the Cryptography Profile. 3.4. Packet Fragmentation When an RTMFP packet (Section 2.2.4) is unavoidably larger than the path MTU (such as a startup packet containing an RHello (Section 2.3.4) or IIKeying (Section 2.3.7) chunk with a large certificate), it can be fragmented into segments that do not exceed the path MTU by using the Packet Fragment chunk (Section 2.3.1). The packet fragmentation mechanism SHOULD be used only to segment unavoidably large packets. Accordingly, this mechanism SHOULD be employed only during session startup with Session ID 0. This mechanism MUST NOT be used instead of the natural fragmentation mechanism of the User Data (Section 2.3.11) and Next User Data (Section 2.3.12) chunks for dividing the messages of the user's data flows into segments that do not exceed the path MTU. A fragmented plain RTMFP packet is reassembled by concatenating the packetFragment fields of the fragments for the packet in contiguous ascending order, starting from index 0 through and including the final fragment. When reassembling packets for Session ID 0, a receiver SHOULD identify the packets by the socket address from which the packet containing the fragment was received, as well as the indicated packetID. A receiver SHOULD allow up to 60 seconds to completely receive a fragmented packet for which progress is being made. A packet is progressing if at least one new fragment for it was received in the last second. A receiver MUST discard a Packet Fragment chunk having an empty packetFragment field. The mode of each packet containing Packet Fragments for the same fragmented packet MUST match the mode of the fragmented packet. A receiver MUST discard any new Packet Fragment chunk received in a packet with a mode different from the mode of the packet containing the first received fragment. A receiver MUST discard any reassembled packet with a mode different than the packets containing its fragments. In order to avoid jamming the network, the sender MUST rate limit packet transmission. In the absence of specific path capacity information (for instance, during session startup), a sender SHOULD NOT send more than 4380 bytes nor more than four packets per distinct endpoint every 200 ms. To avoid resource exhaustion, a receiver SHOULD limit the number of concurrent packet reassembly buffers and the size of each buffer. Limits can depend, for example, on the expected size of reassembled packets, on the rate at which fragmented packets are expected to be received, on the expected degree of interleaving, and on the expected function of the receiver. Limits can depend on the available resources of the receiver. There can be different limits for packets with Session ID 0 and packets for established sessions. For example, a busy server might need to allow for several hundred concurrent packet reassembly buffers to accommodate hundreds of connection requests per second with potentially interleaved fragments, but a client device with constrained resources could allow just a few reassembly buffers. In the absence of specific information regarding the expected size of reassembled packets, a receiver should set the limit for each packet reassembly buffer to 65536 bytes. 3.5. Sessions A session is the protocol relationship between a pair of communicating endpoints, comprising the shared and endpoint-specific information context necessary to carry out the communication. The session context at each end includes at least: o TS_RX: the last timestamp received from the far end; o TS_RX_TIME: the time at which TS_RX was first observed to be different than its previous value; o TS_ECHO_TX: the last timestamp echo sent to the far end; o MRTO: the measured retransmission timeout; o ERTO: the effective retransmission timeout; o Cryptographic keys for encrypting and decrypting packets, and for verifying the validity of packets, according to the Cryptography Profile; o Cryptographic near and far nonces according to the Cryptography Profile, where the near nonce is the far end's far nonce, and vice versa; o The certificate of the far end; o The receive session identifier, used by the far end when sending packets to this end; o The send session identifier to use when sending packets to the far end; o DESTADDR: the destination socket address to use when sending packets to the far end; o The set of all sending flow contexts (Section 3.6.2); o The set of all receiving flow contexts (Section 3.6.3); o The transmission budget, which controls the rate at which data is sent into the network (for example, a congestion window); o S_OUTSTANDING_BYTES: the total amount of user message data outstanding, or in flight, in the network -- that is, the sum of the F_OUTSTANDING_BYTES of each sending flow in the session; o RX_DATA_PACKETS: a count of the number of received packets containing at least one User Data chunk since the last acknowledgement was sent, initially 0; o ACK_NOW: a boolean flag indicating whether an acknowledgement should be sent immediately, initially false; o DELACK_ALARM: an alarm to trigger an acknowledgement after a delay, initially unset; o The state, at any time being one of the following values: the opening states S_IHELLO_SENT and S_KEYING_SENT, the open state S_OPEN, the closing states S_NEARCLOSE and S_FARCLOSE_LINGER, and the closed states S_CLOSED and S_OPEN_FAILED; and o The role -- either Initiator or Responder -- of this end of the session. Note: The following diagram is only a summary of state transitions and their causing events, and is not a complete operational specification. rcv IIKeying Glare far prevails +-------------+ ultimate open timeout +--------------|S_IHELLO_SENT|-------------+ | +-------------+ | | |rcv RHello | | | v | v +-------------+ |<-----------(duplicate session?) |S_OPEN_FAILED| | yes |no +-------------+ | | ^ | rcv IIKeying Glare v | | far prevails +-------------+ | |<-------------|S_KEYING_SENT|-------------+ | +-------------+ ultimate open timeout | |rcv RIKeying | | | rcv v | +-+ IIKeying +--------+ rcv Close Request | |X|---------->| S_OPEN |--------------------+ | +-+ +--------+ | | | |ABRUPT CLOSE | | ORDERLY CLOSE| |or rcv Close Ack | | | |or rcv IIKeying | | | | session override | | | +-------+ | | v | v | +-----------+ | +-----------------+ | |S_NEARCLOSE| | |S_FARCLOSE_LINGER| | +-----------+ | +-----------------+ | rcv Close Ack| | |rcv Close Ack | or 90 seconds| v |or 19 seconds | | +--------+ | | +------>|S_CLOSED|<---------+ +-------------------------->| | +--------+ Figure 8: Session State Diagram 3.5.1. Startup 3.5.1.1. Normal Handshake RTMFP sessions are established with a 4-way handshake in two round trips. The initiator begins by sending an IHello to one or more candidate addresses for the desired destination endpoint. A responder statelessly sends an RHello in response. The first correct RHello received at the initiator is selected; all others are ignored. The initiator computes its half of the session keying and sends an IIKeying. The responder receives the IIKeying and, if it is acceptable, computes its half of the session keying, at which point it can also compute the shared session keying and session nonces. The responder creates a new S_OPEN session with the initiator and sends an RIKeying. The initiator receives the RIKeying and, if it is acceptable, computes the shared session keying and session nonces. The initiator's session is now S_OPEN. . Initiator Responder . | IHello | |(EPD,Tag) | S_IHELLO_SENT |(SID=0) | |------------------------------->| | | | RHello | | (Tag,Cookie,RCert)| | (SID=0)| |<-------------------------------| S_KEYING_SENT | | | IIKeying | |(ISID,Cookie,ICert,SKIC,ISig) | |(SID=0) | |------------------------------->| | | | RIKeying | | (RSID,SKRC,RSig)| | (SID=ISID,Key=Default)| S_OPEN |<-------------------------------| S_OPEN | | | S E S S I O N | |<-------------------(SID=ISID)--| |--(SID=RSID)------------------->| Figure 9: Normal Handshake In the following sections, the handshake is detailed from the perspectives of the initiator and responder. 3.5.1.1.1. Initiator The initiator determines that a session is needed for an Endpoint Discriminator. The initiator creates state for a new opening session and begins with a candidate endpoint address set containing at least one address. The new session is placed in the S_IHELLO_SENT state. If the session does not move to the S_OPEN state before an ultimate open timeout, the session has failed and moves to the S_OPEN_FAILED state. The RECOMMENDED ultimate open timeout is 95 seconds. The initiator chooses a new, unique tag not used by any currently opening session. It is RECOMMENDED that the tag be cryptographically pseudorandom and be at least 8 bytes in length, so that it is hard to guess. The initiator constructs an IHello chunk (Section 2.3.2) with the Endpoint Discriminator and the tag. While the initiator is in the S_IHELLO_SENT state, it sends the IHello to each candidate endpoint address in the set, on a backoff schedule. The backoff SHOULD NOT be less than multiplicative, with not less than 1.5 seconds added to the interval between each attempt. The backoff SHOULD be scheduled separately for each candidate address, since new candidates can be added over time. If the initiator receives a Redirect chunk (Section 2.3.5) with a tag echo matching this session, AND this session is in the S_IHELLO_SENT state, then for each redirect destination indicated in the Redirect: if the candidate endpoint address set contains fewer than REDIRECT_THRESHOLD addresses, add the indicated redirect destination to the candidate endpoint address set. REDIRECT_THRESHOLD SHOULD NOT be more than 24. If the initiator receives an RHello chunk (Section 2.3.4) with a tag echo matching this session, AND this session is in the S_IHELLO_SENT state, AND the responder certificate matches the desired Endpoint Discriminator, AND the certificate is authentic according to the Cryptography Profile, then: 1. If the Canonical Endpoint Discriminator for the responder certificate matches the Canonical Endpoint Discriminator of another existing session in the S_KEYING_SENT or S_OPEN states, AND the certificate of the other opening session matches the desired Endpoint Discriminator, then this session is a duplicate and SHOULD be aborted in favor of the other existing session; otherwise, 2. Move to the S_KEYING_SENT state. Set DESTADDR, the far-end address for the session, to the address from which this RHello was received. The initiator chooses a new, unique receive session ID, not used by any other session, for the responder to use when sending packets to the initiator. It computes a Session Key Initiator Component appropriate to the responder's certificate according to the Cryptography Profile. Using this data and the cookie from the RHello, the initiator constructs and signs an IIKeying chunk (Section 2.3.7). While the initiator is in the S_KEYING_SENT state, it sends the IIKeying to DESTADDR on a backoff schedule. The backoff SHOULD NOT be less than multiplicative, with not less than 1.5 seconds added to the interval between each attempt. If the initiator receives an RIKeying chunk (Section 2.3.8) in a packet with this session's receive session identifier, AND this session is in the S_KEYING_SENT state, AND the signature in the chunk is authentic according to the far end's certificate (from the RHello), AND the Session Key Responder Component successfully combines with the Session Key Initiator Component and the near and far certificates to form the shared session keys and nonces according to the Cryptography Profile, then the session has opened successfully. The session moves to the S_OPEN state. The send session identifier is set from the RIKeying. Packet encryption, decryption, and verification now use the newly computed shared session keys, and the session nonces are available for application- layer cryptographic challenges. 3.5.1.1.2. Responder On receipt of an IHello chunk (Section 2.3.2) with an Endpoint Discriminator that selects its identity, an endpoint SHOULD construct an RHello chunk (Section 2.3.4) and send it to the address from which the IHello was received. To avoid a potential resource exhaustion denial of service, the endpoint SHOULD NOT create any persistent state associated with the IHello. The endpoint MUST generate the cookie for the RHello in such a way that it can be recognized as authentic and valid when echoed in an IIKeying. The endpoint SHOULD use the address from which the IHello was received as part of the cookie generation formula. Cookies SHOULD be valid only for a limited time; that lifetime SHOULD NOT be less than 95 seconds (the recommended ultimate session open timeout). On receipt of an FIHello chunk (Section 2.3.3) from a Forwarder (Section 3.5.1.5) where the Endpoint Discriminator selects its identity, an endpoint SHOULD do one of the following: 1. Compute, construct, and send an RHello as though the FIHello was an IHello received from the indicated reply address; or 2. Construct and send an Implied Redirect (Section 2.3.5) to the FIHello's reply address; or 3. Ignore this FIHello. On receipt of an IIKeying chunk (Section 2.3.7), if the cookie is not authentic or if it has expired, ignore this IIKeying; otherwise, On receipt of an IIKeying chunk, if the cookie appears authentic but does not match the address from which the IIKeying's packet was received, perform the special processing at Cookie Change (Section 3.5.1.2); otherwise, On receipt of an IIKeying with an authentic and valid cookie, if the certificate is authentic according to the Cryptography Profile, AND the signature in the chunk is authentic according to the far end's certificate and the Cryptography Profile, AND the Session Key Initiator Component is acceptable, then: 1. If the address from which this IIKeying was received corresponds to an opening session in the S_IHELLO_SENT or S_KEYING_SENT state, perform the special processing at Glare (Section 3.5.1.3); otherwise, 2. If the address from which this IIKeying was received corresponds to a session in the S_OPEN state, then: 1. If the receiver was the Responder for the S_OPEN session and the session identifier, certificate, and Session Key Initiator Component are identical to those of the S_OPEN session, this IIKeying is a retransmission, so resend the S_OPEN session's RIKeying using the Default Session Key as specified below; otherwise, 2. If the certificate from this IIKeying does not override the certificate of the S_OPEN session, ignore this IIKeying; otherwise, 3. The certificate from this IIKeying overrides the certificate of the S_OPEN session; this is a new opening session from the same identity, and the existing S_OPEN session is stale. Move the existing S_OPEN session to S_CLOSED and abort all of its flows (signaling exceptions to the user), then continue processing this IIKeying. Otherwise, 3. Compute a Session Key Responder Component and choose a new, unique receive session ID not used by any other session for the initiator to use when sending packets to the responder. Using this data, construct and, with the Session Key Initiator Component, sign an RIKeying chunk (Section 2.3.8). Using the Session Key Initiator and Responder Components and the near and far certificates, the responder combines and computes the shared session keys and nonces according to the Cryptography Profile. The responder creates a new session in the S_OPEN state, with the far-endpoint address DESTADDR taken from the source address of the packet containing the IIKeying and the send session identifier taken from the IIKeying. The responder sends the RIKeying to the initiator using the Default Session Key and the requested send session identifier. Packet encryption, decryption, and verification of all future packets for this session use the newly computed keys, and the session nonces are available for application-layer cryptographic challenges. 3.5.1.2. Cookie Change In some circumstances, the responder may generate an RHello cookie for an initiator's address that isn't the address the initiator would use when sending packets directly to the responder. This can happen, for example, when the initiator has multiple local addresses and uses one address to reach a Forwarder (Section 3.5.1.5) but another to reach the responder. Consider the following example: Initiator Forwarder Responder | IHello | | |(Src=Ix) | | |------------------------------->| | | | FIHello | | |(RA=Ix) | | |-------------------------------->| | | | RHello | | (Cookie:Ix)| |<-----------------------------------------------------------------| | | | IIKeying | |(Cookie:Ix,Src=Iy) | |----------------------------------------------------------------->| | | | RHello Cookie Change | | (Cookie:Ix,Cookie:Iy)| |<-----------------------------------------------------------------| | | | IIKeying | |(Cookie:Iy) | |----------------------------------------------------------------->| | | | RIKeying | |<-----------------------------------------------------------------| | | |<======================== S E S S I O N =========================>| Figure 10: Handshake with Cookie Change The initiator has two network interfaces: a first preferred interface with address Ix = 192.0.2.100:50000, and a second with address Iy = 198.51.100.101:50001. The responder has one interface with address Ry = 198.51.100.200:51000, on the same network as the initiator's second interface. The initiator uses its first interface to reach a Forwarder. The Forwarder observes the initiator's address of Ix and sends a Forwarded IHello (Section 2.3.3) to the responder. The responder treats this as if it were an IHello from Ix, calculates a corresponding cookie, and sends an RHello to Ix. The initiator receives this RHello from Ry and selects that address as the destination for the session. It then sends an IIKeying, copying the cookie from the RHello. However, since the source of the RHello is Ry, on a network to which the initiator is directly connected, the initiator uses its second interface Iy to send the IIKeying. The responder, on receiving the IIKeying, will compare the cookie to the expected value based on the source address of the packet, and since the IIKeying source doesn't match the IHello source used to generate the cookie, the responder will reject the IIKeying. If the responder determines that it generated the cookie in the IIKeying but the cookie doesn't match the sender's address (for example, if the cookie is in two parts, with a first part generated independently of the initiator's address and a second part dependent on the address), the responder SHOULD generate a new cookie based on the address from which the IIKeying was received and send an RHello Cookie Change chunk (Section 2.3.6) to the source of the IIKeying, using the session ID from the IIKeying and the Default Session Key. If the initiator receives an RHello Cookie Change chunk for a session in the S_KEYING_SENT state, AND the old cookie matches the one originally sent to the responder, then the initiator adopts the new cookie, constructs and signs a new IIKeying chunk, and sends the new IIKeying to the responder. The initiator SHOULD NOT change the cookie for a session more than once. 3.5.1.3. Glare Glare occurs when two endpoints attempt to initiate sessions to each other concurrently. Glare is detected by receipt of a valid and authentic IIKeying from an endpoint address that is a destination for an opening session. Only one session is allowed between a pair of endpoints. Glare is resolved by comparing the certificate in the received IIKeying with the near end's certificate. The Cryptography Profile defines a certificate comparison function to determine the prevailing endpoint when there is glare. If the near end prevails, discard and ignore the received IIKeying. The far end will abort its opening session on receipt of IIKeying from the near end. Otherwise, the far end prevails: 1. If the certificate in the IIKeying overrides the certificate associated with the near opening session according to the Cryptography Profile, then abort and destroy the near opening session. Then, 2. Continue with normal Responder IIKeying processing (Section 3.5.1.1.2). 3.5.1.4. Redirector +-----------+ +------------+ +-----------+ | Initiator |---------->| Redirector | | Responder | | |<----------| | | | | | +------------+ | | | |<=================================>| | +-----------+ +-----------+ Figure 11: Redirector A Redirector acts like a name server for Endpoint Discriminators. An initiator MAY use a Redirector to discover additional candidate endpoint addresses for a desired endpoint. On receipt of an IHello chunk with an Endpoint Discriminator that does not select the Redirector's identity, the Redirector constructs and sends back to the initiator a Responder Redirect chunk (Section 2.3.5) containing one or more additional candidate addresses for the indicated endpoint. Initiator Redirector Responder | IHello | | |------------------------------->| | | | | | Redirect | | |<-------------------------------| | | | | IHello | |----------------------------------------------------------------->| | | | RHello | |<-----------------------------------------------------------------| | | | IIKeying | |----------------------------------------------------------------->| | | | RIKeying | |<-----------------------------------------------------------------| | | |<======================== S E S S I O N =========================>| Figure 12: Handshake Using a Redirector Deployment Design Note: Redirectors SHOULD NOT initiate new sessions to endpoints that might use the Redirector's address as a candidate for another endpoint, since the far end might interpret the Redirector's IIKeying as glare for the far end's initiation to the other endpoint. 3.5.1.5. Forwarder +-----------+ +-----------+ +---+ +-----------+ | Initiator |---->| Forwarder |<===>| N |<===>| Responder | | | +-----------+ | A | | | | |<=====================>| T |<===>| | +-----------+ +---+ +-----------+ Figure 13: Forwarder A responder might be behind a NAT or firewall that doesn't allow inbound packets to reach the endpoint until it first sends an outbound packet for a particular far-endpoint address. A Forwarder's endpoint address MAY be a candidate address for another endpoint. A responder MAY use a Forwarder to receive FIHello chunks sent on behalf of an initiator. On receipt of an IHello chunk with an Endpoint Discriminator that does not select the Forwarder's identity, if the Forwarder has an S_OPEN session with an endpoint whose certificate matches the desired Endpoint Discriminator, the Forwarder constructs and sends an FIHello chunk (Section 2.3.3) to the selected endpoint over the S_OPEN session, using the tag and Endpoint Discriminator from the IHello chunk and the source address of the packet containing the IHello for the corresponding fields of the FIHello. On receipt of an FIHello chunk, a responder might send an RHello or Implied Redirect to the original source of the IHello (Section 3.5.1.1.2), potentially allowing future packets to flow directly between the initiator and responder through the NAT or firewall. Initiator Forwarder NAT Responder | IHello | | | |------------------------------->| | | | | FIHello | | | |--------------->|--------------->| | | | | | RHello | | :<---------------| |<------------------------------------------------: | | : | | IIKeying : | |-------------------------------------------------:--------------->| | : | | : RIKeying | | :<---------------| |<------------------------------------------------: | | : | |<======================== S E S S I O N ========>:<==============>| Figure 14: Forwarder Handshake where Responder Sends an RHello Initiator Forwarder NAT Responder | IHello | | | |------------------------------->| | | | | FIHello | | | |--------------->|--------------->| | | | | | Redirect | | | (Implied,RD={})| | :<---------------| |<------------------------------------------------: | | : | | IHello : | |------------------------------------------------>:--------------->| | : | | : RHello | | :<---------------| |<------------------------------------------------: | | : | | IIKeying : | |------------------------------------------------>:--------------->| | : | | : RIKeying | | :<---------------| |<------------------------------------------------: | | : | |<======================== S E S S I O N ========>:<==============>| Figure 15: Forwarder Handshake where Responder Sends an Implied Redirect 3.5.1.6. Redirector and Forwarder with NAT +---+ +---+ +---+ +---+ +---+ | I | | N | | I | | N | | R | | n |------>| A |------>| n | | A | | e | | i | | T | | t |<====>| T |<====>| s | | t |<------| |<------| r | | | | p | | i | | | | o | | | | o | | a | | | +---+ | | | n | | t | | | | | | d | | o |<=====>| |<================>| |<====>| e | | r | | | | | | r | +---+ +---+ +---+ +---+ Figure 16: Introduction Service for Initiator and Responder behind NATs An initiator and responder might each be behind distinct NATs or firewalls that don't allow inbound packets to reach the respective endpoints until each first sends an outbound packet for a particular far-endpoint address. An introduction service comprising Redirector and Forwarder functions may facilitate direct communication between endpoints each behind a NAT. The responder is registered with the introduction service via an S_OPEN session to it. The service observes and records the responder's public NAT address as the DESTADDR of the S_OPEN session. The service MAY record other addresses for the responder, for example addresses that the responder self-reports as being directly attached. The initiator begins with an address of the introduction service as an initial candidate. The Redirector portion of the service sends to the initiator a Responder Redirect containing at least the responder's public NAT address as previously recorded. The Forwarder portion of the service sends to the responder a Forwarded IHello containing the initiator's public NAT address as observed to be the source of the IHello. The responder sends an RHello to the initiator's public NAT address in response to the FIHello. This will allow inbound packets to the responder through its NAT from the initiator's public NAT address. The initiator sends an IHello to the responder's public NAT address in response to the Responder Redirect. This will allow inbound packets to the initiator through its NAT from the responder's public NAT address. With transit paths created in both NATs, normal session startup can proceed. Initiator NAT-I Redirector+Forwarder NAT-R Responder | | | | | | IHello | | | | |(Dst=Intro) | | | | |-------------->| | | | | |--------------->| | | | | | FIHello | | | | |(RA=NAT-I-Pub) | | | | |--------------->|--------------->| | | Redirect | | | | | (RD={NAT-R-Pub,| | | | | ...})| | | |<--------------|<---------------| | | | | | RHello | | | | (Dst=NAT-I-Pub)| | | :<---------------| | | (*) <--------------------------: | | IHello | : | |(Dst=NAT-R-Pub)| : | |-------------->: : | | :-------------------------------->:--------------->| | : : | | : : RHello | | : :<---------------| |<--------------:<--------------------------------: | | : : | | IIKeying : : | |-------------->: : | | :-------------------------------->:--------------->| | : : | | : : RIKeying | | : :<---------------| |<--------------:<--------------------------------: | | : : | |<=============>:<======== S E S S I O N ========>:<==============>| Figure 17: Handshake with Redirector and Forwarder At the point in Figure 17 marked (*), the responder's RHello from the FIHello might arrive at the initiator's NAT before or after the initiator's IHello is sent outbound to the responder's public NAT address. If it arrives before, it may be dropped by the NAT. If it arrives after, it will transit the NAT and trigger keying without waiting for another round-trip time. The timing of this race depends, among other factors, on the relative distances of the initiator and responder from each other and from the introduction service. 3.5.1.7. Load Distribution and Fault Tolerance +---+ IHello/RHello +-------------+ | I |<------------------->| Responder 1 | | n | +-------------+ | i | SESSION +-------------+ | t |<=========>| Responder 2 | | i | +-------------+ | a | IHello... +----------------+ | t |-------------------------> X | Dead Responder | | o | +----------------+ | r | IHello/RHello +-------------+ | |<---------------->| Responder N | +---+ +-------------+ Figure 18: Parallel Open to Multiple Endpoints As specified in Section 3.2, more than one endpoint is allowed to be selected by one Endpoint Discriminator. This will typically be the case for a set of servers, any of which could accommodate a connecting client. As specified in Section 3.5.1.1.1, an initiator is allowed to use multiple candidate endpoint addresses when starting a session, and the sender of the first acceptable RHello chunk to be received is selected to complete the session, with later responses ignored. An initiator can start with the multiple candidate endpoint addresses, or it may learn them during startup from one or more Redirectors (Section 3.5.1.4). Parallel open to multiple endpoints for the same Endpoint Discriminator, combined with selection by earliest RHello, can be used for load distribution and fault tolerance. The cost at each endpoint that is not selected is limited to receiving and processing an IHello, and generating and sending an RHello. In one circumstance, multiple servers of similar processing and networking capacity may be located in near proximity to each other, such as in a data center. In this circumstance, a less heavily loaded server can respond to an IHello more quickly than more heavily loaded servers and will tend to be selected by a client. In another circumstance, multiple servers may be located in different physical locations, such as different data centers. In this circumstance, a server that is located nearer (in terms of network distance) to the client can respond earlier than more distant servers and will tend to be selected by the client. Multiple servers, in proximity or distant from one another, can form a redundant pool of servers. A client can perform a parallel open to the multiple servers. In normal operation, the multiple servers will all respond, and the client will select one of them as described above. If one of the multiple servers fails, other servers in the pool can still respond to the client, allowing the client to succeed to an S_OPEN session with one of them. 3.5.2. Congestion Control An RTMFP MUST implement congestion control and avoidance algorithms that are "TCP compatible", in accordance with Internet best current practice [RFC2914]. The algorithms SHOULD NOT be more aggressive in sending data than those described in "TCP Congestion Control" [RFC5681] and MUST NOT be more aggressive in sending data than the "slow start algorithm" described in Section 3.1 of RFC 5681. An endpoint maintains a transmission budget in the session information context of each S_OPEN session (Section 3.5), controlling the rate at which the endpoint sends data into the network. For window-based congestion control and avoidance algorithms, the transmission budget is the congestion window, which is the amount of user data that is allowed to be outstanding, or in flight, in the network. Transmission is allowed when S_OUTSTANDING_BYTES (Section 3.5) is less than the congestion window (Section 3.6.2.3). See Appendix A for an experimental window-based congestion control algorithm for real-time and bulk data. An endpoint avoids sending large bursts of data or packets into the network (Section 3.5.2.3). A sending endpoint increases and decreases its transmission budget in response to acknowledgements (Section 3.6.2.4) and loss according to the congestion control and avoidance algorithms. Loss is detected by negative acknowledgement (Section 3.6.2.5) and timeout (Section 3.6.2.6). Timeout is determined by the Effective Retransmission Timeout (ERTO) (Section 3.5.2.2). The ERTO is measured using the Timestamp and Timestamp Echo packet header fields (Section 2.2.4). A receiving endpoint acknowledges all received data (Section 3.6.3.4) to enable the sender to measure receipt of data, or lack thereof. A receiving endpoint may be receiving time critical (or real-time) data from a first sender while receiving data from other senders. The receiving endpoint can signal its other senders (Section 2.2.4) to cause them to decrease the aggressiveness of their congestion control and avoidance algorithms, in order to yield network capacity to the time critical data (Section 3.5.2.1). 3.5.2.1. Time Critical Reverse Notification A sender can increase its transmission budget at a rate compatible with (but not exceeding) the "slow start algorithm" specified in RFC 5681 (with which the transmission rate is doubled every round trip when beginning or restarting transmission, until loss is detected). However, a sender MUST behave as though the slow start threshold SSTHRESH is clamped to 0 (disabling the slow start algorithm's exponential increase behavior) on a session where a Time Critical Reverse Notification (Section 2.2.4) indication has been received from the far end within the last 800 milliseconds, unless the sender is itself currently sending time critical data to the far end. During each round trip, a sender SHOULD NOT increase the transmission budget by more than 0.5% or by 384 bytes per round trip (whichever is greater) on a session where a Time Critical Reverse Notification indication has been received from the far end within the last 800 milliseconds, unless the sender is itself currently sending time critical data to the far end. 3.5.2.2. Retransmission Timeout RTMFP uses the ERTO to detect when a user data fragment has been lost in the network. The ERTO is typically calculated in a manner similar to that specified in "Requirements for Internet Hosts - Communication Layers" [RFC1122] and is a function of round-trip time measurements and persistent timeout behavior. The ERTO SHOULD be at least 250 milliseconds and SHOULD allow for the receiver to delay sending an acknowledgement for up to 200 milliseconds (Section 3.6.3.4.4). The ERTO MUST NOT be less than the round-trip time. To facilitate round-trip time measurement, an endpoint MUST implement the Timestamp Echo facility: o On a session entering the S_OPEN state, initialize TS_RX_TIME to negative infinity, and initialize TS_RX and TS_ECHO_TX to have no value. o On receipt of a packet in an S_OPEN session with the timestampPresent (Section 2.2.4) flag set, if the timestamp field in the packet is different than TS_RX, set TS_RX to the value of the timestamp field in the packet, and set TS_RX_TIME to the current time. o When sending a packet to the far end in an S_OPEN session: 1. Calculate TS_RX_ELAPSED = current time - TS_RX_TIME. If TS_RX_ELAPSED is more than 128 seconds, then set TS_RX and TS_ECHO_TX to have no value, and do not include a timestamp echo; otherwise, 2. Calculate TS_RX_ELAPSED_TICKS to be the number of whole 4-millisecond periods in TS_RX_ELAPSED; then 3. Calculate TS_ECHO = (TS_RX + TS_RX_ELAPSED_TICKS) MODULO 65536; then 4. If TS_ECHO is not equal to TS_ECHO_TX, then set TS_ECHO_TX to TS_ECHO, set the timestampEchoPresent flag, and set the timestampEcho field to TS_ECHO_TX. The remainder of this section describes an OPTIONAL method for calculating the ERTO. Real-time applications and P2P mesh applications often require knowing the round-trip time and RTT variance. This section additionally describes a method for measuring the round-trip time and RTT variance, and calculating a smoothed round-trip time. Let the session information context contain additional variables: o TS_TX: the last timestamp sent to the far end, initialized to have no value; o TS_ECHO_RX: the last timestamp echo received from the far end, initialized to have no value; o SRTT: the smoothed round-trip time, initialized to have no value; o RTTVAR: the round-trip time variance, initialized to 0. Initialize MRTO to 250 milliseconds. Initialize ERTO to 3 seconds. On sending a packet to the far end of an S_OPEN session, if the current send timestamp is not equal to TS_TX, then set TS_TX to the current send timestamp, set the timestampPresent flag in the packet header, and set the timestamp field to TS_TX. On receipt of a packet from the far end of an S_OPEN session, if the timestampEchoPresent flag is set in the packet header, AND the timestampEcho field is not equal to TS_ECHO_RX, then: 1. Set TS_ECHO_RX to timestampEcho; 2. Calculate RTT_TICKS = (current send timestamp - timestampEcho) MODULO 65536; 3. If RTT_TICKS is greater than 32767, the measurement is invalid, so discard this measurement; otherwise, 4. Calculate RTT = RTT_TICKS * 4 milliseconds; 5. If SRTT has a value, then calculate new values of RTTVAR and SRTT: 1. RTT_DELTA = | SRTT - RTT |; 2. RTTVAR = ((3 * RTTVAR) + RTT_DELTA) / 4; 3. SRTT = ((7 * SRTT) + RTT) / 8. 6. If SRTT has no value, then set SRTT = RTT and RTTVAR = RTT / 2; 7. Set MRTO = SRTT + 4 * RTTVAR + 200 milliseconds; 8. Set ERTO to MRTO or 250 milliseconds, whichever is greater. A retransmission timeout occurs when the most recently transmitted user data fragment has remained outstanding in the network for ERTO. When this timeout occurs, increase ERTO on an exponential backoff with an ultimate backoff cap of 10 seconds: 1. Calculate ERTO_BACKOFF = ERTO * 1.4142; 2. Calculate ERTO_CAPPED to be ERTO_BACKOFF or 10 seconds, whichever is less; 3. Set ERTO to ERTO_CAPPED or MRTO, whichever is greater. 3.5.2.3. Burst Avoidance An application's sending patterns may cause the transmission budget to grow to a large value, but at times its sending patterns will result in a comparatively small amount of data outstanding in the network. In this circumstance, especially with a window-based congestion avoidance algorithm, if the application then has a large amount of new data to send (for example, a new bulk data transfer), it could send data into the network all at once to fill the window. This kind of transmission burst is undesirable, however, because it can jam interfaces, links, and buffers. Accordingly, in any session, an endpoint SHOULD NOT send more than six packets containing user data between receiving any acknowledgements or retransmission timeouts. The following describes an OPTIONAL method to avoid bursting large numbers of packets into the network: Let the session information context contain an additional variable DATA_PACKET_COUNT, initialized to 0. Transmission of a user data fragment on this session is not allowed if DATA_PACKET_COUNT is greater than or equal to 6, regardless of any other allowance of the congestion control algorithm. On transmission of a packet containing at least one User Data chunk (Section 2.3.11), set DATA_PACKET_COUNT = DATA_PACKET_COUNT + 1. On receipt of an acknowledgement chunk (Sections 2.3.13 and 2.3.14), set DATA_PACKET_COUNT to 0. On a retransmission timeout, set DATA_PACKET_COUNT to 0. 3.5.3. Address Mobility Sessions are demultiplexed with a 32-bit session ID, rather than by endpoint address. This allows an endpoint's address to change during an S_OPEN session. This can happen, for example, when switching from a wireless to a wired network, or when moving from one wireless base station to another, or when a NAT restarts. If the near end receives a valid packet for an S_OPEN session from a source address that doesn't match DESTADDR, the far end might have changed addresses. The near end SHOULD verify that the far end is definitively at the new address before changing DESTADDR. A suggested verification method is described in Section 3.5.4.2. 3.5.4. Ping If an endpoint receives a Ping chunk (Section 2.3.9) in a session in the S_OPEN state, it SHOULD construct and send a Ping Reply chunk (Section 2.3.10) in response if possible, copying the message unaltered. The Ping Reply SHOULD be sent as quickly as possible following receipt of a Ping. The semantics of a Ping's message is reserved for the sender; a receiver SHOULD NOT interpret the Ping's message. Endpoints can use the mechanism of the Ping chunk and the expected Ping Reply for any purpose. This specification doesn't mandate any specific constraints on the format or semantics of a Ping message. A Ping Reply MUST be sent only as a response to a Ping. Receipt of a Ping Reply implies live bidirectional connectivity. This specification doesn't mandate any other semantics for a Ping Reply. 3.5.4.1. Keepalive An endpoint can use a Ping to test for live bidirectional connectivity, to test that the far end of a session is still in the S_OPEN state, to keep NAT translations alive, and to keep firewall holes open. An endpoint can use a Ping to hasten detection of a near-end address change by the far end. An endpoint may declare a session to be defunct and dead after a persistent failure by the far end to return Ping Replies in response to Pings. If used for these purposes, a Keepalive Ping SHOULD have an empty message. A Keepalive Ping SHOULD NOT be sent more often than once per ERTO. If a corresponding Ping Reply is not received within ERTO of sending the Ping, ERTO SHOULD be increased according to Section 3.5.2 ("Congestion Control"). 3.5.4.2. Address Mobility This section describes an OPTIONAL but suggested method for processing and verifying a far-end address change. Let the session context contain additional variables MOB_TX_TS, MOB_RX_TS, and MOB_SECRET. MOB_TX_TS and MOB_RX_TS have initial values of negative infinity. MOB_SECRET should be a cryptographically pseudorandom value not less than 128 bits in length and known only to this end. On receipt of a packet for an S_OPEN session, after processing all chunks in the packet: if the session is still in the S_OPEN state, AND the source address of the packet does not match DESTADDR, AND MOB_TX_TS is at least one second in the past, then: 1. Set MOB_TX_TS to the current time; 2. Construct a Ping message comprising the following: a marking to indicate (to this end when returned in a Ping Reply) that it is a mobility check (for example, the first byte being ASCII 'M' for "Mobility"), a timestamp set to MOB_TX_TS, and a cryptographic hash over the following: the preceding items, the address from which the packet was received, and MOB_SECRET; and 3. Send this Ping to the address from which the packet was received, instead of DESTADDR. On receipt of a Ping Reply in an S_OPEN session, if the Ping Reply's message satisfies all of these conditions: o it has this end's expected marking to indicate that it is a mobility check, and o the timestamp in the message is not more than 120 seconds in the past, and o the timestamp in the message is greater than MOB_RX_TS, and o the cryptographic hash matches the expected value according to the contents of the message plus the source address of the packet containing this Ping Reply and MOB_SECRET, then: 1. Set MOB_RX_TS to the timestamp in the message; and 2. Set DESTADDR to the source address of the packet containing this Ping Reply. 3.5.4.3. Path MTU Discovery "Packetization Layer Path MTU Discovery" [RFC4821] describes a method for measuring the path MTU between communicating endpoints. An RTMFP SHOULD perform path MTU discovery. The method described in RFC 4821 can be adapted for use in RTMFP by sending a probe packet comprising one of the Padding chunk types (type 0x00 or 0xff) and a Ping. The Ping chunk SHOULD come after the Padding chunk, to guard against a false positive response in case the probe packet is truncated. 3.5.5. Close An endpoint may close a session at any time. Typically, an endpoint will close a session when there have been no open flows in either direction for a time. In another circumstance, an endpoint may be ceasing operation and will close all of its sessions even if they have open flows. To close an S_OPEN session in a reliable and orderly fashion, an endpoint moves the session to the S_NEARCLOSE state. On a session transitioning from S_OPEN to S_NEARCLOSE and every 5 seconds thereafter while still in the S_NEARCLOSE state, send a Session Close Request chunk (Section 2.3.17). A session that has been in the S_NEARCLOSE state for at least 90 seconds (allowing time to retransmit the Session Close Request multiple times) SHOULD move to the S_CLOSED state. On a session transitioning from S_OPEN to the S_NEARCLOSE, S_FARCLOSE_LINGER or S_CLOSED state, immediately abort and terminate all open or closing flows. Flows only exist in S_OPEN sessions. To close an S_OPEN session abruptly, send a Session Close Acknowledgement chunk (Section 2.3.18), then move to the S_CLOSED state. On receipt of a Session Close Request chunk for a session in the S_OPEN, S_NEARCLOSE, or S_FARCLOSE_LINGER states, send a Session Close Acknowledgement chunk; then, if the session is in the S_OPEN state, move to the S_FARCLOSE_LINGER state. A session that has been in the S_FARCLOSE_LINGER state for at least 19 seconds (allowing time to answer 3 retransmissions of a Session Close Request) SHOULD move to the S_CLOSED state. On receipt of a Session Close Acknowledgement chunk for a session in the S_OPEN, S_NEARCLOSE, or S_FARCLOSE_LINGER states, move to the S_CLOSED state. 3.6. Flows A flow is a unidirectional communication channel in a session for transporting a correlated series of user messages from a sender to a receiver. Each end of a session may have zero or more sending flows to the other end. Each sending flow at one end has a corresponding receiving flow at the other end. 3.6.1. Overview 3.6.1.1. Identity Flows are multiplexed in a session by a flow identifier. Each end of a session chooses its sending flow identifiers independently of the other end. The choice of similar flow identifiers by both ends does not imply an association. A sender MAY choose any identifier for any flow; therefore, a flow receiver MUST NOT ascribe any semantic meaning, role, or name to a flow based only on its identifier. There are no "well known" or reserved flow identifiers. Bidirectional flow association is indicated at flow startup with the Return Flow Association option (Section 2.3.11.1.2). An endpoint can indicate that a new sending flow is in return (or response) to a receiving flow from the other end. A sending flow MUST NOT indicate more than one return association. A receiving flow can be specified as the return association for any number of sending flows. The return flow association, if any, is fixed for the lifetime of the sending flow. Note: Closure of one flow in an association does not automatically close other flows in the association, except as specified in Section 3.6.3.1. Flows are named with arbitrary user metadata. This specification doesn't mandate any particular encoding, syntax, or semantics for the user metadata, except for the encoded size (Section 2.3.11.1.1); the user metadata is entirely reserved for the application. The user metadata is fixed for the lifetime of the flow. 3.6.1.2. Messages and Sequencing Flows provide message-oriented framing. Large messages are fragmented for transport in the network. Receivers reassemble fragmented messages and only present complete messages to the user. A sender queues messages on a sending flow one after another. A receiver can recover the original queuing order of the messages, even when they are reordered in transit by the network or as a result of loss and retransmission, by means of the messages' fragment sequence numbers. Flows are the basic units of message sequencing; each flow is sequenced independently of all other flows; inter-flow message arrival and delivery sequencing are not guaranteed. Independent flow sequencing allows a sender to prioritize the transmission or retransmission of the messages of one flow over those of other flows in a session, allocating capacity from the transmission budget according to priority. RTMFP is designed for flows to be the basic unit of prioritization. In any flow, fragment sequence numbers are unique and monotonically increasing; that is, the fragment sequence numbers for any message MUST be greater than the fragment sequence numbers of all messages previously queued in that flow. Receipt of fragments out of sequence number order within a flow creates discontiguous gaps at the receiver, causing it to send an acknowledgement for every packet and also causing the size of the encoded acknowledgements to grow. Therefore, for any flow, the sender SHOULD send lower sequence numbers first. A sender can abandon a queued message at any time, even if some fragments of that message have been received by the other end. A receiver MUST be able to detect a gap in the flow when a message is abandoned; therefore, each message SHOULD take at least one sequence number from the sequence space even if no fragments for that message are ever sent. The sender will transmit the fragments of all messages not abandoned, and retransmit any lost fragments of all messages not abandoned, until all the fragments of all messages not abandoned are acknowledged by the receiver. A sender indicates a Forward Sequence Number (FSN) to instruct the receiver that sequence numbers less than or equal to the FSN will not be transmitted or retransmitted. This allows the receiver to move forward over gaps and continue sequenced delivery of completely received messages to the user. Any incomplete messages missing fragments with sequence numbers less than or equal to the FSN were abandoned by the sender and will never be completed. A gap indication MUST be communicated to the receiving user. 3.6.1.3. Lifetime A sender begins a flow by sending user message fragments to the other end, and including the user metadata and, if any, the return flow association. The sender continues to include the user metadata and return flow association until the flow is acknowledged by the far end, at which point the sender knows that the receiver has received the user metadata and, if any, the return flow association. After that point, the flow identifier alone is sufficient. Flow receivers SHOULD acknowledge all sequence numbers received for any flow, whether the flow is accepted or rejected. Flow receivers MUST NOT acknowledge sequence numbers higher than the FSN that were not received. Acknowledgements drive the congestion control and avoidance algorithms and therefore must be accurate. An endpoint can reject a receiving flow at any time in the flow's lifetime. To reject the flow, the receiving endpoint sends a Flow Exception Report chunk (Section 2.3.16) immediately preceding every acknowledgement chunk for the rejected receiving flow. An endpoint may eventually conclude and close a sending flow. The last sequence number of the flow is marked with the Final flag. The sending flow is complete when all sequence numbers of the flow, including the final sequence number, have been cumulatively acknowledged by the receiver. The receiving flow is complete when every sequence number from the FSN to the final sequence number has been received. The sending flow and corresponding receiving flow at the respective ends hold the flow identifier of a completed flow in reserve for a time to allow delayed or duplicated fragments and acknowledgements to drain from the network without erroneously initiating a new receiving flow or erroneously acknowledging a new sending flow. If a flow sender receives a Flow Exception indication from the other end, the flow sender SHOULD close the flow and abandon all of the undelivered queued messages. The flow sender SHOULD indicate an exception to the user. 3.6.2. Sender Each sending flow comprises the flow-specific information context necessary to transfer that flow's messages to the other end. Each sending flow context includes at least: o F_FLOW_ID: this flow's identifier; o STARTUP_OPTIONS: the set of options to send to the receiver until this flow is acknowledged, including the User's Per-Flow Metadata and, if set, the Return Flow Association; o SEND_QUEUE: the unacknowledged message fragments queued in this flow, initially empty; each message fragment entry comprising the following: * SEQUENCE_NUMBER: the sequence number of this fragment; * DATA: this fragment's user data; * FRA: the fragment control value for this message fragment, having one of the values enumerated for that purpose in Section 2.3.11 ("User Data Chunk"); * ABANDONED: a boolean flag indicating whether this fragment has been abandoned; * SENT_ABANDONED: a boolean flag indicating whether this fragment was abandoned when sent; * EVER_SENT: a boolean flag indicating whether this fragment has been sent at least once, initially false; * NAK_COUNT: a count of the number of negative acknowledgements detected for this fragment, initially 0; * IN_FLIGHT: a boolean flag indicating whether this fragment is currently outstanding, or in flight, in the network, initially false; * TRANSMIT_SIZE: the size, in bytes, of the encoded User Data chunk (including the chunk header) for this fragment when it was transmitted into the network. o F_OUTSTANDING_BYTES: the sum of the TRANSMIT_SIZE of each entry in SEND_QUEUE where entry.IN_FLIGHT is true; o RX_BUFFER_SIZE: the most recent available buffer advertisement from the other end (Sections 2.3.13 and 2.3.14), initially 65536 bytes; o NEXT_SN: the next sequence number to assign to a message fragment, initially 1; o F_FINAL_SN: the sequence number assigned to the final message fragment of the flow, initially having no value; o EXCEPTION: a boolean flag indicating whether an exception has been reported by the receiver, initially false; o The state, at any time being one of the following values: the open state F_OPEN; the closing states F_CLOSING and F_COMPLETE_LINGER; and the closed state F_CLOSED. Note: The following diagram is only a summary of state transitions and their causing events, and is not a complete operational specification. +--------+ | F_OPEN | +--------+ |CLOSE or |rcv Flow Exception | v +---------+ |F_CLOSING| +---------+ |rcv Data Ack | 0..F_FINAL_SN v +-----------------+ |F_COMPLETE_LINGER| +-----------------+ | 130 seconds v +--------+ |F_CLOSED| +--------+ Figure 19: Sending Flow State Diagram 3.6.2.1. Startup The application opens a new sending flow to the other end in an S_OPEN session. The implementation chooses a new flow ID that is not assigned to any other sending flow in that session in the F_OPEN, F_CLOSING, or F_COMPLETE_LINGER states. The flow starts in the F_OPEN state. The STARTUP_OPTIONS for the new flow is set with the User's Per-Flow Metadata (Section 2.3.11.1.1). If this flow is in return (or response) to a receiving flow from the other end, that flow's ID is encoded in a Return Flow Association (Section 2.3.11.1.2) option and added to STARTUP_OPTIONS. A new sending flow SHOULD NOT be opened in response to a receiving flow from the other end that is not in the RF_OPEN state when the sending flow is opened. At this point, the flow exists in the sender but not in the receiver. The flow begins when user data fragments are transmitted to the receiver. A sender can begin a flow in the absence of immediate user data by sending a Forward Sequence Number Update (Section 3.6.2.7.1), by queuing and transmitting a user data fragment that is already abandoned. 3.6.2.2. Queuing Data The application queues messages in an F_OPEN sending flow for transmission to the far end. The implementation divides each message into one or more fragments for transmission in User Data chunks (Section 2.3.11). Each fragment MUST be small enough so that, if assembled into a packet (Section 2.2.4) with a maximum-size common header, User Data chunk header, and, if not empty, this flow's STARTUP_OPTIONS, the packet will not exceed the path MTU (Section 3.5.4.3). For each fragment, create a fragment entry and set fragmentEntry.SEQUENCE_NUMBER to flow.NEXT_SN, and increment flow.NEXT_SN by one. Set fragmentEntry.FRA according to the encoding in User Data chunks: 0: This fragment is a complete message. 1: This fragment is the first of a multi-fragment message. 2: This fragment is the last of a multi-fragment message. 3: This fragment is in the middle of a multi-fragment message. Append fragmentEntry to flow.SEND_QUEUE. 3.6.2.3. Sending Data A sending flow is ready to transmit if the SEND_QUEUE contains at least one entry that is eligible to send, and if either RX_BUFFER_SIZE is greater than F_OUTSTANDING_BYTES or EXCEPTION is set to true. A SEND_QUEUE entry is eligible to send if it is not IN_FLIGHT, AND at least one of the following conditions holds: o The entry is not ABANDONED; or o The entry is the first one in the SEND_QUEUE; or o The entry's SEQUENCE_NUMBER is equal to flow.F_FINAL_SN. If the session's transmission budget allows, a flow that is ready to transmit is selected for transmission according to the implementation's prioritization scheme. The manner of flow prioritization is not mandated by this specification. Trim abandoned messages from the front of the queue, and find the Forward Sequence Number (FSN): 1. While the SEND_QUEUE contains at least two entries, AND the first entry is not IN_FLIGHT, AND the first entry is ABANDONED, remove and discard the first entry from the SEND_QUEUE; 2. If the first entry in the SEND_QUEUE is not abandoned, set FSN to entry.SEQUENCE_NUMBER - 1; otherwise, 3. If the first entry in the SEND_QUEUE is IN_FLIGHT, AND entry.SENT_ABANDONED is false, set FSN to entry.SEQUENCE_NUMBER - 1; otherwise, 4. The first entry in the SEND_QUEUE is abandoned and either is not IN_FLIGHT or was already abandoned when sent; set FSN to entry.SEQUENCE_NUMBER. The FSN MUST NOT be greater than any sequence number currently outstanding. The FSN MUST NOT be equal to any sequence number currently outstanding that was not abandoned when sent. Assemble user data chunks for this flow into a packet to send to the receiver. While enough space remains in the packet and the flow is ready to transmit: 1. Starting at the head of the SEND_QUEUE, find the first eligible fragment entry; 2. Encode the entry into a User Data chunk (Section 2.3.11) or, if possible (Section 3.6.2.3.2), a Next User Data chunk (Section 2.3.12); 3. If present, set chunk.flowID to flow.F_FLOW_ID; 4. If present, set chunk.sequenceNumber to entry.SEQUENCE_NUMBER; 5. If present, set chunk.fsnOffset to entry.SEQUENCE_NUMBER - FSN; 6. Set chunk.fragmentControl to entry.FRA; 7. Set chunk.abandon to entry.ABANDONED; 8. If entry.SEQUENCE_NUMBER equals flow.F_FINAL_SN, set chunk.final to true; else set chunk.final to false; 9. If any options are being sent with this chunk, set chunk.optionsPresent to true, assemble the options into the chunk, and assemble a Marker to terminate the option list; 10. If entry.ABANDONED is true, set chunk.userData to empty; otherwise, set chunk.userData to entry.DATA; 11. If adding the assembled chunk to the packet would cause the packet to exceed the path MTU, do not assemble this chunk into the packet; enough space no longer remains in the packet; stop. Otherwise, continue: 12. Set entry.IN_FLIGHT to true; 13. Set entry.EVER_SENT to true; 14. Set entry.NAK_COUNT to 0; 15. Set entry.SENT_ABANDONED to entry.ABANDONED; 16. Set entry.TRANSMIT_SIZE to the size of the assembled chunk, including the chunk header; 17. Assemble this chunk into the packet; and 18. If this flow or entry is considered Time Critical (real-time), set the timeCritical flag in the packet header (Section 2.2.4). Complete any other appropriate packet processing, and transmit the packet to the far end. 3.6.2.3.1. Startup Options If STARTUP_OPTIONS is not empty, then when assembling the FIRST User Data chunk for this flow into a packet, add the encoded STARTUP_OPTIONS to that chunk's option list. 3.6.2.3.2. Send Next Data The Next User Data chunk (Section 2.3.12) is a compact encoding for a user message fragment when multiple contiguous fragments are assembled into one packet. Using this chunk where possible can conserve space in a packet, potentially reducing transmission overhead or allowing additional information to be sent in a packet. If, after assembling a user message fragment of a flow into a packet (Section 3.6.2.3), the next eligible fragment to be selected for assembly into that packet belongs to the same flow, AND its sequence number is one greater than that of the fragment just assembled, it is RECOMMENDED that an implementation encode a Next User Data chunk instead of a User Data chunk. The FIRST fragment of a flow assembled into a packet MUST be encoded as a User Data chunk. 3.6.2.4. Processing Acknowledgements A Data Acknowledgement Bitmap chunk (Section 2.3.13) or a Data Acknowledgement Ranges chunk (Section 2.3.14) encodes the acknowledgement of receipt of one or more sequence numbers of a flow, as well as the receiver's current receive window advertisement. On receipt of an acknowledgement chunk for a sending flow: 1. Set PRE_ACK_OUTSTANDING_BYTES to flow.F_OUTSTANDING_BYTES; 2. Set flow.STARTUP_OPTIONS to empty; 3. Set flow.RX_BUFFER_SIZE to chunk.bufferBytesAvailable; 4. For each sequence number encoded in the acknowledgement, if there is an entry in flow.SEND_QUEUE with that sequence number and its IN_FLIGHT is true, then remove the entry from flow.SEND_QUEUE; and 5. Notify the congestion control and avoidance algorithms that PRE_ACK_OUTSTANDING_BYTES - flow.F_OUTSTANDING_BYTES were acknowledged. Note that negative acknowledgements (Section 3.6.2.5) affect "TCP friendly" congestion control. 3.6.2.5. Negative Acknowledgement and Loss A negative acknowledgement is inferred for an outstanding fragment if an acknowledgement is received for any other fragments sent after it in the same session. An implementation SHOULD consider a fragment to be lost once that fragment receives three negative acknowledgements. A lost fragment is no longer outstanding in the network. The following describes an OPTIONAL method for detecting negative acknowledgements. Let the session track the order in which fragments are transmitted across all its sending flows by way of a monotonically increasing Transmission Sequence Number (TSN) recorded with each fragment queue entry each time that fragment is transmitted. Let the session information context contain additional variables: o NEXT_TSN: the next TSN to record with a fragment's queue entry when it is transmitted, initially 1; o MAX_TSN_ACK: the highest acknowledged TSN, initially 0. Let each fragment queue entry contain an additional variable TSN, initially 0, to track its transmission order. On transmission of a message fragment into the network, set its entry.TSN to session.NEXT_TSN, and increment session.NEXT_TSN. On acknowledgement of an outstanding fragment, if its entry.TSN is greater than session.MAX_TSN_ACK, set session.MAX_TSN_ACK to entry.TSN. After processing all acknowledgements in a packet containing at least one acknowledgement, then for each sending flow in that session, for each entry in that flow's SEND_QUEUE, if entry.IN_FLIGHT is true and entry.TSN is less than session.MAX_TSN_ACK, increment entry.NAK_COUNT and notify the congestion control and avoidance algorithms that a negative acknowledgement was detected in this packet. For each sending flow in that session, for each entry in that flow's SEND_QUEUE, if entry.IN_FLIGHT is true and entry.NAK_COUNT is at least 3, that fragment was lost in the network and is no longer considered to be in flight. Set entry.IN_FLIGHT to false. Notify the congestion control and avoidance algorithms of the loss. 3.6.2.6. Timeout A fragment is considered lost and no longer in flight in the network if it has remained outstanding for at least ERTO. The following describes an OPTIONAL method to manage transmission timeouts. This method REQUIRES that either burst avoidance (Section 3.5.2.3) is implemented or the implementation's congestion control and avoidance algorithms will eventually stop sending new fragments into the network if acknowledgements are persistently not received. Let the session information context contain an alarm TIMEOUT_ALARM, initially unset. On sending a packet containing at least one User Data chunk, set or reset TIMEOUT_ALARM to fire in ERTO. On receiving a packet containing at least one acknowledgement, reset TIMEOUT_ALARM (if already set) to fire in ERTO. When TIMEOUT_ALARM fires: 1. Set WAS_LOSS = false; 2. For each sending flow in the session, and for each entry in that flow's SEND_QUEUE: 1. If entry.IN_FLIGHT is true, set WAS_LOSS = true; and 2. Set entry.IN_FLIGHT to false. 3. If WAS_LOSS is true, perform ERTO backoff (Section 3.5.2.2); and 4. Notify the congestion control and avoidance algorithms of the timeout and, if WAS_LOSS is true, that there was loss. 3.6.2.7. Abandoning Data The application can abandon queued messages at any time and for any reason. Example reasons include (but are not limited to) the following: one or more fragments of a message have remained in the SEND_QUEUE for longer than a specified message lifetime; a fragment has been retransmitted more than a specified retransmission limit; a prior message on which this message depends (such as a key frame in a prediction chain) was abandoned and not delivered. To abandon a message fragment, set its SEND_QUEUE entry's ABANDON flag to true. When abandoning a message fragment, abandon all fragments of the message to which it belongs. An abandoned fragment MUST NOT be un-abandoned. 3.6.2.7.1. Forward Sequence Number Update Abandoned data may leave gaps in the sequence number space of a flow. Gaps may cause the receiver to hold completely received messages for ordered delivery to allow for retransmission of the missing fragments. User Data chunks (Section 2.3.11) encode a Forward Sequence Number (FSN) to instruct the receiver that fragments with sequence numbers less than or equal to the FSN will not be transmitted or retransmitted. When the receiver has gaps in the received sequence number space and no non-abandoned message fragments remain in the SEND_QUEUE, the sender SHOULD transmit a Forward Sequence Number Update (FSN Update) comprising a User Data chunk marked abandoned, whose sequence number is the FSN and whose fsnOffset is 0. An FSN Update allows the receiver to skip gaps that will not be repaired and deliver received messages to the user. An FSN Update may be thought of as a transmission or retransmission of abandoned sequence numbers without actually sending the data. The method described in Section 3.6.2.3 ("Sending Data") generates FSN Updates when appropriate. 3.6.2.8. Examples Sender | : 1 |<--- Ack ID=2, seq:0-16 2 |---> Data ID=2, seq#=25, fsnOff=9 (fsn=16) 3 |---> Data ID=2, seq#=26, fsnOff=10 (fsn=16) 4 |<--- Ack ID=2, seq:0-18 5 |---> Data ID=2, seq#=27, fsnOff=9 (fsn=18) 6 |---> Data ID=2, seq#=28, fsnOff=10 (fsn=18) | : There are 9 sequence numbers in flight with delayed acknowledgements. Figure 20: Normal Flow with No Loss Sender | : 1 |<--- Ack ID=3, seq:0-30 2 |---> Data ID=3, seq#=45, fsnOff=15 (fsn=30) 3 |<--- Ack ID=3, seq:0-30, 32 (nack 31:1) 4 |---> Data ID=3, seq#=46, fsnOff=16 (fsn=30) 5 |<--- Ack ID=3, seq:0-30, 32, 34 (nack 31:2, 33:1) 6 |<--- Ack ID=3, seq:0-30, 32, 34-35 (nack 31:3=lost, 33:2) 7 |---> Data ID=3, seq#=47, fsnOff=15 (fsn=32, abandon 31) 8 |<--- Ack ID=3, seq:0-30, 32, 34-36 (nack 33:3=lost) 9 |---> Data ID=3, seq#=33, fsnOff=1 (fsn=32, retransmit 33) 10 |<--- Ack ID=3, seq:0-30, 32, 34-37 11 |---> Data ID=3, seq#=48, fsnOff=16 (fsn=32) | : | (continues through seq#=59) | : 12 |---> Data ID=3, seq#=60, fsnOff=28(fsn=32) 13 |<--- Ack ID=3, seq:0-30, 34-46 14 |---> Data ID=3, seq#=61, fsnOff=29 (fsn=32) 15 |<--- Ack ID=3, seq:0-32, 34-47 16 |---> Data ID=3, seq#=62, fsnOff=30 (fsn=32) 17 |<--- Ack ID=3, seq:0-47 18 |---> Data ID=3, seq#=63, fsnOff=16 (fsn=47) 19 |<--- Ack ID=3, seq:0-49 20 |---> Data ID=3, seq#=64, fsnOff=15 (fsn=49) | : 21 |<--- Ack ID=3, seq:0-59 22 |<--- Ack ID=3, seq:0-59, 61 (nack 60:1) 23 |<--- Ack ID=3, seq:0-59, 61-62 (nack 60:2) 24 |<--- Ack ID=3, seq:0-59, 61-63 (nack 60:3=lost) 25 |---> Data ID=3, ABN=1, seq#=60, fsnOff=0 (fsn=60, abandon 60) 26 |<--- Ack ID=3, seq:0-59, 61-64 | : 27 |<--- Ack ID=3, seq:0-64 Flow with sequence numbers 31, 33, and 60 lost in transit, and a pause at 64. 33 is retransmitted; 31 and 60 are abandoned. Note that line 25 is a Forward Sequence Number Update (Section 3.6.2.7.1). Figure 21: Flow with Loss 3.6.2.9. Flow Control The flow receiver advertises the amount of new data it's willing to accept from the flow sender with the bufferBytesAvailable derived field of an acknowledgement (Sections 2.3.13 and 2.3.14). The flow sender MUST NOT send new data into the network if flow.F_OUTSTANDING_BYTES is greater than or equal to the most recently received buffer advertisement, unless flow.EXCEPTION is true (Section 3.6.2.3). 3.6.2.9.1. Buffer Probe The flow sender is suspended if the most recently received buffer advertisement is zero and the flow hasn't been rejected by the receiver -- that is, while RX_BUFFER_SIZE is zero AND EXCEPTION is false. To guard against potentially lost acknowledgements that might reopen the receive window, a suspended flow sender SHOULD send a packet comprising a Buffer Probe chunk (Section 2.3.15) for this flow from time to time. If the receive window advertisement transitions from non-zero to zero, the flow sender MAY send a Buffer Probe immediately and SHOULD send a probe within one second. The initial period between Buffer Probes SHOULD be at least one second or ERTO, whichever is greater. The period between probes SHOULD increase over time, but the period between probes SHOULD NOT be more than one minute or ERTO, whichever is greater. The flow sender SHOULD stop sending Buffer Probes if it is no longer suspended. 3.6.2.10. Exception The flow receiver can reject the flow at any time and for any reason. The flow receiver sends a Flow Exception Report (Section 2.3.16) when it has rejected a flow. On receiving a Flow Exception Report for a sending flow: 1. If the flow is F_OPEN, close the flow (Section 3.6.2.11) and notify the user that the far end reported an exception with the encoded exception code; 2. Set the EXCEPTION flag to true; and 3. For each entry in SEND_QUEUE, set entry.ABANDONED = true. 3.6.2.11. Close A sending flow is closed by the user or as a result of an exception. To close an F_OPEN flow: 1. Move to the F_CLOSING state; 2. If the SEND_QUEUE is not empty, AND the tail entry of the SEND_QUEUE has a sequence number of NEXT_SN - 1, AND the tail entry.EVER_SENT is false, set F_FINAL_SN to entry.SEQUENCE_NUMBER; else 3. The SEND_QUEUE is empty, OR the tail entry does not have a sequence number of NEXT_SN - 1, OR the tail entry.EVER_SENT is true: enqueue a new SEND_QUEUE entry with entry.SEQUENCE_NUMBER = flow.NEXT_SN, entry.FRA = 0, and entry.ABANDONED = true, and set flow.F_FINAL_SN to entry.SEQUENCE_NUMBER. An F_CLOSING sending flow is complete when its SEND_QUEUE transitions to empty, indicating that all sequence numbers, including the FINAL_SN, have been acknowledged by the other end. When an F_CLOSING sending flow becomes complete, move to the F_COMPLETE_LINGER state. A sending flow MUST remain in the F_COMPLETE_LINGER state for at least 130 seconds. After at least 130 seconds, move to the F_CLOSED state. The sending flow is now closed, its resources can be reclaimed, and its F_FLOW_ID MAY be used for a new sending flow. 3.6.3. Receiver Each receiving flow comprises the flow-specific information context necessary to receive that flow's messages from the sending end and deliver completed messages to the user. Each receiving flow context includes at least: o RF_FLOW_ID: this flow's identifier; o SEQUENCE_SET: the set of all fragment sequence numbers seen in this receiving flow, whether received or abandoned, initially empty; o RF_FINAL_SN: the final fragment sequence number of the flow, initially having no value; o RECV_BUFFER: the message fragments waiting to be delivered to the user, sorted by sequence number in ascending order, initially empty; each message fragment entry comprising the following: * SEQUENCE_NUMBER: the sequence number of this fragment; * DATA: this fragment's user data; and * FRA: the fragment control value for this message fragment, having one of the values enumerated for that purpose in Section 2.3.11 ("User Data Chunk"). o BUFFERED_SIZE: the sum of the lengths of each fragment in RECV_BUFFER plus any additional storage overhead for the fragments incurred by the implementation, in bytes; o BUFFER_CAPACITY: the desired maximum size for the receive buffer, in bytes; o PREV_RWND: the most recent receive window advertisement sent in an acknowledgement, in 1024-byte blocks, initially having no value; o SHOULD_ACK: whether or not an acknowledgement should be sent for this flow, initially false; o EXCEPTION_CODE: the exception code to report to the sender when the flow has been rejected, initially 0; o The state, at any time being one of the following values: the open state RF_OPEN; the closing states RF_REJECTED and RF_COMPLETE_LINGER; and the closed state RF_CLOSED. Note: The following diagram is only a summary of state transitions and their causing events, and is not a complete operational specification. +-+ |X| +-+ |rcv User Data for | no existing flow v +---------+ | RF_OPEN | +---------+ rcv all sequence numbers| |user reject, 0..RF_FINAL_SN | |rcv bad option, | |no metadata at open, | |association specified | | but not F_OPEN at open +---+ | | v | +-----------+ | |RF_REJECTED| | +-----------+ | |rcv all sequence numbers | | 0..RF_FINAL_SN v v +------------------+ |RF_COMPLETE_LINGER| +------------------+ | 120 seconds v +---------+ |RF_CLOSED| +---------+ Figure 22: Receiving Flow State Diagram 3.6.3.1. Startup A new receiving flow starts on receipt of a User Data chunk (Section 2.3.11) encoding a flow ID not belonging to any other receiving flow in the same session in the RF_OPEN, RF_REJECTED, or RF_COMPLETE_LINGER states. On receipt of such a User Data chunk: 1. Set temporary variables METADATA, ASSOCIATED_FLOWID, and ASSOCIATION to each have no value; 2. Create a new receiving flow context in this session, setting its RF_FLOW_ID to the flow ID encoded in the opening User Data chunk, and set to the RF_OPEN state; 3. If the opening User Data chunk encodes a User's Per-Flow Metadata option (Section 2.3.11.1.1), set METADATA to option.userMetadata; 4. If the opening User Data chunk encodes a Return Flow Association option (Section 2.3.11.1.2), set ASSOCIATED_FLOWID to option.flowID; 5. If METADATA has no value, the receiver MUST reject the flow (Section 3.6.3.7), moving it to the RF_REJECTED state; 6. If ASSOCIATED_FLOWID has a value, then if there is no sending flow in the same session with a flow ID of ASSOCIATED_FLOWID, the receiver MUST reject the flow, moving it to the RF_REJECTED state; otherwise, set ASSOCIATION to the indicated sending flow; 7. If ASSOCIATION indicates a sending flow, AND that sending flow's state is not F_OPEN, the receiver MUST reject this receiving flow, moving it to the RF_REJECTED state; 8. If the opening User Data chunk encodes any unrecognized option with a type code less than 8192 (Section 2.3.11.1), the receiver MUST reject the flow, moving it to the RF_REJECTED state; 9. If this new receiving flow is still RF_OPEN, then notify the user that a new receiving flow has opened, including the METADATA and, if present, the ASSOCIATION, and set flow.BUFFER_CAPACITY according to the user; 10. Perform the normal data processing (Section 3.6.3.2) for the opening User Data chunk; and 11. Set this session's ACK_NOW to true. 3.6.3.2. Receiving Data A User Data chunk (Section 2.3.11) or a Next User Data chunk (Section 2.3.12) encodes one fragment of a user data message of a flow, as well as the flow's Forward Sequence Number and potentially optional parameters (Section 2.3.11.1). On receipt of a User Data or Next User Data chunk: 1. If chunk.flowID doesn't indicate an existing receiving flow in the same session in the RF_OPEN, RF_REJECTED, or RF_COMPLETE_LINGER state, perform the steps of Section 3.6.3.1 ("Startup") to start a new receiving flow; 2. Retrieve the receiving flow context for the flow indicated by chunk.flowID; 3. Set flow.SHOULD_ACK to true; 4. If the flow is RF_OPEN, AND the chunk encodes any unrecognized option with a type code less than 8192 (Section 2.3.11.1), the flow MUST be rejected: notify the user of an exception, and reject the flow (Section 3.6.3.7), moving it to the RF_REJECTED state; 5. If the flow is not in the RF_OPEN state, set session.ACK_NOW to true; 6. If flow.PREV_RWND has a value and that value is less than 2 blocks, set session.ACK_NOW to true; 7. If chunk.abandon is true, set session.ACK_NOW to true; 8. If flow.SEQUENCE_SET has any gaps (that is, if it doesn't contain every sequence number from 0 through and including the highest sequence number in the set), set session.ACK_NOW to true; 9. If flow.SEQUENCE_SET contains chunk.sequenceNumber, then this chunk is a duplicate: set session.ACK_NOW to true; 10. If flow.SEQUENCE_SET doesn't contain chunk.sequenceNumber, AND chunk.final is true, AND flow.RF_FINAL_SN has no value, then set flow.RF_FINAL_SN to chunk.sequenceNumber, and set session.ACK_NOW to true; 11. If the flow is in the RF_OPEN state, AND flow.SEQUENCE_SET doesn't contain chunk.sequenceNumber, AND chunk.abandon is false, then create a new RECV_BUFFER entry for this chunk's data and set entry.SEQUENCE_NUMBER to chunk.sequenceNumber, entry.DATA to chunk.userData, and entry.FRA to chunk.fragmentControl, and insert this new entry into flow.RECV_BUFFER; 12. Add to flow.SEQUENCE_SET the range of sequence numbers from 0 through and including the chunk.forwardSequenceNumber derived field; 13. Add chunk.sequenceNumber to flow.SEQUENCE_SET; 14. If flow.SEQUENCE_SET now has any gaps, set session.ACK_NOW to true; 15. If session.ACK_NOW is false and session.DELACK_ALARM is not set, set session.DELACK_ALARM to fire in 200 milliseconds; and 16. Attempt delivery of completed messages in this flow's RECV_BUFFER to the user (Section 3.6.3.3). After processing all chunks in a packet containing at least one User Data chunk, increment session.RX_DATA_PACKETS by one. If session.RX_DATA_PACKETS is at least two, set session.ACK_NOW to true. A receiving flow that is not in the RF_CLOSED state is ready to send an acknowledgement if its SHOULD_ACK flag is set. Acknowledgements for receiving flows that are ready are sent either opportunistically by piggybacking on a packet that's already sending user data or an acknowledgement (Section 3.6.3.4.6), or when the session's ACK_NOW flag is set (Section 3.6.3.4.5). 3.6.3.3. Buffering and Delivering Data A receiving flow's information context contains a RECV_BUFFER for reordering, reassembling, and holding the user data messages of the flow. Only complete messages are delivered to the user; an implementation MUST NOT deliver partially received messages, except by special arrangement with the user. Let the Cumulative Acknowledgement Sequence Number (CSN) be the highest number in the contiguous range of numbers in SEQUENCE_SET starting with 0. For example, if SEQUENCE_SET contains {0, 1, 2, 3, 5, 6}, the contiguous range starting with 0 is 0..3, so the CSN is 3. A message is complete if all of its fragments are present in the RECV_BUFFER. The fragments of one message have contiguous sequence numbers. A message can be either a single fragment, whose fragment control value is 0-whole, or two or more fragments where the first's fragment control value is 1-begin, followed by zero or more fragments with control value 3-middle, and terminated by a last fragment with control value 2-end. An incomplete message segment is a contiguous sequence of one or more fragments that do not form a complete message -- that is, a 1-begin followed by zero or more 3-middle fragments but with no 2-end, or zero or more 3-middle fragments followed by a 2-end but with no 1-begin, or one or more 3-middle fragments with neither a 1-begin nor a 2-end. Incomplete message segments can either be in progress or abandoned. An incomplete segment is abandoned in the following cases: o The sequence number of the segment's first fragment is less than or equal to the CSN, AND that fragment's control value is not 1-begin; or o The sequence number of the segment's last fragment is less than the CSN. Abandoned message segments will never be completed, so they SHOULD be removed from the RECV_BUFFER to make room in the advertised receive window and the receiver's memory for messages that can be completed. The user can suspend delivery of a flow's messages. A suspended receiving flow holds completed messages in its RECV_BUFFER until the user resumes delivery. A suspended flow can cause the receive window advertisement to go to zero even when the BUFFER_CAPACITY is non-zero; this is described in detail in Section 3.6.3.5 ("Flow Control"). When the receiving flow is not suspended, the original queuing order of the messages is recovered by delivering, in ascending sequence number order, complete messages in the RECV_BUFFER whose sequence numbers are less than or equal to the CSN. The following describes a method for discarding abandoned message segments and delivering complete messages in original queuing order when the receiving flow is not suspended. While the first fragment entry in the RECV_BUFFER has a sequence number less than or equal to the CSN and delivery is still possible: 1. If entry.FRA is 0-whole, deliver entry.DATA to the user, and remove this entry from RECV_BUFFER; otherwise, 2. If entry.FRA is 2-end or 3-middle, this entry belongs to an abandoned segment, so remove and discard this entry from RECV_BUFFER; otherwise, 3. Entry.FRA is 1-begin. Let LAST_ENTRY be the last RECV_BUFFER entry that is part of this message segment (LAST_ENTRY can be entry if the segment has only one fragment so far). Then: 1. If LAST_ENTRY.FRA is 2-end, this segment is a complete message, so concatenate the DATA fields of each fragment entry of this segment in ascending sequence number order and deliver the complete message to the user, then remove the entries for this complete message from RECV_BUFFER; otherwise, 2. If LAST_ENTRY.SEQUENCE_NUMBER is less than CSN, this segment is incomplete and abandoned, so remove and discard the entries for this segment from RECV_BUFFER; otherwise, 3. LAST_ENTRY.SEQUENCE_NUMBER is equal to CSN and LAST_ENTRY.FRA is not 2-end: this segment is incomplete but still in progress. Ordered delivery is no longer possible until at least one more fragment is received. Stop. If flow.RF_FINAL_SN has a value and is equal to the CSN, AND RECV_BUFFER is empty, all complete messages have been delivered to the user, so notify the user that the flow is complete. 3.6.3.4. Acknowledging Data A flow receiver SHOULD acknowledge all user data fragment sequence numbers seen in that flow. Acknowledgements drive the sender's congestion control and avoidance algorithms, clear data from the sender's buffers, and in some sender implementations clock new data into the network; therefore, the acknowledgements must be accurate and timely. 3.6.3.4.1. Timing For reasons similar to those discussed in Section 4.2.3.2 of RFC 1122 [RFC1122], it is advantageous to delay sending acknowledgements for a short time, so that multiple data fragments can be acknowledged in a single transmission. However, it is also advantageous for a sender to receive timely notification about the receiver's disposition of the flow, particularly in unusual or exceptional circumstances, so that the circumstances can be addressed if possible. Therefore, a flow receiver SHOULD send an acknowledgement for a flow as soon as is practical in any of the following circumstances: o On receipt of a User Data chunk that starts a new flow; o On receipt of a User Data or Next User Data chunk if the flow is not in the RF_OPEN state; o On receipt of a User Data chunk where, before processing the chunk, the SEQUENCE_SET of the indicated flow does not contain every sequence number between 0 and the highest sequence number in the set (that is, if there was a sequence number gap before processing the chunk); o On receipt of a User Data chunk where, after processing the chunk, the flow's SEQUENCE_SET does not contain every sequence number between 0 and the highest sequence number in the set (that is, if this chunk causes a sequence number gap); o On receipt of a Buffer Probe for the flow; o On receipt of a User Data chunk if the last acknowledgement sent for the flow indicated fewer than two bufferBlocksAvailable; o On receipt of a User Data or Next User Data chunk for the flow if, after processing the chunk, the flow's BUFFER_CAPACITY is not at least 1024 bytes greater than BUFFERED_SIZE; o On receipt of a User Data or Next User Data chunk for any sequence number that was already seen (that is, on receipt of a duplicate); o On the first receipt of the final sequence number of the flow; o On receipt of two packets in the session that contain user data for any flows since an acknowledgement was last sent, the new acknowledgements being for the flows having any User Data chunks in the received packets (that is, for every second packet containing user data); o After receipt of a User Data chunk for the flow, if an acknowledgement for any other flow is being sent (that is, consolidate acknowledgements); o After receipt of a User Data chunk for the flow, if any user data for a sending flow is being sent in a packet and if there is space available in the same packet (that is, attempt to piggyback an acknowledgement with user data if possible); o No longer than 200 milliseconds after receipt of a User Data chunk for the flow. 3.6.3.4.2. Size and Truncation Including an encoded acknowledgement in a packet might cause the packet to exceed the path MTU. In that case: o If the packet is being sent primarily to send an acknowledgement, AND this is the first acknowledgement in the packet, truncate the acknowledgement so that the packet does not exceed the path MTU; otherwise, o The acknowledgement is being piggybacked in a packet with user data or with an acknowledgement for another flow: do not include this acknowledgement in the packet, and send it later. 3.6.3.4.3. Constructing The Data Acknowledgement Bitmap chunk (Section 2.3.13) and Data Acknowledgement Ranges chunk (Section 2.3.14) encode a receiving flow's SEQUENCE_SET and its receive window advertisement. The two chunks are semantically equivalent; implementations SHOULD send whichever provides the most compact encoding of the SEQUENCE_SET. When assembling an acknowledgement for a receiving flow: 1. If the flow's state is RF_REJECTED, first assemble a Flow Exception Report chunk (Section 2.3.16) for flow.flowID; 2. Choose the acknowledgement chunk type that most compactly encodes flow.SEQUENCE_SET; 3. Use the method described in Section 3.6.3.5 ("Flow Control") to determine the value for the acknowledgement chunk's bufferBlocksAvailable field. 3.6.3.4.4. Delayed Acknowledgement As discussed in Section 3.6.3.4.1 ("Timing"), a flow receiver can delay sending an acknowledgement for up to 200 milliseconds after receiving user data. The method described in Section 3.6.3.2 ("Receiving Data") sets the session's DELACK_ALARM. When DELACK_ALARM fires, set ACK_NOW to true. 3.6.3.4.5. Obligatory Acknowledgement One or more acknowledgements should be sent as soon as is practical when the session's ACK_NOW flag is set. While the ACK_NOW flag is set: 1. Choose a receiving flow that is ready to send an acknowledgement; 2. If there is no such flow, there is no work to do, set ACK_NOW to false, set RX_DATA_PACKETS to 0, clear the DELACK_ALARM, and stop; otherwise, 3. Start a new packet; 4. Assemble an acknowledgement for the flow and include it in the packet, truncating it if necessary so that the packet doesn't exceed the path MTU; 5. Set flow.SHOULD_ACK to false; 6. Set flow.PREV_RWND to the bufferBlocksAvailable field of the included acknowledgement chunk; 7. Attempt to piggyback acknowledgements for any other flows that are ready to send an acknowledgement into the packet, as described below; and 8. Send the packet. 3.6.3.4.6. Opportunistic Acknowledgement When sending a packet with user data or an acknowledgement, any other receiving flows that are ready to send an acknowledgement should include their acknowledgements in the packet if possible. To piggyback acknowledgements in a packet that is already being sent, where the packet contains user data or an acknowledgement, while there is at least one receiving flow that is ready to send an acknowledgement: 1. Assemble an acknowledgement for the flow; 2. If the acknowledgement cannot be included in the packet without exceeding the path MTU, the packet is full; stop. Otherwise, 3. Include the acknowledgement in the packet; 4. Set flow.SHOULD_ACK to false; 5. Set flow.PREV_RWND to the bufferBlocksAvailable field of the included acknowledgement chunk; and 6. If there are no longer any receiving flows in the session that are ready to send an acknowledgement, set session.ACK_NOW to false, set session.RX_DATA_PACKETS to 0, and clear session.DELACK_ALARM. 3.6.3.4.7. Example Figure 23 shows an example flow with sequence numbers 31 and 33 lost in transit; 31 is abandoned, and 33 is retransmitted. Receiver 1 |<--- Data ID=3, seq#=29, fsnOff=11 (fsn=18) 2 |<--- Data ID=3, seq#=30, fsnOff=12 (fsn=18) 3 |---> Ack ID=3, seq:0-30 4 |<--- Data ID=3, seq#=32, fsnOff=12 (fsn=20) 5 |---> Ack ID=3, seq:0-30, 32 6 |<--- Data ID=3, seq#=34, fsnOff=12 (fsn=22) 7 |---> Ack ID=3, seq:0-30, 32, 34 | : 8 |<--- Data ID=3, seq#=46, fsnOff=16 (fsn=30) 9 |---> Ack ID=3, seq:0-30, 32, 34-46 10 |<--- Data ID=3, seq#=47, fsnOff=15 (fsn=32) 11 |---> Ack ID=3, seq:0-32, 34-47 12 |<--- Data ID=3, seq#=33, fsnOff=1 (fsn=32) 13 |---> Ack ID=3, seq#=0-47 14 |<--- Data ID=3, seq#=48, fsnOff=16 (fsn=32) 15 |<--- Data ID=3, seq#=49, fsnOff=17 (fsn=32) 16 |---> Ack ID=3, seq#=0-49 | : Figure 23: Flow Example with Loss 3.6.3.5. Flow Control The flow receiver maintains a buffer for reassembling and reordering messages for delivery to the user (Section 3.6.3.3). The implementation and the user may wish to limit the amount of resources (including buffer memory) that a flow is allowed to use. RTMFP provides a means for each receiving flow to govern the amount of data sent by the sender, by way of the bufferBytesAvailable derived field of acknowledgement chunks (Sections 2.3.13 and 2.3.14). This derived field indicates the amount of data that the sender is allowed to have outstanding in the network, until instructed otherwise. This amount is also called the receive window. The flow receiver can suspend the sender by advertising a closed (zero length) receive window. The user can suspend delivery of messages from the receiving flow (Section 3.6.3.3). This can cause the receive buffer to fill. In order for progress to be made on completing a fragmented message or repairing a gap for sequenced delivery in a flow, the flow receiver MUST advertise at least one buffer block in an acknowledgement if it is not suspended, even if the amount of data in the buffer exceeds the buffer capacity, unless the buffer capacity is 0. Otherwise, deadlock can occur, as the receive buffer will stay full and won't drain because of a gap or incomplete message, and the gap or incomplete message can't be repaired or completed because the sender is suspended. The receive window is advertised in units of 1024-byte blocks. For example, advertisements for 1 byte, 1023 bytes, and 1024 bytes each require one block. An advertisement for 1025 bytes requires two blocks. The following describes the RECOMMENDED method of calculating the bufferBlocksAvailable field of an acknowledgement chunk for a receiving flow: 1. If BUFFERED_SIZE is greater than or equal to BUFFER_CAPACITY, set ADVERTISE_BYTES to 0; 2. If BUFFERED_SIZE is less than BUFFER_CAPACITY, set ADVERTISE_BYTES to BUFFER_CAPACITY - BUFFERED_SIZE; 3. Set ADVERTISE_BLOCKS to CEIL(ADVERTISE_BYTES / 1024); 4. If ADVERTISE_BLOCKS is 0, AND BUFFER_CAPACITY is greater than 0, AND delivery to the user is not suspended, set ADVERTISE_BLOCKS to 1; and 5. Set the acknowledgement's bufferBlocksAvailable field to ADVERTISE_BLOCKS. 3.6.3.6. Receiving a Buffer Probe A Buffer Probe chunk (Section 2.3.15) is sent by the flow sender (Section 3.6.2.9.1) to request the current receive window advertisement (in the form of an acknowledgement) from the flow receiver. On receipt of a Buffer Probe chunk: 1. If chunk.flowID doesn't belong to a receiving flow in the same session in the RF_OPEN, RF_REJECTED, or RF_COMPLETE_LINGER state, ignore this Buffer Probe; otherwise, 2. Retrieve the receiving flow context for the flow indicated by chunk.flowID; then 3. Set flow.SHOULD_ACK to true; and 4. Set session.ACK_NOW to true. 3.6.3.7. Rejecting a Flow A receiver can reject an RF_OPEN flow at any time and for any reason. To reject a receiving flow in the RF_OPEN state: 1. Move to the RF_REJECTED state; 2. Discard all entries in flow.RECV_BUFFER, as they are no longer relevant; 3. If the user rejected the flow, set flow.EXCEPTION_CODE to the exception code indicated by the user; otherwise, the flow was rejected automatically by the implementation, so the exception code is 0; 4. Set flow.SHOULD_ACK to true; and 5. Set session.ACK_NOW to true. The receiver indicates that it has rejected a flow by sending a Flow Exception Report chunk (Section 2.3.16) with every acknowledgement (Section 3.6.3.4.3) for a flow in the RF_REJECTED state. 3.6.3.8. Close A receiving flow is complete when every sequence number from 0 through and including the final sequence number has been received -- that is, when flow.RF_FINAL_SN has a value and flow.SEQUENCE_SET contains every sequence number from 0 through flow.RF_FINAL_SN, inclusive. When an RF_OPEN or RF_REJECTED receiving flow becomes complete, move to the RF_COMPLETE_LINGER state, set flow.SHOULD_ACK to true, and set session.ACK_NOW to true. A receiving flow SHOULD remain in the RF_COMPLETE_LINGER state for 120 seconds. After 120 seconds, move to the RF_CLOSED state. The receiving flow is now closed, and its resources can be reclaimed once all complete messages in flow.RECV_BUFFER have been delivered to the user (Section 3.6.3.3). The same flow ID might be used for a new flow by the sender after this point. Discussion: The flow sender detects that the flow is complete on receiving an acknowledgement of all fragment sequence numbers of the flow. This can't happen until after the receiver has detected that the flow is complete and acknowledged all of the sequence numbers. The receiver's RF_COMPLETE_LINGER period is two minutes (one Maximum Segment Lifetime (MSL)); this period allows any in-flight packets to drain from the network without being misidentified and gives the sender an opportunity to retransmit any sequence numbers if the completing acknowledgement is lost. The sender's F_COMPLETE_LINGER period is at least two minutes plus 10 seconds and doesn't begin until the completing acknowledgement is received; therefore, the same flow identifier won't be reused by the flow sender for a new sending flow for at least 10 seconds after the flow receiver has closed the receiving flow context. This ensures correct operation independent of network delay, even when the sender's clock runs up to 8 percent faster than the receiver's. 4. IANA Considerations This memo specifies chunk type code values (Section 2.3) and User Data option type code values (Section 2.3.11.1). These type code values are assigned and maintained by Adobe. Therefore, this memo has no IANA actions. 5. Security Considerations This memo specifies a general framework that can be used to establish a confidential and authenticated session between endpoints. A Cryptography Profile, not specified herein, defines the cryptographic algorithms, data formats, and semantics as used within this framework. Designing a Cryptography Profile to ensure that communications are protected to the degree required by the application-specific threat model is outside the scope of this specification. A block cipher in CBC mode is RECOMMENDED for packet encryption (Section 2.2.3). An attacker can predict the values of some fields from one plain RTMFP packet to the next or predict that some fields may be the same from one packet to the next. This SHOULD be considered in choosing and implementing a packet encryption cipher and mode. The well-known Default Session Key of a Cryptography Profile serves multiple purposes, including the scrambling of session startup packets to protect interior fields from undesirable modification by middleboxes such as NATs, increasing the effort required for casual passive observation of startup packets, and allowing different applications of RTMFP using different Default Session Keys to (intentionally or not) share network transport addresses without interference. The Default Session Key, being well known, MUST NOT be construed to contribute to the security of session startup; session startup is essentially in the clear. Section 3.5.4.2 describes an OPTIONAL method for processing a change of network address of a communicating peer. Securely processing address mobility using that method, or any substantially similar method, REQUIRES at least that the packet encryption function of the Cryptography Profile (Section 2.2.3) employs a cryptographic verification mechanism comprising secret information known only to the two endpoints. Without this constraint, that method, or any substantially similar method, becomes "session hijacking support". Flows and packet fragmentation imply semantics that could cause unbounded resource utilization in receivers, causing a denial of service. Implementations SHOULD guard against unbounded or excessive resource use and abort sessions that appear abusive. A rogue but popular Redirector (Section 3.5.1.4) could direct session initiators to flood a victim address or network with Initiator Hello packets, potentially causing a denial of service. An attacker that can passively observe an IHello and that possesses a certificate matching the Endpoint Discriminator (without having to know the private key, if any, associated with it) can deny the initiator access to the desired responder by sending an RHello before the desired responder does, since only the first received RHello is selected by the initiator. The attacker needn't forge the desired responder's source address, since the RHello is selected based on the tag echo and not the packet's source address. This can simplify the attack in some network or host configurations. An attacker that can passively observe and record the packets of an established session can use traffic analysis techniques to infer the start and completion of flows without decrypting the packets. The User Data fragments of flows have unique sequence numbers, so flows are immune to replay while they are open. However, once a flow has completed and the linger period has concluded, the attacker could replay the recorded packets, opening a new flow in the receiver and duplicating the flow's data; this replay might have undesirable effects on the receiver's application. The attacker could also infer that a new flow has begun reusing the recorded flow's identifier and replay the final sequence number or any of the other fragments in the flow, potentially denying or interfering with legitimate traffic to the receiver. Therefore, the data integrity aspect of packet encryption SHOULD comprise anti-replay measures. 6. Acknowledgements Special thanks go to Matthew Kaufman for his contributions to the creation and design of RTMFP. Thanks to Jari Arkko, Ben Campbell, Wesley Eddy, Stephen Farrell, Philipp Hancke, Bela Lubkin, Hilarie Orman, Richard Scheffenegger, and Martin Stiemerling for their detailed reviews of this memo. 7. References 7.1. Normative References [CBC] Dworkin, M., "Recommendation for Block Cipher Modes of Operation", NIST Special Publication 800-38A, December 2001, <http://csrc.nist.gov/publications/ nistpubs/800-38a/sp800-38a.pdf>. [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, August 1980. [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, September 2000. [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, March 2007. [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009. 7.2. Informative References [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, "Session Traversal Utilities for NAT (STUN)", RFC 5389, October 2008. [ScalableTCP] Kelly, T., "Scalable TCP: Improving Performance in Highspeed Wide Area Networks", December 2002, <http://datatag.web.cern.ch/datatag/papers/ pfldnet2003-ctk.pdf>. Appendix A. Example Congestion Control Algorithm As mandated in Section 3.5.2, an RTMFP is required to use TCP- compatible congestion control, but flexibility in exact implementation is allowed, within certain limits. This section describes an experimental window-based congestion control algorithm that is appropriate for real-time and bulk data transport in RTMFP. The algorithm includes slow start and congestion avoidance phases, including modified increase and decrease parameters. These parameters are further adjusted according to whether real-time data is being sent and whether Time Critical Reverse Notifications are received. A.1. Discussion RFC 5681 defines the standard window-based congestion control algorithms for TCP. These algorithms are appropriate for delay- insensitive bulk data transport but have undesirable behaviors for delay- and loss-sensitive applications. Among the undesirable behaviors are the cutting of the congestion window in half during a loss event, and the rapidity of the slow start algorithm's exponential growth. Cutting the congestion window in half requires a large channel headroom to support a real-time application and can cause a large amount of jitter from sender-side buffering. Doubling the congestion window during the slow start phase can lead to the congestion window temporarily growing to twice the size it should be, causing a period of excessive loss in the path. We found that a number of deployed TCP implementations use the method of equation (3) from Section 3.1 of RFC 5681; this method, when combined with the recommended behavior of acknowledging every other packet, causes the congestion window to grow at approximately half the rate that the recommended method specifies. In order to compete fairly with these deployed TCPs, we choose 768 bytes per round trip as the increment during the normal congestion avoidance phase; this is approximately half of the typical maximum segment size of 1500 bytes and is also easily subdivided. The sender may be sending real-time data to the far end. When sending real-time data, a smoother response to congestion is desired while still competing with reasonable fairness to other flows in the Internet. In order to scale the sending rate quickly, the slow start algorithm is desired, but slow start's normal rate of increase can cause excessive loss in the last round trip. Accordingly, slow start's exponential increase rate is adjusted to double approximately every 3 round trips instead of every round trip. The multiplicative decrease cuts the congestion window by one eighth on loss to maintain a smoother sending rate. The additive increase is done at half the normal rate (incrementing at 384 bytes per round trip), to both compensate for the less aggressive loss response and probe the path capacity more gently. The far end may report that it is receiving real-time data from other peers, or the sender may be sending real-time data to other far ends. In these circumstances (if not sending real-time data to this far end), it is desirable to respond differently than the standard TCP algorithms specify, to both yield capacity to the real-time flows and avoid excessive losses while probing the path capacity. Slow start's exponential increase is disabled, and the additive increase is done at half the normal rate (incrementing at 384 bytes per round trip). Multiplicative decrease is left at the normal rate (cutting by half) to yield to other flows. Since real-time messages may be small, and sent regularly, it is advantageous to spread congestion window increases out across the round-trip time instead of doing them all at once. We divide the round trip into 16 segments with an additive increase of a useful size (48 bytes) per segment. Scalable TCP [ScalableTCP] describes experimental methods of modifying the additive increase and multiplicative decrease of the congestion window in large delay-bandwidth scenarios. The congestion window is increased by 1% each round trip and decreased by one eighth on loss in the congestion avoidance phase in certain circumstances (specifically, when a 1% increase is larger than the normal additive- increase amount). Those methods are adapted here. The scalable increase amount is 48 bytes for every 4800 bytes acknowledged, to spread the increase out over the round trip. The congestion window is decreased by one eighth on loss when it is at least 67200 bytes per round trip, which is seven eighths of 76800 (the point at which 1% is greater than 768 bytes per round trip). When sending real-time data to the far end, the scalable increase is 1% or 384 bytes per round trip, whichever is greater. Otherwise, when notified that the far end is receiving real-time data from other peers, the scaled increase is adjusted to 0.5% or 384 bytes per round trip, whichever is greater. A.2. Algorithm Let SMSS denote the Sender Maximum Segment Size [RFC5681], for example 1460 bytes. Let CWND_INIT denote the Initial Congestion Window (IW) according to Section 3.1 of RFC 5681, for example 4380 bytes. Let CWND_TIMEDOUT denote the congestion window after a timeout indicating lost data, being 1*SMSS (for example, 1460 bytes). Let the session information context contain additional variables: o CWND: the congestion window, initialized to CWND_INIT; o SSTHRESH: the slow start threshold, initialized to positive infinity; o ACKED_BYTES_ACCUMULATOR: a count of acknowledged bytes, initialized to 0; o ACKED_BYTES_THIS_PACKET: a count of acknowledged bytes observed in the current packet; o PRE_ACK_OUTSTANDING: the number of bytes outstanding in the network before processing any acknowledgements in the current packet; o ANY_LOSS: an indication of whether any loss has been detected in the current packet; o ANY_NAKS: an indication of whether any negative acknowledgements have been detected in the current packet; o ANY_ACKS: an indication of whether any acknowledgement chunks have been received in the current packet. Let FASTGROW_ALLOWED indicate whether the congestion window is allowed to grow at the normal rate versus a slower rate, being false if a Time Critical Reverse Notification has been received on this session within the last 800 milliseconds (Sections 2.2.4 and 3.5.2.1) or if a Time Critical Forward Notification has been sent on ANY session in the last 800 milliseconds, and otherwise being true. Let TC_SENT indicate whether a Time Critical Forward Notification has been sent on this session within the last 800 milliseconds. Implement the method described in Section 3.6.2.6 to manage transmission timeouts, including setting the TIMEOUT_ALARM. On being notified that the TIMEOUT_ALARM has fired, perform the function shown in Figure 24: on TimeoutNotification(WAS_LOSS): set SSTHRESH to MAX(SSTHRESH, CWND * 3/4). set ACKED_BYTES_ACCUMULATOR to 0. if WAS_LOSS is true: set CWND to CWND_TIMEDOUT. else: set CWND to CWND_INIT. Figure 24: Pseudocode for Handling a Timeout Notification Before processing each received packet in this session: 1. Set ANY_LOSS to false; 2. Set ANY_NAKS to false; 3. Set ACKED_BYTES_THIS_PACKET to 0; and 4. Set PRE_ACK_OUTSTANDING to S_OUTSTANDING_BYTES. On notification of loss (Section 3.6.2.5), set ANY_LOSS to true. On notification of negative acknowledgement (Section 3.6.2.5), set ANY_NAKS to true. On notification of acknowledgement of data (Section 3.6.2.4), set ANY_ACKS to true, and add the count of acknowledged bytes to ACKED_BYTES_THIS_PACKET. After processing all chunks in each received packet for this session, perform the function shown in Figure 25: if ANY_LOSS is true: if (TC_SENT is true) OR (PRE_ACK_OUTSTANDING > 67200 AND \ FASTGROW_ALLOWED is true): set SSTHRESH to MAX(PRE_ACK_OUTSTANDING * 7/8, CWND_INIT). else: set SSTHRESH to MAX(PRE_ACK_OUTSTANDING * 1/2, CWND_INIT). set CWND to SSTHRESH. set ACKED_BYTES_ACCUMULATOR to 0. else if (ANY_ACKS is true) AND (ANY_NAKS is false) AND \ (PRE_ACK_OUTSTANDING >= CWND): set var INCREASE to 0. var AITHRESH. if FASTGROW_ALLOWED is true: if CWND < SSTHRESH: set INCREASE to ACKED_BYTES_THIS_PACKET. else: add ACKED_BYTES_THIS_PACKET to ACKED_BYTES_ACCUMULATOR. set AITHRESH to MIN(MAX(CWND / 16, 64), 4800). while ACKED_BYTES_ACCUMULATOR >= AITHRESH: subtract AITHRESH from ACKED_BYTES_ACCUMULATOR. add 48 to INCREASE. else FASTGROW_ALLOWED is false: if CWND < SSTHRESH AND TC_SENT is true: set INCREASE to CEIL(ACKED_BYTES_THIS_PACKET / 4). else: var AITHRESH_CAP. if TC_SENT is true: set AITHRESH_CAP to 2400. else: set AITHRESH_CAP to 4800. add ACKED_BYTES_THIS_PACKET to ACKED_BYTES_ACCUMULATOR. set AITHRESH to MIN(MAX(CWND / 16, 64), AITHRESH_CAP). while ACKED_BYTES_ACCUMULATOR >= AITHRESH: subtract AITHRESH from ACKED_BYTES_ACCUMULATOR. add 24 to INCREASE. set CWND to MAX(CWND + MIN(INCREASE, SMSS), CWND_INIT). Figure 25: Pseudocode for Congestion Window Adjustment after Processing a Packet Author's Address Michael C. Thornburgh Adobe Systems Incorporated 345 Park Avenue San Jose, CA 95110-2704 US Phone: +1 408 536 6000 EMail: mthornbu@adobe.com URI: http://www.adobe.com/