rfc9040.original | rfc9040.txt | |||
---|---|---|---|---|
TCPM WG J. Touch | ||||
Internet Draft Independent | ||||
Intended status: Informational M. Welzl | ||||
Obsoletes: 2140 S. Islam | ||||
Expires: October 2021 University of Oslo | ||||
April 12, 2021 | ||||
TCP Control Block Interdependence | ||||
draft-ietf-tcpm-2140bis-11.txt | ||||
Status of this Memo | Internet Engineering Task Force (IETF) J. Touch | |||
Request for Comments: 9040 Independent | ||||
Obsoletes: 2140 M. Welzl | ||||
Category: Informational S. Islam | ||||
ISSN: 2070-1721 University of Oslo | ||||
July 2021 | ||||
This Internet-Draft is submitted in full conformance with the | TCP Control Block Interdependence | |||
provisions of BCP 78 and BCP 79. | ||||
This document may contain material from IETF Documents or IETF | Abstract | |||
Contributions published or made publicly available before November | ||||
10, 2008. The person(s) controlling the copyright in some of this | ||||
material may not have granted the IETF Trust the right to allow | ||||
modifications of such material outside the IETF Standards Process. | ||||
Without obtaining an adequate license from the person(s) controlling | ||||
the copyright in such materials, this document may not be modified | ||||
outside the IETF Standards Process, and derivative works of it may | ||||
not be created outside the IETF Standards Process, except to format | ||||
it for publication as an RFC or to translate it into languages other | ||||
than English. | ||||
Internet-Drafts are working documents of the Internet Engineering | This memo provides guidance to TCP implementers that is intended to | |||
Task Force (IETF), its areas, and its working groups. Note that | help improve connection convergence to steady-state operation without | |||
other groups may also distribute working documents as Internet- | affecting interoperability. It updates and replaces RFC 2140's | |||
Drafts. | description of sharing TCP state, as typically represented in TCP | |||
Control Blocks, among similar concurrent or consecutive connections. | ||||
Internet-Drafts are draft documents valid for a maximum of six | Status of This Memo | |||
months and may be updated, replaced, or obsoleted by other documents | ||||
at any time. It is inappropriate to use Internet-Drafts as | ||||
reference material or to cite them other than as "work in progress." | ||||
The list of current Internet-Drafts can be accessed at | This document is not an Internet Standards Track specification; it is | |||
http://www.ietf.org/ietf/1id-abstracts.txt | published for informational purposes. | |||
The list of Internet-Draft Shadow Directories can be accessed at | This document is a product of the Internet Engineering Task Force | |||
http://www.ietf.org/shadow.html | (IETF). It represents the consensus of the IETF community. It has | |||
received public review and has been approved for publication by the | ||||
Internet Engineering Steering Group (IESG). Not all documents | ||||
approved by the IESG are candidates for any level of Internet | ||||
Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on October 12, 2021. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9040. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with | carefully, as they describe your rights and restrictions with respect | |||
respect to this document. Code Components extracted from this | to this document. Code Components extracted from this document must | |||
document must include Simplified BSD License text as described in | include Simplified BSD License text as described in Section 4.e of | |||
Section 4.e of the Trust Legal Provisions and are provided | the Trust Legal Provisions and are provided without warranty as | |||
without warranty as described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Abstract | ||||
This memo provides guidance to TCP implementers that is intended to | ||||
help improve connection convergence to steady-state operation | ||||
without affecting interoperability. It updates and replaces RFC | ||||
2140's description of sharing TCP state, as typically represented in | ||||
TCP Control Blocks, among similar concurrent or consecutive | ||||
connections. | ||||
Table of Contents | Table of Contents | |||
1. Introduction...................................................3 | 1. Introduction | |||
2. Conventions Used in This Document..............................4 | 2. Conventions Used in This Document | |||
3. Terminology....................................................4 | 3. Terminology | |||
4. The TCP Control Block (TCB)....................................5 | 4. The TCP Control Block (TCB) | |||
5. TCB Interdependence............................................7 | 5. TCB Interdependence | |||
6. Temporal Sharing...............................................7 | 6. Temporal Sharing | |||
6.1. Initialization of a new TCB..................................7 | 6.1. Initialization of a New TCB | |||
6.2. Updates to the TCB cache.....................................8 | 6.2. Updates to the TCB Cache | |||
6.3. Discussion..................................................10 | 6.3. Discussion | |||
7. Ensemble Sharing..............................................11 | 7. Ensemble Sharing | |||
7.1. Initialization of a new TCB.................................11 | 7.1. Initialization of a New TCB | |||
7.2. Updates to the TCB cache....................................12 | 7.2. Updates to the TCB Cache | |||
7.3. Discussion..................................................13 | 7.3. Discussion | |||
8. Issues with TCB information sharing...........................14 | 8. Issues with TCB Information Sharing | |||
8.1. Traversing the same network path............................15 | 8.1. Traversing the Same Network Path | |||
8.2. State dependence............................................15 | 8.2. State Dependence | |||
8.3. Problems with sharing based on IP address...................16 | 8.3. Problems with Sharing Based on IP Address | |||
9. Implications..................................................16 | 9. Implications | |||
9.1. Layering....................................................17 | 9.1. Layering | |||
9.2. Other possibilities.........................................17 | 9.2. Other Possibilities | |||
10. Implementation Observations..................................18 | 10. Implementation Observations | |||
11. Changes Compared to RFC 2140.................................19 | 11. Changes Compared to RFC 2140 | |||
12. Security Considerations......................................19 | 12. Security Considerations | |||
13. IANA Considerations..........................................20 | 13. IANA Considerations | |||
14. References...................................................20 | 14. References | |||
14.1. Normative References....................................20 | 14.1. Normative References | |||
14.2. Informative References..................................21 | 14.2. Informative References | |||
15. Acknowledgments..............................................24 | Appendix A. TCB Sharing History | |||
16. Change log...................................................24 | Appendix B. TCP Option Sharing and Caching | |||
Appendix A : TCB Sharing History.................................28 | Appendix C. Automating the Initial Window in TCP over Long | |||
Appendix B : TCP Option Sharing and Caching......................29 | Timescales | |||
Appendix C : Automating the Initial Window in TCP over Long | C.1. Introduction | |||
Timescales.......................................................31 | C.2. Design Considerations | |||
C.1. Introduction.............................................31 | C.3. Proposed IW Algorithm | |||
C.2. Design Considerations....................................31 | C.4. Discussion | |||
C.3. Proposed IW Algorithm....................................32 | C.5. Observations | |||
C.4. Discussion...............................................36 | Acknowledgments | |||
C.5. Observations.............................................37 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
TCP is a connection-oriented reliable transport protocol layered | TCP is a connection-oriented reliable transport protocol layered over | |||
over IP [RFC793]. Each TCP connection maintains state, usually in a | IP [RFC0793]. Each TCP connection maintains state, usually in a data | |||
data structure called the TCP Control Block (TCB). The TCB contains | structure called the "TCP Control Block (TCB)". The TCB contains | |||
information about the connection state, its associated local | information about the connection state, its associated local process, | |||
process, and feedback parameters about the connection's transmission | and feedback parameters about the connection's transmission | |||
properties. As originally specified and usually implemented, most | properties. As originally specified and usually implemented, most | |||
TCB information is maintained on a per-connection basis. Some | TCB information is maintained on a per-connection basis. Some | |||
implementations share certain TCB information across connections to | implementations share certain TCB information across connections to | |||
the same host [RFC2140]. Such sharing is intended to lead to better | the same host [RFC2140]. Such sharing is intended to lead to better | |||
overall transient performance, especially for numerous short-lived | overall transient performance, especially for numerous short-lived | |||
and simultaneous connections, as can be used in the World-Wide Web | and simultaneous connections, as can be used in the World Wide Web | |||
and other applications [Be94][Br02]. This sharing of state is | and other applications [Be94] [Br02]. This sharing of state is | |||
intended to help TCP connections converge to long term behavior | intended to help TCP connections converge to long-term behavior | |||
(assuming stable application load, i.e., so-called "steady-state") | (assuming stable application load, i.e., so-called "steady-state") | |||
more quickly without affecting TCP interoperability. | more quickly without affecting TCP interoperability. | |||
This document updates RFC 2140's discussion of TCB state sharing and | This document updates RFC 2140's discussion of TCB state sharing and | |||
provides a complete replacement for that document. This state | provides a complete replacement for that document. This state | |||
sharing affects only TCB initialization [RFC2140] and thus has no | sharing affects only TCB initialization [RFC2140] and thus has no | |||
effect on the long-term behavior of TCP after a connection has been | effect on the long-term behavior of TCP after a connection has been | |||
established nor on interoperability. Path information shared across | established or on interoperability. Path information shared across | |||
SYN destination port numbers assumes that TCP segments having the | SYN destination port numbers assumes that TCP segments having the | |||
same host-pair experience the same path properties, i.e., that | same host-pair experience the same path properties, i.e., that | |||
traffic is not routed differently based on port numbers or other | traffic is not routed differently based on port numbers or other | |||
connection parameters (also addressed further in Section 8.1). The | connection parameters (also addressed further in Section 8.1). The | |||
observations about TCB sharing in this document apply similarly to | observations about TCB sharing in this document apply similarly to | |||
any protocol with congestion state, including SCTP [RFC4960] and | any protocol with congestion state, including the Stream Control | |||
DCCP [RFC4340], as well as for individual subflows in Multipath TCP | Transmission Protocol (SCTP) [RFC4960] and the Datagram Congestion | |||
[RFC8684]. | Control Protocol (DCCP) [RFC4340], as well as to individual subflows | |||
in Multipath TCP [RFC8684]. | ||||
2. Conventions Used in This Document | 2. Conventions Used in This Document | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
The core of this document describes behavior that is already | The core of this document describes behavior that is already | |||
permitted by TCP standards. As a result, it provides informative | permitted by TCP standards. As a result, this document provides | |||
guidance but does not use normative language, except when quoting | informative guidance but does not use normative language except when | |||
other documents. Normative language is used in Appendix C as | quoting other documents. Normative language is used in Appendix C as | |||
examples of requirements for future consideration. | examples of requirements for future consideration. | |||
3. Terminology | 3. Terminology | |||
The following terminology is used frequently in this document. Items | The following terminology is used frequently in this document. Items | |||
preceded with a "+" may be part of the state maintained as TCP | preceded with a "+" may be part of the state maintained as TCP | |||
connection state in the associated connections TCB and are the focus | connection state in the TCB of associated connections and are the | |||
of sharing as described in this document. Note that terms are used | focus of sharing as described in this document. Note that terms are | |||
as originally introduced where possible; in some cases, direction is | used as originally introduced where possible; in some cases, | |||
indicated with a suffix (_S for send, _R for receive) and in other | direction is indicated with a suffix (_S for send, _R for receive) | |||
cases spelled out (sendcwnd). | and in other cases spelled out (sendcwnd). | |||
+cwnd - TCP congestion window size [RFC5681] | +cwnd: TCP congestion window size [RFC5681] | |||
host - a source or sink of TCP segments associated with a single IP | host: a source or sink of TCP segments associated with a single IP | |||
address | address | |||
host-pair - a pair of hosts and their corresponding IP addresses | host-pair: a pair of hosts and their corresponding IP addresses | |||
+MMS_R - maximum message size that can be received, the largest | ISN: Initial Sequence Number | |||
received transport payload of an IP datagram [RFC1122] | ||||
+MMS_S - maximum message size that can be sent, the largest | +MMS_R: maximum message size that can be received, the largest | |||
transmitted transport payload of an IP datagram [RFC1122] | received transport payload of an IP datagram [RFC1122] | |||
path - an Internet path between the IP addresses of two hosts | +MMS_S: maximum message size that can be sent, the largest | |||
transmitted transport payload of an IP datagram [RFC1122] | ||||
PCB - protocol control block, the data associated with a protocol as | path: an Internet path between the IP addresses of two hosts | |||
maintained by an endpoint; a TCP PCB is called a TCB | ||||
PLPMTUD - packetization-layer path MTU discovery, a mechanism that | ||||
uses transport packets to discover the PMTU [RFC4821] | ||||
+PMTU - largest IP datagram that can traverse a path | PCB: protocol control block, the data associated with a protocol as | |||
[RFC1191][RFC8201] | maintained by an endpoint; a TCP PCB is called a "TCB" | |||
PMTUD - path-layer MTU discovery, a mechanism that relies on ICMP | PLPMTUD: packetization-layer path MTU discovery, a mechanism that | |||
error messages to discover the PMTU [RFC1191][RFC8201] | uses transport packets to discover the Path Maximum | |||
Transmission Unit (PMTU) [RFC4821] | ||||
+RTT - round-trip time of a TCP packet exchange [RFC793] | +PMTU: largest IP datagram that can traverse a path [RFC1191] | |||
[RFC8201] | ||||
+RTTVAR - variation of round-trip times of a TCP packet exchange | PMTUD: path-layer MTU discovery, a mechanism that relies on ICMP | |||
[RFC6298] | error messages to discover the PMTU [RFC1191] [RFC8201] | |||
+rwnd - TCP receive window size [RFC5681] | +RTT: round-trip time of a TCP packet exchange [RFC0793] | |||
+sendcwnd - TCP send-side congestion window (cwnd) size [RFC5681] | +RTTVAR: variation of round-trip times of a TCP packet exchange | |||
[RFC6298] | ||||
+sendMSS - TCP maximum segment size, a value transmitted in a TCP | +rwnd: TCP receive window size [RFC5681] | |||
option that represents the largest TCP user data payload that can be | ||||
received [RFC6691] | ||||
+ssthresh - TCP slow-start threshold [RFC5681] | +sendcwnd: TCP send-side congestion window (cwnd) size [RFC5681] | |||
TCB - TCP Control Block, the data associated with a TCP connection | +sendMSS: TCP maximum segment size, a value transmitted in a TCP | |||
as maintained by an endpoint | option that represents the largest TCP user data payload that | |||
can be received [RFC6691] | ||||
TCP-AO - TCP Authentication Option [RFC5925] | +ssthresh: TCP slow-start threshold [RFC5681] | |||
TFO - TCP Fast Open option [RFC7413] | TCB: TCP Control Block, the data associated with a TCP connection as | |||
maintained by an endpoint | ||||
+TFO_cookie - TCP Fast Open cookie, state that is used as part of | TCP-AO: TCP Authentication Option [RFC5925] | |||
the TFO mechanism, when TFO is supported [RFC7413] | ||||
+TFO_failure - an indication of when TFO option negotiation failed, | TFO: TCP Fast Open option [RFC7413] | |||
when TFO is supported | ||||
+TFOinfo - information cached when a TFO connection is established, | +TFO_cookie: TCP Fast Open cookie, state that is used as part of the | |||
which includes the TFO_cookie [RFC7413] | TFO mechanism, when TFO is supported [RFC7413] | |||
4. The TCP Control Block (TCB) | +TFO_failure: an indication of when TFO option negotiation failed, | |||
when TFO is supported | ||||
+TFOinfo: information cached when a TFO connection is established, | ||||
which includes the TFO_cookie [RFC7413] | ||||
4. The TCP Control Block (TCB) | ||||
A TCB describes the data associated with each connection, i.e., with | A TCB describes the data associated with each connection, i.e., with | |||
each association of a pair of applications across the network. The | each association of a pair of applications across the network. The | |||
TCB contains at least the following information [RFC793]: | TCB contains at least the following information [RFC0793]: | |||
Local process state | Local process state | |||
pointers to send and receive buffers | ||||
pointers to retransmission queue and current segment | ||||
pointers to Internet Protocol (IP) PCB | ||||
Per-connection shared state | ||||
macro-state | ||||
connection state | ||||
timers | ||||
flags | ||||
local and remote host numbers and ports | ||||
TCP option state | ||||
micro-state | ||||
send and receive window state (size*, current number) | ||||
congestion window size (sendcwnd)* | ||||
congestion window size threshold (ssthresh)* | ||||
max window size seen* | ||||
sendMSS# | ||||
MMS_S# | ||||
MMS_R# | ||||
PMTU# | ||||
round-trip time and its variation# | ||||
The per-connection information is shown as split into macro-state | pointers to send and receive buffers | |||
and micro-state, terminology borrowed from [Co91]. Macro-state | pointers to retransmission queue and current segment | |||
describes the protocol for establishing the initial shared state | pointers to Internet Protocol (IP) PCB | |||
about the connection; we include the endpoint numbers and components | ||||
(timers, flags) required upon commencement that are later used to | Per-connection shared state | |||
help maintain that state. Micro-state describes the protocol after a | ||||
macro-state | ||||
connection state | ||||
timers | ||||
flags | ||||
local and remote host numbers and ports | ||||
TCP option state | ||||
micro-state | ||||
send and receive window state (size*, current number) | ||||
congestion window size (sendcwnd)* | ||||
congestion window size threshold (ssthresh)* | ||||
max window size seen* | ||||
sendMSS# | ||||
MMS_S# | ||||
MMS_R# | ||||
PMTU# | ||||
round-trip time and its variation# | ||||
The per-connection information is shown as split into macro-state and | ||||
micro-state, terminology borrowed from [Co91]. Macro-state describes | ||||
the protocol for establishing the initial shared state about the | ||||
connection; we include the endpoint numbers and components (timers, | ||||
flags) required upon commencement that are later used to help | ||||
maintain that state. Micro-state describes the protocol after a | ||||
connection has been established, to maintain the reliability and | connection has been established, to maintain the reliability and | |||
congestion control of the data transferred in the connection. | congestion control of the data transferred in the connection. | |||
We distinguish two other classes of shared micro-state that are | We distinguish two other classes of shared micro-state that are | |||
associated more with host-pairs than with application pairs. One | associated more with host-pairs than with application pairs. One | |||
class is clearly host-pair dependent (shown above as "#", e.g., | class is clearly host-pair dependent (shown above as "#", e.g., | |||
sendMSS, MMS_R, MMS_S, PMTU, RTT), because these parameters are | sendMSS, MMS_R, MMS_S, PMTU, RTT), because these parameters are | |||
defined by the endpoint or endpoint pair (sendMSS, MMS_R, MMS_S, | defined by the endpoint or endpoint pair (of the given example: | |||
RTT) or are already cached and shared on that basis (PMTU | sendMSS, MMS_R, MMS_S, RTT) or are already cached and shared on that | |||
[RFC1191][RFC4821]). The other is host-pair dependent in its | basis (of the given example: PMTU [RFC1191] [RFC4821]). The other is | |||
aggregate (shown above as "*", e.g., congestion window information, | host-pair dependent in its aggregate (shown above as "*", e.g., | |||
current window sizes, etc.) because they depend on the total | congestion window information, current window sizes, etc.) because | |||
capacity between the two endpoints. | they depend on the total capacity between the two endpoints. | |||
Not all of the TCB state is necessarily sharable. In particular, | Not all of the TCB state is necessarily shareable. In particular, | |||
some TCP options are negotiated only upon request by the application | some TCP options are negotiated only upon request by the application | |||
layer, so their use may not be correlated across connections. Other | layer, so their use may not be correlated across connections. Other | |||
options negotiate connection-specific parameters, which are | options negotiate connection-specific parameters, which are similarly | |||
similarly not shareable. These are discussed further in Appendix B. | not shareable. These are discussed further in Appendix B. | |||
Finally, we exclude rwnd from further discussion because its value | Finally, we exclude rwnd from further discussion because its value | |||
should depend on the send window size, so it is already addressed by | should depend on the send window size, so it is already addressed by | |||
send window sharing and is not independently affected by sharing. | send window sharing and is not independently affected by sharing. | |||
5. TCB Interdependence | 5. TCB Interdependence | |||
There are two cases of TCB interdependence. Temporal sharing occurs | There are two cases of TCB interdependence. Temporal sharing occurs | |||
when the TCB of an earlier (now CLOSED) connection to a host is used | when the TCB of an earlier (now CLOSED) connection to a host is used | |||
to initialize some parameters of a new connection to that same host, | to initialize some parameters of a new connection to that same host, | |||
i.e., in sequence. Ensemble sharing occurs when a currently active | i.e., in sequence. Ensemble sharing occurs when a currently active | |||
connection to a host is used to initialize another (concurrent) | connection to a host is used to initialize another (concurrent) | |||
connection to that host. | connection to that host. | |||
6. Temporal Sharing | 6. Temporal Sharing | |||
The TCB data cache is accessed in two ways: it is read to initialize | The TCB data cache is accessed in two ways: it is read to initialize | |||
new TCBs and written when more current per-host state is available. | new TCBs and written when more current per-host state is available. | |||
6.1. Initialization of a new TCB | 6.1. Initialization of a New TCB | |||
TCBs for new connections can be initialized using cached context | ||||
from past connections as follows: | ||||
TEMPORAL SHARING - TCB Initialization | ||||
Cached TCB New TCB | ||||
-------------------------------------- | ||||
old_MMS_S old_MMS_S or not cached* | ||||
old_MMS_R old_MMS_R or not cached* | ||||
old_sendMSS old_sendMSS | ||||
old_PMTU old_PMTU+ | ||||
old_RTT old_RTT | ||||
old_RTTVAR old_RTTVAR | ||||
old_option (option specific) | TCBs for new connections can be initialized using cached context from | |||
past connections as follows: | ||||
old_ssthresh old_ssthresh | +==============+=============================+ | |||
| Cached TCB | New TCB | | ||||
+==============+=============================+ | ||||
| old_MMS_S | old_MMS_S or not cached (2) | | ||||
+--------------+-----------------------------+ | ||||
| old_MMS_R | old_MMS_R or not cached (2) | | ||||
+--------------+-----------------------------+ | ||||
| old_sendMSS | old_sendMSS | | ||||
+--------------+-----------------------------+ | ||||
| old_PMTU | old_PMTU (1) | | ||||
+--------------+-----------------------------+ | ||||
| old_RTT | old_RTT | | ||||
+--------------+-----------------------------+ | ||||
| old_RTTVAR | old_RTTVAR | | ||||
+--------------+-----------------------------+ | ||||
| old_option | (option specific) | | ||||
+--------------+-----------------------------+ | ||||
| old_ssthresh | old_ssthresh | | ||||
+--------------+-----------------------------+ | ||||
| old_sendcwnd | old_sendcwnd | | ||||
+--------------+-----------------------------+ | ||||
old_sendcwnd old_sendcwnd | Table 1: Temporal Sharing - TCB Initialization | |||
+Note that PMTU is cached at the IP layer [RFC1191][RFC4821]. | (1) Note that PMTU is cached at the IP layer [RFC1191] [RFC4821]. | |||
*Note that some values are not cached when they are computed locally | ||||
(MMS_R) or indicated in the connection itself (MMS_S in the SYN). | ||||
The table below gives an overview of option-specific information | (2) Note that some values are not cached when they are computed | |||
that can be shared. Additional information on some specific TCP | locally (MMS_R) or indicated in the connection itself (MMS_S in | |||
options and sharing is provided in Appendix B. | the SYN). | |||
TEMPORAL SHARING - Option Info Initialization | Table 2 gives an overview of option-specific information that can be | |||
shared. Additional information on some specific TCP options and | ||||
sharing is provided in Appendix B. | ||||
Cached New | +=================+=================+ | |||
------------------------------------ | | Cached | New | | |||
old_TFO_cookie old_TFO_cookie | +=================+=================+ | |||
| old_TFO_cookie | old_TFO_cookie | | ||||
+-----------------+-----------------+ | ||||
| old_TFO_failure | old_TFO_failure | | ||||
+-----------------+-----------------+ | ||||
old_TFO_failure old_TFO_failure | Table 2: Temporal Sharing - | |||
Option Info Initialization | ||||
6.2. Updates to the TCB cache | 6.2. Updates to the TCB Cache | |||
During a connection, the TCB cache can be updated based on events of | During a connection, the TCB cache can be updated based on events of | |||
current connections and their TCBs as they progress over time, as | current connections and their TCBs as they progress over time, as | |||
shown below: | shown in Table 3. | |||
TEMPORAL SHARING - Cache Updates | ||||
Cached TCB Current TCB when? New Cached TCB | ||||
---------------------------------------------------------- | ||||
old_MMS_S curr_MMS_S OPEN curr_MMS_S | ||||
old_MMS_R curr_MMS_R OPEN curr_MMS_R | ||||
old_sendMSS curr_sendMSS MSSopt curr_sendMSS | ||||
old_PMTU curr_PMTU PMTUD+ / curr_PMTU | ||||
PLPMTUD+ | ||||
old_RTT curr_RTT CLOSE merge(curr,old) | ||||
old_RTTVAR curr_RTTVAR CLOSE merge(curr,old) | ||||
old_option curr_option ESTAB (depends on option) | ||||
old_ssthresh curr_ssthresh CLOSE merge(curr,old) | +==============+===============+=============+=================+ | |||
| Cached TCB | Current TCB | When? | New Cached TCB | | ||||
+==============+===============+=============+=================+ | ||||
| old_MMS_S | curr_MMS_S | OPEN | curr_MMS_S | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_MMS_R | curr_MMS_R | OPEN | curr_MMS_R | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_sendMSS | curr_sendMSS | MSSopt | curr_sendMSS | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_PMTU | curr_PMTU | PMTUD (1) / | curr_PMTU | | ||||
| | | PLPMTUD (1) | | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_RTT | curr_RTT | CLOSE | merge(curr,old) | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_RTTVAR | curr_RTTVAR | CLOSE | merge(curr,old) | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_option | curr_option | ESTAB | (depends on | | ||||
| | | | option) | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_ssthresh | curr_ssthresh | CLOSE | merge(curr,old) | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
| old_sendcwnd | curr_sendcwnd | CLOSE | merge(curr,old) | | ||||
+--------------+---------------+-------------+-----------------+ | ||||
old_sendcwnd curr_sendcwnd CLOSE merge(curr,old) | Table 3: Temporal Sharing - Cache Updates | |||
+Note that PMTU is cached at the IP layer [RFC1191][RFC4821]. | (1) Note that PMTU is cached at the IP layer [RFC1191] [RFC4821]. | |||
Merge() is the function that combines the current and previous (old) | Merge() is the function that combines the current and previous (old) | |||
values and may vary for each parameter of the TCB cache. The | values and may vary for each parameter of the TCB cache. The | |||
particular function is not specified in this document; examples | particular function is not specified in this document; examples | |||
include windowed averages (mean of the past N values, for some N) | include windowed averages (mean of the past N values, for some N) and | |||
and exponential decay (new = (1-alpha)*old + alpha *new, where alpha | exponential decay (new = (1-alpha)*old + alpha *new, where alpha is | |||
is in the range [0..1]). | in the range [0..1]). | |||
The table below gives an overview of option-specific information | ||||
that can be similarly shared. The TFO cookie is maintained until the | ||||
client explicitly requests it be updated as a separate event. | ||||
TEMPORAL SHARING - Option Info Updates | Table 4 gives an overview of option-specific information that can be | |||
similarly shared. The TFO cookie is maintained until the client | ||||
explicitly requests it be updated as a separate event. | ||||
Cached Current when? New Cached | +=================+=================+=======+=================+ | |||
--------------------------------------------------------- | | Cached | Current | When? | New Cached | | |||
old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie | +=================+=================+=======+=================+ | |||
| old_TFO_cookie | old_TFO_cookie | ESTAB | old_TFO_cookie | | ||||
+-----------------+-----------------+-------+-----------------+ | ||||
| old_TFO_failure | old_TFO_failure | ESTAB | old_TFO_failure | | ||||
+-----------------+-----------------+-------+-----------------+ | ||||
old_TFO_failure old_TFO_failure ESTAB old_TFO_failure | Table 4: Temporal Sharing - Option Info Updates | |||
6.3. Discussion | 6.3. Discussion | |||
As noted, there is no particular benefit to caching MMS_S and MMS_R | As noted, there is no particular benefit to caching MMS_S and MMS_R | |||
as these are reported by the local IP stack. Caching sendMSS and | as these are reported by the local IP stack. Caching sendMSS and | |||
PMTU is trivial; reported values are cached (PMTU at the IP layer), | PMTU is trivial; reported values are cached (PMTU at the IP layer), | |||
and the most recent values are used. The cache is updated when the | and the most recent values are used. The cache is updated when the | |||
MSS option is received in a SYN or after PMTUD (i.e., when an ICMPv4 | MSS option is received in a SYN or after PMTUD (i.e., when an ICMPv4 | |||
Fragmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is | Fragmentation Needed [RFC1191] or ICMPv6 Packet Too Big message is | |||
received [RFC8201] or the equivalent is inferred, e.g., as from | received [RFC8201] or the equivalent is inferred, e.g., as from | |||
PLPMTUD [RFC4821]), respectively, so the cache always has the most | PLPMTUD [RFC4821]), respectively, so the cache always has the most | |||
recent values from any connection. For sendMSS, the cache is | recent values from any connection. For sendMSS, the cache is | |||
consulted only at connection establishment and not otherwise | consulted only at connection establishment and not otherwise updated, | |||
updated, which means that MSS options do not affect current | which means that MSS options do not affect current connections. The | |||
connections. The default sendMSS is never saved; only reported MSS | default sendMSS is never saved; only reported MSS values update the | |||
values update the cache, so an explicit override is required to | cache, so an explicit override is required to reduce the sendMSS. | |||
reduce the sendMSS. Cached sendMSS affects only data sent in the SYN | Cached sendMSS affects only data sent in the SYN segment, i.e., | |||
segment, i.e., during client connection initiation or during | during client connection initiation or during simultaneous open; the | |||
simultaneous open; all other segment MSS are based on the value | MSS of all other segments are constrained by the value updated as | |||
updated as included in the SYN. | included in the SYN. | |||
RTT values are updated by formulae that merges the old and new | RTT values are updated by formulae that merge the old and new values, | |||
values, as noted in Section 6.2. Dynamic RTT estimation requires a | as noted in Section 6.2. Dynamic RTT estimation requires a sequence | |||
sequence of RTT measurements. As a result, the cached RTT (and its | of RTT measurements. As a result, the cached RTT (and its variation) | |||
variation) is an average of its previous value with the contents of | is an average of its previous value with the contents of the | |||
the currently active TCB for that host, when a TCB is closed. RTT | currently active TCB for that host, when a TCB is closed. RTT values | |||
values are updated only when a connection is closed. The method for | are updated only when a connection is closed. The method for merging | |||
merging old and current values needs to attempt to reduce the | old and current values needs to attempt to reduce the transient | |||
transient effects of the new connections. | effects of the new connections. | |||
The updates for RTT, RTTVAR and ssthresh rely on existing | The updates for RTT, RTTVAR, and ssthresh rely on existing | |||
information, i.e., old values. Should no such values exist, the | information, i.e., old values. Should no such values exist, the | |||
current values are cached instead. | current values are cached instead. | |||
TCP options are copied or merged depending on the details of each | TCP options are copied or merged depending on the details of each | |||
option. E.g., TFO state is updated when a connection is established | option. For example, TFO state is updated when a connection is | |||
and read before establishing a new connection. | established and read before establishing a new connection. | |||
Sections 8 and 9 discuss compatibility issues and implications of | Sections 8 and 9 discuss compatibility issues and implications of | |||
sharing the specific information listed above. Section 10 gives an | sharing the specific information listed above. Section 10 gives an | |||
overview of known implementations. | overview of known implementations. | |||
Most cached TCB values are updated when a connection closes. The | Most cached TCB values are updated when a connection closes. The | |||
exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122], | exceptions are MMS_R and MMS_S, which are reported by IP [RFC1122]; | |||
PMTU which is updated after Path MTU Discovery and also reported by | PMTU, which is updated after Path MTU Discovery and also reported by | |||
IP [RFC1191][RFC4821][RFC8201], and sendMSS, which is updated if the | IP [RFC1191] [RFC4821] [RFC8201]; and sendMSS, which is updated if | |||
MSS option is received in the TCP SYN header. | the MSS option is received in the TCP SYN header. | |||
Sharing sendMSS information affects only data in the SYN of the next | Sharing sendMSS information affects only data in the SYN of the next | |||
connection, because sendMSS information is typically included in | connection, because sendMSS information is typically included in most | |||
most TCP SYN segments. Caching PMTU can accelerate the efficiency of | TCP SYN segments. Caching PMTU can accelerate the efficiency of | |||
PMTUD but can also result in black-holing until corrected if in | PMTUD but can also result in black-holing until corrected if in | |||
error. Caching MMS_R and MMS_S may be of little direct value as they | error. Caching MMS_R and MMS_S may be of little direct value as they | |||
are reported by the local IP stack anyway. | are reported by the local IP stack anyway. | |||
The way in which other TCP option state can be shared depends on the | The way in which state related to other TCP options can be shared | |||
details of that option. E.g., TFO state includes the TCP Fast Open | depends on the details of that option. For example, TFO state | |||
Cookie [RFC7413] or, in case TFO fails, a negative TCP Fast Open | includes the TCP Fast Open cookie [RFC7413] or, in case TFO fails, a | |||
response. RFC 7413 states, "The client MUST cache negative responses | negative TCP Fast Open response. RFC 7413 states, | |||
from the server in order to avoid potential connection failures. | ||||
Negative responses include the server not acknowledging the data in | ||||
the SYN, ICMP error messages, and (most importantly) no response | ||||
(SYN-ACK) from the server at all, i.e., connection timeout." [RFC | ||||
7413]. TFOinfo is cached when a connection is established. | ||||
Other TCP option state might not be as readily cached. E.g., TCP-AO | | The client MUST cache negative responses from the server in order | |||
[RFC5925] success or failure between a host pair for a single SYN | | to avoid potential connection failures. Negative responses | |||
destination port might be usefully cached. TCP-AO success or failure | | include the server not acknowledging the data in the SYN, ICMP | |||
to other SYN destination ports on that host pair is never useful to | | error messages, and (most importantly) no response (SYN-ACK) from | |||
cache because TCP-AO security parameters can vary per service. | | the server at all, i.e., connection timeout. | |||
7. Ensemble Sharing | TFOinfo is cached when a connection is established. | |||
State related to other TCP options might not be as readily cached. | ||||
For example, TCP-AO [RFC5925] success or failure between a host-pair | ||||
for a single SYN destination port might be usefully cached. TCP-AO | ||||
success or failure to other SYN destination ports on that host-pair | ||||
is never useful to cache because TCP-AO security parameters can vary | ||||
per service. | ||||
7. Ensemble Sharing | ||||
Sharing cached TCB data across concurrent connections requires | Sharing cached TCB data across concurrent connections requires | |||
attention to the aggregate nature of some of the shared state. For | attention to the aggregate nature of some of the shared state. For | |||
example, although MSS and RTT values can be shared by copying, it | example, although MSS and RTT values can be shared by copying, it may | |||
may not be appropriate to simply copy congestion window or ssthresh | not be appropriate to simply copy congestion window or ssthresh | |||
information; instead, the new values can be a function (f) of the | information; instead, the new values can be a function (f) of the | |||
cumulative values and the number of connections (N). | cumulative values and the number of connections (N). | |||
7.1. Initialization of a new TCB | 7.1. Initialization of a New TCB | |||
TCBs for new connections can be initialized using cached context | ||||
from concurrent connections as follows: | ||||
ENSEMBLE SHARING - TCB Initialization | ||||
Cached TCB New TCB | ||||
------------------------------------------ | ||||
old_MMS_S old_MMS_S | ||||
old_MMS_R old_MMS_R | ||||
old_sendMSS old_sendMSS | ||||
old_PMTU old_PMTU+ | ||||
old_RTT old_RTT | ||||
old_RTTVAR old_RTTVAR | ||||
sum(old_ssthresh) f(sum(old_ssthresh), N) | ||||
sum(old_sendcwnd) f(sum(old_sendcwnd), N) | ||||
_ | ||||
old_option (option specific) | ||||
+Note that PMTU is cached at the IP layer [RFC1191][RFC4821]. | ||||
In the table, the cached sum() is a total across all active | ||||
connections because these parameters act in aggregate; similarly f() | ||||
is a function that updates that sum based on the new connection's | ||||
values, represented as "N". | ||||
The table below gives an overview of option-specific information | ||||
that can be similarly shared. Again, The TFO_cookie is updated upon | ||||
explicit client request, which is a separate event. | ||||
ENSEMBLE SHARING - Option Info Initialization | ||||
Cached New | ||||
------------------------------------ | ||||
old_TFO_cookie old_TFO_cookie | ||||
old_TFO_failure old_TFO_failure | ||||
7.2. Updates to the TCB cache | TCBs for new connections can be initialized using cached context from | |||
concurrent connections as follows: | ||||
During a connection, the TCB cache can be updated based on changes | +===================+=========================+ | |||
to concurrent connections and their TCBs, as shown below: | | Cached TCB | New TCB | | |||
+===================+=========================+ | ||||
| old_MMS_S | old_MMS_S | | ||||
+-------------------+-------------------------+ | ||||
| old_MMS_R | old_MMS_R | | ||||
+-------------------+-------------------------+ | ||||
| old_sendMSS | old_sendMSS | | ||||
+-------------------+-------------------------+ | ||||
| old_PMTU | old_PMTU (1) | | ||||
+-------------------+-------------------------+ | ||||
| old_RTT | old_RTT | | ||||
+-------------------+-------------------------+ | ||||
| old_RTTVAR | old_RTTVAR | | ||||
+-------------------+-------------------------+ | ||||
| sum(old_ssthresh) | f(sum(old_ssthresh), N) | | ||||
+-------------------+-------------------------+ | ||||
| sum(old_sendcwnd) | f(sum(old_sendcwnd), N) | | ||||
+-------------------+-------------------------+ | ||||
| old_option | (option specific) | | ||||
+-------------------+-------------------------+ | ||||
ENSEMBLE SHARING - Cache Updates | Table 5: Ensemble Sharing - TCB Initialization | |||
Cached TCB Current TCB when? New Cached TCB | (1) Note that PMTU is cached at the IP layer [RFC1191] [RFC4821]. | |||
--------------------------------------------------------------- | ||||
old_MMS_S curr_MMS_S OPEN curr_MMS_S | ||||
old_MMS_R curr_MMS_R OPEN curr_MMS_R | In Table 5, the cached sum() is a total across all active connections | |||
because these parameters act in aggregate; similarly, f() is a | ||||
function that updates that sum based on the new connection's values, | ||||
represented as "N". | ||||
old_sendMSS curr_sendMSS MSSopt curr_sendMSS | Table 6 gives an overview of option-specific information that can be | |||
similarly shared. Again, the TFO_cookie is updated upon explicit | ||||
client request, which is a separate event. | ||||
old_PMTU curr_PMTU PMTUD+ / curr_PMTU | +=================+=================+ | |||
PLPMTUD+ | | Cached | New | | |||
+=================+=================+ | ||||
| old_TFO_cookie | old_TFO_cookie | | ||||
+-----------------+-----------------+ | ||||
| old_TFO_failure | old_TFO_failure | | ||||
+-----------------+-----------------+ | ||||
old_RTT curr_RTT update rtt_update(old, curr) | Table 6: Ensemble Sharing - | |||
Option Info Initialization | ||||
old_RTTVAR curr_RTTVAR update rtt_update(old, curr) | 7.2. Updates to the TCB Cache | |||
old_ssthresh curr_ssthresh update adjust sum as appropriate | During a connection, the TCB cache can be updated based on changes to | |||
concurrent connections and their TCBs, as shown below: | ||||
old_sendcwnd curr_sendcwnd update adjust sum as appropriate | +==============+===============+===========+=================+ | |||
| Cached TCB | Current TCB | When? | New Cached TCB | | ||||
+==============+===============+===========+=================+ | ||||
| old_MMS_S | curr_MMS_S | OPEN | curr_MMS_S | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_MMS_R | curr_MMS_R | OPEN | curr_MMS_R | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_sendMSS | curr_sendMSS | MSSopt | curr_sendMSS | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_PMTU | curr_PMTU | PMTUD+ / | curr_PMTU | | ||||
| | | PLPMTUD+ | | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_RTT | curr_RTT | update | rtt_update(old, | | ||||
| | | | curr) | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_RTTVAR | curr_RTTVAR | update | rtt_update(old, | | ||||
| | | | curr) | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_ssthresh | curr_ssthresh | update | adjust sum as | | ||||
| | | | appropriate | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_sendcwnd | curr_sendcwnd | update | adjust sum as | | ||||
| | | | appropriate | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
| old_option | curr_option | (depends) | (option | | ||||
| | | | specific) | | ||||
+--------------+---------------+-----------+-----------------+ | ||||
old_option curr_option (depends) (option specific) | Table 7: Ensemble Sharing - Cache Updates | |||
+Note that the PMTU is cached at the IP layer [RFC1191][RFC4821]. | + Note that the PMTU is cached at the IP layer [RFC1191] [RFC4821]. | |||
In the table, rtt_update() is the function used to combine old and | In Table 7, rtt_update() is the function used to combine old and | |||
current values, e.g., as a windowed average or exponentially decayed | current values, e.g., as a windowed average or exponentially decayed | |||
average. | average. | |||
The table below gives an overview of option-specific information | Table 8 gives an overview of option-specific information that can be | |||
that can be similarly shared. | similarly shared. | |||
ENSEMBLE SHARING - Option Info Updates | ||||
Cached Current when? New Cached | +=================+=================+=======+=================+ | |||
---------------------------------------------------------- | | Cached | Current | When? | New Cached | | |||
old_TFO_cookie old_TFO_cookie ESTAB old_TFO_cookie | +=================+=================+=======+=================+ | |||
| old_TFO_cookie | old_TFO_cookie | ESTAB | old_TFO_cookie | | ||||
+-----------------+-----------------+-------+-----------------+ | ||||
| old_TFO_failure | old_TFO_failure | ESTAB | old_TFO_failure | | ||||
+-----------------+-----------------+-------+-----------------+ | ||||
old_TFO_failure old_TFO_failure ESTAB old_TFO_failure | Table 8: Ensemble Sharing - Option Info Updates | |||
7.3. Discussion | 7.3. Discussion | |||
For ensemble sharing, TCB information should be cached as early as | For ensemble sharing, TCB information should be cached as early as | |||
possible, sometimes before a connection is closed. Otherwise, | possible, sometimes before a connection is closed. Otherwise, | |||
opening multiple concurrent connections may not result in TCB data | opening multiple concurrent connections may not result in TCB data | |||
sharing if no connection closes before others open. The amount of | sharing if no connection closes before others open. The amount of | |||
work involved in updating the aggregate average should be minimized, | work involved in updating the aggregate average should be minimized, | |||
but the resulting value should be equivalent to having all values | but the resulting value should be equivalent to having all values | |||
measured within a single connection. The function "rtt_update" in | measured within a single connection. The function "rtt_update" in | |||
the ensemble sharing table indicates this operation, which occurs | Table 7 indicates this operation, which occurs whenever the RTT would | |||
whenever the RTT would have been updated in the individual TCP | have been updated in the individual TCP connection. As a result, the | |||
connection. As a result, the cache contains the shared RTT | cache contains the shared RTT variables, which no longer need to | |||
variables, which no longer need to reside in the TCB. | reside in the TCB. | |||
Congestion window size and ssthresh aggregation are more complicated | Congestion window size and ssthresh aggregation are more complicated | |||
in the concurrent case. When there is an ensemble of connections, we | in the concurrent case. When there is an ensemble of connections, we | |||
need to decide how that ensemble would have shared these variables, | need to decide how that ensemble would have shared these variables, | |||
in order to derive initial values for new TCBs. | in order to derive initial values for new TCBs. | |||
Sections 8 and 9 discuss compatibility issues and implications of | Sections 8 and 9 discuss compatibility issues and implications of | |||
sharing the specific information listed above. | sharing the specific information listed above. | |||
There are several ways to initialize the congestion window in a new | There are several ways to initialize the congestion window in a new | |||
TCB among an ensemble of current connections to a host. Current TCP | TCB among an ensemble of current connections to a host. Current TCP | |||
implementations initialize it to four segments as standard [RFC3390] | implementations initialize it to 4 segments as standard [RFC3390] and | |||
and 10 segments experimentally [RFC6928]. These approaches assume | 10 segments experimentally [RFC6928]. These approaches assume that | |||
that new connections should behave as conservatively as possible. | new connections should behave as conservatively as possible. The | |||
The algorithm described in [Ba12] adjusts the initial cwnd depending | algorithm described in [Ba12] adjusts the initial cwnd depending on | |||
on the cwnd values of ongoing connections. It is also possible to | the cwnd values of ongoing connections. It is also possible to use | |||
use sharing mechanisms over long timescales to adapt TCP's initial | sharing mechanisms over long timescales to adapt TCP's initial window | |||
window automatically, as described further in Appendix C. | automatically, as described further in Appendix C. | |||
8. Issues with TCB information sharing | 8. Issues with TCB Information Sharing | |||
Here, we discuss various types of problems that may arise with TCB | Here, we discuss various types of problems that may arise with TCB | |||
information sharing. | information sharing. | |||
For the congestion and current window information, the initial | For the congestion and current window information, the initial values | |||
values computed by TCB interdependence may not be consistent with | computed by TCB interdependence may not be consistent with the long- | |||
the long-term aggregate behavior of a set of concurrent connections | term aggregate behavior of a set of concurrent connections between | |||
between the same endpoints. Under conventional TCP congestion | the same endpoints. Under conventional TCP congestion control, if | |||
control, if the congestion window of a single existing connection | the congestion window of a single existing connection has converged | |||
has converged to 40 segments, two newly joining concurrent | to 40 segments, two newly joining concurrent connections will assume | |||
connections assume initial windows of 10 segments [RFC6928], and the | initial windows of 10 segments [RFC6928] and the existing | |||
current connection's window doesn't decrease to accommodate this | connection's window will not decrease to accommodate this additional | |||
additional load and connections can mutually interfere. One example | load. As a consequence, the three connections can mutually | |||
of this is seen on low-bandwidth, high-delay links, where concurrent | interfere. One example of this is seen on low-bandwidth, high-delay | |||
connections supporting Web traffic can collide because their initial | links, where concurrent connections supporting Web traffic can | |||
windows were too large, even when set at one segment. | collide because their initial windows were too large, even when set | |||
at 1 segment. | ||||
The authors of [Hu12] recommend caching ssthresh for temporal | The authors of [Hu12] recommend caching ssthresh for temporal sharing | |||
sharing only when flows are long. Some studies suggest that sharing | only when flows are long. Some studies suggest that sharing ssthresh | |||
ssthresh between short flows can deteriorate the performance of | between short flows can deteriorate the performance of individual | |||
individual connections [Hu12, Du16], although this may benefit | connections [Hu12] [Du16], although this may benefit aggregate | |||
aggregate network performance. | network performance. | |||
8.1. Traversing the same network path | 8.1. Traversing the Same Network Path | |||
TCP is sometimes used in situations where packets of the same host- | TCP is sometimes used in situations where packets of the same host- | |||
pair do not always take the same path, such as when connection- | pair do not always take the same path, such as when connection- | |||
specific parameters are used for routing (e.g., for load balancing). | specific parameters are used for routing (e.g., for load balancing). | |||
Multipath routing that relies on examining transport headers, such | Multipath routing that relies on examining transport headers, such as | |||
as ECMP and LAG [RFC7424], may not result in repeatable path | ECMP and Link Aggregation Group (LAG) [RFC7424], may not result in | |||
selection when TCP segments are encapsulated, encrypted, or altered | repeatable path selection when TCP segments are encapsulated, | |||
- for example, in some Virtual Private Network (VPN) tunnels that | encrypted, or altered -- for example, in some Virtual Private Network | |||
rely on proprietary encapsulation. Similarly, such approaches cannot | (VPN) tunnels that rely on proprietary encapsulation. Similarly, | |||
operate deterministically when the TCP header is encrypted, e.g., | such approaches cannot operate deterministically when the TCP header | |||
when using IPsec ESP (although TCB interdependence among the entire | is encrypted, e.g., when using IPsec Encapsulating Security Payload | |||
set sharing the same endpoint IP addresses should work without | (ESP) (although TCB interdependence among the entire set sharing the | |||
problems when the TCP header is encrypted). Measures to increase the | same endpoint IP addresses should work without problems when the TCP | |||
probability that connections use the same path could be applied: | header is encrypted). Measures to increase the probability that | |||
e.g., the connections could be given the same IPv6 flow label | connections use the same path could be applied; for example, the | |||
[RFC6437]. TCB interdependence can also be extended to sets of host | connections could be given the same IPv6 flow label [RFC6437]. TCB | |||
IP address pairs that share the same network path conditions, such | interdependence can also be extended to sets of host IP address pairs | |||
as when a group of addresses is on the same LAN (see Section 9). | that share the same network path conditions, such as when a group of | |||
addresses is on the same LAN (see Section 9). | ||||
Traversing the same path is not important for host-specific | Traversing the same path is not important for host-specific | |||
information such as rwnd and TCP option state, such as TFOinfo, or | information (e.g., rwnd), TCP option state (e.g., TFOinfo), or for | |||
for information that is already cached per-host, such as path MTU. | information that is already cached per-host (e.g., path MTU). When | |||
When TCB information is shared across different SYN destination | TCB information is shared across different SYN destination ports, | |||
ports, path-related information can be incorrect; however, the | path-related information can be incorrect; however, the impact of | |||
impact of this error is potentially diminished if (as discussed | this error is potentially diminished if (as discussed here) TCB | |||
here) TCB sharing affects only the transient event of a connection | sharing affects only the transient event of a connection start or if | |||
start or if TCB information is shared only within connections to the | TCB information is shared only within connections to the same SYN | |||
same SYN destination port. | destination port. | |||
In case of Temporal Sharing, TCB information could also become | In the case of temporal sharing, TCB information could also become | |||
invalid over time, i.e., indicating that although the path remains | invalid over time, i.e., indicating that although the path remains | |||
the same, path properties have changed. Because this is similar to | the same, path properties have changed. Because this is similar to | |||
the case when a connection becomes idle, mechanisms that address | the case when a connection becomes idle, mechanisms that address idle | |||
idle TCP connections (e.g., [RFC7661]) could also be applied to TCB | TCP connections (e.g., [RFC7661]) could also be applied to TCB cache | |||
cache management, especially when TCP Fast Open is used [RFC7413]. | management, especially when TCP Fast Open is used [RFC7413]. | |||
8.2. State dependence | 8.2. State Dependence | |||
There may be additional considerations to the way in which TCB | There may be additional considerations to the way in which TCB | |||
interdependence rebalances congestion feedback among the current | interdependence rebalances congestion feedback among the current | |||
connections, e.g., it may be appropriate to consider the impact of a | connections. For example, it may be appropriate to consider the | |||
connection being in Fast Recovery [RFC5681] or some other similar | impact of a connection being in Fast Recovery [RFC5681] or some other | |||
unusual feedback state, e.g., as inhibiting or affecting the | similar unusual feedback state that could inhibit or affect the | |||
calculations described herein. | calculations described herein. | |||
8.3. Problems with sharing based on IP address | 8.3. Problems with Sharing Based on IP Address | |||
It can be wrong to share TCB information between TCP connections on | It can be wrong to share TCB information between TCP connections on | |||
the same host as identified by the IP address if an IP address is | the same host as identified by the IP address if an IP address is | |||
assigned to a new host (e.g., IP address spinning, as is used by | assigned to a new host (e.g., IP address spinning, as is used by ISPs | |||
ISPs to inhibit running servers). It can be wrong if Network Address | to inhibit running servers). It can be wrong if Network Address | |||
(and Port) Translation (NA(P)T) [RFC2663] or any other IP sharing | Translation (NAT) [RFC2663], Network Address and Port Translation | |||
mechanism is used. Such mechanisms are less likely to be used with | (NAPT) [RFC2663], or any other IP sharing mechanism is used. Such | |||
IPv6. Other methods to identify a host could also be considered to | mechanisms are less likely to be used with IPv6. Other methods to | |||
make correct TCB sharing more likely. Moreover, some TCB information | identify a host could also be considered to make correct TCB sharing | |||
is about dominant path properties rather than the specific host. IP | more likely. Moreover, some TCB information is about dominant path | |||
addresses may differ, yet the relevant part of the path may be the | properties rather than the specific host. IP addresses may differ, | |||
same. | yet the relevant part of the path may be the same. | |||
9. Implications | 9. Implications | |||
There are several implications to incorporating TCB interdependence | There are several implications to incorporating TCB interdependence | |||
in TCP implementations. First, it may reduce the need for | in TCP implementations. First, it may reduce the need for | |||
application-layer multiplexing for performance enhancement | application-layer multiplexing for performance enhancement [RFC7231]. | |||
[RFC7231]. Protocols like HTTP/2 [RFC7540] avoid connection | Protocols like HTTP/2 [RFC7540] avoid connection re-establishment | |||
reestablishment costs by serializing or multiplexing a set of per- | costs by serializing or multiplexing a set of per-host connections | |||
host connections across a single TCP connection. This avoids TCP's | across a single TCP connection. This avoids TCP's per-connection | |||
per-connection OPEN handshake and also avoids recomputing the MSS, | OPEN handshake and also avoids recomputing the MSS, RTT, and | |||
RTT, and congestion window values. By avoiding the so-called "slow- | congestion window values. By avoiding the so-called "slow-start | |||
start restart", performance can be optimized [Hu01]. TCB | restart", performance can be optimized [Hu01]. TCB interdependence | |||
interdependence can provide the "slow-start restart avoidance" of | can provide the "slow-start restart avoidance" of multiplexing, | |||
multiplexing, without requiring a multiplexing mechanism at the | without requiring a multiplexing mechanism at the application layer. | |||
application layer. | ||||
Like the initial version of this document [RFC2140], this update's | Like the initial version of this document [RFC2140], this update's | |||
approach to TCB interdependence focuses on sharing a set of TCBs by | approach to TCB interdependence focuses on sharing a set of TCBs by | |||
updating the TCB state to reduce the impact of transients when | updating the TCB state to reduce the impact of transients when | |||
connections begin, end, or otherwise significantly change state. | connections begin, end, or otherwise significantly change state. | |||
Other mechanisms have since been proposed to continuously share | Other mechanisms have since been proposed to continuously share | |||
information between all ongoing communication (including | information between all ongoing communication (including | |||
connectionless protocols), updating the congestion state during any | connectionless protocols) and update the congestion state during any | |||
congestion-related event (e.g., timeout, loss confirmation, etc.) | congestion-related event (e.g., timeout, loss confirmation, etc.) | |||
[RFC3124]. By dealing exclusively with transients, the approach in | [RFC3124]. By dealing exclusively with transients, the approach in | |||
this document is more likely to exhibit the "steady-state" behavior | this document is more likely to exhibit the "steady-state" behavior | |||
as unmodified, independent TCP connections. | as unmodified, independent TCP connections. | |||
9.1. Layering | 9.1. Layering | |||
TCB interdependence pushes some of the TCP implementation from the | TCB interdependence pushes some of the TCP implementation from its | |||
traditional transport layer (in the ISO model), to the network | typical placement solely within the transport layer (in the ISO | |||
layer. This acknowledges that some state is in fact per-host-pair or | model) to the network layer. This acknowledges that some components | |||
can be per-path as indicated solely by that host-pair. Transport | of state are, in fact, per-host-pair or can be per-path as indicated | |||
protocols typically manage per-application-pair associations (per | solely by that host-pair. Transport protocols typically manage per- | |||
stream), and network protocols manage per-host-pair and path | application-pair associations (per stream), and network protocols | |||
associations (routing). Round-trip time, MSS, and congestion | manage per-host-pair and path associations (routing). Round-trip | |||
information could be more appropriately handled at the network | time, MSS, and congestion information could be more appropriately | |||
layer, aggregated among concurrent connections, and shared across | handled at the network layer, aggregated among concurrent | |||
connection instances [RFC3124]. | connections, and shared across connection instances [RFC3124]. | |||
An earlier version of RTT sharing suggested implementing RTT state | An earlier version of RTT sharing suggested implementing RTT state at | |||
at the IP layer, rather than at the TCP layer. Our observations | the IP layer rather than at the TCP layer. Our observations describe | |||
describe sharing state among TCP connections, which avoids some of | sharing state among TCP connections, which avoids some of the | |||
the difficulties in an IP-layer solution. One such problem of an IP | difficulties in an IP-layer solution. One such problem of an IP- | |||
layer solution is determining the correspondence between packet | layer solution is determining the correspondence between packet | |||
exchanges using IP header information alone, where such | exchanges using IP header information alone, where such | |||
correspondence is needed to compute RTT. Because TCB sharing | correspondence is needed to compute RTT. Because TCB sharing | |||
computes RTTs inside the TCP layer using TCP header information, it | computes RTTs inside the TCP layer using TCP header information, it | |||
can be implemented more directly and simply than at the IP layer. | can be implemented more directly and simply than at the IP layer. | |||
This is a case where information should be computed at the transport | This is a case where information should be computed at the transport | |||
layer but could be shared at the network layer. | layer but could be shared at the network layer. | |||
9.2. Other possibilities | 9.2. Other Possibilities | |||
Per-host-pair associations are not the limit of these techniques. It | Per-host-pair associations are not the limit of these techniques. It | |||
is possible that TCBs could be similarly shared between hosts on a | is possible that TCBs could be similarly shared between hosts on a | |||
subnet or within a cluster, because the predominant path can be | subnet or within a cluster, because the predominant path can be | |||
subnet-subnet, rather than host-host. Additionally, TCB | subnet-subnet rather than host-host. Additionally, TCB | |||
interdependence can be applied to any protocol with congestion | interdependence can be applied to any protocol with congestion state, | |||
state, including SCTP [RFC4960] and DCCP [RFC4340], as well as for | including SCTP [RFC4960] and DCCP [RFC4340], as well as to individual | |||
individual subflows in Multipath TCP [RFC8684]. | subflows in Multipath TCP [RFC8684]. | |||
There may be other information that can be shared between concurrent | There may be other information that can be shared between concurrent | |||
connections. For example, knowing that another connection has just | connections. For example, knowing that another connection has just | |||
tried to expand its window size and failed, a connection may not | tried to expand its window size and failed, a connection may not | |||
attempt to do the same for some period. The idea is that existing | attempt to do the same for some period. The idea is that existing | |||
TCP implementations infer the behavior of all competing connections, | TCP implementations infer the behavior of all competing connections, | |||
including those within the same host or subnet. One possible | including those within the same host or subnet. One possible | |||
optimization is to make that implicit feedback explicit, via | optimization is to make that implicit feedback explicit, via extended | |||
extended information associated with the endpoint IP address and its | information associated with the endpoint IP address and its TCP | |||
TCP implementation, rather than per-connection state in the TCB. | implementation, rather than per-connection state in the TCB. | |||
This document focuses on sharing TCB information at connection | This document focuses on sharing TCB information at connection | |||
initialization. Subsequent to RFC 2140, there have been numerous | initialization. Subsequent to RFC 2140, there have been numerous | |||
approaches that attempt to coordinate ongoing state across | approaches that attempt to coordinate ongoing state across concurrent | |||
concurrent connections, both within TCP and other congestion- | connections, both within TCP and other congestion-reactive protocols, | |||
reactive protocols, which are summarized in [Is18]. These approaches | which are summarized in [Is18]. These approaches are more complex to | |||
are more complex to implement and their comparison to steady-state | implement, and their comparison to steady-state TCP equivalence can | |||
TCP equivalence can be more difficult to establish, sometimes | be more difficult to establish, sometimes intentionally (i.e., they | |||
intentionally (i.e., they sometimes intend to provide a different | sometimes intend to provide a different kind of "fairness" than | |||
kind of "fairness" than emerges from TCP operation). | emerges from TCP operation). | |||
10. Implementation Observations | ||||
The observation that some TCB state is host-pair specific rather | ||||
than application-pair dependent is not new and is a common | ||||
engineering decision in layered protocol implementations. Although | ||||
now deprecated, T/TCP [RFC1644] was the first to propose using | ||||
caches in order to maintain TCB states (see Appendix A). | ||||
The table below describes the current implementation status for TCB | ||||
temporal sharing in Windows as of December 2020, Apple variants | ||||
(macOS, iOS, iPadOS, tvOS, watchOS) as of January 2021, Linux kernel | ||||
version 5.10.3, and FreeBSD 12. Ensemble sharing is not yet | ||||
implemented. | ||||
KNOWN IMPLEMENTATION STATUS | ||||
TCB data Status | ||||
------------------------------------------------------------ | ||||
old_MMS_S Not shared | ||||
old_MMS_R Not shared | ||||
old_sendMSS Cached and shared in Apple, Linux (MSS) | ||||
old_PMTU Cached and shared in Apple, FreeBSD, Windows (PMTU) | ||||
old_RTT Cached and shared in Apple, FreeBSD, Linux, Windows | ||||
old_RTTVAR Cached and shared in Apple, FreeBSD, Windows | 10. Implementation Observations | |||
old_TFOinfo Cached and shared in Apple, Linux, Windows | The observation that some TCB state is host-pair specific rather than | |||
application-pair dependent is not new and is a common engineering | ||||
decision in layered protocol implementations. Although now | ||||
deprecated, T/TCP [RFC1644] was the first to propose using caches in | ||||
order to maintain TCB states (see Appendix A). | ||||
old_sendcwnd Not shared | Table 9 describes the current implementation status for TCB temporal | |||
sharing in Windows as of December 2020, Apple variants (macOS, iOS, | ||||
iPadOS, tvOS, and watchOS) as of January 2021, Linux kernel version | ||||
5.10.3, and FreeBSD 12. Ensemble sharing is not yet implemented. | ||||
old_ssthresh Cached and shared in Apple, FreeBSD*, Linux* | +==============+=========================================+ | |||
| TCB data | Status | | ||||
+==============+=========================================+ | ||||
| old_MMS_S | Not shared | | ||||
+--------------+-----------------------------------------+ | ||||
| old_MMS_R | Not shared | | ||||
+--------------+-----------------------------------------+ | ||||
| old_sendMSS | Cached and shared in Apple, Linux (MSS) | | ||||
+--------------+-----------------------------------------+ | ||||
| old_PMTU | Cached and shared in Apple, FreeBSD, | | ||||
| | Windows (PMTU) | | ||||
+--------------+-----------------------------------------+ | ||||
| old_RTT | Cached and shared in Apple, FreeBSD, | | ||||
| | Linux, Windows | | ||||
+--------------+-----------------------------------------+ | ||||
| old_RTTVAR | Cached and shared in Apple, FreeBSD, | | ||||
| | Windows | | ||||
+--------------+-----------------------------------------+ | ||||
| old_TFOinfo | Cached and shared in Apple, Linux, | | ||||
| | Windows | | ||||
+--------------+-----------------------------------------+ | ||||
| old_sendcwnd | Not shared | | ||||
+--------------+-----------------------------------------+ | ||||
| old_ssthresh | Cached and shared in Apple, FreeBSD*, | | ||||
| | Linux* | | ||||
+--------------+-----------------------------------------+ | ||||
| TFO failure | Cached and shared in Apple | | ||||
+--------------+-----------------------------------------+ | ||||
TFO failure Cached and shared in Apple | Table 9: KNOWN IMPLEMENTATION STATUS | |||
In the table above, "Apple" refers to all Apple OSes, i.e., | * Note: In FreeBSD, new ssthresh is the mean of curr_ssthresh and | |||
desktop/laptop macOS, phone iOS, pad iPadOS, video player tvOS, and | its previous value if a previous value exists; in Linux, the | |||
watch watchOS, which all share the same Internet protocol stack. | calculation depends on state and is max(curr_cwnd/2, old_ssthresh) | |||
in most cases. | ||||
*Note: In FreeBSD, new ssthresh is the mean of curr_ssthresh and | In Table 9, "Apple" refers to all Apple OSes, i.e., macOS (desktop/ | |||
previous value if a previous value exists; in Linux, the calculation | laptop), iOS (phone), iPadOS (tablet), tvOS (video player), and | |||
depends on state and is max(curr_cwnd/2, old_ssthresh) in most | watchOS (smart watch), which all share the same Internet protocol | |||
cases. | stack. | |||
11. Changes Compared to RFC 2140 | 11. Changes Compared to RFC 2140 | |||
This document updates the description of TCB sharing in RFC 2140 and | This document updates the description of TCB sharing in RFC 2140 and | |||
its associated impact on existing and new connection state, | its associated impact on existing and new connection state, providing | |||
providing a complete replacement for that document [RFC2140]. It | a complete replacement for that document [RFC2140]. It clarifies the | |||
clarifies the previous description and terminology and extends the | previous description and terminology and extends the mechanism to its | |||
mechanism to its impact on new protocols and mechanisms, including | impact on new protocols and mechanisms, including multipath TCP, Fast | |||
multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication | Open, PLPMTUD, NAT, and the TCP Authentication Option. | |||
Option. | ||||
The detailed impact on TCB state addresses TCB parameters in greater | The detailed impact on TCB state addresses TCB parameters with | |||
detail, addressing MSS in both the send and receive direction, MSS | greater specificity. It separates the way MSS is used in both send | |||
and sendMSS separately, adds path MTU and ssthresh, and addresses | and receive directions, it separates the way both of these MSS values | |||
the impact on TCP option state. | differ from sendMSS, it adds both path MTU and ssthresh, and it | |||
addresses the impact on state associated with TCP options. | ||||
New sections have been added to address compatibility issues and | New sections have been added to address compatibility issues and | |||
implementation observations. The relation of this work to T/TCP has | implementation observations. The relation of this work to T/TCP has | |||
been moved to 0 on history, partly to reflect the deprecation of | been moved to Appendix A (which describes the history to TCB sharing) | |||
that protocol. | partly to reflect the deprecation of that protocol. | |||
Appendix C has been added to discuss the potential to use temporal | Appendix C has been added to discuss the potential to use temporal | |||
sharing over long timescales to adapt TCP's initial window | sharing over long timescales to adapt TCP's initial window | |||
automatically, avoiding the need to periodically revise a single | automatically, avoiding the need to periodically revise a single | |||
global constant value. | global constant value. | |||
Finally, this document updates and significantly expands the | Finally, this document updates and significantly expands the | |||
referenced literature. | referenced literature. | |||
12. Security Considerations | 12. Security Considerations | |||
These presented implementation methods do not have additional | These presented implementation methods do not have additional | |||
ramifications for direct (connection-aborting or information | ramifications for direct (connection-aborting or information- | |||
injecting) attacks on individual connections. Individual | injecting) attacks on individual connections. Individual | |||
connections, whether using sharing or not, also may be susceptible | connections, whether using sharing or not, also may be susceptible to | |||
to denial-of-service attacks that reduce performance or completely | denial-of-service attacks that reduce performance or completely deny | |||
deny connections and transfers if not otherwise secured. | connections and transfers if not otherwise secured. | |||
TCB sharing may create additional denial-of-service attacks that | TCB sharing may create additional denial-of-service attacks that | |||
affect the performance of other connections by polluting the cached | affect the performance of other connections by polluting the cached | |||
information. This can occur across whatever set of connections where | information. This can occur across any set of connections in which | |||
the TCB is shared, between connections in a single host, or between | the TCB is shared, between connections in a single host, or between | |||
hosts if TCB sharing is implemented within a subnet (see | hosts if TCB sharing is implemented within a subnet (see | |||
Implications section). Some shared TCB parameters are used only to | "Implications" (Section 9)). Some shared TCB parameters are used | |||
create new TCBs, others are shared among the TCBs of ongoing | only to create new TCBs; others are shared among the TCBs of ongoing | |||
connections. New connections can join the ongoing set, e.g., to | connections. New connections can join the ongoing set, e.g., to | |||
optimize send window size among a set of connections to the same | optimize send window size among a set of connections to the same | |||
host. PMTU is defined as shared at the IP layer, and is already | host. PMTU is defined as shared at the IP layer and is already | |||
susceptible in this way. | susceptible in this way. | |||
Options in client SYNs can be easier to forge than complete, two-way | Options in client SYNs can be easier to forge than complete, two-way | |||
connections. As a result, their values may not be safely | connections. As a result, their values may not be safely | |||
incorporated in shared values until after the three-way handshake | incorporated in shared values until after the three-way handshake | |||
completes. | completes. | |||
Attacks on parameters used only for initialization affect only the | Attacks on parameters used only for initialization affect only the | |||
transient performance of a TCP connection. For short connections, | transient performance of a TCP connection. For short connections, | |||
the performance ramification can approach that of a denial-of- | the performance ramification can approach that of a denial-of-service | |||
service attack. E.g., if an application changes its TCB to have a | attack. For example, if an application changes its TCB to have a | |||
false and small window size, subsequent connections will experience | false and small window size, subsequent connections will experience | |||
performance degradation until their window grew appropriately. | performance degradation until their window grows appropriately. | |||
TCB sharing reuses and mixes information from past and current | TCB sharing reuses and mixes information from past and current | |||
connections. Although reusing information could create a potential | connections. Although reusing information could create a potential | |||
for fingerprinting to identify hosts, the mixing reduces that | for fingerprinting to identify hosts, the mixing reduces that | |||
potential. There has been no evidence of fingerprinting based on | potential. There has been no evidence of fingerprinting based on | |||
this technique and it is currently considered safe in that regard. | this technique, and it is currently considered safe in that regard. | |||
Further, information about the performance of a TCP connection has | Further, information about the performance of a TCP connection has | |||
not been considered as private. | not been considered as private. | |||
13. IANA Considerations | 13. IANA Considerations | |||
There are no IANA implications or requests in this document. | ||||
This section should be removed upon final publication as an RFC. | ||||
14. References | ||||
14.1. Normative References | ||||
[RFC793] Postel, J., "Transmission Control Protocol," Network | ||||
Working Group RFC-793/STD-7, ISI, Sept. 1981. | ||||
[RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- | ||||
Communication Layers", RFC-1122, Oct. 1989. | ||||
[RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, | ||||
Nov. 1990. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
Requirement Levels", BCP 14, RFC 2119, March 1997. | ||||
[RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU | ||||
Discovery," RFC 4821, Mar. 2007. | ||||
[RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion | ||||
Control," RFC 5681 (Standards Track), Sep. 2009. | ||||
[RFC6298] Paxson, V., Allman, M., Chu, J., Sargent, M., "Computing | ||||
TCP's Retransmission Timer," RFC 6298, June 2011. | ||||
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast | ||||
Open", RFC 7413, Dec. 2014. | ||||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
2119 Key Words", RFC 8174, May 2017. | ||||
[RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), | ||||
"Path MTU Discovery for IP version 6," RFC 8201, Jul. | ||||
2017. | ||||
14.2. Informative References | ||||
[Al10] Allman, M., "Initial Congestion Window Specification", | ||||
(work in progress), draft-allman-tcpm-bump-initcwnd-00, | ||||
Nov. 2010. | ||||
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A | ||||
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala | ||||
Lumpur, Malaysia, May 23-27 2016. | ||||
[Ba20] Bagnulo, M., Briscoe, B., "ECN++: Adding Explicit | ||||
Congestion Notification (ECN) to TCP Control Packets", | ||||
draft-ietf-tcpm-generalized-ecn-07, Feb. 2021. | ||||
[Be94] Berners-Lee, T., et al., "The World-Wide Web," | ||||
Communications of the ACM, V37, Aug. 1994, pp. 76-82. | ||||
[Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for | ||||
Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. | ||||
[Br02] Brownlee, N., Claffy, K., "Understanding Internet Traffic | ||||
Streams: Dragonflies and Tortoises", IEEE Communications | ||||
Magazine p110-117, 2002. | ||||
[Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, | ||||
Prentice-Hall, NJ, 1991. | ||||
[Du16] Dukkipati, N., Yuchung C., Amin V., "Research Impacting | ||||
the Practice of Congestion Control." ACM SIGCOMM CCR | ||||
(editorial), on-line post, July 2016. | ||||
[FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/ | ||||
[Hu01] Hughes, A., Touch, J., Heidemann, J., "Issues in Slow- | ||||
Start Restart After Idle", draft-hughes-restart-00 | ||||
(expired), Dec. 2001. | ||||
[Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for | ||||
short TCP flows," 2012 IEEE International Conference on | ||||
Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. | ||||
[IANA] IANA TCP Parameters (options) registry, | ||||
https://www.iana.org/assignments/tcp-parameters | ||||
[Is18] Islam, S., Welzl, M., Hiorth, K., Hayes, D., Armitage, G., | ||||
Gjessing, S., "ctrlTCP: Reducing Latency through Coupled, | ||||
Heterogeneous Multi-Flow TCP Congestion Control," Proc. | ||||
IEEE INFOCOM Global Internet Symposium (GI) workshop (GI | ||||
2018), Honolulu, HI, April 2018. | ||||
[Ja88] Jacobson, V., Karels, M., "Congestion Avoidance and | ||||
Control", Proc. Sigcomm 1988. | ||||
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions | ||||
Functional Specification," RFC-1644, July 1994. | ||||
[RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, | ||||
September 1992. | ||||
[RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast | ||||
Retransmit, and Fast Recovery Algorithms", RFC2001 | ||||
(Standards Track), Jan. 1997. | ||||
[RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, | ||||
April 1997. | ||||
[RFC2414] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's | ||||
Initial Window", RFC 2414 (Experimental), Sept. 1998. | ||||
[RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address | ||||
Translator (NAT) Terminology and Considerations", RFC- | ||||
2663, August 1999. | ||||
[RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's | ||||
Initial Window," RFC 3390, Oct. 2002. | ||||
[RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," | ||||
RFC 3124, June 2001. | ||||
[RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion | ||||
Control Protocol (DCCP)," RFC 4340, Mar. 2006. | ||||
[RFC4960] Stewart, R., (Ed.), "Stream Control Transmission | ||||
Protocol," RFC4960, Sept. 2007. | ||||
[RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication | ||||
Option," RFC 5925, June 2010. | ||||
[RFC6437] Amante, S., Carpenter, B., Jiang, S., Rajajalme, J., "IPv6 | ||||
Flow Label Specification," RFC 6437, Nov. 2011. | ||||
[RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)," | ||||
RFC 6691, July 2012. | ||||
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing | ||||
TCP's Initial Window," RFC 6928, Apr. 2013. | ||||
[RFC7231] Fielding, R., Reshke, J., Eds., "HTTP/1.1 Semantics and | ||||
Content," RFC-7231, June 2014. | ||||
[RFC7323] Borman, D., Braden, B., Jacobson, V., Scheffenegger, R., | ||||
(Ed.), "TCP Extensions for High Performance," RFC 7323, | ||||
Sept. 2014. | ||||
[RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, | ||||
B., "Mechanisms for Optimizing Link Aggregation Group | ||||
(LAG) and Equal-Cost Multipath (ECMP) Component Link | ||||
Utilization in Networks", RFC 7424, Jan. 2015 | ||||
[RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer | ||||
Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. | ||||
[RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP | ||||
to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. | ||||
[RFC8684] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., | ||||
Paasch, C., "TCP Extensions for Multipath Operation with | ||||
Multiple Addresses," RFC 8684, Mar. 2020. | ||||
15. Acknowledgments | ||||
The authors would like to thank for Praveen Balasubramanian for | ||||
information regarding TCB sharing in Windows, Christoph Paasch for | ||||
information regarding TCB sharing in Apple OSes, and Yuchung Cheng, | ||||
Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on | ||||
earlier versions of the draft, as well as members of the TCPM WG. | ||||
Earlier revisions of this work received funding from a collaborative | ||||
research project between the University of Oslo and Huawei | ||||
Technologies Co., Ltd. and were partly supported by USC/ISI's Postel | ||||
Center. | ||||
This document was prepared using 2-Word-v2.0.template.dot. | This document has no IANA actions. | |||
16. Change log | 14. References | |||
This section should be removed upon final publication as an RFC. | 14.1. Normative References | |||
ietf-11: | [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, | |||
RFC 793, DOI 10.17487/RFC0793, September 1981, | ||||
<https://www.rfc-editor.org/info/rfc793>. | ||||
- Addressed gen-art review and IESG feedback | [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | |||
Communication Layers", STD 3, RFC 1122, | ||||
DOI 10.17487/RFC1122, October 1989, | ||||
<https://www.rfc-editor.org/info/rfc1122>. | ||||
ietf-10: | [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | |||
DOI 10.17487/RFC1191, November 1990, | ||||
<https://www.rfc-editor.org/info/rfc1191>. | ||||
- Addressed IETF last call feedback | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | ||||
DOI 10.17487/RFC2119, March 1997, | ||||
<https://www.rfc-editor.org/info/rfc2119>. | ||||
ietf-09: | [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | |||
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | ||||
<https://www.rfc-editor.org/info/rfc4821>. | ||||
- Correction of typographic errors | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | ||||
<https://www.rfc-editor.org/info/rfc5681>. | ||||
ietf-08: | [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, | |||
"Computing TCP's Retransmission Timer", RFC 6298, | ||||
DOI 10.17487/RFC6298, June 2011, | ||||
<https://www.rfc-editor.org/info/rfc6298>. | ||||
- Address TSV AD comments, add Apple OS implementation status | [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | |||
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | ||||
<https://www.rfc-editor.org/info/rfc7413>. | ||||
ietf-07: | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
- Update per id-nits and normative language for consistency | [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., | |||
"Path MTU Discovery for IP version 6", STD 87, RFC 8201, | ||||
DOI 10.17487/RFC8201, July 2017, | ||||
<https://www.rfc-editor.org/info/rfc8201>. | ||||
ietf-06: | 14.2. Informative References | |||
- Address WGLC comments | [Al10] Allman, M., "Initial Congestion Window Specification", | |||
Work in Progress, Internet-Draft, draft-allman-tcpm-bump- | ||||
initcwnd-00, 15 November 2010, | ||||
<https://datatracker.ietf.org/doc/html/draft-allman-tcpm- | ||||
bump-initcwnd-00>. | ||||
ietf-05: | [Ba12] Barik, R., Welzl, M., Ferlin, S., and O. Alay, "LISA: A | |||
linked slow-start algorithm for MPTCP", IEEE ICC, | ||||
DOI 10.1109/ICC.2016.7510786, May 2016, | ||||
<https://doi.org/10.1109/ICC.2016.7510786>. | ||||
- Correction of typographic errors, expansion of terminology | [Ba20] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
Congestion Notification (ECN) to TCP Control Packets", | ||||
Work in Progress, Internet-Draft, draft-ietf-tcpm- | ||||
generalized-ecn-07, 16 February 2021, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | ||||
generalized-ecn-07>. | ||||
ietf-04: | [Be94] Berners-Lee, T., Cailliau, C., Luotonen, A., Nielsen, H., | |||
and A. Secret, "The World-Wide Web", Communications of the | ||||
ACM V37, pp. 76-82, DOI 10.1145/179606.179671, August | ||||
1994, <https://doi.org/10.1145/179606.179671>. | ||||
- Fix internal cross-reference errors that appeared in ietf-02 | [Br02] Brownlee, N. and KC. Claffy, "Understanding Internet | |||
- Updated tables to re-center; clarified text | traffic streams: dragonflies and tortoises", IEEE | |||
Communications Magazine, pp. 110-117, | ||||
DOI 10.1109/MCOM.2002.1039865, 2002, | ||||
<https://doi.org/10.1109/MCOM.2002.1039865>. | ||||
ietf-03: | [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for | |||
Sun OS 4.1.3", USC/ISI Release 1.0, September 1994. | ||||
- Correction of typographic errors, minor rewording in appendices | [Co91] Comer, D. and D. Stevens, "Internetworking with TCP/IP", | |||
ISBN 10: 0134685059, ISBN 13: 9780134685052, 1991. | ||||
ietf-02: | [Du16] Dukkipati, N., Cheng, Y., and A. Vahdat, "Research | |||
Impacting the Practice of Congestion Control", Computer | ||||
Communication Review, The ACM SIGCOMM newsletter, July | ||||
2016. | ||||
- Minor reorganization and correction of typographic errors | [FreeBSD] FreeBSD, "The FreeBSD Project", | |||
- Added text to address fingerprinting in Security section | <https://www.freebsd.org/>. | |||
- Now retains Appendix B and body option tables upon publication | ||||
ietf-01: | [Hu01] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP | |||
Slow-Start Restart After Idle", Work in Progress, | ||||
Internet-Draft, draft-hughes-restart-00, December 2001, | ||||
<https://datatracker.ietf.org/doc/html/draft-hughes- | ||||
restart-00>. | ||||
- Added Appendix C to address long-timescale temporal adaptation | [Hu12] Hurtig, P. and A. Brunstrom, "Enhanced metric caching for | |||
short TCP flows", IEEE International Conference on | ||||
Communications, DOI 10.1109/ICC.2012.6364516, 2012, | ||||
<https://doi.org/10.1109/ICC.2012.6364516>. | ||||
ietf-00: | [IANA] IANA, "Transmission Control Protocol (TCP) Parameters", | |||
<https://www.iana.org/assignments/tcp-parameters>. | ||||
- Re-issued as draft-ietf-tcpm-2140bis due to WG adoption. | [Is18] Islam, S., Welzl, M., Hiorth, K., Hayes, D., Armitage, G., | |||
- Cleaned orphan references to T/TCP, removed incomplete refs | and S. Gjessing, "ctrlTCP: Reducing latency through | |||
- Moved references to informative section and updated Sec 2 | coupled, heterogeneous multi-flow TCP congestion control", | |||
- Updated to clarify no impact to interoperability | IEEE INFOCOM 2018 - IEEE Conference on Computer | |||
- Updated appendix B to avoid 2119 language | Communications Workshops (INFOCOM WKSHPS), | |||
DOI 10.1109/INFCOMW.2018.8406887, April 2018, | ||||
<https://doi.org/10.1109/INFCOMW.2018.8406887>. | ||||
06: | [Ja88] Jacobson, V. and M. Karels, "Congestion Avoidance and | |||
Control", SIGCOMM Symposium proceedings on Communications | ||||
architectures and protocols, November 1988. | ||||
- Changed to update 2140, cite it normatively, and summarize the | [RFC1379] Braden, R., "Extending TCP for Transactions -- Concepts", | |||
updates in a separate section | RFC 1379, DOI 10.17487/RFC1379, November 1992, | |||
<https://www.rfc-editor.org/info/rfc1379>. | ||||
05: | [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions | |||
Functional Specification", RFC 1644, DOI 10.17487/RFC1644, | ||||
July 1994, <https://www.rfc-editor.org/info/rfc1644>. | ||||
- Fixed some TBDs | [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast | |||
Retransmit, and Fast Recovery Algorithms", RFC 2001, | ||||
DOI 10.17487/RFC2001, January 1997, | ||||
<https://www.rfc-editor.org/info/rfc2001>. | ||||
04: | [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, | |||
DOI 10.17487/RFC2140, April 1997, | ||||
<https://www.rfc-editor.org/info/rfc2140>. | ||||
- Removed BCP-style recommendations and fixed some TBDs | [RFC2414] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | |||
Initial Window", RFC 2414, DOI 10.17487/RFC2414, September | ||||
1998, <https://www.rfc-editor.org/info/rfc2414>. | ||||
03: | [RFC2663] Srisuresh, P. and M. Holdrege, "IP Network Address | |||
Translator (NAT) Terminology and Considerations", | ||||
RFC 2663, DOI 10.17487/RFC2663, August 1999, | ||||
<https://www.rfc-editor.org/info/rfc2663>. | ||||
- Updated Touch's affiliation and address information | [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | |||
RFC 3124, DOI 10.17487/RFC3124, June 2001, | ||||
<https://www.rfc-editor.org/info/rfc3124>. | ||||
02: | [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | |||
Initial Window", RFC 3390, DOI 10.17487/RFC3390, October | ||||
2002, <https://www.rfc-editor.org/info/rfc3390>. | ||||
- Stated that our OS implementation overview table only covers | [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | |||
temporal sharing. | Congestion Control Protocol (DCCP)", RFC 4340, | |||
DOI 10.17487/RFC4340, March 2006, | ||||
<https://www.rfc-editor.org/info/rfc4340>. | ||||
- Correctly reflected sharing of old_RTT in Linux in the | [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", | |||
implementation overview table. | RFC 4960, DOI 10.17487/RFC4960, September 2007, | |||
<https://www.rfc-editor.org/info/rfc4960>. | ||||
- Marked entries that are considered safe to share with an | [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP | |||
asterisk (suggestion was to split the table) | Authentication Option", RFC 5925, DOI 10.17487/RFC5925, | |||
June 2010, <https://www.rfc-editor.org/info/rfc5925>. | ||||
- Discussed correct host identification: NATs may make IP | [RFC6437] Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme, | |||
addresses the wrong input, could e.g., use HTTP cookie. | "IPv6 Flow Label Specification", RFC 6437, | |||
DOI 10.17487/RFC6437, November 2011, | ||||
<https://www.rfc-editor.org/info/rfc6437>. | ||||
- Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and | [RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)", | |||
MTU | RFC 6691, DOI 10.17487/RFC6691, July 2012, | |||
<https://www.rfc-editor.org/info/rfc6691>. | ||||
- Added information about option sharing, listed options in | [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | |||
Appendix B | "Increasing TCP's Initial Window", RFC 6928, | |||
DOI 10.17487/RFC6928, April 2013, | ||||
<https://www.rfc-editor.org/info/rfc6928>. | ||||
Authors' Addresses | [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | |||
Protocol (HTTP/1.1): Semantics and Content", RFC 7231, | ||||
DOI 10.17487/RFC7231, June 2014, | ||||
<https://www.rfc-editor.org/info/rfc7231>. | ||||
Joe Touch | [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. | |||
Manhattan Beach, CA 90266 | Scheffenegger, Ed., "TCP Extensions for High Performance", | |||
USA | RFC 7323, DOI 10.17487/RFC7323, September 2014, | |||
<https://www.rfc-editor.org/info/rfc7323>. | ||||
Phone: +1 (310) 560-0334 | [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., and B. | |||
Email: touch@strayalpha.com | Khasnabish, "Mechanisms for Optimizing Link Aggregation | |||
Group (LAG) and Equal-Cost Multipath (ECMP) Component Link | ||||
Utilization in Networks", RFC 7424, DOI 10.17487/RFC7424, | ||||
January 2015, <https://www.rfc-editor.org/info/rfc7424>. | ||||
Michael Welzl | [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext | |||
University of Oslo | Transfer Protocol Version 2 (HTTP/2)", RFC 7540, | |||
PO Box 1080 Blindern | DOI 10.17487/RFC7540, May 2015, | |||
Oslo N-0316 | <https://www.rfc-editor.org/info/rfc7540>. | |||
Norway | ||||
Phone: +47 22 85 24 20 | [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating | |||
Email: michawe@ifi.uio.no | TCP to Support Rate-Limited Traffic", RFC 7661, | |||
Safiqul Islam | DOI 10.17487/RFC7661, October 2015, | |||
University of Oslo | <https://www.rfc-editor.org/info/rfc7661>. | |||
PO Box 1080 Blindern | ||||
Oslo N-0316 | ||||
Norway | ||||
Phone: +47 22 84 08 37 | [RFC8684] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. | |||
Email: safiquli@ifi.uio.no | Paasch, "TCP Extensions for Multipath Operation with | |||
Multiple Addresses", RFC 8684, DOI 10.17487/RFC8684, March | ||||
2020, <https://www.rfc-editor.org/info/rfc8684>. | ||||
Appendix A: TCB Sharing History | Appendix A. TCB Sharing History | |||
T/TCP proposed using caches to maintain TCB information across | T/TCP proposed using caches to maintain TCB information across | |||
instances (temporal sharing), e.g., smoothed RTT, RTT variation, | instances (temporal sharing), e.g., smoothed RTT, RTT variation, | |||
congestion avoidance threshold, and MSS [RFC1644]. These values were | congestion-avoidance threshold, and MSS [RFC1644]. These values were | |||
in addition to connection counts used by T/TCP to accelerate data | in addition to connection counts used by T/TCP to accelerate data | |||
delivery prior to the full three-way handshake during an OPEN. The | delivery prior to the full three-way handshake during an OPEN. The | |||
goal was to aggregate TCB components where they reflect one | goal was to aggregate TCB components where they reflect one | |||
association - that of the host-pair, rather than artificially | association -- that of the host-pair rather than artificially | |||
separating those components by connection. | separating those components by connection. | |||
At least one T/TCP implementation saved the MSS and aggregated the | At least one T/TCP implementation saved the MSS and aggregated the | |||
RTT parameters across multiple connections but omitted caching the | RTT parameters across multiple connections but omitted caching the | |||
congestion window information [Br94], as originally specified in | congestion window information [Br94], as originally specified in | |||
[RFC1379]. Some T/TCP implementations immediately updated MSS when | [RFC1379]. Some T/TCP implementations immediately updated MSS when | |||
the TCP MSS header option was received [Br94], although this was not | the TCP MSS header option was received [Br94], although this was not | |||
addressed specifically in the concepts or functional specification | addressed specifically in the concepts or functional specification | |||
[RFC1379][RFC1644]. In later T/TCP implementations, RTT values were | [RFC1379] [RFC1644]. In later T/TCP implementations, RTT values were | |||
updated only after a CLOSE, which does not benefit concurrent | updated only after a CLOSE, which does not benefit concurrent | |||
sessions. | sessions. | |||
Temporal sharing of cached TCB data was originally implemented in | Temporal sharing of cached TCB data was originally implemented in the | |||
the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same | Sun OS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same | |||
[FreeBSD]. As mentioned before, only the MSS and RTT parameters were | [FreeBSD]. As mentioned before, only the MSS and RTT parameters were | |||
cached, as originally specified in [RFC1379]. Later discussion of | cached, as originally specified in [RFC1379]. Later discussion of T/ | |||
T/TCP suggested including congestion control parameters in this | TCP suggested including congestion control parameters in this cache; | |||
cache; for example, [RFC1644] (Section 3.1) hints at initializing | for example, Section 3.1 of [RFC1644] hints at initializing the | |||
the congestion window to the old window size. | congestion window to the old window size. | |||
Appendix B: TCP Option Sharing and Caching | Appendix B. TCP Option Sharing and Caching | |||
In addition to the options that can be cached and shared, this memo | In addition to the options that can be cached and shared, this memo | |||
also lists known TCP options [IANA] for which state is unsafe to be | also lists known TCP options [IANA] for which state is unsafe to be | |||
kept. This list is not intended to be authoritative or exhaustive. | kept. This list is not intended to be authoritative or exhaustive. | |||
Obsolete (unsafe to keep state): | Obsolete (unsafe to keep state): | |||
ECHO | Echo | |||
ECHO REPLY | Echo Reply | |||
PO Conn permitted | Partial Order Connection Permitted | |||
PO service profile | Partial Order Service Profile | |||
CC | CC | |||
CC.NEW | CC.NEW | |||
CC.ECHO | CC.ECHO | |||
Alt CS req | TCP Alternate Checksum Request | |||
Alt CS data | TCP Alternate Checksum Data | |||
No state to keep: | No state to keep: | |||
EOL | End of Option List (EOL) | |||
NOP | No-Operation (NOP) | |||
WS | Window Scale (WS) | |||
SACK | SACK | |||
TS | Timestamps (TS) | |||
MD5 | MD5 Signature Option | |||
TCP-AO | TCP Authentication Option (TCP-AO) | |||
EXP1 | RFC3692-style Experiment 1 | |||
EXP2 | RFC3692-style Experiment 2 | |||
Unsafe to keep state: | Unsafe to keep state: | |||
Skeeter (DH exchange, known to be vulnerable) | Skeeter (DH exchange, known to be vulnerable) | |||
Bubba (DH exchange, known to be vulnerable) | Bubba (DH exchange, known to be vulnerable) | |||
Trailer CS | Trailer Checksum Option | |||
SCPS capabilities | SCPS capabilities | |||
S-NACK | Selective Negative Acknowledgements (S-NACK) | |||
Records boundaries | Records Boundaries | |||
Corruption experienced | Corruption experienced | |||
SNAP | SNAP | |||
TCP Compression | TCP Compression Filter | |||
Quickstart response | Quick-Start Response | |||
UTO | User Timeout Option (UTO) | |||
MPTCP negotiation success (see below for negotiation failure) | Multipath TCP (MPTCP) negotiation success (see below for | |||
negotiation failure) | ||||
TFO negotiation success (see below for negotiation failure) | TCP Fast Open (TFO) negotiation success (see below for negotiation | |||
failure) | ||||
Safe but optional to keep state: | Safe but optional to keep state: | |||
MPTCP negotiation failure (to avoid negotiation retries) | Multipath TCP (MPTCP) negotiation failure (to avoid negotiation | |||
retries) | ||||
MSS | Maximum Segment Size (MSS) | |||
TFO negotiation failure (to avoid negotiation retries) | TCP Fast Open (TFO) negotiation failure (to avoid negotiation | |||
retries) | ||||
Safe and necessary to keep state: | Safe and necessary to keep state: | |||
TFO cookie (if TFO succeeded in the past) | TCP Fast Open (TFO) Cookie (if TFO succeeded in the past) | |||
Appendix C: Automating the Initial Window in TCP over Long Timescales | Appendix C. Automating the Initial Window in TCP over Long Timescales | |||
C.1. Introduction | C.1. Introduction | |||
Temporal sharing, as described earlier in this document, builds on | Temporal sharing, as described earlier in this document, builds on | |||
the assumption that multiple consecutive connections between the | the assumption that multiple consecutive connections between the same | |||
same host pair are somewhat likely to be exposed to similar | host-pair are somewhat likely to be exposed to similar environment | |||
environment characteristics. The stored information can become less | characteristics. The stored information can become less accurate | |||
accurate over time and suitable precautions should take this ageing | over time and suitable precautions should take this aging into | |||
into consideration (this is discussed further in section 8.1). | consideration (this is discussed further in Section 8.1). However, | |||
However, there are also cases where it can make sense to track these | there are also cases where it can make sense to track these values | |||
values over longer periods, observing properties of TCP connections | over longer periods, observing properties of TCP connections to | |||
to gradually influence evolving trends in TCP parameters. This | gradually influence evolving trends in TCP parameters. This appendix | |||
appendix describes an example of such a case. | describes an example of such a case. | |||
TCP's congestion control algorithm uses an initial window value | TCP's congestion control algorithm uses an initial window value (IW) | |||
(IW), both as a starting point for new connections and as an upper | both as a starting point for new connections and as an upper limit | |||
limit for restarting after an idle period [RFC5681][RFC7661]. This | for restarting after an idle period [RFC5681] [RFC7661]. This value | |||
value has evolved over time, originally one maximum segment size | has evolved over time; it was originally 1 maximum segment size (MSS) | |||
(MSS), and increased to the lesser of four MSS or 4,380 bytes | and increased to the lesser of 4 MSSs or 4,380 bytes [RFC3390] | |||
[RFC3390][RFC5681]. For a typical Internet connection with a maximum | [RFC5681]. For a typical Internet connection with a maximum | |||
transmission unit (MTU) of 1500 bytes, this permits three segments | transmission unit (MTU) of 1500 bytes, this permits 3 segments of | |||
of 1,460 bytes each. | 1,460 bytes each. | |||
The IW value was originally implied in the original TCP congestion | The IW value was originally implied in the original TCP congestion | |||
control description and documented as a standard in 1997 | control description and documented as a standard in 1997 [RFC2001] | |||
[RFC2001][Ja88]. The value was updated in 1998 experimentally and | [Ja88]. The value was updated in 1998 experimentally and moved to | |||
moved to the standards track in 2002 [RFC2414][RFC3390]. In 2013, it | the Standards Track in 2002 [RFC2414] [RFC3390]. In 2013, it was | |||
was experimentally increased to 10 [RFC6928]. | experimentally increased to 10 [RFC6928]. | |||
This appendix discusses how TCP can objectively measure when an IW | This appendix discusses how TCP can objectively measure when an IW is | |||
is too large, and that such feedback should be used over long | too large and that such feedback should be used over long timescales | |||
timescales to adjust the IW automatically. The result should be | to adjust the IW automatically. The result should be safer to deploy | |||
safer to deploy and might avoid the need to repeatedly revisit IW | and might avoid the need to repeatedly revisit IW over time. | |||
over time. | ||||
Note that this mechanism attempts to make the IW more adaptive over | Note that this mechanism attempts to make the IW more adaptive over | |||
time. It can increase the IW beyond that which is currently | time. It can increase the IW beyond that which is currently | |||
recommended for widescale deployment, and so its use should be | recommended for wide-scale deployment, so its use should be carefully | |||
carefully monitored. | monitored. | |||
C.2. Design Considerations | C.2. Design Considerations | |||
TCP's IW value has existed statically for over two decades, so any | TCP's IW value has existed statically for over two decades, so any | |||
solution to adjusting the IW dynamically should have similarly | solution to adjusting the IW dynamically should have similarly | |||
stable, non-invasive effects on the performance and complexity of | stable, non-invasive effects on the performance and complexity of | |||
TCP. In order to be fair, the IW should be similar for most machines | TCP. In order to be fair, the IW should be similar for most machines | |||
on the public Internet. Finally, a desirable goal is to develop a | on the public Internet. Finally, a desirable goal is to develop a | |||
self-correcting algorithm, so that IW values that cause network | self-correcting algorithm so that IW values that cause network | |||
problems can be avoided. To that end, we propose the following | problems can be avoided. To that end, we propose the following | |||
design goals: | design goals: | |||
o Impart little to no impact to TCP in the absence of loss, i.e., | * Impart little to no impact to TCP in the absence of loss, i.e., it | |||
it should not increase the complexity of default packet | should not increase the complexity of default packet processing in | |||
processing in the normal case. | the normal case. | |||
o Adapt to network feedback over long timescales, avoiding values | * Adapt to network feedback over long timescales, avoiding values | |||
that persistently cause network problems. | that persistently cause network problems. | |||
o Decrease the IW in the presence of sustained loss of IW segments, | * Decrease the IW in the presence of sustained loss of IW segments, | |||
as determined over a number of different connections. | as determined over a number of different connections. | |||
o Increase the IW in the absence of sustained loss of IW segments, | * Increase the IW in the absence of sustained loss of IW segments, | |||
as determined over a number of different connections. | as determined over a number of different connections. | |||
o Operate conservatively, i.e., tend towards leaving the IW the | * Operate conservatively, i.e., tend towards leaving the IW the same | |||
same in the absence of sufficient information, and give greater | in the absence of sufficient information, and give greater | |||
consideration to IW segment loss than IW segment success. | consideration to IW segment loss than IW segment success. | |||
We expect that, without other context, a good IW algorithm will | We expect that, without other context, a good IW algorithm will | |||
converge to a single value, but this is not required. An endpoint | converge to a single value, but this is not required. An endpoint | |||
with additional context or information, or deployed in a constrained | with additional context or information, or deployed in a constrained | |||
environment, can always use a different value. In particular, | environment, can always use a different value. In particular, | |||
information from previous connections, or sets of connections with a | information from previous connections, or sets of connections with a | |||
similar path, can already be used as context for such decisions (as | similar path, can already be used as context for such decisions (as | |||
noted in the core of this document). | noted in the core of this document). | |||
However, if a given IW value persistently causes packet loss during | However, if a given IW value persistently causes packet loss during | |||
the initial burst of packets, it is clearly inappropriate and could | the initial burst of packets, it is clearly inappropriate and could | |||
be inducing unnecessary loss in other competing connections. This | be inducing unnecessary loss in other competing connections. This | |||
might happen for sites behind very slow boxes with small buffers, | might happen for sites behind very slow boxes with small buffers, | |||
which may or may not be the first hop. | which may or may not be the first hop. | |||
C.3. Proposed IW Algorithm | C.3. Proposed IW Algorithm | |||
Below is a simple description of the proposed IW algorithm. It | Below is a simple description of the proposed IW algorithm. It | |||
relies on the following parameters: | relies on the following parameters: | |||
o MinIW = 3 MSS or 4,380 bytes (as per [RFC3390]) | * MinIW = 3 MSS or 4,380 bytes (as per [RFC3390]) | |||
o MaxIW = 10 MSS (as per [RFC6928]) | * MaxIW = 10 MSS (as per [RFC6928]) | |||
o MulDecr = 0.5 | * MulDecr = 0.5 | |||
o AddIncr = 2 MSS | ||||
o Threshold = 0.05 | * AddIncr = 2 MSS | |||
* Threshold = 0.05 | ||||
We assume that the minimum IW (MinIW) should be as currently | We assume that the minimum IW (MinIW) should be as currently | |||
specified as standard [RFC3390]. The maximum IW can be set to a | specified as standard [RFC3390]. The maximum IW (MaxIW) can be set | |||
fixed value (we suggest using the experimental and now somewhat de- | to a fixed value (we suggest using the experimental and now somewhat | |||
facto standard in [RFC6928]) or set based on a schedule if trusted | de facto standard in [RFC6928]) or set based on a schedule if trusted | |||
time references are available [Al10]; here we prefer a fixed value. | time references are available [Al10]; here, we prefer a fixed value. | |||
We also propose to use an AIMD algorithm, with increase and | We also propose to use an Additive Increase Multiplicative Decrease | |||
decreases as noted. | (AIMD) algorithm, with increase and decreases as noted. | |||
Although these parameters are somewhat arbitrary, their initial | Although these parameters are somewhat arbitrary, their initial | |||
values are not important except that the algorithm is AIMD and the | values are not important except that the algorithm is AIMD and the | |||
MaxIW should not exceed that recommended for other systems on the | MaxIW should not exceed that recommended for other systems on the | |||
Internet (here we selected the current de-facto standard rather than | Internet (here, we selected the current de facto standard rather than | |||
the actual standard). Current proposals, including default current | the actual standard). Current proposals, including default current | |||
operation, are degenerate cases of the algorithm below for given | operation, are degenerate cases of the algorithm below for given | |||
parameters - notably MulDec = 1.0 and AddIncr = 0 MSS, thus | parameters, notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling | |||
disabling the automatic part of the algorithm. | the automatic part of the algorithm. | |||
The proposed algorithm is as follows: | The proposed algorithm is as follows: | |||
1. On boot: | 1. On boot: | |||
IW = MaxIW; # assume this is in bytes, and indicates an integer | IW = MaxIW; # assume this is in bytes and indicates an integer | |||
multiple of 2 MSS (an even number to support ACK compression) | # multiple of 2 MSS (an even number to support | |||
# ACK compression) | ||||
2. Upon starting a new connection: | 2. Upon starting a new connection: | |||
CWND = IW; | CWND = IW; | |||
conncount++; | conncount++; | |||
IWnotchecked = 1; # true | IWnotchecked = 1; # true | |||
3. During a connection's SYN-ACK processing, if SYN-ACK includes ECN | 3. During a connection's SYN-ACK processing, if SYN-ACK includes ECN | |||
(as similarly addressed in Sec 5 of ECN++ for TCP [Ba20]), treat | (as similarly addressed in Section 5 of ECN++ for TCP [Ba20]), | |||
as if the IW is too large: | treat as if the IW is too large: | |||
if (IWnotchecked && (synackecn == 1)) { | if (IWnotchecked && (synackecn == 1)) { | |||
losscount++; | losscount++; | |||
IWnotchecked = 0; # never check again | IWnotchecked = 0; # never check again | |||
} | } | |||
4. During a connection, if retransmission occurs, check the seqno of | 4. During a connection, if retransmission occurs, check the seqno of | |||
the outgoing packet (in bytes) to see if the resent segment fixes | the outgoing packet (in bytes) to see if the re-sent segment | |||
an IW loss: | fixes an IW loss: | |||
if (Retransmitting && IWnotchecked && ((seqno - ISN) < IW))) { | if (Retransmitting && IWnotchecked && ((seqno - ISN) < IW))) { | |||
losscount++; | losscount++; | |||
IWnotchecked = 0; # never do this entire "if" again | IWnotchecked = 0; # never do this entire "if" again | |||
} else { | } else { | |||
IWnotchecked = 0; # you're beyond the IW so stop checking | IWnotchecked = 0; # you're beyond the IW so stop checking | |||
} | } | |||
5. Once every 1000 connections, as a separate process (i.e., not as | 5. Once every 1000 connections, as a separate process (i.e., not as | |||
part of processing a given connection): | part of processing a given connection): | |||
if (conncount > 1000) { | if (conncount > 1000) { | |||
if (losscount/conncount > threshold) { | if (losscount/conncount > threshold) { | |||
# the number of connections with errors is too high | # the number of connections with errors is too high | |||
IW = IW * MulDecr; | IW = IW * MulDecr; | |||
} else { | } else { | |||
IW = IW + AddIncr; | IW = IW + AddIncr; | |||
} | ||||
} | } | |||
} | ||||
As presented, this algorithm can yield a false positive when the | As presented, this algorithm can yield a false positive when the | |||
sequence number wraps around, e.g., the code might increment | sequence number wraps around, e.g., the code might increment | |||
losscount in step 4 when no loss occurred or fail to increment | losscount in step 4 when no loss occurred or fail to increment | |||
losscount when a loss did occur. This can be avoided using either | losscount when a loss did occur. This can be avoided using either | |||
PAWS [RFC7323] context or internal extended sequence number | Protection Against Wrapped Sequences (PAWS) [RFC7323] context or | |||
representations (as in TCP-AO [RFC5925]). Alternately, false | internal extended sequence number representations (as in TCP | |||
positives can be tolerated because they are expected to be | Authentication Option (TCP-AO) [RFC5925]). Alternately, false | |||
infrequent and thus will not significantly impact the algorithm. | positives can be tolerated because they are expected to be infrequent | |||
and thus will not significantly impact the algorithm. | ||||
A number of additional constraints need to be imposed if this | A number of additional constraints need to be imposed if this | |||
mechanism is implemented to ensure that it defaults to values that | mechanism is implemented to ensure that it defaults to values that | |||
comply with current Internet standards, is conservative in how it | comply with current Internet standards, is conservative in how it | |||
extends those values, and returns to those values in the absence of | extends those values, and returns to those values in the absence of | |||
positive feedback (i.e., success). To that end, we recommend the | positive feedback (i.e., success). To that end, we recommend the | |||
following list of example constraints: | following list of example constraints: | |||
>> The automatic IW algorithm MUST initialize MaxIW a value no | * The automatic IW algorithm MUST initialize MaxIW a value no larger | |||
larger than the currently recommended Internet default, in the | than the currently recommended Internet default in the absence of | |||
absence of other context information. | other context information. | |||
Thus, if there are too few connections to make a decision or if | Thus, if there are too few connections to make a decision or if | |||
there is otherwise insufficient information to increase the IW, then | there is otherwise insufficient information to increase the IW, | |||
the MaxIW defaults to the current recommended value. | then the MaxIW defaults to the current recommended value. | |||
>> An implementation MAY allow the MaxIW to grow beyond the | * An implementation MAY allow the MaxIW to grow beyond the currently | |||
currently recommended Internet default, but not more than 2 segments | recommended Internet default but not more than 2 segments per | |||
per calendar year. | calendar year. | |||
Thus, if an endpoint has a persistent history of successfully | Thus, if an endpoint has a persistent history of successfully | |||
transmitting IW segments without loss, then it is allowed to probe | transmitting IW segments without loss, then it is allowed to probe | |||
the Internet to determine if larger IW values have similar success. | the Internet to determine if larger IW values have similar | |||
This probing is limited and requires a trusted time source, | success. This probing is limited and requires a trusted time | |||
otherwise the MaxIW remains constant. | source; otherwise, the MaxIW remains constant. | |||
>> An implementation MUST adjust the IW based on loss statistics at | * An implementation MUST adjust the IW based on loss statistics at | |||
least once every 1000 connections. | least once every 1000 connections. | |||
An endpoint needs to be sufficiently reactive to IW loss. | An endpoint needs to be sufficiently reactive to IW loss. | |||
>> An implementation MUST decrease the IW by at least one MSS when | * An implementation MUST decrease the IW by at least 1 MSS when | |||
indicated during an evaluation interval. | indicated during an evaluation interval. | |||
An endpoint that detects loss needs to decrease its IW by at least | An endpoint that detects loss needs to decrease its IW by at least | |||
one MSS, otherwise it is not participating in an automatic reactive | 1 MSS; otherwise, it is not participating in an automatic reactive | |||
algorithm. | algorithm. | |||
>> An implementation MUST increase by no more than 2 MSS per | * An implementation MUST increase by no more than 2 MSSs per | |||
evaluation interval. | evaluation interval. | |||
An endpoint that does not experience IW loss needs to probe the | An endpoint that does not experience IW loss needs to probe the | |||
network incrementally. | network incrementally. | |||
>> An implementation SHOULD use an IW that is an integer multiple of | * An implementation SHOULD use an IW that is an integer multiple of | |||
2 MSS. | 2 MSSs. | |||
The IW should remain a multiple of 2 MSS segments, to enable | The IW should remain a multiple of 2 MSS segments to enable | |||
efficient ACK compression without incurring unnecessary timeouts. | efficient ACK compression without incurring unnecessary timeouts. | |||
>> An implementation MUST decrease the IW if more than 95% of | * An implementation MUST decrease the IW if more than 95% of | |||
connections have IW losses. | connections have IW losses. | |||
Again, this is to ensure an implementation is sufficiently reactive. | Again, this is to ensure an implementation is sufficiently | |||
reactive. | ||||
>> An implementation MAY group IW values and statistics within | * An implementation MAY group IW values and statistics within | |||
subsets of connections. Such grouping MAY use any information about | subsets of connections. Such grouping MAY use any information | |||
connections to form groups except loss statistics. | about connections to form groups except loss statistics. | |||
There are some TCP connections which might not be counted at all, | There are some TCP connections that might not be counted at all, such | |||
such as those to/from loopback addresses, or those within the same | as those to/from loopback addresses or those within the same subnet | |||
subnet as that of a local interface (for which congestion control is | as that of a local interface (for which congestion control is | |||
sometimes disabled anyway). This may also include connections that | sometimes disabled anyway). This may also include connections that | |||
terminate before the IW is full, i.e., as a separate check at the | terminate before the IW is full, i.e., as a separate check at the | |||
time of the connection closing. | time of the connection closing. | |||
The period over which the IW is updated is intended to be a long | The period over which the IW is updated is intended to be a long | |||
timescale, e.g., a month or so, or 1,000 connections, whichever is | timescale, e.g., a month or so, or 1,000 connections, whichever is | |||
longer. An implementation might check the IW once a month, and | longer. An implementation might check the IW once a month and simply | |||
simply not update the IW or clear the connection counts in months | not update the IW or clear the connection counts in months where the | |||
where the number of connections is too small. | number of connections is too small. | |||
C.4. Discussion | C.4. Discussion | |||
There are numerous parameters to the above algorithm that are | There are numerous parameters to the above algorithm that are | |||
compliant with the given requirements; this is intended to allow | compliant with the given requirements; this is intended to allow | |||
variation in configuration and implementation while ensuring that | variation in configuration and implementation while ensuring that all | |||
all such algorithms are reactive and safe. | such algorithms are reactive and safe. | |||
This algorithm continues to assume segments because that is the | This algorithm continues to assume segments because that is the basis | |||
basis of most TCP implementations. It might be useful to consider | of most TCP implementations. It might be useful to consider revising | |||
revising the specifications to allow byte-based congestion given | the specifications to allow byte-based congestion given sufficient | |||
sufficient experience. | experience. | |||
The algorithm checks for IW losses only during the first IW after a | The algorithm checks for IW losses only during the first IW after a | |||
connection start; it does not check for IW losses elsewhere the IW | connection start; it does not check for IW losses elsewhere the IW is | |||
is used, e.g., during slow-start restarts. | used, e.g., during slow-start restarts. | |||
>> An implementation MAY detect IW losses during slow-start restarts | * An implementation MAY detect IW losses during slow-start restarts | |||
in addition to losses during the first IW of a connection. In this | in addition to losses during the first IW of a connection. In | |||
case, the implementation MUST count each restart as a "connection" | this case, the implementation MUST count each restart as a | |||
for the purposes of connection counts and periodic rechecking of the | "connection" for the purposes of connection counts and periodic | |||
IW value. | rechecking of the IW value. | |||
False positives can occur during some kinds of segment reordering, | False positives can occur during some kinds of segment reordering, | |||
e.g., that might trigger spurious retransmissions even without a | e.g., that might trigger spurious retransmissions even without a true | |||
true segment loss. These are not expected to be sufficiently common | segment loss. These are not expected to be sufficiently common to | |||
to dominate the algorithm and its conclusions. | dominate the algorithm and its conclusions. | |||
This mechanism does require additional per-connection state, which | This mechanism does require additional per-connection state, which is | |||
is currently common in some implementations, and is useful for other | currently common in some implementations and is useful for other | |||
reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism | reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism | |||
also benefits from persistent state kept across reboots, as would be | in this appendix also benefits from persistent state kept across | |||
other state sharing mechanisms (e.g., TCP Control Block Sharing per | reboots, which would also be useful to other state sharing mechanisms | |||
the main body of this document). | (e.g., TCP Control Block Sharing per the main body of this document). | |||
The receive window (rwnd) is not involved in this calculation. The | The receive window (rwnd) is not involved in this calculation. The | |||
size of rwnd is determined by receiver resources and provides space | size of rwnd is determined by receiver resources and provides space | |||
to accommodate segment reordering. It is not involved with | to accommodate segment reordering. Also, rwnd is not involved with | |||
congestion control, which is the focus of this document and its | congestion control, which is the focus of the way this appendix | |||
management of the IW. | manages the IW. | |||
C.5. Observations | C.5. Observations | |||
The IW may not converge to a single, global value. It also may not | The IW may not converge to a single global value. It also may not | |||
converge at all, but rather may oscillate by a few MSS as it | converge at all but rather may oscillate by a few MSSs as it | |||
repeatedly probes the Internet for larger IWs and fails. Both | repeatedly probes the Internet for larger IWs and fails. Both | |||
properties are consistent with TCP behavior during each individual | properties are consistent with TCP behavior during each individual | |||
connection. | connection. | |||
This mechanism assumes that losses during the IW are due to IW size. | This mechanism assumes that losses during the IW are due to IW size. | |||
Persistent errors that drop packets for other reasons - e.g., OS | Persistent errors that drop packets for other reasons, e.g., OS bugs, | |||
bugs, can cause false positives. Again, this is consistent with | can cause false positives. Again, this is consistent with TCP's | |||
TCP's basic assumption that loss is caused by congestion and | basic assumption that loss is caused by congestion and requires | |||
requires backoff. This algorithm treats the IW of new connections as | backoff. This algorithm treats the IW of new connections as a long- | |||
a long-timescale backoff system. | timescale backoff system. | |||
Acknowledgments | ||||
The authors would like to thank Praveen Balasubramanian for | ||||
information regarding TCB sharing in Windows; Christoph Paasch for | ||||
information regarding TCB sharing in Apple OSs; Yuchung Cheng, Lars | ||||
Eggert, Ilpo Jarvinen, and Michael Scharf for comments on earlier | ||||
draft versions of this document; as well as members of the TCPM WG. | ||||
Earlier revisions of this work received funding from a collaborative | ||||
research project between the University of Oslo and Huawei | ||||
Technologies Co., Ltd. and were partly supported by USC/ISI's Postel | ||||
Center. | ||||
This document was prepared using 2-Word-v2.0.template.dot. | ||||
Authors' Addresses | ||||
Joe Touch | ||||
Manhattan Beach, CA 90266 | ||||
United States of America | ||||
Phone: +1 (310) 560-0334 | ||||
Email: touch@strayalpha.com | ||||
Michael Welzl | ||||
University of Oslo | ||||
PO Box 1080 Blindern | ||||
N-0316 Oslo | ||||
Norway | ||||
Phone: +47 22 85 24 20 | ||||
Email: michawe@ifi.uio.no | ||||
Safiqul Islam | ||||
University of Oslo | ||||
PO Box 1080 Blindern | ||||
Oslo N-0316 | ||||
Norway | ||||
Phone: +47 22 84 08 37 | ||||
Email: safiquli@ifi.uio.no | ||||
End of changes. 316 change blocks. | ||||
1057 lines changed or deleted | 1009 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |