rfc9293.original | rfc9293.txt | |||
---|---|---|---|---|
Internet Engineering Task Force W. Eddy, Ed. | Internet Engineering Task Force (IETF) W. Eddy, Ed. | |||
Internet-Draft MTI Systems | STD: 7 MTI Systems | |||
Obsoletes: 793, 879, 2873, 6093, 6429, 6528, 7 March 2022 | Request for Comments: 9293 August 2022 | |||
6691 (if approved) | Obsoletes: 793, 879, 2873, 6093, 6429, 6528, | |||
Updates: 5961, 1011, 1122 (if approved) | 6691 | |||
Intended status: Standards Track | Updates: 1011, 1122, 5961 | |||
Expires: 8 September 2022 | Category: Standards Track | |||
ISSN: 2070-1721 | ||||
Transmission Control Protocol (TCP) Specification | Transmission Control Protocol (TCP) | |||
draft-ietf-tcpm-rfc793bis-28 | ||||
Abstract | Abstract | |||
This document specifies the Transmission Control Protocol (TCP). TCP | This document specifies the Transmission Control Protocol (TCP). TCP | |||
is an important transport layer protocol in the Internet protocol | is an important transport-layer protocol in the Internet protocol | |||
stack, and has continuously evolved over decades of use and growth of | stack, and it has continuously evolved over decades of use and growth | |||
the Internet. Over this time, a number of changes have been made to | of the Internet. Over this time, a number of changes have been made | |||
TCP as it was specified in RFC 793, though these have only been | to TCP as it was specified in RFC 793, though these have only been | |||
documented in a piecemeal fashion. This document collects and brings | documented in a piecemeal fashion. This document collects and brings | |||
those changes together with the protocol specification from RFC 793. | those changes together with the protocol specification from RFC 793. | |||
This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, | This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, | |||
6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs | 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs | |||
1011 and 1122, and should be considered as a replacement for the | 1011 and 1122, and it should be considered as a replacement for the | |||
portions of those document dealing with TCP requirements. It also | portions of those documents dealing with TCP requirements. It also | |||
updates RFC 5961 by adding a small clarification in reset handling | updates RFC 5961 by adding a small clarification in reset handling | |||
while in the SYN-RECEIVED state. The TCP header control bits from | while in the SYN-RECEIVED state. The TCP header control bits from | |||
RFC 793 have also been updated based on RFC 3168. | RFC 793 have also been updated based on RFC 3168. | |||
RFC EDITOR NOTE: If approved for publication as an RFC, this should | ||||
be marked additionally as "STD: 7" and replace RFC 793 in that role. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 8 September 2022. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9293. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
This document may contain material from IETF Documents or IETF | This document may contain material from IETF Documents or IETF | |||
Contributions published or made publicly available before November | Contributions published or made publicly available before November | |||
10, 2008. The person(s) controlling the copyright in some of this | 10, 2008. The person(s) controlling the copyright in some of this | |||
material may not have granted the IETF Trust the right to allow | material may not have granted the IETF Trust the right to allow | |||
modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
than English. | than English. | |||
Table of Contents | Table of Contents | |||
1. Purpose and Scope . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Purpose and Scope | |||
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2. Introduction | |||
2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 | 2.1. Requirements Language | |||
2.2. Key TCP Concepts . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Key TCP Concepts | |||
3. Functional Specification . . . . . . . . . . . . . . . . . . 6 | 3. Functional Specification | |||
3.1. Header Format . . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Header Format | |||
3.2. Specific Option Definitions . . . . . . . . . . . . . . . 12 | 3.2. Specific Option Definitions | |||
3.2.1. Other Common Options . . . . . . . . . . . . . . . . 13 | 3.2.1. Other Common Options | |||
3.2.2. Experimental TCP Options . . . . . . . . . . . . . . 13 | 3.2.2. Experimental TCP Options | |||
3.3. TCP Terminology Overview . . . . . . . . . . . . . . . . 13 | 3.3. TCP Terminology Overview | |||
3.3.1. Key Connection State Variables . . . . . . . . . . . 13 | 3.3.1. Key Connection State Variables | |||
3.3.2. State Machine Overview . . . . . . . . . . . . . . . 15 | 3.3.2. State Machine Overview | |||
3.4. Sequence Numbers . . . . . . . . . . . . . . . . . . . . 18 | 3.4. Sequence Numbers | |||
3.4.1. Initial Sequence Number Selection . . . . . . . . . . 21 | 3.4.1. Initial Sequence Number Selection | |||
3.4.2. Knowing When to Keep Quiet . . . . . . . . . . . . . 23 | 3.4.2. Knowing When to Keep Quiet | |||
3.4.3. The TCP Quiet Time Concept . . . . . . . . . . . . . 23 | 3.4.3. The TCP Quiet Time Concept | |||
3.5. Establishing a connection . . . . . . . . . . . . . . . . 25 | 3.5. Establishing a Connection | |||
3.5.1. Half-Open Connections and Other Anomalies . . . . . . 28 | 3.5.1. Half-Open Connections and Other Anomalies | |||
3.5.2. Reset Generation . . . . . . . . . . . . . . . . . . 31 | 3.5.2. Reset Generation | |||
3.5.3. Reset Processing . . . . . . . . . . . . . . . . . . 32 | 3.5.3. Reset Processing | |||
3.6. Closing a Connection | ||||
3.6. Closing a Connection . . . . . . . . . . . . . . . . . . 32 | 3.6.1. Half-Closed Connections | |||
3.6.1. Half-Closed Connections . . . . . . . . . . . . . . . 35 | 3.7. Segmentation | |||
3.7. Segmentation . . . . . . . . . . . . . . . . . . . . . . 35 | 3.7.1. Maximum Segment Size Option | |||
3.7.1. Maximum Segment Size Option . . . . . . . . . . . . . 37 | 3.7.2. Path MTU Discovery | |||
3.7.2. Path MTU Discovery . . . . . . . . . . . . . . . . . 38 | 3.7.3. Interfaces with Variable MTU Values | |||
3.7.3. Interfaces with Variable MTU Values . . . . . . . . . 39 | 3.7.4. Nagle Algorithm | |||
3.7.4. Nagle Algorithm . . . . . . . . . . . . . . . . . . . 39 | 3.7.5. IPv6 Jumbograms | |||
3.7.5. IPv6 Jumbograms . . . . . . . . . . . . . . . . . . . 40 | 3.8. Data Communication | |||
3.8. Data Communication . . . . . . . . . . . . . . . . . . . 40 | 3.8.1. Retransmission Timeout | |||
3.8.1. Retransmission Timeout . . . . . . . . . . . . . . . 41 | 3.8.2. TCP Congestion Control | |||
3.8.2. TCP Congestion Control . . . . . . . . . . . . . . . 41 | 3.8.3. TCP Connection Failures | |||
3.8.3. TCP Connection Failures . . . . . . . . . . . . . . . 42 | 3.8.4. TCP Keep-Alives | |||
3.8.4. TCP Keep-Alives . . . . . . . . . . . . . . . . . . . 43 | 3.8.5. The Communication of Urgent Information | |||
3.8.5. The Communication of Urgent Information . . . . . . . 44 | 3.8.6. Managing the Window | |||
3.8.6. Managing the Window . . . . . . . . . . . . . . . . . 45 | 3.9. Interfaces | |||
3.9. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 50 | 3.9.1. User/TCP Interface | |||
3.9.1. User/TCP Interface . . . . . . . . . . . . . . . . . 50 | 3.9.2. TCP/Lower-Level Interface | |||
3.9.2. TCP/Lower-Level Interface . . . . . . . . . . . . . . 59 | 3.10. Event Processing | |||
3.10. Event Processing . . . . . . . . . . . . . . . . . . . . 61 | 3.10.1. OPEN Call | |||
3.10.1. OPEN Call . . . . . . . . . . . . . . . . . . . . . 63 | 3.10.2. SEND Call | |||
3.10.2. SEND Call . . . . . . . . . . . . . . . . . . . . . 64 | 3.10.3. RECEIVE Call | |||
3.10.3. RECEIVE Call . . . . . . . . . . . . . . . . . . . . 65 | 3.10.4. CLOSE Call | |||
3.10.4. CLOSE Call . . . . . . . . . . . . . . . . . . . . . 67 | 3.10.5. ABORT Call | |||
3.10.5. ABORT Call . . . . . . . . . . . . . . . . . . . . . 68 | 3.10.6. STATUS Call | |||
3.10.6. STATUS Call . . . . . . . . . . . . . . . . . . . . 69 | 3.10.7. SEGMENT ARRIVES | |||
3.10.7. SEGMENT ARRIVES . . . . . . . . . . . . . . . . . . 70 | 3.10.8. Timeouts | |||
3.10.8. Timeouts . . . . . . . . . . . . . . . . . . . . . . 84 | 4. Glossary | |||
4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 84 | 5. Changes from RFC 793 | |||
5. Changes from RFC 793 . . . . . . . . . . . . . . . . . . . . 89 | 6. IANA Considerations | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 96 | 7. Security and Privacy Considerations | |||
7. Security and Privacy Considerations . . . . . . . . . . . . . 97 | 8. References | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 99 | 8.1. Normative References | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 100 | 8.2. Informative References | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 100 | Appendix A. Other Implementation Notes | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 102 | A.1. IP Security Compartment and Precedence | |||
Appendix A. Other Implementation Notes . . . . . . . . . . . . . 107 | A.1.1. Precedence | |||
A.1. IP Security Compartment and Precedence . . . . . . . . . 108 | A.1.2. MLS Systems | |||
A.1.1. Precedence . . . . . . . . . . . . . . . . . . . . . 108 | A.2. Sequence Number Validation | |||
A.1.2. MLS Systems . . . . . . . . . . . . . . . . . . . . . 109 | A.3. Nagle Modification | |||
A.2. Sequence Number Validation . . . . . . . . . . . . . . . 109 | A.4. Low Watermark Settings | |||
A.3. Nagle Modification . . . . . . . . . . . . . . . . . . . 109 | Appendix B. TCP Requirement Summary | |||
A.4. Low Watermark Settings . . . . . . . . . . . . . . . . . 110 | Acknowledgments | |||
Appendix B. TCP Requirement Summary . . . . . . . . . . . . . . 110 | Author's Address | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 114 | ||||
1. Purpose and Scope | 1. Purpose and Scope | |||
In 1981, RFC 793 [16] was released, documenting the Transmission | In 1981, RFC 793 [16] was released, documenting the Transmission | |||
Control Protocol (TCP), and replacing earlier specifications for TCP | Control Protocol (TCP) and replacing earlier published specifications | |||
that had been published in the past. | for TCP. | |||
Since then, TCP has been widely implemented, and has been used as a | Since then, TCP has been widely implemented, and it has been used as | |||
transport protocol for numerous applications on the Internet. | a transport protocol for numerous applications on the Internet. | |||
For several decades, RFC 793 plus a number of other documents have | For several decades, RFC 793 plus a number of other documents have | |||
combined to serve as the core specification for TCP [50]. Over time, | combined to serve as the core specification for TCP [49]. Over time, | |||
a number of errata have been filed against RFC 793. There have also | a number of errata have been filed against RFC 793. There have also | |||
been deficiencies found and resolved in security, performance, and | been deficiencies found and resolved in security, performance, and | |||
many other aspects. The number of enhancements has grown over time | many other aspects. The number of enhancements has grown over time | |||
across many separate documents. These were never accumulated | across many separate documents. These were never accumulated | |||
together into a comprehensive update to the base specification. | together into a comprehensive update to the base specification. | |||
The purpose of this document is to bring together all of the IETF | The purpose of this document is to bring together all of the IETF | |||
Standards Track changes and other clarifications that have been made | Standards Track changes and other clarifications that have been made | |||
to the base TCP functional specification and unify them into an | to the base TCP functional specification (RFC 793) and to unify them | |||
updated version of RFC 793. | into an updated version of the specification. | |||
Some companion documents are referenced for important algorithms that | Some companion documents are referenced for important algorithms that | |||
are used by TCP (e.g. for congestion control), but have not been | are used by TCP (e.g., for congestion control) but have not been | |||
completely included in this document. This is a conscious choice, as | completely included in this document. This is a conscious choice, as | |||
this base specification can be used with multiple additional | this base specification can be used with multiple additional | |||
algorithms that are developed and incorporated separately. This | algorithms that are developed and incorporated separately. This | |||
document focuses on the common basis all TCP implementations must | document focuses on the common basis that all TCP implementations | |||
support in order to interoperate. Since some additional TCP features | must support in order to interoperate. Since some additional TCP | |||
have become quite complicated themselves (e.g. advanced loss recovery | features have become quite complicated themselves (e.g., advanced | |||
and congestion control), future companion documents may attempt to | loss recovery and congestion control), future companion documents may | |||
similarly bring these together. | attempt to similarly bring these together. | |||
In addition to the protocol specification that describes the TCP | In addition to the protocol specification that describes the TCP | |||
segment format, generation, and processing rules that are to be | segment format, generation, and processing rules that are to be | |||
implemented in code, RFC 793 and other updates also contain | implemented in code, RFC 793 and other updates also contain | |||
informative and descriptive text for readers to understand aspects of | informative and descriptive text for readers to understand aspects of | |||
the protocol design and operation. This document does not attempt to | the protocol design and operation. This document does not attempt to | |||
alter or update this informative text, and is focused only on | alter or update this informative text and is focused only on updating | |||
updating the normative protocol specification. This document | the normative protocol specification. This document preserves | |||
preserves references to the documentation containing the important | references to the documentation containing the important explanations | |||
explanations and rationale, where appropriate. | and rationale, where appropriate. | |||
This document is intended to be useful both in checking existing TCP | This document is intended to be useful both in checking existing TCP | |||
implementations for conformance purposes, as well as in writing new | implementations for conformance purposes, as well as in writing new | |||
implementations. | implementations. | |||
2. Introduction | 2. Introduction | |||
RFC 793 contains a discussion of the TCP design goals and provides | RFC 793 contains a discussion of the TCP design goals and provides | |||
examples of its operation, including examples of connection | examples of its operation, including examples of connection | |||
establishment, connection termination, and packet retransmission to | establishment, connection termination, and packet retransmission to | |||
repair losses. | repair losses. | |||
This document describes the basic functionality expected in modern | This document describes the basic functionality expected in modern | |||
TCP implementations, and replaces the protocol specification in RFC | TCP implementations and replaces the protocol specification in RFC | |||
793. It does not replicate or attempt to update the introduction and | 793. It does not replicate or attempt to update the introduction and | |||
philosophy content in Sections 1 and 2 of RFC 793. Other documents | philosophy content in Sections 1 and 2 of RFC 793. Other documents | |||
are referenced to provide explanation of the theory of operation, | are referenced to provide explanations of the theory of operation, | |||
rationale, and detailed discussion of design decisions. This | rationale, and detailed discussion of design decisions. This | |||
document only focuses on the normative behavior of the protocol. | document only focuses on the normative behavior of the protocol. | |||
The "TCP Roadmap" [50] provides a more extensive guide to the RFCs | The "TCP Roadmap" [49] provides a more extensive guide to the RFCs | |||
that define TCP and describe various important algorithms. The TCP | that define TCP and describe various important algorithms. The TCP | |||
Roadmap contains sections on strongly encouraged enhancements that | Roadmap contains sections on strongly encouraged enhancements that | |||
improve performance and other aspects of TCP beyond the basic | improve performance and other aspects of TCP beyond the basic | |||
operation specified in this document. As one example, implementing | operation specified in this document. As one example, implementing | |||
congestion control (e.g. [8]) is a TCP requirement, but is a complex | congestion control (e.g., [8]) is a TCP requirement, but it is a | |||
topic on its own, and not described in detail in this document, as | complex topic on its own and not described in detail in this | |||
there are many options and possibilities that do not impact basic | document, as there are many options and possibilities that do not | |||
interoperability. Similarly, most TCP implementations today include | impact basic interoperability. Similarly, most TCP implementations | |||
the high-performance extensions in [48], but these are not strictly | today include the high-performance extensions in [47], but these are | |||
required or discussed in this document. Multipath considerations for | not strictly required or discussed in this document. Multipath | |||
TCP are also specified separately in [59]. | considerations for TCP are also specified separately in [59]. | |||
A list of changes from RFC 793 is contained in Section 5. | A list of changes from RFC 793 is contained in Section 5. | |||
2.1. Requirements Language | 2.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [3][12] when, and only when, they appear in all capitals, as shown | BCP 14 [3] [12] when, and only when, they appear in all capitals, as | |||
here. | shown here. | |||
Each use of RFC 2119 keywords in the document is individually labeled | Each use of RFC 2119 keywords in the document is individually labeled | |||
and referenced in Appendix B that summarizes implementation | and referenced in Appendix B, which summarizes implementation | |||
requirements. | requirements. | |||
Sentences using "MUST" are labeled as "MUST-X" with X being a numeric | Sentences using "MUST" are labeled as "MUST-X" with X being a numeric | |||
identifier enabling the requirement to be located easily when | identifier enabling the requirement to be located easily when | |||
referenced from Appendix B. | referenced from Appendix B. | |||
Similarly, sentences using "SHOULD" are labeled with "SHLD-X", "MAY" | Similarly, sentences using "SHOULD" are labeled with "SHLD-X", "MAY" | |||
with "MAY-X", and "RECOMMENDED" with "REC-X". | with "MAY-X", and "RECOMMENDED" with "REC-X". | |||
For the purposes of this labeling, "SHOULD NOT" and "MUST NOT" are | For the purposes of this labeling, "SHOULD NOT" and "MUST NOT" are | |||
skipping to change at page 6, line 21 ¶ | skipping to change at line 250 ¶ | |||
applications. | applications. | |||
The application byte-stream is conveyed over the network via TCP | The application byte-stream is conveyed over the network via TCP | |||
segments, with each TCP segment sent as an Internet Protocol (IP) | segments, with each TCP segment sent as an Internet Protocol (IP) | |||
datagram. | datagram. | |||
TCP reliability consists of detecting packet losses (via sequence | TCP reliability consists of detecting packet losses (via sequence | |||
numbers) and errors (via per-segment checksums), as well as | numbers) and errors (via per-segment checksums), as well as | |||
correction via retransmission. | correction via retransmission. | |||
TCP supports unicast delivery of data. Anycast applications exist | TCP supports unicast delivery of data. There are anycast | |||
that successfully use TCP without modifications, though there is some | applications that can successfully use TCP without modifications, | |||
risk of instability due to changes of lower-layer forwarding behavior | though there is some risk of instability due to changes of lower- | |||
[47]. | layer forwarding behavior [46]. | |||
TCP is connection-oriented, though does not inherently include a | TCP is connection oriented, though it does not inherently include a | |||
liveness detection capability. | liveness detection capability. | |||
Data flow is supported bidirectionally over TCP connections, though | Data flow is supported bidirectionally over TCP connections, though | |||
applications are free to send data only unidirectionally, if they so | applications are free to send data only unidirectionally, if they so | |||
choose. | choose. | |||
TCP uses port numbers to identify application services and to | TCP uses port numbers to identify application services and to | |||
multiplex distinct flows between hosts. | multiplex distinct flows between hosts. | |||
A more detailed description of TCP features compared to other | A more detailed description of TCP features compared to other | |||
transport protocols can be found in Section 3.1 of [53]. Further | transport protocols can be found in Section 3.1 of [52]. Further | |||
description of the motivations for developing TCP and its role in the | description of the motivations for developing TCP and its role in the | |||
Internet protocol stack can be found in Section 2 of [16] and earlier | Internet protocol stack can be found in Section 2 of [16] and earlier | |||
versions of the TCP specification. | versions of the TCP specification. | |||
3. Functional Specification | 3. Functional Specification | |||
3.1. Header Format | 3.1. Header Format | |||
TCP segments are sent as internet datagrams. The Internet Protocol | TCP segments are sent as internet datagrams. The Internet Protocol | |||
(IP) header carries several information fields, including the source | (IP) header carries several information fields, including the source | |||
and destination host addresses [1] [13]. A TCP header follows the IP | and destination host addresses [1] [13]. A TCP header follows the IP | |||
headers, supplying information specific to the TCP protocol. This | headers, supplying information specific to TCP. This division allows | |||
division allows for the existence of host level protocols other than | for the existence of host-level protocols other than TCP. In the | |||
TCP. In early development of the Internet suite of protocols, the IP | early development of the Internet suite of protocols, the IP header | |||
header fields had been a part of TCP. | fields had been a part of TCP. | |||
This document describes the TCP protocol. The TCP protocol uses TCP | This document describes TCP, which uses TCP headers. | |||
Headers. | ||||
A TCP Header, followed by any user data in the segment, is formatted | A TCP header, followed by any user data in the segment, is formatted | |||
as follows, using the style from [67]: | as follows, using the style from [66]: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Source Port | Destination Port | | | Source Port | Destination Port | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Sequence Number | | | Sequence Number | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Acknowledgment Number | | | Acknowledgment Number | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
skipping to change at page 7, line 39 ¶ | skipping to change at line 316 ¶ | |||
: Data : | : Data : | |||
: | | : | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Note that one tick mark represents one bit position. | Note that one tick mark represents one bit position. | |||
Figure 1: TCP Header Format | Figure 1: TCP Header Format | |||
where: | where: | |||
Source Port: 16 bits. | Source Port: 16 bits | |||
The source port number. | The source port number. | |||
Destination Port: 16 bits. | Destination Port: 16 bits | |||
The destination port number. | The destination port number. | |||
Sequence Number: 32 bits. | Sequence Number: 32 bits | |||
The sequence number of the first data octet in this segment (except | The sequence number of the first data octet in this segment (except | |||
when the SYN flag is set). If SYN is set the sequence number is | when the SYN flag is set). If SYN is set, the sequence number is | |||
the initial sequence number (ISN) and the first data octet is | the initial sequence number (ISN) and the first data octet is | |||
ISN+1. | ISN+1. | |||
Acknowledgment Number: 32 bits. | Acknowledgment Number: 32 bits | |||
If the ACK control bit is set, this field contains the value of the | If the ACK control bit is set, this field contains the value of the | |||
next sequence number the sender of the segment is expecting to | next sequence number the sender of the segment is expecting to | |||
receive. Once a connection is established, this is always sent. | receive. Once a connection is established, this is always sent. | |||
Data Offset (DOffset): 4 bits. | Data Offset (DOffset): 4 bits | |||
The number of 32 bit words in the TCP Header. This indicates where | ||||
The number of 32-bit words in the TCP header. This indicates where | ||||
the data begins. The TCP header (even one including options) is an | the data begins. The TCP header (even one including options) is an | |||
integer multiple of 32 bits long. | integer multiple of 32 bits long. | |||
Reserved (Rsrvd): 4 bits. | Reserved (Rsrvd): 4 bits | |||
A set of control bits reserved for future use. Must be zero in | A set of control bits reserved for future use. Must be zero in | |||
generated segments and must be ignored in received segments, if | generated segments and must be ignored in received segments if the | |||
corresponding future features are unimplemented by the sending or | corresponding future features are not implemented by the sending or | |||
receiving host. | receiving host. | |||
The control bits are also known as "flags". Assignment is managed | Control bits: The control bits are also known as "flags". | |||
by IANA from the "TCP Header Flags" registry [63]. The currently | Assignment is managed by IANA from the "TCP Header Flags" registry | |||
assigned control bits are CWR, ECE, URG, ACK, PSH, RST, SYN, and | [62]. The currently assigned control bits are CWR, ECE, URG, ACK, | |||
FIN. | PSH, RST, SYN, and FIN. | |||
CWR: 1 bit. | CWR: 1 bit | |||
Congestion Window Reduced (see [6]). | ||||
ECE: 1 bit. | Congestion Window Reduced (see [6]). | |||
ECN-Echo (see [6]). | ||||
URG: 1 bit. | ECE: 1 bit | |||
Urgent Pointer field is significant. | ||||
ACK: 1 bit. | ECN-Echo (see [6]). | |||
Acknowledgment field is significant. | ||||
PSH: 1 bit. | URG: 1 bit | |||
Push Function (see the Send Call description in Section 3.9.1). | ||||
RST: 1 bit. | Urgent pointer field is significant. | |||
Reset the connection. | ||||
SYN: 1 bit. | ACK: 1 bit | |||
Synchronize sequence numbers. | ||||
FIN: 1 bit. | Acknowledgment field is significant. | |||
No more data from sender. | ||||
Window: 16 bits. | PSH: 1 bit | |||
Push function (see the Send Call description in Section 3.9.1). | ||||
RST: 1 bit | ||||
Reset the connection. | ||||
SYN: 1 bit | ||||
Synchronize sequence numbers. | ||||
FIN: 1 bit | ||||
No more data from sender. | ||||
Window: 16 bits | ||||
The number of data octets beginning with the one indicated in the | The number of data octets beginning with the one indicated in the | |||
acknowledgment field that the sender of this segment is willing to | acknowledgment field that the sender of this segment is willing to | |||
accept. The value is shifted when the Window Scaling extension is | accept. The value is shifted when the window scaling extension is | |||
used [48]. | used [47]. | |||
The window size MUST be treated as an unsigned number, or else | The window size MUST be treated as an unsigned number, or else | |||
large window sizes will appear like negative windows and TCP will | large window sizes will appear like negative windows and TCP will | |||
not work (MUST-1). It is RECOMMENDED that implementations will | not work (MUST-1). It is RECOMMENDED that implementations will | |||
reserve 32-bit fields for the send and receive window sizes in the | reserve 32-bit fields for the send and receive window sizes in the | |||
connection record and do all window computations with 32 bits (REC- | connection record and do all window computations with 32 bits (REC- | |||
1). | 1). | |||
Checksum: 16 bits. | Checksum: 16 bits | |||
The checksum field is the 16 bit ones' complement of the ones' | ||||
complement sum of all 16 bit words in the header and text. The | The checksum field is the 16-bit ones' complement of the ones' | |||
complement sum of all 16-bit words in the header and text. The | ||||
checksum computation needs to ensure the 16-bit alignment of the | checksum computation needs to ensure the 16-bit alignment of the | |||
data being summed. If a segment contains an odd number of header | data being summed. If a segment contains an odd number of header | |||
and text octets, alignment can be achieved by padding the last | and text octets, alignment can be achieved by padding the last | |||
octet with zeros on its right to form a 16 bit word for checksum | octet with zeros on its right to form a 16-bit word for checksum | |||
purposes. The pad is not transmitted as part of the segment. | purposes. The pad is not transmitted as part of the segment. | |||
While computing the checksum, the checksum field itself is replaced | While computing the checksum, the checksum field itself is replaced | |||
with zeros. | with zeros. | |||
The checksum also covers a pseudo header (Figure 2) conceptually | The checksum also covers a pseudo-header (Figure 2) conceptually | |||
prefixed to the TCP header. The pseudo header is 96 bits for IPv4 | prefixed to the TCP header. The pseudo-header is 96 bits for IPv4 | |||
and 320 bits for IPv6. Including the pseudo header in the checksum | and 320 bits for IPv6. Including the pseudo-header in the checksum | |||
gives the TCP connection protection against misrouted segments. | gives the TCP connection protection against misrouted segments. | |||
This information is carried in IP headers and is transferred across | This information is carried in IP headers and is transferred across | |||
the TCP/Network interface in the arguments or results of calls by | the TCP/network interface in the arguments or results of calls by | |||
the TCP implementation on the IP layer. | the TCP implementation on the IP layer. | |||
+--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
| Source Address | | | Source Address | | |||
+--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
| Destination Address | | | Destination Address | | |||
+--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
| zero | PTCL | TCP Length | | | zero | PTCL | TCP Length | | |||
+--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
Figure 2: IPv4 Pseudo Header | Figure 2: IPv4 Pseudo-header | |||
Pseudo header components for IPv4: | Pseudo-header components for IPv4: | |||
Source Address: the IPv4 source address in network byte order | Source Address: the IPv4 source address in network byte order | |||
Destination Address: the IPv4 destination address in network | Destination Address: the IPv4 destination address in network | |||
byte order | byte order | |||
zero: bits set to zero | ||||
PTCL: the protocol number from the IP header | zero: bits set to zero | |||
TCP Length: the TCP header length plus the data length in | PTCL: the protocol number from the IP header | |||
octets (this is not an explicitly transmitted quantity, but is | ||||
computed), and it does not count the 12 octets of the pseudo | TCP Length: the TCP header length plus the data length in octets | |||
(this is not an explicitly transmitted quantity but is | ||||
computed), and it does not count the 12 octets of the pseudo- | ||||
header. | header. | |||
For IPv6, the pseudo header is defined in Section 8.1 of RFC 8200 | For IPv6, the pseudo-header is defined in Section 8.1 of RFC 8200 | |||
[13], and contains the IPv6 Source Address and Destination | [13] and contains the IPv6 Source Address and Destination Address, | |||
Address, an Upper Layer Packet Length (a 32-bit value otherwise | an Upper-Layer Packet Length (a 32-bit value otherwise equivalent | |||
equivalent to TCP Length in the IPv4 pseudo header), three bytes | to TCP Length in the IPv4 pseudo-header), three bytes of zero | |||
of zero-padding, and a Next Header value (differing from the IPv6 | padding, and a Next Header value, which differs from the IPv6 | |||
header value in the case of extension headers present in between | header value if there are extension headers present between IPv6 | |||
IPv6 and TCP). | and TCP. | |||
The TCP checksum is never optional. The sender MUST generate it | The TCP checksum is never optional. The sender MUST generate it | |||
(MUST-2) and the receiver MUST check it (MUST-3). | (MUST-2) and the receiver MUST check it (MUST-3). | |||
Urgent Pointer: 16 bits | ||||
Urgent Pointer: 16 bits. | ||||
This field communicates the current value of the urgent pointer as | This field communicates the current value of the urgent pointer as | |||
a positive offset from the sequence number in this segment. The | a positive offset from the sequence number in this segment. The | |||
urgent pointer points to the sequence number of the octet following | urgent pointer points to the sequence number of the octet following | |||
the urgent data. This field is only to be interpreted in segments | the urgent data. This field is only to be interpreted in segments | |||
with the URG control bit set. | with the URG control bit set. | |||
Options: [TCP Option]; size(Options) == (DOffset-5)*32; present | Options: [TCP Option]; size(Options) == (DOffset-5)*32; present only | |||
only when DOffset > 5. Note that this size expression also | when DOffset > 5. Note that this size expression also includes any | |||
includes any padding trailing the actual options present. | padding trailing the actual options present. | |||
Options may occupy space at the end of the TCP header and are a | Options may occupy space at the end of the TCP header and are a | |||
multiple of 8 bits in length. All options are included in the | multiple of 8 bits in length. All options are included in the | |||
checksum. An option may begin on any octet boundary. There are | checksum. An option may begin on any octet boundary. There are | |||
two cases for the format of an option: | two cases for the format of an option: | |||
Case 1: A single octet of option-kind. | Case 1: A single octet of option-kind. | |||
Case 2: An octet of option-kind (Kind), an octet of option- | Case 2: An octet of option-kind (Kind), an octet of option-length, | |||
length, and the actual option-data octets. | and the actual option-data octets. | |||
The option-length counts the two octets of option-kind and option- | The option-length counts the two octets of option-kind and option- | |||
length as well as the option-data octets. | length as well as the option-data octets. | |||
Note that the list of options may be shorter than the data offset | Note that the list of options may be shorter than the Data Offset | |||
field might imply. The content of the header beyond the End-of- | field might imply. The content of the header beyond the End of | |||
Option option MUST be header padding of zeros (MUST-69). | Option List Option MUST be header padding of zeros (MUST-69). | |||
The list of all currently defined options is managed by IANA [62], | The list of all currently defined options is managed by IANA [62], | |||
and each option is defined in other RFCs, as indicated there. That | and each option is defined in other RFCs, as indicated there. That | |||
set includes experimental options that can be extended to support | set includes experimental options that can be extended to support | |||
multiple concurrent usages [46]. | multiple concurrent usages [45]. | |||
A given TCP implementation can support any currently defined | A given TCP implementation can support any currently defined | |||
options, but the following options MUST be supported (MUST-4 - note | options, but the following options MUST be supported (MUST-4 -- | |||
Maximum Segment Size option support is also part of MUST-19 in | note Maximum Segment Size Option support is also part of MUST-14 in | |||
Section 3.7.2): | Section 3.7.1): | |||
Kind Length Meaning | +======+========+============================+ | |||
---- ------ ------- | | Kind | Length | Meaning | | |||
0 - End of option list. | +======+========+============================+ | |||
1 - No-Operation. | | 0 | - | End of Option List Option. | | |||
2 4 Maximum Segment Size. | +------+--------+----------------------------+ | |||
| 1 | - | No-Operation. | | ||||
+------+--------+----------------------------+ | ||||
| 2 | 4 | Maximum Segment Size. | | ||||
+------+--------+----------------------------+ | ||||
Table 1: Mandatory Option Set | ||||
These options are specified in detail in Section 3.2. | These options are specified in detail in Section 3.2. | |||
A TCP implementation MUST be able to receive a TCP option in any | A TCP implementation MUST be able to receive a TCP Option in any | |||
segment (MUST-5). | segment (MUST-5). | |||
A TCP implementation MUST (MUST-6) ignore without error any TCP | A TCP implementation MUST (MUST-6) ignore without error any TCP | |||
option it does not implement, assuming that the option has a length | Option it does not implement, assuming that the option has a length | |||
field. All TCP options except End of option list and No-Operation | field. All TCP Options except End of Option List Option (EOL) and | |||
MUST have length fields, including all future options (MUST-68). | No-Operation (NOP) MUST have length fields, including all future | |||
TCP implementations MUST be prepared to handle an illegal option | options (MUST-68). TCP implementations MUST be prepared to handle | |||
length (e.g., zero); a suggested procedure is to reset the | an illegal option length (e.g., zero); a suggested procedure is to | |||
connection and log the error cause (MUST-7). | reset the connection and log the error cause (MUST-7). | |||
Note: There is ongoing work to extend the space available for TCP | Note: There is ongoing work to extend the space available for TCP | |||
options, such as [66]. | Options, such as [65]. | |||
Data: variable length | ||||
Data: variable length. | ||||
User data carried by the TCP segment. | User data carried by the TCP segment. | |||
3.2. Specific Option Definitions | 3.2. Specific Option Definitions | |||
A TCP Option, in the mandatory option set, is one of: an End of | A TCP Option, in the mandatory option set, is one of an End of Option | |||
Option List Option, a No-Operation Option, or a Maximum Segment Size | List Option, a No-Operation Option, or a Maximum Segment Size Option. | |||
Option. | ||||
An End of Option List Option is formatted as follows: | An End of Option List Option is formatted as follows: | |||
0 | 0 | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| 0 | | | 0 | | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
where: | where: | |||
Kind: 1 byte; Kind == 0. | Kind: 1 byte; Kind == 0. | |||
This option code indicates the end of the option list. This might | This option code indicates the end of the option list. This might | |||
not coincide with the end of the TCP header according to the Data | not coincide with the end of the TCP header according to the Data | |||
Offset field. This is used at the end of all options, not the end | Offset field. This is used at the end of all options, not the end | |||
of each option, and need only be used if the end of the options | of each option, and need only be used if the end of the options | |||
would not otherwise coincide with the end of the TCP header. | would not otherwise coincide with the end of the TCP header. | |||
A No-Operation Option is formatted as follows: | A No-Operation Option is formatted as follows: | |||
0 | 0 | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| 1 | | | 1 | | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
where: | where: | |||
Kind: 1 byte; Kind == 1. | Kind: 1 byte; Kind == 1. | |||
This option code can be used between options, for example, to align | This option code can be used between options, for example, to align | |||
the beginning of a subsequent option on a word boundary. There is | the beginning of a subsequent option on a word boundary. There is | |||
no guarantee that senders will use this option, so receivers MUST | no guarantee that senders will use this option, so receivers MUST | |||
be prepared to process options even if they do not begin on a word | be prepared to process options even if they do not begin on a word | |||
boundary (MUST-64). | boundary (MUST-64). | |||
A Maximum Segment Size Option is formatted as follows: | A Maximum Segment Size Option is formatted as follows: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 2 | Length | Maximum Segment Size (MSS) | | | 2 | Length | Maximum Segment Size (MSS) | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
where: | where: | |||
Kind: 1 byte; Kind == 2. | Kind: 1 byte; Kind == 2. | |||
If this option is present, then it communicates the maximum receive | If this option is present, then it communicates the maximum receive | |||
segment size at the TCP endpoint that sends this segment. This | segment size at the TCP endpoint that sends this segment. This | |||
value is limited by the IP reassembly limit. This field may be | value is limited by the IP reassembly limit. This field may be | |||
sent in the initial connection request (i.e., in segments with the | sent in the initial connection request (i.e., in segments with the | |||
SYN control bit set) and MUST NOT be sent in other segments (MUST- | SYN control bit set) and MUST NOT be sent in other segments (MUST- | |||
65). If this option is not used, any segment size is allowed. A | 65). If this option is not used, any segment size is allowed. A | |||
more complete description of this option is provided in | more complete description of this option is provided in | |||
Section 3.7.1. | Section 3.7.1. | |||
Length: 1 byte; Length == 4. | Length: 1 byte; Length == 4. | |||
Length of the option in bytes. | Length of the option in bytes. | |||
Maximum Segment Size (MSS): 2 bytes. | Maximum Segment Size (MSS): 2 bytes. | |||
The maximum receive segment size at the TCP endpoint that sends | The maximum receive segment size at the TCP endpoint that sends | |||
this segment. | this segment. | |||
3.2.1. Other Common Options | 3.2.1. Other Common Options | |||
Additional RFCs define some other commonly used options that are | Additional RFCs define some other commonly used options that are | |||
recommended to implement for high performance, but not necessary for | recommended to implement for high performance but are not necessary | |||
basic TCP interoperability. These are the TCP Selective | for basic TCP interoperability. These are the TCP Selective | |||
Acknowledgement (SACK) option [23][27], TCP Timestamp (TS) option | Acknowledgment (SACK) Option [22] [26], TCP Timestamp (TS) Option | |||
[48], and TCP Window Scaling (WS) option [48]. | [47], and TCP Window Scale (WS) Option [47]. | |||
3.2.2. Experimental TCP Options | 3.2.2. Experimental TCP Options | |||
Experimental TCP option values are defined in [31], and [46] | Experimental TCP Option values are defined in [30], and [45] | |||
describes the current recommended usage for these experimental | describes the current recommended usage for these experimental | |||
values. | values. | |||
3.3. TCP Terminology Overview | 3.3. TCP Terminology Overview | |||
This section includes an overview of key terms needed to understand | This section includes an overview of key terms needed to understand | |||
the detailed protocol operation in the rest of the document. There | the detailed protocol operation in the rest of the document. There | |||
is a glossary of terms in Section 4. | is a glossary of terms in Section 4. | |||
3.3.1. Key Connection State Variables | 3.3.1. Key Connection State Variables | |||
Before we can discuss very much about the operation of the TCP | Before we can discuss the operation of the TCP implementation in | |||
implementation we need to introduce some detailed terminology. The | detail, we need to introduce some detailed terminology. The | |||
maintenance of a TCP connection requires maintaining state for | maintenance of a TCP connection requires maintaining state for | |||
several variables. We conceive of these variables being stored in a | several variables. We conceive of these variables being stored in a | |||
connection record called a Transmission Control Block or TCB. Among | connection record called a Transmission Control Block or TCB. Among | |||
the variables stored in the TCB are the local and remote IP addresses | the variables stored in the TCB are the local and remote IP addresses | |||
and port numbers, the IP security level and compartment of the | and port numbers, the IP security level, and compartment of the | |||
connection (see Appendix A.1), pointers to the user's send and | connection (see Appendix A.1), pointers to the user's send and | |||
receive buffers, pointers to the retransmit queue and to the current | receive buffers, pointers to the retransmit queue and to the current | |||
segment. In addition, several variables relating to the send and | segment. In addition, several variables relating to the send and | |||
receive sequence numbers are stored in the TCB. | receive sequence numbers are stored in the TCB. | |||
Send Sequence Variables: | +==========+=====================================================+ | |||
| Variable | Description | | ||||
+==========+=====================================================+ | ||||
| SND.UNA | send unacknowledged | | ||||
+----------+-----------------------------------------------------+ | ||||
| SND.NXT | send next | | ||||
+----------+-----------------------------------------------------+ | ||||
| SND.WND | send window | | ||||
+----------+-----------------------------------------------------+ | ||||
| SND.UP | send urgent pointer | | ||||
+----------+-----------------------------------------------------+ | ||||
| SND.WL1 | segment sequence number used for last window update | | ||||
+----------+-----------------------------------------------------+ | ||||
| SND.WL2 | segment acknowledgment number used for last window | | ||||
| | update | | ||||
+----------+-----------------------------------------------------+ | ||||
| ISS | initial send sequence number | | ||||
+----------+-----------------------------------------------------+ | ||||
SND.UNA - send unacknowledged | Table 2: Send Sequence Variables | |||
SND.NXT - send next | ||||
SND.WND - send window | ||||
SND.UP - send urgent pointer | ||||
SND.WL1 - segment sequence number used for last window update | ||||
SND.WL2 - segment acknowledgment number used for last window | ||||
update | ||||
ISS - initial send sequence number | ||||
Receive Sequence Variables: | +==========+=================================+ | |||
| Variable | Description | | ||||
+==========+=================================+ | ||||
| RCV.NXT | receive next | | ||||
+----------+---------------------------------+ | ||||
| RCV.WND | receive window | | ||||
+----------+---------------------------------+ | ||||
| RCV.UP | receive urgent pointer | | ||||
+----------+---------------------------------+ | ||||
| IRS | initial receive sequence number | | ||||
+----------+---------------------------------+ | ||||
RCV.NXT - receive next | Table 3: Receive Sequence Variables | |||
RCV.WND - receive window | ||||
RCV.UP - receive urgent pointer | ||||
IRS - initial receive sequence number | ||||
The following diagrams may help to relate some of these variables to | The following diagrams may help to relate some of these variables to | |||
the sequence space. | the sequence space. | |||
1 2 3 4 | 1 2 3 4 | |||
----------|----------|----------|---------- | ----------|----------|----------|---------- | |||
SND.UNA SND.NXT SND.UNA | SND.UNA SND.NXT SND.UNA | |||
+SND.WND | +SND.WND | |||
1 - old sequence numbers that have been acknowledged | 1 - old sequence numbers that have been acknowledged | |||
skipping to change at page 15, line 22 ¶ | skipping to change at line 703 ¶ | |||
3 - future sequence numbers that are not yet allowed | 3 - future sequence numbers that are not yet allowed | |||
Figure 4: Receive Sequence Space | Figure 4: Receive Sequence Space | |||
The receive window is the portion of the sequence space labeled 2 in | The receive window is the portion of the sequence space labeled 2 in | |||
Figure 4. | Figure 4. | |||
There are also some variables used frequently in the discussion that | There are also some variables used frequently in the discussion that | |||
take their values from the fields of the current segment. | take their values from the fields of the current segment. | |||
Current Segment Variables: | +==========+===============================+ | |||
| Variable | Description | | ||||
+==========+===============================+ | ||||
| SEG.SEQ | segment sequence number | | ||||
+----------+-------------------------------+ | ||||
| SEG.ACK | segment acknowledgment number | | ||||
+----------+-------------------------------+ | ||||
| SEG.LEN | segment length | | ||||
+----------+-------------------------------+ | ||||
| SEG.WND | segment window | | ||||
+----------+-------------------------------+ | ||||
| SEG.UP | segment urgent pointer | | ||||
+----------+-------------------------------+ | ||||
SEG.SEQ - segment sequence number | Table 4: Current Segment Variables | |||
SEG.ACK - segment acknowledgment number | ||||
SEG.LEN - segment length | ||||
SEG.WND - segment window | ||||
SEG.UP - segment urgent pointer | ||||
3.3.2. State Machine Overview | 3.3.2. State Machine Overview | |||
A connection progresses through a series of states during its | A connection progresses through a series of states during its | |||
lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, | lifetime. The states are: LISTEN, SYN-SENT, SYN-RECEIVED, | |||
ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, | ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, | |||
TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional | TIME-WAIT, and the fictional state CLOSED. CLOSED is fictional | |||
because it represents the state when there is no TCB, and therefore, | because it represents the state when there is no TCB, and therefore, | |||
no connection. Briefly the meanings of the states are: | no connection. Briefly the meanings of the states are: | |||
LISTEN - represents waiting for a connection request from any | LISTEN - represents waiting for a connection request from any remote | |||
remote TCP peer and port. | TCP peer and port. | |||
SYN-SENT - represents waiting for a matching connection request | SYN-SENT - represents waiting for a matching connection request | |||
after having sent a connection request. | after having sent a connection request. | |||
SYN-RECEIVED - represents waiting for a confirming connection | SYN-RECEIVED - represents waiting for a confirming connection | |||
request acknowledgment after having both received and sent a | request acknowledgment after having both received and sent a | |||
connection request. | connection request. | |||
ESTABLISHED - represents an open connection, data received can be | ESTABLISHED - represents an open connection, data received can be | |||
delivered to the user. The normal state for the data transfer | delivered to the user. The normal state for the data transfer | |||
phase of the connection. | phase of the connection. | |||
FIN-WAIT-1 - represents waiting for a connection termination | FIN-WAIT-1 - represents waiting for a connection termination request | |||
request from the remote TCP peer, or an acknowledgment of the | from the remote TCP peer, or an acknowledgment of the connection | |||
connection termination request previously sent. | termination request previously sent. | |||
FIN-WAIT-2 - represents waiting for a connection termination | FIN-WAIT-2 - represents waiting for a connection termination request | |||
request from the remote TCP peer. | from the remote TCP peer. | |||
CLOSE-WAIT - represents waiting for a connection termination | CLOSE-WAIT - represents waiting for a connection termination request | |||
request from the local user. | from the local user. | |||
CLOSING - represents waiting for a connection termination request | CLOSING - represents waiting for a connection termination request | |||
acknowledgment from the remote TCP peer. | acknowledgment from the remote TCP peer. | |||
LAST-ACK - represents waiting for an acknowledgment of the | LAST-ACK - represents waiting for an acknowledgment of the | |||
connection termination request previously sent to the remote TCP | connection termination request previously sent to the remote TCP | |||
peer (this termination request sent to the remote TCP peer already | peer (this termination request sent to the remote TCP peer already | |||
included an acknowledgment of the termination request sent from | included an acknowledgment of the termination request sent from | |||
the remote TCP peer). | the remote TCP peer). | |||
TIME-WAIT - represents waiting for enough time to pass to be sure | TIME-WAIT - represents waiting for enough time to pass to be sure | |||
the remote TCP peer received the acknowledgment of its connection | the remote TCP peer received the acknowledgment of its connection | |||
termination request, and to avoid new connections being impacted | termination request and to avoid new connections being impacted by | |||
by delayed segments from previous connections. | delayed segments from previous connections. | |||
CLOSED - represents no connection state at all. | CLOSED - represents no connection state at all. | |||
A TCP connection progresses from one state to another in response to | A TCP connection progresses from one state to another in response to | |||
events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, | events. The events are the user calls, OPEN, SEND, RECEIVE, CLOSE, | |||
ABORT, and STATUS; the incoming segments, particularly those | ABORT, and STATUS; the incoming segments, particularly those | |||
containing the SYN, ACK, RST and FIN flags; and timeouts. | containing the SYN, ACK, RST, and FIN flags; and timeouts. | |||
The OPEN call specifies whether connection establishment is to be | The OPEN call specifies whether connection establishment is to be | |||
actively pursued, or to be passively waited for. | actively pursued, or to be passively waited for. | |||
A passive OPEN request means that the process wants to accept | A passive OPEN request means that the process wants to accept | |||
incoming connection requests, in contrast to an active OPEN | incoming connection requests, in contrast to an active OPEN | |||
attempting to initiate a connection. | attempting to initiate a connection. | |||
The state diagram in Figure 5 illustrates only state changes, | The state diagram in Figure 5 illustrates only state changes, | |||
together with the causing events and resulting actions, but addresses | together with the causing events and resulting actions, but addresses | |||
neither error conditions nor actions that are not connected with | neither error conditions nor actions that are not connected with | |||
state changes. In a later section, more detail is offered with | state changes. In a later section, more detail is offered with | |||
respect to the reaction of the TCP implementation to events. Some | respect to the reaction of the TCP implementation to events. Some | |||
state names are abbreviated or hyphenated differently in the diagram | state names are abbreviated or hyphenated differently in the diagram | |||
from how they appear elsewhere in the document. | from how they appear elsewhere in the document. | |||
NOTA BENE: This diagram is only a summary and must not be taken as | NOTA BENE: This diagram is only a summary and must not be taken as | |||
the total specification. Many details are not included. | the total specification. Many details are not included. | |||
+---------+ ---------\ active OPEN | +---------+ ---------\ active OPEN | |||
| CLOSED | \ ----------- | | CLOSED | \ ----------- | |||
+---------+<---------\ \ create TCB | +---------+<---------\ \ create TCB | |||
| ^ \ \ snd SYN | | ^ \ \ snd SYN | |||
passive OPEN | | CLOSE \ \ | passive OPEN | | CLOSE \ \ | |||
------------ | | ---------- \ \ | ------------ | | ---------- \ \ | |||
create TCB | | delete TCB \ \ | create TCB | | delete TCB \ \ | |||
V | \ \ | V | \ \ | |||
rcv RST (note 1) +---------+ CLOSE | \ | rcv RST (note 1) +---------+ CLOSE | \ | |||
skipping to change at page 18, line 5 ¶ | skipping to change at line 840 ¶ | |||
| rcv FIN -------------- | Timeout=2MSL -------------- | | | rcv FIN -------------- | Timeout=2MSL -------------- | | |||
| ------- x V ------------ x V | | ------- x V ------------ x V | |||
\ snd ACK +---------+delete TCB +---------+ | \ snd ACK +---------+delete TCB +---------+ | |||
-------------------->|TIME-WAIT|------------------->| CLOSED | | -------------------->|TIME-WAIT|------------------->| CLOSED | | |||
+---------+ +---------+ | +---------+ +---------+ | |||
Figure 5: TCP Connection State Diagram | Figure 5: TCP Connection State Diagram | |||
The following notes apply to Figure 5: | The following notes apply to Figure 5: | |||
Note 1: The transition from SYN-RECEIVED to LISTEN on receiving a | Note 1: The transition from SYN-RECEIVED to LISTEN on receiving a | |||
RST is conditional on having reached SYN-RECEIVED after a passive | RST is conditional on having reached SYN-RECEIVED after a passive | |||
open. | OPEN. | |||
Note 2: The figure omits a transition from FIN-WAIT-1 to TIME-WAIT | Note 2: The figure omits a transition from FIN-WAIT-1 to TIME-WAIT | |||
if a FIN is received and the local FIN is also acknowledged. | if a FIN is received and the local FIN is also acknowledged. | |||
Note 3: A RST can be sent from any state with a corresponding | Note 3: A RST can be sent from any state with a corresponding | |||
transition to TIME-WAIT (see [71] for rationale). These | transition to TIME-WAIT (see [70] for rationale). These | |||
transitions are not explicitly shown, otherwise the diagram would | transitions are not explicitly shown; otherwise, the diagram would | |||
become very difficult to read. Similarly, receipt of a RST from | become very difficult to read. Similarly, receipt of a RST from | |||
any state results in a transition to LISTEN or CLOSED, though this | any state results in a transition to LISTEN or CLOSED, though this | |||
is also omitted from the diagram for legibility. | is also omitted from the diagram for legibility. | |||
3.4. Sequence Numbers | 3.4. Sequence Numbers | |||
A fundamental notion in the design is that every octet of data sent | A fundamental notion in the design is that every octet of data sent | |||
over a TCP connection has a sequence number. Since every octet is | over a TCP connection has a sequence number. Since every octet is | |||
sequenced, each of them can be acknowledged. The acknowledgment | sequenced, each of them can be acknowledged. The acknowledgment | |||
mechanism employed is cumulative so that an acknowledgment of | mechanism employed is cumulative so that an acknowledgment of | |||
sequence number X indicates that all octets up to but not including X | sequence number X indicates that all octets up to but not including X | |||
have been received. This mechanism allows for straight-forward | have been received. This mechanism allows for straightforward | |||
duplicate detection in the presence of retransmission. Numbering of | duplicate detection in the presence of retransmission. The numbering | |||
octets within a segment is that the first data octet immediately | scheme of octets within a segment is as follows: the first data octet | |||
following the header is the lowest numbered, and the following octets | immediately following the header is the lowest numbered, and the | |||
are numbered consecutively. | following octets are numbered consecutively. | |||
It is essential to remember that the actual sequence number space is | It is essential to remember that the actual sequence number space is | |||
finite, though large. This space ranges from 0 to 2**32 - 1. Since | finite, though large. This space ranges from 0 to 2^32 - 1. Since | |||
the space is finite, all arithmetic dealing with sequence numbers | the space is finite, all arithmetic dealing with sequence numbers | |||
must be performed modulo 2**32. This unsigned arithmetic preserves | must be performed modulo 2^32. This unsigned arithmetic preserves | |||
the relationship of sequence numbers as they cycle from 2**32 - 1 to | the relationship of sequence numbers as they cycle from 2^32 - 1 to 0 | |||
0 again. There are some subtleties to computer modulo arithmetic, so | again. There are some subtleties to computer modulo arithmetic, so | |||
great care should be taken in programming the comparison of such | great care should be taken in programming the comparison of such | |||
values. The symbol "=<" means "less than or equal" (modulo 2**32). | values. The symbol "=<" means "less than or equal" (modulo 2^32). | |||
The typical kinds of sequence number comparisons that the TCP | The typical kinds of sequence number comparisons that the TCP | |||
implementation must perform include: | implementation must perform include: | |||
(a) Determining that an acknowledgment refers to some sequence | (a) Determining that an acknowledgment refers to some sequence | |||
number sent but not yet acknowledged. | number sent but not yet acknowledged. | |||
(b) Determining that all sequence numbers occupied by a segment | (b) Determining that all sequence numbers occupied by a segment have | |||
have been acknowledged (e.g., to remove the segment from a | been acknowledged (e.g., to remove the segment from a | |||
retransmission queue). | retransmission queue). | |||
(c) Determining that an incoming segment contains sequence numbers | (c) Determining that an incoming segment contains sequence numbers | |||
that are expected (i.e., that the segment "overlaps" the receive | that are expected (i.e., that the segment "overlaps" the receive | |||
window). | window). | |||
In response to sending data the TCP endpoint will receive | In response to sending data, the TCP endpoint will receive | |||
acknowledgments. The following comparisons are needed to process the | acknowledgments. The following comparisons are needed to process the | |||
acknowledgments. | acknowledgments: | |||
SND.UNA = oldest unacknowledged sequence number | SND.UNA = oldest unacknowledged sequence number | |||
SND.NXT = next sequence number to be sent | SND.NXT = next sequence number to be sent | |||
SEG.ACK = acknowledgment from the receiving TCP peer (next | SEG.ACK = acknowledgment from the receiving TCP peer (next | |||
sequence number expected by the receiving TCP peer) | sequence number expected by the receiving TCP peer) | |||
SEG.SEQ = first sequence number of a segment | SEG.SEQ = first sequence number of a segment | |||
SEG.LEN = the number of octets occupied by the data in the segment | SEG.LEN = the number of octets occupied by the data in the segment | |||
(counting SYN and FIN) | (counting SYN and FIN) | |||
SEG.SEQ+SEG.LEN-1 = last sequence number of a segment | SEG.SEQ+SEG.LEN-1 = last sequence number of a segment | |||
A new acknowledgment (called an "acceptable ack"), is one for which | A new acknowledgment (called an "acceptable ack") is one for which | |||
the inequality below holds: | the inequality below holds: | |||
SND.UNA < SEG.ACK =< SND.NXT | SND.UNA < SEG.ACK =< SND.NXT | |||
A segment on the retransmission queue is fully acknowledged if the | A segment on the retransmission queue is fully acknowledged if the | |||
sum of its sequence number and length is less or equal than the | sum of its sequence number and length is less than or equal to the | |||
acknowledgment value in the incoming segment. | acknowledgment value in the incoming segment. | |||
When data is received the following comparisons are needed: | When data is received, the following comparisons are needed: | |||
RCV.NXT = next sequence number expected on an incoming segment, | RCV.NXT = next sequence number expected on an incoming segment, | |||
and is the left or lower edge of the receive window | and is the left or lower edge of the receive window | |||
RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming | RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming | |||
segment, and is the right or upper edge of the receive window | segment, and is the right or upper edge of the receive window | |||
SEG.SEQ = first sequence number occupied by the incoming segment | SEG.SEQ = first sequence number occupied by the incoming segment | |||
SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming | SEG.SEQ+SEG.LEN-1 = last sequence number occupied by the incoming | |||
skipping to change at page 20, line 12 ¶ | skipping to change at line 942 ¶ | |||
RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | |||
or | or | |||
RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND | RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND | |||
The first part of this test checks to see if the beginning of the | The first part of this test checks to see if the beginning of the | |||
segment falls in the window, the second part of the test checks to | segment falls in the window, the second part of the test checks to | |||
see if the end of the segment falls in the window; if the segment | see if the end of the segment falls in the window; if the segment | |||
passes either part of the test it contains data in the window. | passes either part of the test, it contains data in the window. | |||
Actually, it is a little more complicated than this. Due to zero | Actually, it is a little more complicated than this. Due to zero | |||
windows and zero length segments, we have four cases for the | windows and zero-length segments, we have four cases for the | |||
acceptability of an incoming segment: | acceptability of an incoming segment: | |||
Segment Receive Test | +=========+=========+======================================+ | |||
Length Window | | Segment | Receive | Test | | |||
------- ------- ------------------------------------------- | | Length | Window | | | |||
+=========+=========+======================================+ | ||||
0 0 SEG.SEQ = RCV.NXT | | 0 | 0 | SEG.SEQ = RCV.NXT | | |||
+---------+---------+--------------------------------------+ | ||||
0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | | 0 | >0 | RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | | |||
+---------+---------+--------------------------------------+ | ||||
>0 0 not acceptable | | >0 | 0 | not acceptable | | |||
+---------+---------+--------------------------------------+ | ||||
| >0 | >0 | RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | | ||||
| | | | | ||||
| | | or | | ||||
| | | | | ||||
| | | RCV.NXT =< SEG.SEQ+SEG.LEN-1 < | | ||||
| | | RCV.NXT+RCV.WND | | ||||
+---------+---------+--------------------------------------+ | ||||
>0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | Table 5: Segment Acceptability Tests | |||
or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND | ||||
Note that when the receive window is zero no segments should be | Note that when the receive window is zero no segments should be | |||
acceptable except ACK segments. Thus, it is possible for a TCP | acceptable except ACK segments. Thus, it is possible for a TCP | |||
implementation to maintain a zero receive window while transmitting | implementation to maintain a zero receive window while transmitting | |||
data and receiving ACKs. A TCP receiver MUST process the RST and URG | data and receiving ACKs. A TCP receiver MUST process the RST and URG | |||
fields of all incoming segments, even when the receive window is zero | fields of all incoming segments, even when the receive window is zero | |||
(MUST-66). | (MUST-66). | |||
We have taken advantage of the numbering scheme to protect certain | We have taken advantage of the numbering scheme to protect certain | |||
control information as well. This is achieved by implicitly | control information as well. This is achieved by implicitly | |||
skipping to change at page 20, line 52 ¶ | skipping to change at line 989 ¶ | |||
one copy of the control will be acted upon). Control information is | one copy of the control will be acted upon). Control information is | |||
not physically carried in the segment data space. Consequently, we | not physically carried in the segment data space. Consequently, we | |||
must adopt rules for implicitly assigning sequence numbers to | must adopt rules for implicitly assigning sequence numbers to | |||
control. The SYN and FIN are the only controls requiring this | control. The SYN and FIN are the only controls requiring this | |||
protection, and these controls are used only at connection opening | protection, and these controls are used only at connection opening | |||
and closing. For sequence number purposes, the SYN is considered to | and closing. For sequence number purposes, the SYN is considered to | |||
occur before the first actual data octet of the segment in which it | occur before the first actual data octet of the segment in which it | |||
occurs, while the FIN is considered to occur after the last actual | occurs, while the FIN is considered to occur after the last actual | |||
data octet in a segment in which it occurs. The segment length | data octet in a segment in which it occurs. The segment length | |||
(SEG.LEN) includes both data and sequence space-occupying controls. | (SEG.LEN) includes both data and sequence space-occupying controls. | |||
When a SYN is present then SEG.SEQ is the sequence number of the SYN. | When a SYN is present, then SEG.SEQ is the sequence number of the | |||
SYN. | ||||
3.4.1. Initial Sequence Number Selection | 3.4.1. Initial Sequence Number Selection | |||
A connection is defined by a pair of sockets. Connections can be | A connection is defined by a pair of sockets. Connections can be | |||
reused. New instances of a connection will be referred to as | reused. New instances of a connection will be referred to as | |||
incarnations of the connection. The problem that arises from this is | incarnations of the connection. The problem that arises from this is | |||
-- "how does the TCP implementation identify duplicate segments from | -- "how does the TCP implementation identify duplicate segments from | |||
previous incarnations of the connection?" This problem becomes | previous incarnations of the connection?" This problem becomes | |||
apparent if the connection is being opened and closed in quick | apparent if the connection is being opened and closed in quick | |||
succession, or if the connection breaks with loss of memory and is | succession, or if the connection breaks with loss of memory and is | |||
then reestablished. To support this, the TIME-WAIT state limits the | then reestablished. To support this, the TIME-WAIT state limits the | |||
rate of connection reuse, while the initial sequence number selection | rate of connection reuse, while the initial sequence number selection | |||
described below further protects against ambiguity about what | described below further protects against ambiguity about which | |||
incarnation of a connection an incoming packet corresponds to. | incarnation of a connection an incoming packet corresponds to. | |||
To avoid confusion we must prevent segments from one incarnation of a | To avoid confusion, we must prevent segments from one incarnation of | |||
connection from being used while the same sequence numbers may still | a connection from being used while the same sequence numbers may | |||
be present in the network from an earlier incarnation. We want to | still be present in the network from an earlier incarnation. We want | |||
assure this, even if a TCP endpoint loses all knowledge of the | to assure this even if a TCP endpoint loses all knowledge of the | |||
sequence numbers it has been using. When new connections are | sequence numbers it has been using. When new connections are | |||
created, an initial sequence number (ISN) generator is employed that | created, an initial sequence number (ISN) generator is employed that | |||
selects a new 32 bit ISN. There are security issues that result if | selects a new 32-bit ISN. There are security issues that result if | |||
an off-path attacker is able to predict or guess ISN values [43]. | an off-path attacker is able to predict or guess ISN values [42]. | |||
TCP Initial Sequence Numbers are generated from a number sequence | TCP initial sequence numbers are generated from a number sequence | |||
that monotonically increases until it wraps, known loosely as a | that monotonically increases until it wraps, known loosely as a | |||
"clock". This clock is a 32-bit counter that typically increments at | "clock". This clock is a 32-bit counter that typically increments at | |||
least once every roughly 4 microseconds, although it is neither | least once every roughly 4 microseconds, although it is neither | |||
assumed to be realtime nor precise, and need not persist across | assumed to be realtime nor precise, and need not persist across | |||
reboots. The clock component is intended to ensure that with a | reboots. The clock component is intended to ensure that with a | |||
Maximum Segment Lifetime (MSL), generated ISNs will be unique, since | Maximum Segment Lifetime (MSL), generated ISNs will be unique since | |||
it cycles approximately every 4.55 hours, which is much longer than | it cycles approximately every 4.55 hours, which is much longer than | |||
the MSL. | the MSL. Please note that for modern networks that support high data | |||
rates where the connection might start and quickly advance sequence | ||||
numbers to overlap within the MSL, it is recommended to implement the | ||||
Timestamp Option as mentioned later in Section 3.4.3. | ||||
A TCP implementation MUST use the above type of "clock" for clock- | A TCP implementation MUST use the above type of "clock" for clock- | |||
driven selection of initial sequence numbers (MUST-8), and SHOULD | driven selection of initial sequence numbers (MUST-8), and SHOULD | |||
generate its Initial Sequence Numbers with the expression: | generate its initial sequence numbers with the expression: | |||
ISN = M + F(localip, localport, remoteip, remoteport, secretkey) | ISN = M + F(localip, localport, remoteip, remoteport, secretkey) | |||
where M is the 4 microsecond timer, and F() is a pseudorandom | where M is the 4 microsecond timer, and F() is a pseudorandom | |||
function (PRF) of the connection's identifying parameters ("localip, | function (PRF) of the connection's identifying parameters ("localip, | |||
localport, remoteip, remoteport") and a secret key ("secretkey") | localport, remoteip, remoteport") and a secret key ("secretkey") | |||
(SHLD-1). F() MUST NOT be computable from the outside (MUST-9), or | (SHLD-1). F() MUST NOT be computable from the outside (MUST-9), or | |||
an attacker could still guess at sequence numbers from the ISN used | an attacker could still guess at sequence numbers from the ISN used | |||
for some other connection. The PRF could be implemented as a | for some other connection. The PRF could be implemented as a | |||
cryptographic hash of the concatenation of the TCP connection | cryptographic hash of the concatenation of the TCP connection | |||
parameters and some secret data. For discussion of the selection of | parameters and some secret data. For discussion of the selection of | |||
a specific hash algorithm and management of the secret key data, | a specific hash algorithm and management of the secret key data, | |||
please see Section 3 of [43]. | please see Section 3 of [42]. | |||
For each connection there is a send sequence number and a receive | For each connection there is a send sequence number and a receive | |||
sequence number. The initial send sequence number (ISS) is chosen by | sequence number. The initial send sequence number (ISS) is chosen by | |||
the data sending TCP peer, and the initial receive sequence number | the data sending TCP peer, and the initial receive sequence number | |||
(IRS) is learned during the connection establishing procedure. | (IRS) is learned during the connection-establishing procedure. | |||
For a connection to be established or initialized, the two TCP peers | For a connection to be established or initialized, the two TCP peers | |||
must synchronize on each other's initial sequence numbers. This is | must synchronize on each other's initial sequence numbers. This is | |||
done in an exchange of connection establishing segments carrying a | done in an exchange of connection-establishing segments carrying a | |||
control bit called "SYN" (for synchronize) and the initial sequence | control bit called "SYN" (for synchronize) and the initial sequence | |||
numbers. As a shorthand, segments carrying the SYN bit are also | numbers. As a shorthand, segments carrying the SYN bit are also | |||
called "SYNs". Hence, the solution requires a suitable mechanism for | called "SYNs". Hence, the solution requires a suitable mechanism for | |||
picking an initial sequence number and a slightly involved handshake | picking an initial sequence number and a slightly involved handshake | |||
to exchange the ISNs. | to exchange the ISNs. | |||
The synchronization requires each side to send its own initial | The synchronization requires each side to send its own initial | |||
sequence number and to receive a confirmation of it in acknowledgment | sequence number and to receive a confirmation of it in acknowledgment | |||
from the remote TCP peer. Each side must also receive the remote | from the remote TCP peer. Each side must also receive the remote | |||
peer's initial sequence number and send a confirming acknowledgment. | peer's initial sequence number and send a confirming acknowledgment. | |||
skipping to change at page 22, line 49 ¶ | skipping to change at line 1079 ¶ | |||
Because steps 2 and 3 can be combined in a single message this is | Because steps 2 and 3 can be combined in a single message this is | |||
called the three-way (or three message) handshake (3WHS). | called the three-way (or three message) handshake (3WHS). | |||
A 3WHS is necessary because sequence numbers are not tied to a global | A 3WHS is necessary because sequence numbers are not tied to a global | |||
clock in the network, and TCP implementations may have different | clock in the network, and TCP implementations may have different | |||
mechanisms for picking the ISNs. The receiver of the first SYN has | mechanisms for picking the ISNs. The receiver of the first SYN has | |||
no way of knowing whether the segment was an old one or not, unless | no way of knowing whether the segment was an old one or not, unless | |||
it remembers the last sequence number used on the connection (which | it remembers the last sequence number used on the connection (which | |||
is not always possible), and so it must ask the sender to verify this | is not always possible), and so it must ask the sender to verify this | |||
SYN. The three-way handshake and the advantages of a clock-driven | SYN. The three-way handshake and the advantages of a clock-driven | |||
scheme for ISN selection are discussed in [70]. | scheme for ISN selection are discussed in [69]. | |||
3.4.2. Knowing When to Keep Quiet | 3.4.2. Knowing When to Keep Quiet | |||
A theoretical problem exists where data could be corrupted due to | A theoretical problem exists where data could be corrupted due to | |||
confusion between old segments in the network and new ones after a | confusion between old segments in the network and new ones after a | |||
host reboots, if the same port numbers and sequence space are reused. | host reboots if the same port numbers and sequence space are reused. | |||
The "Quiet Time" concept discussed below addresses this and the | The "quiet time" concept discussed below addresses this, and the | |||
discussion of it is included for situations where it might be | discussion of it is included for situations where it might be | |||
relevant, although it is not felt to be necessary in most current | relevant, although it is not felt to be necessary in most current | |||
implementations. The problem was more relevant earlier in the | implementations. The problem was more relevant earlier in the | |||
history of TCP. In practical use on the Internet today, the error- | history of TCP. In practical use on the Internet today, the error- | |||
prone conditions are sufficiently unlikely that it is felt safe to | prone conditions are sufficiently unlikely that it is safe to ignore. | |||
ignore. Reasons why it is now negligible include: (a) ISS and | Reasons why it is now negligible include: (a) ISS and ephemeral port | |||
ephemeral port randomization have reduced likelihood of reuse of port | randomization have reduced likelihood of reuse of port numbers and | |||
numbers and sequence numbers after reboots, (b) the effective MSL of | sequence numbers after reboots, (b) the effective MSL of the Internet | |||
the Internet has declined as links have become faster, and (c) | has declined as links have become faster, and (c) reboots often | |||
reboots often taking longer than an MSL anyways. | taking longer than an MSL anyways. | |||
To be sure that a TCP implementation does not create a segment | To be sure that a TCP implementation does not create a segment | |||
carrying a sequence number that may be duplicated by an old segment | carrying a sequence number that may be duplicated by an old segment | |||
remaining in the network, the TCP endpoint must keep quiet for an MSL | remaining in the network, the TCP endpoint must keep quiet for an MSL | |||
before assigning any sequence numbers upon starting up or recovering | before assigning any sequence numbers upon starting up or recovering | |||
from a situation where memory of sequence numbers in use was lost. | from a situation where memory of sequence numbers in use was lost. | |||
For this specification the MSL is taken to be 2 minutes. This is an | For this specification the MSL is taken to be 2 minutes. This is an | |||
engineering choice, and may be changed if experience indicates it is | engineering choice, and may be changed if experience indicates it is | |||
desirable to do so. Note that if a TCP endpoint is reinitialized in | desirable to do so. Note that if a TCP endpoint is reinitialized in | |||
some sense, yet retains its memory of sequence numbers in use, then | some sense, yet retains its memory of sequence numbers in use, then | |||
it need not wait at all; it must only be sure to use sequence numbers | it need not wait at all; it must only be sure to use sequence numbers | |||
larger than those recently used. | larger than those recently used. | |||
3.4.3. The TCP Quiet Time Concept | 3.4.3. The TCP Quiet Time Concept | |||
Hosts that for any reason lose knowledge of the last sequence numbers | Hosts that for any reason lose knowledge of the last sequence numbers | |||
transmitted on each active (i.e., not closed) connection shall delay | transmitted on each active (i.e., not closed) connection shall delay | |||
emitting any TCP segments for at least the agreed MSL in the internet | emitting any TCP segments for at least the agreed MSL in the internet | |||
system that the host is a part of. In the paragraphs below, an | system that the host is a part of. In the paragraphs below, an | |||
explanation for this specification is given. TCP implementors may | explanation for this specification is given. TCP implementers may | |||
violate the "quiet time" restriction, but only at the risk of causing | violate the "quiet time" restriction, but only at the risk of causing | |||
some old data to be accepted as new or new data rejected as old | some old data to be accepted as new or new data rejected as old | |||
duplicated data by some receivers in the internet system. | duplicated data by some receivers in the internet system. | |||
TCP endpoints consume sequence number space each time a segment is | TCP endpoints consume sequence number space each time a segment is | |||
formed and entered into the network output queue at a source host. | formed and entered into the network output queue at a source host. | |||
The duplicate detection and sequencing algorithm in the TCP protocol | The duplicate detection and sequencing algorithm in TCP relies on the | |||
relies on the unique binding of segment data to sequence space to the | unique binding of segment data to sequence space to the extent that | |||
extent that sequence numbers will not cycle through all 2**32 values | sequence numbers will not cycle through all 2^32 values before the | |||
before the segment data bound to those sequence numbers has been | segment data bound to those sequence numbers has been delivered and | |||
delivered and acknowledged by the receiver and all duplicate copies | acknowledged by the receiver and all duplicate copies of the segments | |||
of the segments have "drained" from the internet. Without such an | have "drained" from the internet. Without such an assumption, two | |||
assumption, two distinct TCP segments could conceivably be assigned | distinct TCP segments could conceivably be assigned the same or | |||
the same or overlapping sequence numbers, causing confusion at the | overlapping sequence numbers, causing confusion at the receiver as to | |||
receiver as to which data is new and which is old. Remember that | which data is new and which is old. Remember that each segment is | |||
each segment is bound to as many consecutive sequence numbers as | bound to as many consecutive sequence numbers as there are octets of | |||
there are octets of data and SYN or FIN flags in the segment. | data and SYN or FIN flags in the segment. | |||
Under normal conditions, TCP implementations keep track of the next | Under normal conditions, TCP implementations keep track of the next | |||
sequence number to emit and the oldest awaiting acknowledgment so as | sequence number to emit and the oldest awaiting acknowledgment so as | |||
to avoid mistakenly using a sequence number over before its first use | to avoid mistakenly reusing a sequence number before its first use | |||
has been acknowledged. This alone does not guarantee that old | has been acknowledged. This alone does not guarantee that old | |||
duplicate data is drained from the net, so the sequence space has | duplicate data is drained from the net, so the sequence space has | |||
been made large to reduce the probability that a wandering duplicate | been made large to reduce the probability that a wandering duplicate | |||
will cause trouble upon arrival. At 2 megabits/sec. it takes 4.5 | will cause trouble upon arrival. At 2 megabits/sec., it takes 4.5 | |||
hours to use up 2**32 octets of sequence space. Since the maximum | hours to use up 2^32 octets of sequence space. Since the maximum | |||
segment lifetime in the net is not likely to exceed a few tens of | segment lifetime in the net is not likely to exceed a few tens of | |||
seconds, this is deemed ample protection for foreseeable nets, even | seconds, this is deemed ample protection for foreseeable nets, even | |||
if data rates escalate to 10s of megabits/sec. At 100 megabits/sec, | if data rates escalate to 10s of megabits/sec. At 100 megabits/sec., | |||
the cycle time is 5.4 minutes, which may be a little short, but still | the cycle time is 5.4 minutes, which may be a little short but still | |||
within reason. Much higher data rates are possible today, with | within reason. Much higher data rates are possible today, with | |||
implications described in the final paragraph of this subsection. | implications described in the final paragraph of this subsection. | |||
The basic duplicate detection and sequencing algorithm in TCP can be | The basic duplicate detection and sequencing algorithm in TCP can be | |||
defeated, however, if a source TCP endpoint does not have any memory | defeated, however, if a source TCP endpoint does not have any memory | |||
of the sequence numbers it last used on a given connection. For | of the sequence numbers it last used on a given connection. For | |||
example, if the TCP implementation were to start all connections with | example, if the TCP implementation were to start all connections with | |||
sequence number 0, then upon the host rebooting, a TCP peer might re- | sequence number 0, then upon the host rebooting, a TCP peer might re- | |||
form an earlier connection (possibly after half-open connection | form an earlier connection (possibly after half-open connection | |||
resolution) and emit packets with sequence numbers identical to or | resolution) and emit packets with sequence numbers identical to or | |||
overlapping with packets still in the network, which were emitted on | overlapping with packets still in the network, which were emitted on | |||
an earlier incarnation of the same connection. In the absence of | an earlier incarnation of the same connection. In the absence of | |||
knowledge about the sequence numbers used on a particular connection, | knowledge about the sequence numbers used on a particular connection, | |||
the TCP specification recommends that the source delay for MSL | the TCP specification recommends that the source delay for MSL | |||
seconds before emitting segments on the connection, to allow time for | seconds before emitting segments on the connection, to allow time for | |||
segments from the earlier connection incarnation to drain from the | segments from the earlier connection incarnation to drain from the | |||
system. | system. | |||
Even hosts that can remember the time of day and used it to select | Even hosts that can remember the time of day and use it to select | |||
initial sequence number values are not immune from this problem | initial sequence number values are not immune from this problem | |||
(i.e., even if time of day is used to select an initial sequence | (i.e., even if time of day is used to select an initial sequence | |||
number for each new connection incarnation). | number for each new connection incarnation). | |||
Suppose, for example, that a connection is opened starting with | Suppose, for example, that a connection is opened starting with | |||
sequence number S. Suppose that this connection is not used much and | sequence number S. Suppose that this connection is not used much and | |||
that eventually the initial sequence number function (ISN(t)) takes | that eventually the initial sequence number function (ISN(t)) takes | |||
on a value equal to the sequence number, say S1, of the last segment | on a value equal to the sequence number, say S1, of the last segment | |||
sent by this TCP endpoint on a particular connection. Now suppose, | sent by this TCP endpoint on a particular connection. Now suppose, | |||
at this instant, the host reboots and establishes a new incarnation | at this instant, the host reboots and establishes a new incarnation | |||
skipping to change at page 25, line 15 ¶ | skipping to change at line 1188 ¶ | |||
the recovery occurs quickly enough, any old duplicates in the net | the recovery occurs quickly enough, any old duplicates in the net | |||
bearing sequence numbers in the neighborhood of S1 may arrive and be | bearing sequence numbers in the neighborhood of S1 may arrive and be | |||
treated as new packets by the receiver of the new incarnation of the | treated as new packets by the receiver of the new incarnation of the | |||
connection. | connection. | |||
The problem is that the recovering host may not know for how long it | The problem is that the recovering host may not know for how long it | |||
was down between rebooting nor does it know whether there are still | was down between rebooting nor does it know whether there are still | |||
old duplicates in the system from earlier connection incarnations. | old duplicates in the system from earlier connection incarnations. | |||
One way to deal with this problem is to deliberately delay emitting | One way to deal with this problem is to deliberately delay emitting | |||
segments for one MSL after recovery from a reboot - this is the | segments for one MSL after recovery from a reboot -- this is the | |||
"quiet time" specification. Hosts that prefer to avoid waiting and | "quiet time" specification. Hosts that prefer to avoid waiting and | |||
are willing to risk possible confusion of old and new packets at a | are willing to risk possible confusion of old and new packets at a | |||
given destination may choose not to wait for the "quiet time". | given destination may choose not to wait for the "quiet time". | |||
Implementors may provide TCP users with the ability to select on a | Implementers may provide TCP users with the ability to select on a | |||
connection by connection basis whether to wait after a reboot, or may | connection-by-connection basis whether to wait after a reboot, or may | |||
informally implement the "quiet time" for all connections. | informally implement the "quiet time" for all connections. | |||
Obviously, even where a user selects to "wait," this is not necessary | Obviously, even where a user selects to "wait", this is not necessary | |||
after the host has been "up" for at least MSL seconds. | after the host has been "up" for at least MSL seconds. | |||
To summarize: every segment emitted occupies one or more sequence | To summarize: every segment emitted occupies one or more sequence | |||
numbers in the sequence space, the numbers occupied by a segment are | numbers in the sequence space, and the numbers occupied by a segment | |||
"busy" or "in use" until MSL seconds have passed, upon rebooting a | are "busy" or "in use" until MSL seconds have passed. Upon | |||
block of space-time is occupied by the octets and SYN or FIN flags of | rebooting, a block of space-time is occupied by the octets and SYN or | |||
any potentially still in-flight segments, and if a new connection is | FIN flags of any potentially still in-flight segments. If a new | |||
started too soon and uses any of the sequence numbers in the space- | connection is started too soon and uses any of the sequence numbers | |||
time footprint of those potentially still in-flight segments of the | in the space-time footprint of those potentially still in-flight | |||
previous connection incarnation, there is a potential sequence number | segments of the previous connection incarnation, there is a potential | |||
overlap area that could cause confusion at the receiver. | sequence number overlap area that could cause confusion at the | |||
receiver. | ||||
High performance cases will have shorter cycle times than those in | High-performance cases will have shorter cycle times than those in | |||
the megabits per second that the base TCP design described above | the megabits per second that the base TCP design described above | |||
considers. At 1 Gbps, the cycle time is 34 seconds, only 3 seconds | considers. At 1 Gbps, the cycle time is 34 seconds, only 3 seconds | |||
at 10 Gbps, and around a third of a second at 100 Gbps. In these | at 10 Gbps, and around a third of a second at 100 Gbps. In these | |||
higher performance cases, TCP Timestamp options and Protection | higher-performance cases, TCP Timestamp Options and Protection | |||
Against Wrapped Sequences (PAWS) [48] provide the needed capability | Against Wrapped Sequences (PAWS) [47] provide the needed capability | |||
to detect and discard old duplicates. | to detect and discard old duplicates. | |||
3.5. Establishing a connection | 3.5. Establishing a Connection | |||
The "three-way handshake" is the procedure used to establish a | The "three-way handshake" is the procedure used to establish a | |||
connection. This procedure normally is initiated by one TCP peer and | connection. This procedure normally is initiated by one TCP peer and | |||
responded to by another TCP peer. The procedure also works if two | responded to by another TCP peer. The procedure also works if two | |||
TCP peers simultaneously initiate the procedure. When simultaneous | TCP peers simultaneously initiate the procedure. When simultaneous | |||
open occurs, each TCP peer receives a "SYN" segment that carries no | open occurs, each TCP peer receives a SYN segment that carries no | |||
acknowledgment after it has sent a "SYN". Of course, the arrival of | acknowledgment after it has sent a SYN. Of course, the arrival of an | |||
an old duplicate "SYN" segment can potentially make it appear, to the | old duplicate SYN segment can potentially make it appear, to the | |||
recipient, that a simultaneous connection initiation is in progress. | recipient, that a simultaneous connection initiation is in progress. | |||
Proper use of "reset" segments can disambiguate these cases. | Proper use of "reset" segments can disambiguate these cases. | |||
Several examples of connection initiation follow. Although these | Several examples of connection initiation follow. Although these | |||
examples do not show connection synchronization using data-carrying | examples do not show connection synchronization using data-carrying | |||
segments, this is perfectly legitimate, so long as the receiving TCP | segments, this is perfectly legitimate, so long as the receiving TCP | |||
endpoint doesn't deliver the data to the user until it is clear the | endpoint doesn't deliver the data to the user until it is clear the | |||
data is valid (e.g., the data is buffered at the receiver until the | data is valid (e.g., the data is buffered at the receiver until the | |||
connection reaches the ESTABLISHED state, given that the three-way | connection reaches the ESTABLISHED state, given that the three-way | |||
handshake reduces the possibility of false connections). It is a | handshake reduces the possibility of false connections). It is a | |||
trade-off between memory and messages to provide information for this | trade-off between memory and messages to provide information for this | |||
checking. | checking. | |||
The simplest 3WHS is shown in Figure 6. The figures should be | The simplest 3WHS is shown in Figure 6. The figures should be | |||
interpreted in the following way. Each line is numbered for | interpreted in the following way. Each line is numbered for | |||
reference purposes. Right arrows (-->) indicate departure of a TCP | reference purposes. Right arrows (-->) indicate departure of a TCP | |||
segment from TCP peer A to TCP peer B, or arrival of a segment at B | segment from TCP Peer A to TCP Peer B or arrival of a segment at B | |||
from A. Left arrows (<--), indicate the reverse. Ellipsis (...) | from A. Left arrows (<--) indicate the reverse. Ellipses (...) | |||
indicates a segment that is still in the network (delayed). Comments | indicate a segment that is still in the network (delayed). Comments | |||
appear in parentheses. TCP connection states represent the state | appear in parentheses. TCP connection states represent the state | |||
AFTER the departure or arrival of the segment (whose contents are | AFTER the departure or arrival of the segment (whose contents are | |||
shown in the center of each line). Segment contents are shown in | shown in the center of each line). Segment contents are shown in | |||
abbreviated form, with sequence number, control flags, and ACK field. | abbreviated form, with sequence number, control flags, and ACK field. | |||
Other fields such as window, addresses, lengths, and text have been | Other fields such as window, addresses, lengths, and text have been | |||
left out in the interest of clarity. | left out in the interest of clarity. | |||
TCP Peer A TCP Peer B | TCP Peer A TCP Peer B | |||
1. CLOSED LISTEN | 1. CLOSED LISTEN | |||
2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED | 2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED | |||
3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED | 3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED | |||
4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED | 4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED | |||
5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED | 5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED | |||
Figure 6: Basic 3-Way Handshake for Connection Synchronization | Figure 6: Basic Three-Way Handshake for Connection Synchronization | |||
In line 2 of Figure 6, TCP Peer A begins by sending a SYN segment | In line 2 of Figure 6, TCP Peer A begins by sending a SYN segment | |||
indicating that it will use sequence numbers starting with sequence | indicating that it will use sequence numbers starting with sequence | |||
number 100. In line 3, TCP Peer B sends a SYN and acknowledges the | number 100. In line 3, TCP Peer B sends a SYN and acknowledges the | |||
SYN it received from TCP Peer A. Note that the acknowledgment field | SYN it received from TCP Peer A. Note that the acknowledgment field | |||
indicates TCP Peer B is now expecting to hear sequence 101, | indicates TCP Peer B is now expecting to hear sequence 101, | |||
acknowledging the SYN that occupied sequence 100. | acknowledging the SYN that occupied sequence 100. | |||
At line 4, TCP Peer A responds with an empty segment containing an | At line 4, TCP Peer A responds with an empty segment containing an | |||
ACK for TCP Peer B's SYN; and in line 5, TCP Peer A sends some data. | ACK for TCP Peer B's SYN; and in line 5, TCP Peer A sends some data. | |||
skipping to change at page 28, line 34 ¶ | skipping to change at line 1347 ¶ | |||
As a simple example of recovery from old duplicates, consider | As a simple example of recovery from old duplicates, consider | |||
Figure 8. At line 3, an old duplicate SYN arrives at TCP Peer B. | Figure 8. At line 3, an old duplicate SYN arrives at TCP Peer B. | |||
TCP Peer B cannot tell that this is an old duplicate, so it responds | TCP Peer B cannot tell that this is an old duplicate, so it responds | |||
normally (line 4). TCP Peer A detects that the ACK field is | normally (line 4). TCP Peer A detects that the ACK field is | |||
incorrect and returns a RST (reset) with its SEQ field selected to | incorrect and returns a RST (reset) with its SEQ field selected to | |||
make the segment believable. TCP Peer B, on receiving the RST, | make the segment believable. TCP Peer B, on receiving the RST, | |||
returns to the LISTEN state. When the original SYN finally arrives | returns to the LISTEN state. When the original SYN finally arrives | |||
at line 6, the synchronization proceeds normally. If the SYN at line | at line 6, the synchronization proceeds normally. If the SYN at line | |||
6 had arrived before the RST, a more complex exchange might have | 6 had arrived before the RST, a more complex exchange might have | |||
occurred with RST's sent in both directions. | occurred with RSTs sent in both directions. | |||
3.5.1. Half-Open Connections and Other Anomalies | 3.5.1. Half-Open Connections and Other Anomalies | |||
An established connection is said to be "half-open" if one of the TCP | An established connection is said to be "half-open" if one of the TCP | |||
peers has closed or aborted the connection at its end without the | peers has closed or aborted the connection at its end without the | |||
knowledge of the other, or if the two ends of the connection have | knowledge of the other, or if the two ends of the connection have | |||
become desynchronized owing to a failure or reboot that resulted in | become desynchronized owing to a failure or reboot that resulted in | |||
loss of memory. Such connections will automatically become reset if | loss of memory. Such connections will automatically become reset if | |||
an attempt is made to send data in either direction. However, half- | an attempt is made to send data in either direction. However, half- | |||
open connections are expected to be unusual. | open connections are expected to be unusual. | |||
skipping to change at page 29, line 17 ¶ | skipping to change at line 1377 ¶ | |||
TCP implementation. Depending on the operating system supporting A's | TCP implementation. Depending on the operating system supporting A's | |||
TCP implementation, it is likely that some error recovery mechanism | TCP implementation, it is likely that some error recovery mechanism | |||
exists. When the TCP endpoint is up again, A is likely to start | exists. When the TCP endpoint is up again, A is likely to start | |||
again from the beginning or from a recovery point. As a result, A | again from the beginning or from a recovery point. As a result, A | |||
will probably try to OPEN the connection again or try to SEND on the | will probably try to OPEN the connection again or try to SEND on the | |||
connection it believes open. In the latter case, it receives the | connection it believes open. In the latter case, it receives the | |||
error message "connection not open" from the local (A's) TCP | error message "connection not open" from the local (A's) TCP | |||
implementation. In an attempt to establish the connection, A's TCP | implementation. In an attempt to establish the connection, A's TCP | |||
implementation will send a segment containing SYN. This scenario | implementation will send a segment containing SYN. This scenario | |||
leads to the example shown in Figure 9. After TCP Peer A reboots, | leads to the example shown in Figure 9. After TCP Peer A reboots, | |||
the user attempts to re-open the connection. TCP Peer B, in the | the user attempts to reopen the connection. TCP Peer B, in the | |||
meantime, thinks the connection is open. | meantime, thinks the connection is open. | |||
TCP Peer A TCP Peer B | TCP Peer A TCP Peer B | |||
1. (REBOOT) (send 300,receive 100) | 1. (REBOOT) (send 300,receive 100) | |||
2. CLOSED ESTABLISHED | 2. CLOSED ESTABLISHED | |||
3. SYN-SENT --> <SEQ=400><CTL=SYN> --> (??) | 3. SYN-SENT --> <SEQ=400><CTL=SYN> --> (??) | |||
skipping to change at page 29, line 45 ¶ | skipping to change at line 1405 ¶ | |||
Figure 9: Half-Open Connection Discovery | Figure 9: Half-Open Connection Discovery | |||
When the SYN arrives at line 3, TCP Peer B, being in a synchronized | When the SYN arrives at line 3, TCP Peer B, being in a synchronized | |||
state, and the incoming segment outside the window, responds with an | state, and the incoming segment outside the window, responds with an | |||
acknowledgment indicating what sequence it next expects to hear (ACK | acknowledgment indicating what sequence it next expects to hear (ACK | |||
100). TCP Peer A sees that this segment does not acknowledge | 100). TCP Peer A sees that this segment does not acknowledge | |||
anything it sent and, being unsynchronized, sends a reset (RST) | anything it sent and, being unsynchronized, sends a reset (RST) | |||
because it has detected a half-open connection. TCP Peer B aborts at | because it has detected a half-open connection. TCP Peer B aborts at | |||
line 5. TCP Peer A will continue to try to establish the connection; | line 5. TCP Peer A will continue to try to establish the connection; | |||
the problem is now reduced to the basic 3-way handshake of Figure 6. | the problem is now reduced to the basic three-way handshake of | |||
Figure 6. | ||||
An interesting alternative case occurs when TCP Peer A reboots and | An interesting alternative case occurs when TCP Peer A reboots and | |||
TCP Peer B tries to send data on what it thinks is a synchronized | TCP Peer B tries to send data on what it thinks is a synchronized | |||
connection. This is illustrated in Figure 10. In this case, the | connection. This is illustrated in Figure 10. In this case, the | |||
data arriving at TCP Peer A from TCP Peer B (line 2) is unacceptable | data arriving at TCP Peer A from TCP Peer B (line 2) is unacceptable | |||
because no such connection exists, so TCP Peer A sends a RST. The | because no such connection exists, so TCP Peer A sends a RST. The | |||
RST is acceptable so TCP Peer B processes it and aborts the | RST is acceptable so TCP Peer B processes it and aborts the | |||
connection. | connection. | |||
TCP Peer A TCP Peer B | TCP Peer A TCP Peer B | |||
skipping to change at page 30, line 41 ¶ | skipping to change at line 1444 ¶ | |||
1. LISTEN LISTEN | 1. LISTEN LISTEN | |||
2. ... <SEQ=Z><CTL=SYN> --> SYN-RECEIVED | 2. ... <SEQ=Z><CTL=SYN> --> SYN-RECEIVED | |||
3. (??) <-- <SEQ=X><ACK=Z+1><CTL=SYN,ACK> <-- SYN-RECEIVED | 3. (??) <-- <SEQ=X><ACK=Z+1><CTL=SYN,ACK> <-- SYN-RECEIVED | |||
4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) | 4. --> <SEQ=Z+1><CTL=RST> --> (return to LISTEN!) | |||
5. LISTEN LISTEN | 5. LISTEN LISTEN | |||
Figure 11: Old Duplicate SYN Initiates a Reset on two Passive Sockets | Figure 11: Old Duplicate SYN Initiates a Reset on Two Passive Sockets | |||
A variety of other cases are possible, all of which are accounted for | A variety of other cases are possible, all of which are accounted for | |||
by the following rules for RST generation and processing. | by the following rules for RST generation and processing. | |||
3.5.2. Reset Generation | 3.5.2. Reset Generation | |||
A TCP user or application can issue a reset on a connection at any | A TCP user or application can issue a reset on a connection at any | |||
time, though reset events are also generated by the protocol itself | time, though reset events are also generated by the protocol itself | |||
when various error conditions occur, as described below. The side of | when various error conditions occur, as described below. The side of | |||
a connection issuing a reset should enter the TIME-WAIT state, as | a connection issuing a reset should enter the TIME-WAIT state, as | |||
this generally helps to reduce the load on busy servers for reasons | this generally helps to reduce the load on busy servers for reasons | |||
described in [71]. | described in [70]. | |||
As a general rule, reset (RST) is sent whenever a segment arrives | As a general rule, reset (RST) is sent whenever a segment arrives | |||
that apparently is not intended for the current connection. A reset | that apparently is not intended for the current connection. A reset | |||
must not be sent if it is not clear that this is the case. | must not be sent if it is not clear that this is the case. | |||
There are three groups of states: | There are three groups of states: | |||
1. If the connection does not exist (CLOSED) then a reset is sent | 1. If the connection does not exist (CLOSED), then a reset is sent | |||
in response to any incoming segment except another reset. A SYN | in response to any incoming segment except another reset. A SYN | |||
segment that does not match an existing connection is rejected by | segment that does not match an existing connection is rejected by | |||
this means. | this means. | |||
If the incoming segment has the ACK bit set, the reset takes its | If the incoming segment has the ACK bit set, the reset takes its | |||
sequence number from the ACK field of the segment, otherwise the | sequence number from the ACK field of the segment; otherwise, the | |||
reset has sequence number zero and the ACK field is set to the sum | reset has sequence number zero and the ACK field is set to the | |||
of the sequence number and segment length of the incoming segment. | sum of the sequence number and segment length of the incoming | |||
The connection remains in the CLOSED state. | segment. The connection remains in the CLOSED state. | |||
2. If the connection is in any non-synchronized state (LISTEN, | 2. If the connection is in any non-synchronized state (LISTEN, SYN- | |||
SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges | SENT, SYN-RECEIVED), and the incoming segment acknowledges | |||
something not yet sent (the segment carries an unacceptable ACK), | something not yet sent (the segment carries an unacceptable ACK), | |||
or if an incoming segment has a security level or compartment | or if an incoming segment has a security level or compartment | |||
Appendix A.1 that does not exactly match the level and compartment | (Appendix A.1) that does not exactly match the level and | |||
requested for the connection, a reset is sent. | compartment requested for the connection, a reset is sent. | |||
If the incoming segment has an ACK field, the reset takes its | If the incoming segment has an ACK field, the reset takes its | |||
sequence number from the ACK field of the segment, otherwise the | sequence number from the ACK field of the segment; otherwise, the | |||
reset has sequence number zero and the ACK field is set to the sum | reset has sequence number zero and the ACK field is set to the | |||
of the sequence number and segment length of the incoming segment. | sum of the sequence number and segment length of the incoming | |||
The connection remains in the same state. | segment. The connection remains in the same state. | |||
3. If the connection is in a synchronized state (ESTABLISHED, | 3. If the connection is in a synchronized state (ESTABLISHED, FIN- | |||
FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), | WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), | |||
any unacceptable segment (out of window sequence number or | any unacceptable segment (out-of-window sequence number or | |||
unacceptable acknowledgment number) must be responded to with an | unacceptable acknowledgment number) must be responded to with an | |||
empty acknowledgment segment (without any user data) containing | empty acknowledgment segment (without any user data) containing | |||
the current send-sequence number and an acknowledgment indicating | the current send sequence number and an acknowledgment indicating | |||
the next sequence number expected to be received, and the | the next sequence number expected to be received, and the | |||
connection remains in the same state. | connection remains in the same state. | |||
If an incoming segment has a security level or compartment that | If an incoming segment has a security level or compartment that | |||
does not exactly match the level and compartment requested for the | does not exactly match the level and compartment requested for | |||
connection, a reset is sent and the connection goes to the CLOSED | the connection, a reset is sent and the connection goes to the | |||
state. The reset takes its sequence number from the ACK field of | CLOSED state. The reset takes its sequence number from the ACK | |||
the incoming segment. | field of the incoming segment. | |||
3.5.3. Reset Processing | 3.5.3. Reset Processing | |||
In all states except SYN-SENT, all reset (RST) segments are validated | In all states except SYN-SENT, all reset (RST) segments are validated | |||
by checking their SEQ-fields. A reset is valid if its sequence | by checking their SEQ fields. A reset is valid if its sequence | |||
number is in the window. In the SYN-SENT state (a RST received in | number is in the window. In the SYN-SENT state (a RST received in | |||
response to an initial SYN), the RST is acceptable if the ACK field | response to an initial SYN), the RST is acceptable if the ACK field | |||
acknowledges the SYN. | acknowledges the SYN. | |||
The receiver of a RST first validates it, then changes state. If the | The receiver of a RST first validates it, then changes state. If the | |||
receiver was in the LISTEN state, it ignores it. If the receiver was | receiver was in the LISTEN state, it ignores it. If the receiver was | |||
in SYN-RECEIVED state and had previously been in the LISTEN state, | in SYN-RECEIVED state and had previously been in the LISTEN state, | |||
then the receiver returns to the LISTEN state, otherwise the receiver | then the receiver returns to the LISTEN state; otherwise, the | |||
aborts the connection and goes to the CLOSED state. If the receiver | receiver aborts the connection and goes to the CLOSED state. If the | |||
was in any other state, it aborts the connection and advises the user | receiver was in any other state, it aborts the connection and advises | |||
and goes to the CLOSED state. | the user and goes to the CLOSED state. | |||
TCP implementations SHOULD allow a received RST segment to include | TCP implementations SHOULD allow a received RST segment to include | |||
data (SHLD-2). It has been suggested that a RST segment could | data (SHLD-2). It has been suggested that a RST segment could | |||
contain diagnostic data that explains the cause of the RST. No | contain diagnostic data that explains the cause of the RST. No | |||
standard has yet been established for such data. | standard has yet been established for such data. | |||
3.6. Closing a Connection | 3.6. Closing a Connection | |||
CLOSE is an operation meaning "I have no more data to send." The | CLOSE is an operation meaning "I have no more data to send." The | |||
notion of closing a full-duplex connection is subject to ambiguous | notion of closing a full-duplex connection is subject to ambiguous | |||
skipping to change at page 33, line 5 ¶ | skipping to change at line 1546 ¶ | |||
closed, so the user can terminate their side gracefully. A TCP | closed, so the user can terminate their side gracefully. A TCP | |||
implementation will reliably deliver all buffers SENT before the | implementation will reliably deliver all buffers SENT before the | |||
connection was CLOSED so a user who expects no data in return need | connection was CLOSED so a user who expects no data in return need | |||
only wait to hear the connection was CLOSED successfully to know that | only wait to hear the connection was CLOSED successfully to know that | |||
all their data was received at the destination TCP endpoint. Users | all their data was received at the destination TCP endpoint. Users | |||
must keep reading connections they close for sending until the TCP | must keep reading connections they close for sending until the TCP | |||
implementation indicates there is no more data. | implementation indicates there is no more data. | |||
There are essentially three cases: | There are essentially three cases: | |||
1) The user initiates by telling the TCP implementation to CLOSE | 1) The user initiates by telling the TCP implementation to CLOSE the | |||
the connection (TCP Peer A in Figure 12). | connection (TCP Peer A in Figure 12). | |||
2) The remote TCP endpoint initiates by sending a FIN control | 2) The remote TCP endpoint initiates by sending a FIN control signal | |||
signal (TCP Peer B in Figure 12). | (TCP Peer B in Figure 12). | |||
3) Both users CLOSE simultaneously (Figure 13). | 3) Both users CLOSE simultaneously (Figure 13). | |||
Case 1: Local user initiates the close | ||||
Case 1: Local user initiates the close | ||||
In this case, a FIN segment can be constructed and placed on the | In this case, a FIN segment can be constructed and placed on the | |||
outgoing segment queue. No further SENDs from the user will be | outgoing segment queue. No further SENDs from the user will be | |||
accepted by the TCP implementation, and it enters the FIN-WAIT-1 | accepted by the TCP implementation, and it enters the FIN-WAIT-1 | |||
state. RECEIVEs are allowed in this state. All segments | state. RECEIVEs are allowed in this state. All segments | |||
preceding and including FIN will be retransmitted until | preceding and including FIN will be retransmitted until | |||
acknowledged. When the other TCP peer has both acknowledged the | acknowledged. When the other TCP peer has both acknowledged the | |||
FIN and sent a FIN of its own, the first TCP peer can ACK this | FIN and sent a FIN of its own, the first TCP peer can ACK this | |||
FIN. Note that a TCP endpoint receiving a FIN will ACK but not | FIN. Note that a TCP endpoint receiving a FIN will ACK but not | |||
send its own FIN until its user has CLOSED the connection also. | send its own FIN until its user has CLOSED the connection also. | |||
Case 2: TCP endpoint receives a FIN from the network | Case 2: TCP endpoint receives a FIN from the network | |||
If an unsolicited FIN arrives from the network, the receiving TCP | If an unsolicited FIN arrives from the network, the receiving TCP | |||
endpoint can ACK it and tell the user that the connection is | endpoint can ACK it and tell the user that the connection is | |||
closing. The user will respond with a CLOSE, upon which the TCP | closing. The user will respond with a CLOSE, upon which the TCP | |||
endpoint can send a FIN to the other TCP peer after sending any | endpoint can send a FIN to the other TCP peer after sending any | |||
remaining data. The TCP endpoint then waits until its own FIN is | remaining data. The TCP endpoint then waits until its own FIN is | |||
acknowledged whereupon it deletes the connection. If an ACK is | acknowledged whereupon it deletes the connection. If an ACK is | |||
not forthcoming, after the user timeout the connection is aborted | not forthcoming, after the user timeout the connection is aborted | |||
and the user is told. | and the user is told. | |||
Case 3: Both users close simultaneously | Case 3: Both users close simultaneously | |||
A simultaneous CLOSE by users at both ends of a connection causes | A simultaneous CLOSE by users at both ends of a connection causes | |||
FIN segments to be exchanged (Figure 13). When all segments | FIN segments to be exchanged (Figure 13). When all segments | |||
preceding the FINs have been processed and acknowledged, each TCP | preceding the FINs have been processed and acknowledged, each TCP | |||
peer can ACK the FIN it has received. Both will, upon receiving | peer can ACK the FIN it has received. Both will, upon receiving | |||
these ACKs, delete the connection. | these ACKs, delete the connection. | |||
TCP Peer A TCP Peer B | TCP Peer A TCP Peer B | |||
1. ESTABLISHED ESTABLISHED | 1. ESTABLISHED ESTABLISHED | |||
skipping to change at page 35, line 9 ¶ | skipping to change at line 1635 ¶ | |||
which one or more RST segments are sent and the connection state is | which one or more RST segments are sent and the connection state is | |||
immediately discarded. If the local TCP connection is closed by the | immediately discarded. If the local TCP connection is closed by the | |||
remote side due to a FIN or RST received from the remote side, then | remote side due to a FIN or RST received from the remote side, then | |||
the local application MUST be informed whether it closed normally or | the local application MUST be informed whether it closed normally or | |||
was aborted (MUST-12). | was aborted (MUST-12). | |||
3.6.1. Half-Closed Connections | 3.6.1. Half-Closed Connections | |||
The normal TCP close sequence delivers buffered data reliably in both | The normal TCP close sequence delivers buffered data reliably in both | |||
directions. Since the two directions of a TCP connection are closed | directions. Since the two directions of a TCP connection are closed | |||
independently, it is possible for a connection to be "half closed," | independently, it is possible for a connection to be "half closed", | |||
i.e., closed in only one direction, and a host is permitted to | i.e., closed in only one direction, and a host is permitted to | |||
continue sending data in the open direction on a half-closed | continue sending data in the open direction on a half-closed | |||
connection. | connection. | |||
A host MAY implement a "half-duplex" TCP close sequence, so that an | A host MAY implement a "half-duplex" TCP close sequence, so that an | |||
application that has called CLOSE cannot continue to read data from | application that has called CLOSE cannot continue to read data from | |||
the connection (MAY-1). If such a host issues a CLOSE call while | the connection (MAY-1). If such a host issues a CLOSE call while | |||
received data is still pending in the TCP connection, or if new data | received data is still pending in the TCP connection, or if new data | |||
is received after CLOSE is called, its TCP implementation SHOULD send | is received after CLOSE is called, its TCP implementation SHOULD send | |||
a RST to show that data was lost (SHLD-3). See [24] section 2.17 for | a RST to show that data was lost (SHLD-3). See [23], Section 2.17 | |||
discussion. | for discussion. | |||
When a connection is closed actively, it MUST linger in the TIME-WAIT | When a connection is closed actively, it MUST linger in the TIME-WAIT | |||
state for a time 2xMSL (Maximum Segment Lifetime) (MUST-13). | state for a time 2xMSL (Maximum Segment Lifetime) (MUST-13). | |||
However, it MAY accept a new SYN from the remote TCP endpoint to | However, it MAY accept a new SYN from the remote TCP endpoint to | |||
reopen the connection directly from TIME-WAIT state (MAY-2), if it: | reopen the connection directly from TIME-WAIT state (MAY-2), if it: | |||
(1) assigns its initial sequence number for the new connection to | (1) assigns its initial sequence number for the new connection to be | |||
be larger than the largest sequence number it used on the previous | larger than the largest sequence number it used on the previous | |||
connection incarnation, and | connection incarnation, and | |||
(2) returns to TIME-WAIT state if the SYN turns out to be an old | (2) returns to TIME-WAIT state if the SYN turns out to be an old | |||
duplicate. | duplicate. | |||
When the TCP Timestamp options are available, an improved algorithm | When the TCP Timestamp Options are available, an improved algorithm | |||
is described in [41] in order to support higher connection | is described in [40] in order to support higher connection | |||
establishment rates. This algorithm for reducing TIME-WAIT is a Best | establishment rates. This algorithm for reducing TIME-WAIT is a Best | |||
Current Practice that SHOULD be implemented, since timestamp options | Current Practice that SHOULD be implemented since Timestamp Options | |||
are commonly used, and using them to reduce TIME-WAIT provides | are commonly used, and using them to reduce TIME-WAIT provides | |||
benefits for busy Internet servers (SHLD-4). | benefits for busy Internet servers (SHLD-4). | |||
3.7. Segmentation | 3.7. Segmentation | |||
The term "segmentation" refers to the activity TCP performs when | The term "segmentation" refers to the activity TCP performs when | |||
ingesting a stream of bytes from a sending application and | ingesting a stream of bytes from a sending application and | |||
packetizing that stream of bytes into TCP segments. Individual TCP | packetizing that stream of bytes into TCP segments. Individual TCP | |||
segments often do not correspond one-for-one to individual send (or | segments often do not correspond one-for-one to individual send (or | |||
socket write) calls from the application. Applications may perform | socket write) calls from the application. Applications may perform | |||
writes at the granularity of messages in the upper layer protocol, | writes at the granularity of messages in the upper-layer protocol, | |||
but TCP guarantees no boundary coherence between the TCP segments | but TCP guarantees no correlation between the boundaries of TCP | |||
sent and received versus user application data read or write buffer | segments sent and received and the boundaries of the read or write | |||
boundaries. In some specific protocols, such as Remote Direct Memory | buffers of user application data. In some specific protocols, such | |||
Access (RDMA) using Direct Data Placement (DDP) and Marker PDU | as Remote Direct Memory Access (RDMA) using Direct Data Placement | |||
Aligned Framing (MPA) [35], there are performance optimizations | (DDP) and Marker PDU Aligned Framing (MPA) [34], there are | |||
possible when the relation between TCP segments and application data | performance optimizations possible when the relation between TCP | |||
units can be controlled, and MPA includes a specific mechanism for | segments and application data units can be controlled, and MPA | |||
detecting and verifying this relationship between TCP segments and | includes a specific mechanism for detecting and verifying this | |||
application message data structures, but this is specific to | relationship between TCP segments and application message data | |||
applications like RDMA. In general, multiple goals influence the | structures, but this is specific to applications like RDMA. In | |||
sizing of TCP segments created by a TCP implementation. | general, multiple goals influence the sizing of TCP segments created | |||
by a TCP implementation. | ||||
Goals driving the sending of larger segments include: | Goals driving the sending of larger segments include: | |||
* Reducing the number of packets in flight within the network. | * Reducing the number of packets in flight within the network. | |||
* Increasing processing efficiency and potential performance by | * Increasing processing efficiency and potential performance by | |||
enabling a smaller number of interrupts and inter-layer | enabling a smaller number of interrupts and inter-layer | |||
interactions. | interactions. | |||
* Limiting the overhead of TCP headers. | * Limiting the overhead of TCP headers. | |||
skipping to change at page 36, line 32 ¶ | skipping to change at line 1708 ¶ | |||
Note that the performance benefits of sending larger segments may | Note that the performance benefits of sending larger segments may | |||
decrease as the size increases, and there may be boundaries where | decrease as the size increases, and there may be boundaries where | |||
advantages are reversed. For instance, on some implementation | advantages are reversed. For instance, on some implementation | |||
architectures, 1025 bytes within a segment could lead to worse | architectures, 1025 bytes within a segment could lead to worse | |||
performance than 1024 bytes, due purely to data alignment on copy | performance than 1024 bytes, due purely to data alignment on copy | |||
operations. | operations. | |||
Goals driving the sending of smaller segments include: | Goals driving the sending of smaller segments include: | |||
* Avoiding sending a TCP segment that would result in an IP datagram | * Avoiding sending a TCP segment that would result in an IP datagram | |||
larger than the smallest MTU along an IP network path, because | larger than the smallest MTU along an IP network path because this | |||
this results in either packet loss or packet fragmentation. | results in either packet loss or packet fragmentation. Making | |||
Making matters worse, some firewalls or middleboxes may drop | matters worse, some firewalls or middleboxes may drop fragmented | |||
fragmented packets or ICMP messages related to fragmentation. | packets or ICMP messages related to fragmentation. | |||
* Preventing delays to the application data stream, especially when | * Preventing delays to the application data stream, especially when | |||
TCP is waiting on the application to generate more data, or when | TCP is waiting on the application to generate more data, or when | |||
the application is waiting on an event or input from its peer in | the application is waiting on an event or input from its peer in | |||
order to generate more data. | order to generate more data. | |||
* Enabling "fate sharing" between TCP segments and lower-layer data | * Enabling "fate sharing" between TCP segments and lower-layer data | |||
units (e.g. below IP, for links with cell or frame sizes smaller | units (e.g., below IP, for links with cell or frame sizes smaller | |||
than the IP MTU). | than the IP MTU). | |||
Towards meeting these competing sets of goals, TCP includes several | Towards meeting these competing sets of goals, TCP includes several | |||
mechanisms, including the Maximum Segment Size option, Path MTU | mechanisms, including the Maximum Segment Size Option, Path MTU | |||
Discovery, the Nagle algorithm, and support for IPv6 Jumbograms, as | Discovery, the Nagle algorithm, and support for IPv6 Jumbograms, as | |||
discussed in the following subsections. | discussed in the following subsections. | |||
3.7.1. Maximum Segment Size Option | 3.7.1. Maximum Segment Size Option | |||
TCP endpoints MUST implement both sending and receiving the MSS | TCP endpoints MUST implement both sending and receiving the MSS | |||
option (MUST-14). | Option (MUST-14). | |||
TCP implementations SHOULD send an MSS option in every SYN segment | TCP implementations SHOULD send an MSS Option in every SYN segment | |||
when its receive MSS differs from the default 536 for IPv4 or 1220 | when its receive MSS differs from the default 536 for IPv4 or 1220 | |||
for IPv6 (SHLD-5), and MAY send it always (MAY-3). | for IPv6 (SHLD-5), and MAY send it always (MAY-3). | |||
If an MSS option is not received at connection setup, TCP | If an MSS Option is not received at connection setup, TCP | |||
implementations MUST assume a default send MSS of 536 (576 - 40) for | implementations MUST assume a default send MSS of 536 (576 - 40) for | |||
IPv4 or 1220 (1280 - 60) for IPv6 (MUST-15). | IPv4 or 1220 (1280 - 60) for IPv6 (MUST-15). | |||
The maximum size of a segment that TCP endpoint really sends, the | The maximum size of a segment that a TCP endpoint really sends, the | |||
"effective send MSS," MUST be the smaller (MUST-16) of the send MSS | "effective send MSS", MUST be the smaller (MUST-16) of the send MSS | |||
(that reflects the available reassembly buffer size at the remote | (that reflects the available reassembly buffer size at the remote | |||
host, the EMTU_R [20]) and the largest transmission size permitted by | host, the EMTU_R [19]) and the largest transmission size permitted by | |||
the IP layer (EMTU_S [20]): | the IP layer (EMTU_S [19]): | |||
Eff.snd.MSS = | ||||
min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize | Eff.snd.MSS = min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize | |||
where: | where: | |||
* SendMSS is the MSS value received from the remote host, or the | * SendMSS is the MSS value received from the remote host, or the | |||
default 536 for IPv4 or 1220 for IPv6, if no MSS option is | default 536 for IPv4 or 1220 for IPv6, if no MSS Option is | |||
received. | received. | |||
* MMS_S is the maximum size for a transport-layer message that TCP | * MMS_S is the maximum size for a transport-layer message that TCP | |||
may send. | may send. | |||
* TCPhdrsize is the size of the fixed TCP header and any options. | * TCPhdrsize is the size of the fixed TCP header and any options. | |||
This is 20 in the (rare) case that no options are present, but may | This is 20 in the (rare) case that no options are present but may | |||
be larger if TCP options are to be sent. Note that some options | be larger if TCP Options are to be sent. Note that some options | |||
might not be included on all segments, but that for each segment | might not be included on all segments, but that for each segment | |||
sent, the sender should adjust the data length accordingly, within | sent, the sender should adjust the data length accordingly, within | |||
the Eff.snd.MSS. | the Eff.snd.MSS. | |||
* IPoptionsize is the size of any IPv4 options or IPv6 extension | * IPoptionsize is the size of any IPv4 options or IPv6 extension | |||
headers associated with a TCP connection. Note that some options | headers associated with a TCP connection. Note that some options | |||
or extension headers might not be included on all packets, but | or extension headers might not be included on all packets, but | |||
that for each segment sent, the sender should adjust the data | that for each segment sent, the sender should adjust the data | |||
length accordingly, within the Eff.snd.MSS. | length accordingly, within the Eff.snd.MSS. | |||
The MSS value to be sent in an MSS option should be equal to the | The MSS value to be sent in an MSS Option should be equal to the | |||
effective MTU minus the fixed IP and TCP headers. By ignoring both | effective MTU minus the fixed IP and TCP headers. By ignoring both | |||
IP and TCP options when calculating the value for the MSS option, if | IP and TCP Options when calculating the value for the MSS Option, if | |||
there are any IP or TCP options to be sent in a packet, then the | there are any IP or TCP Options to be sent in a packet, then the | |||
sender must decrease the size of the TCP data accordingly. RFC 6691 | sender must decrease the size of the TCP data accordingly. RFC 6691 | |||
[44] discusses this in greater detail. | [43] discusses this in greater detail. | |||
The MSS value to be sent in an MSS option must be less than or equal | The MSS value to be sent in an MSS Option must be less than or equal | |||
to: | to: | |||
MMS_R - 20 | MMS_R - 20 | |||
where MMS_R is the maximum size for a transport-layer message that | where MMS_R is the maximum size for a transport-layer message that | |||
can be received (and reassembled at the IP layer) (MUST-67). TCP | can be received (and reassembled at the IP layer) (MUST-67). TCP | |||
obtains MMS_R and MMS_S from the IP layer; see the generic call | obtains MMS_R and MMS_S from the IP layer; see the generic call | |||
GET_MAXSIZES in Section 3.4 of RFC 1122. These are defined in terms | GET_MAXSIZES in Section 3.4 of RFC 1122. These are defined in terms | |||
of their IP MTU equivalents, EMTU_R and EMTU_S [20]. | of their IP MTU equivalents, EMTU_R and EMTU_S [19]. | |||
When TCP is used in a situation where either the IP or TCP headers | When TCP is used in a situation where either the IP or TCP headers | |||
are not fixed, the sender must reduce the amount of TCP data in any | are not fixed, the sender must reduce the amount of TCP data in any | |||
given packet by the number of octets used by the IP and TCP options. | given packet by the number of octets used by the IP and TCP options. | |||
This has been a point of confusion historically, as explained in RFC | This has been a point of confusion historically, as explained in RFC | |||
6691, Section 3.1. | 6691, Section 3.1. | |||
3.7.2. Path MTU Discovery | 3.7.2. Path MTU Discovery | |||
A TCP implementation may be aware of the MTU on directly connected | A TCP implementation may be aware of the MTU on directly connected | |||
skipping to change at page 38, line 40 ¶ | skipping to change at line 1809 ¶ | |||
effective MTU of less than or equal to 576 for destinations not | effective MTU of less than or equal to 576 for destinations not | |||
directly connected, and for IPv6 this would be 1280. Using these | directly connected, and for IPv6 this would be 1280. Using these | |||
fixed values limits TCP connection performance and efficiency. | fixed values limits TCP connection performance and efficiency. | |||
Instead, implementation of Path MTU Discovery (PMTUD) and | Instead, implementation of Path MTU Discovery (PMTUD) and | |||
Packetization Layer Path MTU Discovery (PLPMTUD) is strongly | Packetization Layer Path MTU Discovery (PLPMTUD) is strongly | |||
recommended in order for TCP to improve segmentation decisions. Both | recommended in order for TCP to improve segmentation decisions. Both | |||
PMTUD and PLPMTUD help TCP choose segment sizes that avoid both on- | PMTUD and PLPMTUD help TCP choose segment sizes that avoid both on- | |||
path (for IPv4) and source fragmentation (IPv4 and IPv6). | path (for IPv4) and source fragmentation (IPv4 and IPv6). | |||
PMTUD for IPv4 [2] or IPv6 [14] is implemented in conjunction between | PMTUD for IPv4 [2] or IPv6 [14] is implemented in conjunction between | |||
TCP, IP, and ICMP protocols. It relies both on avoiding source | TCP, IP, and ICMP. It relies both on avoiding source fragmentation | |||
fragmentation and setting the IPv4 DF (don't fragment) flag, the | and setting the IPv4 DF (don't fragment) flag, the latter to inhibit | |||
latter to inhibit on-path fragmentation. It relies on ICMP errors | on-path fragmentation. It relies on ICMP errors from routers along | |||
from routers along the path, whenever a segment is too large to | the path whenever a segment is too large to traverse a link. Several | |||
traverse a link. Several adjustments to a TCP implementation with | adjustments to a TCP implementation with PMTUD are described in RFC | |||
PMTUD are described in RFC 2923 in order to deal with problems | 2923 in order to deal with problems experienced in practice [27]. | |||
experienced in practice [28]. PLPMTUD [32] is a Standards Track | PLPMTUD [31] is a Standards Track improvement to PMTUD that relaxes | |||
improvement to PMTUD that relaxes the requirement for ICMP support | the requirement for ICMP support across a path, and improves | |||
across a path, and improves performance in cases where ICMP is not | performance in cases where ICMP is not consistently conveyed, but | |||
consistently conveyed, but still tries to avoid source fragmentation. | still tries to avoid source fragmentation. The mechanisms in all | |||
The mechanisms in all four of these RFCs are recommended to be | four of these RFCs are recommended to be included in TCP | |||
included in TCP implementations. | implementations. | |||
The TCP MSS option specifies an upper bound for the size of packets | The TCP MSS Option specifies an upper bound for the size of packets | |||
that can be received (see [44]). Hence, setting the value in the MSS | that can be received (see [43]). Hence, setting the value in the MSS | |||
option too small can impact the ability for PMTUD or PLPMTUD to find | Option too small can impact the ability for PMTUD or PLPMTUD to find | |||
a larger path MTU. RFC 1191 discusses this implication of many older | a larger path MTU. RFC 1191 discusses this implication of many older | |||
TCP implementations setting the TCP MSS to 536 (corresponding to the | TCP implementations setting the TCP MSS to 536 (corresponding to the | |||
IPv4 576 byte default MTU) for non-local destinations, rather than | IPv4 576 byte default MTU) for non-local destinations, rather than | |||
deriving it from the MTUs of connected interfaces as recommended. | deriving it from the MTUs of connected interfaces as recommended. | |||
3.7.3. Interfaces with Variable MTU Values | 3.7.3. Interfaces with Variable MTU Values | |||
The effective MTU can sometimes vary, as when used with variable | The effective MTU can sometimes vary, as when used with variable | |||
compression, e.g., RObust Header Compression (ROHC) [38]. It is | compression, e.g., RObust Header Compression (ROHC) [37]. It is | |||
tempting for a TCP implementation to advertise the largest possible | tempting for a TCP implementation to advertise the largest possible | |||
MSS, to support the most efficient use of compressed payloads. | MSS, to support the most efficient use of compressed payloads. | |||
Unfortunately, some compression schemes occasionally need to transmit | Unfortunately, some compression schemes occasionally need to transmit | |||
full headers (and thus smaller payloads) to resynchronize state at | full headers (and thus smaller payloads) to resynchronize state at | |||
their endpoint compressors/decompressors. If the largest MTU is used | their endpoint compressors/decompressors. If the largest MTU is used | |||
to calculate the value to advertise in the MSS option, TCP | to calculate the value to advertise in the MSS Option, TCP | |||
retransmission may interfere with compressor resynchronization. | retransmission may interfere with compressor resynchronization. | |||
As a result, when the effective MTU of an interface varies packet-to- | As a result, when the effective MTU of an interface varies packet-to- | |||
packet, TCP implementations SHOULD use the smallest effective MTU of | packet, TCP implementations SHOULD use the smallest effective MTU of | |||
the interface to calculate the value to advertise in the MSS option | the interface to calculate the value to advertise in the MSS Option | |||
(SHLD-6). | (SHLD-6). | |||
3.7.4. Nagle Algorithm | 3.7.4. Nagle Algorithm | |||
The "Nagle algorithm" was described in RFC 896 [18] and was | The "Nagle algorithm" was described in RFC 896 [17] and was | |||
recommended in RFC 1122 [20] for mitigation of an early problem of | recommended in RFC 1122 [19] for mitigation of an early problem of | |||
too many small packets being generated. It has been implemented in | too many small packets being generated. It has been implemented in | |||
most current TCP code bases, sometimes with minor variations (see | most current TCP code bases, sometimes with minor variations (see | |||
Appendix A.3). | Appendix A.3). | |||
If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the | If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the | |||
sending TCP endpoint buffers all user data (regardless of the PSH | sending TCP endpoint buffers all user data (regardless of the PSH | |||
bit), until the outstanding data has been acknowledged or until the | bit) until the outstanding data has been acknowledged or until the | |||
TCP endpoint can send a full-sized segment (Eff.snd.MSS bytes). | TCP endpoint can send a full-sized segment (Eff.snd.MSS bytes). | |||
A TCP implementation SHOULD implement the Nagle Algorithm to coalesce | A TCP implementation SHOULD implement the Nagle algorithm to coalesce | |||
short segments (SHLD-7). However, there MUST be a way for an | short segments (SHLD-7). However, there MUST be a way for an | |||
application to disable the Nagle algorithm on an individual | application to disable the Nagle algorithm on an individual | |||
connection (MUST-17). In all cases, sending data is also subject to | connection (MUST-17). In all cases, sending data is also subject to | |||
the limitation imposed by the Slow Start algorithm [8]. | the limitation imposed by the slow start algorithm [8]. | |||
Since there can be problematic interactions between the Nagle | Since there can be problematic interactions between the Nagle | |||
Algorithm and delayed acknowledgements, some implementations use | algorithm and delayed acknowledgments, some implementations use minor | |||
minor variations of the Nagle algorithm, such as the one described in | variations of the Nagle algorithm, such as the one described in | |||
Appendix A.3. | Appendix A.3. | |||
3.7.5. IPv6 Jumbograms | 3.7.5. IPv6 Jumbograms | |||
In order to support TCP over IPv6 Jumbograms, implementations need to | In order to support TCP over IPv6 Jumbograms, implementations need to | |||
be able to send TCP segments larger than the 64KB limit that the MSS | be able to send TCP segments larger than the 64-KB limit that the MSS | |||
option can convey. RFC 2675 [25] defines that an MSS value of 65,535 | Option can convey. RFC 2675 [24] defines that an MSS value of 65,535 | |||
bytes is to be treated as infinity, and Path MTU Discovery [14] is | bytes is to be treated as infinity, and Path MTU Discovery [14] is | |||
used to determine the actual MSS. | used to determine the actual MSS. | |||
The Jumbo Payload option need not be implemented or understood by | The Jumbo Payload Option need not be implemented or understood by | |||
IPv6 nodes that do not support attachment to links with a MTU greater | IPv6 nodes that do not support attachment to links with an MTU | |||
than 65,575 [25], and the present IPv6 Node Requirements does not | greater than 65,575 [24], and the present IPv6 Node Requirements does | |||
include support for Jumbograms [55]. | not include support for Jumbograms [55]. | |||
3.8. Data Communication | 3.8. Data Communication | |||
Once the connection is established data is communicated by the | Once the connection is established, data is communicated by the | |||
exchange of segments. Because segments may be lost due to errors | exchange of segments. Because segments may be lost due to errors | |||
(checksum test failure), or network congestion, TCP uses | (checksum test failure) or network congestion, TCP uses | |||
retransmission to ensure delivery of every segment. Duplicate | retransmission to ensure delivery of every segment. Duplicate | |||
segments may arrive due to network or TCP retransmission. As | segments may arrive due to network or TCP retransmission. As | |||
discussed in the section on sequence numbers, the TCP implementation | discussed in the section on sequence numbers (Section 3.4), the TCP | |||
performs certain tests on the sequence and acknowledgment numbers in | implementation performs certain tests on the sequence and | |||
the segments to verify their acceptability. | acknowledgment numbers in the segments to verify their acceptability. | |||
The sender of data keeps track of the next sequence number to use in | The sender of data keeps track of the next sequence number to use in | |||
the variable SND.NXT. The receiver of data keeps track of the next | the variable SND.NXT. The receiver of data keeps track of the next | |||
sequence number to expect in the variable RCV.NXT. The sender of | sequence number to expect in the variable RCV.NXT. The sender of | |||
data keeps track of the oldest unacknowledged sequence number in the | data keeps track of the oldest unacknowledged sequence number in the | |||
variable SND.UNA. If the data flow is momentarily idle and all data | variable SND.UNA. If the data flow is momentarily idle and all data | |||
sent has been acknowledged then the three variables will be equal. | sent has been acknowledged, then the three variables will be equal. | |||
When the sender creates a segment and transmits it the sender | When the sender creates a segment and transmits it, the sender | |||
advances SND.NXT. When the receiver accepts a segment it advances | advances SND.NXT. When the receiver accepts a segment, it advances | |||
RCV.NXT and sends an acknowledgment. When the data sender receives | RCV.NXT and sends an acknowledgment. When the data sender receives | |||
an acknowledgment it advances SND.UNA. The extent to which the | an acknowledgment, it advances SND.UNA. The extent to which the | |||
values of these variables differ is a measure of the delay in the | values of these variables differ is a measure of the delay in the | |||
communication. The amount by which the variables are advanced is the | communication. The amount by which the variables are advanced is the | |||
length of the data and SYN or FIN flags in the segment. Note that | length of the data and SYN or FIN flags in the segment. Note that, | |||
once in the ESTABLISHED state all segments must carry current | once in the ESTABLISHED state, all segments must carry current | |||
acknowledgment information. | acknowledgment information. | |||
The CLOSE user call implies a push function (see Section 3.9.1), as | The CLOSE user call implies a push function (see Section 3.9.1), as | |||
does the FIN control flag in an incoming segment. | does the FIN control flag in an incoming segment. | |||
3.8.1. Retransmission Timeout | 3.8.1. Retransmission Timeout | |||
Because of the variability of the networks that compose an | Because of the variability of the networks that compose an | |||
internetwork system and the wide range of uses of TCP connections the | internetwork system and the wide range of uses of TCP connections, | |||
retransmission timeout (RTO) must be dynamically determined. | the retransmission timeout (RTO) must be dynamically determined. | |||
The RTO MUST be computed according to the algorithm in [10], | The RTO MUST be computed according to the algorithm in [10], | |||
including Karn's algorithm for taking RTT samples (MUST-18). | including Karn's algorithm for taking RTT samples (MUST-18). | |||
RFC 793 contains an early example procedure for computing the RTO, | RFC 793 contains an early example procedure for computing the RTO, | |||
based on work mentioned in IEN 177 [72]. This was then replaced by | based on work mentioned in IEN 177 [71]. This was then replaced by | |||
the algorithm described in RFC 1122, and subsequently updated in RFC | the algorithm described in RFC 1122, which was subsequently updated | |||
2988, and then again in RFC 6298. | in RFC 2988 and then again in RFC 6298. | |||
RFC 1122 allows that if a retransmitted packet is identical to the | RFC 1122 allows that if a retransmitted packet is identical to the | |||
original packet (which implies not only that the data boundaries have | original packet (which implies not only that the data boundaries have | |||
not changed, but also that none of the headers have changed), then | not changed, but also that none of the headers have changed), then | |||
the same IPv4 Identification field MAY be used (see Section 3.2.1.5 | the same IPv4 Identification field MAY be used (see Section 3.2.1.5 | |||
of RFC 1122) (MAY-4). The same IP identification field may be reused | of RFC 1122) (MAY-4). The same IP Identification field may be reused | |||
anyways, since it is only meaningful when a datagram is fragmented | anyways since it is only meaningful when a datagram is fragmented | |||
[45]. TCP implementations should not rely on or typically interact | [44]. TCP implementations should not rely on or typically interact | |||
with this IPv4 header field in any way. It is not a reasonable way | with this IPv4 header field in any way. It is not a reasonable way | |||
to either indicate duplicate sent segments, nor to identify duplicate | to indicate duplicate sent segments nor to identify duplicate | |||
received segments. | received segments. | |||
3.8.2. TCP Congestion Control | 3.8.2. TCP Congestion Control | |||
RFC 2914 [5] explains the importance of congestion control for the | RFC 2914 [5] explains the importance of congestion control for the | |||
Internet. | Internet. | |||
RFC 1122 required implementation of Van Jacobson's congestion control | RFC 1122 required implementation of Van Jacobson's congestion control | |||
algorithms slow start and congestion avoidance together with | algorithms slow start and congestion avoidance together with | |||
exponential back-off for successive RTO values for the same segment. | exponential backoff for successive RTO values for the same segment. | |||
RFC 2581 provided IETF Standards Track description of slow start and | RFC 2581 provided IETF Standards Track description of slow start and | |||
congestion avoidance, along with fast retransmit and fast recovery. | congestion avoidance, along with fast retransmit and fast recovery. | |||
RFC 5681 is the current description of these algorithms and is the | RFC 5681 is the current description of these algorithms and is the | |||
current Standards Track specification providing guidelines for TCP | current Standards Track specification providing guidelines for TCP | |||
congestion control. RFC 6298 describes exponential back-off of RTO | congestion control. RFC 6298 describes exponential backoff of RTO | |||
values, including keeping the backed-off value until a subsequent | values, including keeping the backed-off value until a subsequent | |||
segment with new data has been sent and acknowledged without | segment with new data has been sent and acknowledged without | |||
retransmission. | retransmission. | |||
A TCP endpoint MUST implement the basic congestion control algorithms | A TCP endpoint MUST implement the basic congestion control algorithms | |||
slow start, congestion avoidance, and exponential back-off of RTO to | slow start, congestion avoidance, and exponential backoff of RTO to | |||
avoid creating congestion collapse conditions (MUST-19). RFC 5681 | avoid creating congestion collapse conditions (MUST-19). RFC 5681 | |||
and RFC 6298 describe the basic algorithms on the IETF Standards | and RFC 6298 describe the basic algorithms on the IETF Standards | |||
Track that are broadly applicable. Multiple other suitable | Track that are broadly applicable. Multiple other suitable | |||
algorithms exist and have been widely used. Many TCP implementations | algorithms exist and have been widely used. Many TCP implementations | |||
support a set of alternative algorithms that can be configured for | support a set of alternative algorithms that can be configured for | |||
use on the endpoint. An endpoint MAY implement such alternative | use on the endpoint. An endpoint MAY implement such alternative | |||
algorithms provided that the algorithms are conformant with the TCP | algorithms provided that the algorithms are conformant with the TCP | |||
specifications from the IETF Standards Track as described in RFC | specifications from the IETF Standards Track as described in RFC | |||
2914, RFC 5033 [7], and RFC 8961 [15] (MAY-18). | 2914, RFC 5033 [7], and RFC 8961 [15] (MAY-18). | |||
Explicit Congestion Notification (ECN) was defined in RFC 3168 and is | Explicit Congestion Notification (ECN) was defined in RFC 3168 and is | |||
an IETF Standards Track enhancement that has many benefits [52]. | an IETF Standards Track enhancement that has many benefits [51]. | |||
A TCP endpoint SHOULD implement ECN as described in RFC 3168 (SHLD- | A TCP endpoint SHOULD implement ECN as described in RFC 3168 (SHLD- | |||
8). | 8). | |||
3.8.3. TCP Connection Failures | 3.8.3. TCP Connection Failures | |||
Excessive retransmission of the same segment by a TCP endpoint | Excessive retransmission of the same segment by a TCP endpoint | |||
indicates some failure of the remote host or the Internet path. This | indicates some failure of the remote host or the internetwork path. | |||
failure may be of short or long duration. The following procedure | This failure may be of short or long duration. The following | |||
MUST be used to handle excessive retransmissions of data segments | procedure MUST be used to handle excessive retransmissions of data | |||
(MUST-20): | segments (MUST-20): | |||
(a) There are two thresholds R1 and R2 measuring the amount of | (a) There are two thresholds R1 and R2 measuring the amount of | |||
retransmission that has occurred for the same segment. R1 and R2 | retransmission that has occurred for the same segment. R1 and | |||
might be measured in time units or as a count of retransmissions | R2 might be measured in time units or as a count of | |||
(with the current RTO and corresponding backoffs as a conversion | retransmissions (with the current RTO and corresponding backoffs | |||
factor, if needed). | as a conversion factor, if needed). | |||
(b) When the number of transmissions of the same segment reaches | (b) When the number of transmissions of the same segment reaches or | |||
or exceeds threshold R1, pass negative advice (see Section 3.3.1.4 | exceeds threshold R1, pass negative advice (see Section 3.3.1.4 | |||
of [20]) to the IP layer, to trigger dead-gateway diagnosis. | of [19]) to the IP layer, to trigger dead-gateway diagnosis. | |||
(c) When the number of transmissions of the same segment reaches a | (c) When the number of transmissions of the same segment reaches a | |||
threshold R2 greater than R1, close the connection. | threshold R2 greater than R1, close the connection. | |||
(d) An application MUST (MUST-21) be able to set the value for R2 | (d) An application MUST (MUST-21) be able to set the value for R2 | |||
for a particular connection. For example, an interactive | for a particular connection. For example, an interactive | |||
application might set R2 to "infinity," giving the user control | application might set R2 to "infinity", giving the user control | |||
over when to disconnect. | over when to disconnect. | |||
(e) TCP implementations SHOULD inform the application of the | (e) TCP implementations SHOULD inform the application of the | |||
delivery problem (unless such information has been disabled by the | delivery problem (unless such information has been disabled by | |||
application; see Asynchronous Reports section), when R1 is reached | the application; see the "Asynchronous Reports" section | |||
and before R2 (SHLD-9). This will allow a remote login | (Section 3.9.1.8)), when R1 is reached and before R2 (SHLD-9). | |||
application program to inform the user, for example. | This will allow a remote login application program to inform the | |||
user, for example. | ||||
The value of R1 SHOULD correspond to at least 3 retransmissions, at | The value of R1 SHOULD correspond to at least 3 retransmissions, at | |||
the current RTO (SHLD-10). The value of R2 SHOULD correspond to at | the current RTO (SHLD-10). The value of R2 SHOULD correspond to at | |||
least 100 seconds (SHLD-11). | least 100 seconds (SHLD-11). | |||
An attempt to open a TCP connection could fail with excessive | An attempt to open a TCP connection could fail with excessive | |||
retransmissions of the SYN segment or by receipt of a RST segment or | retransmissions of the SYN segment or by receipt of a RST segment or | |||
an ICMP Port Unreachable. SYN retransmissions MUST be handled in the | an ICMP Port Unreachable. SYN retransmissions MUST be handled in the | |||
general way just described for data retransmissions, including | general way just described for data retransmissions, including | |||
notification of the application layer. | notification of the application layer. | |||
skipping to change at page 43, line 23 ¶ | skipping to change at line 2030 ¶ | |||
enough to provide retransmission of the segment for at least 3 | enough to provide retransmission of the segment for at least 3 | |||
minutes (MUST-23). The application can close the connection (i.e., | minutes (MUST-23). The application can close the connection (i.e., | |||
give up on the open attempt) sooner, of course. | give up on the open attempt) sooner, of course. | |||
3.8.4. TCP Keep-Alives | 3.8.4. TCP Keep-Alives | |||
A TCP connection is said to be "idle" if for some long amount of time | A TCP connection is said to be "idle" if for some long amount of time | |||
there have been no incoming segments received and there is no new or | there have been no incoming segments received and there is no new or | |||
unacknowledged data to be sent. | unacknowledged data to be sent. | |||
Implementors MAY include "keep-alives" in their TCP implementations | Implementers MAY include "keep-alives" in their TCP implementations | |||
(MAY-5), although this practice is not universally accepted. Some | (MAY-5), although this practice is not universally accepted. Some | |||
TCP implementations, however, have included a keep-alive mechanism. | TCP implementations, however, have included a keep-alive mechanism. | |||
To confirm that an idle connection is still active, these | To confirm that an idle connection is still active, these | |||
implementations send a probe segment designed to elicit a response | implementations send a probe segment designed to elicit a response | |||
from the TCP peer. Such a segment generally contains SEG.SEQ = | from the TCP peer. Such a segment generally contains SEG.SEQ = | |||
SND.NXT-1 and may or may not contain one garbage octet of data. If | SND.NXT-1 and may or may not contain one garbage octet of data. If | |||
keep-alives are included, the application MUST be able to turn them | keep-alives are included, the application MUST be able to turn them | |||
on or off for each TCP connection (MUST-24), and they MUST default to | on or off for each TCP connection (MUST-24), and they MUST default to | |||
off (MUST-25). | off (MUST-25). | |||
Keep-alive packets MUST only be sent when no sent data is | Keep-alive packets MUST only be sent when no sent data is | |||
outstanding, and no data or acknowledgement packets have been | outstanding, and no data or acknowledgment packets have been received | |||
received for the connection within an interval (MUST-26). This | for the connection within an interval (MUST-26). This interval MUST | |||
interval MUST be configurable (MUST-27) and MUST default to no less | be configurable (MUST-27) and MUST default to no less than two hours | |||
than two hours (MUST-28). | (MUST-28). | |||
It is extremely important to remember that ACK segments that contain | It is extremely important to remember that ACK segments that contain | |||
no data are not reliably transmitted by TCP. Consequently, if a | no data are not reliably transmitted by TCP. Consequently, if a | |||
keep-alive mechanism is implemented it MUST NOT interpret failure to | keep-alive mechanism is implemented it MUST NOT interpret failure to | |||
respond to any specific probe as a dead connection (MUST-29). | respond to any specific probe as a dead connection (MUST-29). | |||
An implementation SHOULD send a keep-alive segment with no data | An implementation SHOULD send a keep-alive segment with no data | |||
(SHLD-12); however, it MAY be configurable to send a keep-alive | (SHLD-12); however, it MAY be configurable to send a keep-alive | |||
segment containing one garbage octet (MAY-6), for compatibility with | segment containing one garbage octet (MAY-6), for compatibility with | |||
erroneous TCP implementations. | erroneous TCP implementations. | |||
3.8.5. The Communication of Urgent Information | 3.8.5. The Communication of Urgent Information | |||
As a result of implementation differences and middlebox interactions, | As a result of implementation differences and middlebox interactions, | |||
new applications SHOULD NOT employ the TCP urgent mechanism (SHLD- | new applications SHOULD NOT employ the TCP urgent mechanism (SHLD- | |||
13). However, TCP implementations MUST still include support for the | 13). However, TCP implementations MUST still include support for the | |||
urgent mechanism (MUST-30). Information on how some TCP | urgent mechanism (MUST-30). Information on how some TCP | |||
implementations interpret the urgent pointer can be found in RFC 6093 | implementations interpret the urgent pointer can be found in RFC 6093 | |||
[40]. | [39]. | |||
The objective of the TCP urgent mechanism is to allow the sending | The objective of the TCP urgent mechanism is to allow the sending | |||
user to stimulate the receiving user to accept some urgent data and | user to stimulate the receiving user to accept some urgent data and | |||
to permit the receiving TCP endpoint to indicate to the receiving | to permit the receiving TCP endpoint to indicate to the receiving | |||
user when all the currently known urgent data has been received by | user when all the currently known urgent data has been received by | |||
the user. | the user. | |||
This mechanism permits a point in the data stream to be designated as | This mechanism permits a point in the data stream to be designated as | |||
the end of urgent information. Whenever this point is in advance of | the end of urgent information. Whenever this point is in advance of | |||
the receive sequence number (RCV.NXT) at the receiving TCP endpoint, | the receive sequence number (RCV.NXT) at the receiving TCP endpoint, | |||
that TCP must tell the user to go into "urgent mode"; when the | then the TCP implementation must tell the user to go into "urgent | |||
receive sequence number catches up to the urgent pointer, the TCP | mode"; when the receive sequence number catches up to the urgent | |||
implementation must tell user to go into "normal mode". If the | pointer, the TCP implementation must tell user to go into "normal | |||
urgent pointer is updated while the user is in "urgent mode", the | mode". If the urgent pointer is updated while the user is in "urgent | |||
update will be invisible to the user. | mode", the update will be invisible to the user. | |||
The method employs an urgent field that is carried in all segments | The method employs an urgent field that is carried in all segments | |||
transmitted. The URG control flag indicates that the urgent field is | transmitted. The URG control flag indicates that the urgent field is | |||
meaningful and must be added to the segment sequence number to yield | meaningful and must be added to the segment sequence number to yield | |||
the urgent pointer. The absence of this flag indicates that there is | the urgent pointer. The absence of this flag indicates that there is | |||
no urgent data outstanding. | no urgent data outstanding. | |||
To send an urgent indication the user must also send at least one | To send an urgent indication, the user must also send at least one | |||
data octet. If the sending user also indicates a push, timely | data octet. If the sending user also indicates a push, timely | |||
delivery of the urgent information to the destination process is | delivery of the urgent information to the destination process is | |||
enhanced. Note that because changes in the urgent pointer correspond | enhanced. Note that because changes in the urgent pointer correspond | |||
to data being written by a sending application, the urgent pointer | to data being written by a sending application, the urgent pointer | |||
can not "recede" in the sequence space, but a TCP receiver should be | cannot "recede" in the sequence space, but a TCP receiver should be | |||
robust to invalid urgent pointer values. | robust to invalid urgent pointer values. | |||
A TCP implementation MUST support a sequence of urgent data of any | A TCP implementation MUST support a sequence of urgent data of any | |||
length (MUST-31). [20] | length (MUST-31) [19]. | |||
The urgent pointer MUST point to the sequence number of the octet | The urgent pointer MUST point to the sequence number of the octet | |||
following the urgent data (MUST-62). | following the urgent data (MUST-62). | |||
A TCP implementation MUST (MUST-32) inform the application layer | A TCP implementation MUST (MUST-32) inform the application layer | |||
asynchronously whenever it receives an Urgent pointer and there was | asynchronously whenever it receives an urgent pointer and there was | |||
previously no pending urgent data, or whenever the Urgent pointer | previously no pending urgent data, or whenever the urgent pointer | |||
advances in the data stream. The TCP implementation MUST (MUST-33) | advances in the data stream. The TCP implementation MUST (MUST-33) | |||
provide a way for the application to learn how much urgent data | provide a way for the application to learn how much urgent data | |||
remains to be read from the connection, or at least to determine | remains to be read from the connection, or at least to determine | |||
whether more urgent data remains to be read [20]. | whether more urgent data remains to be read [19]. | |||
3.8.6. Managing the Window | 3.8.6. Managing the Window | |||
The window sent in each segment indicates the range of sequence | The window sent in each segment indicates the range of sequence | |||
numbers the sender of the window (the data receiver) is currently | numbers the sender of the window (the data receiver) is currently | |||
prepared to accept. There is an assumption that this is related to | prepared to accept. There is an assumption that this is related to | |||
the currently available data buffer space available for this | the data buffer space currently available for this connection. | |||
connection. | ||||
The sending TCP endpoint packages the data to be transmitted into | The sending TCP endpoint packages the data to be transmitted into | |||
segments that fit the current window, and may repackage segments on | segments that fit the current window, and may repackage segments on | |||
the retransmission queue. Such repackaging is not required, but may | the retransmission queue. Such repackaging is not required but may | |||
be helpful. | be helpful. | |||
In a connection with a one-way data flow, the window information will | In a connection with a one-way data flow, the window information will | |||
be carried in acknowledgment segments that all have the same sequence | be carried in acknowledgment segments that all have the same sequence | |||
number, so there will be no way to reorder them if they arrive out of | number, so there will be no way to reorder them if they arrive out of | |||
order. This is not a serious problem, but it will allow the window | order. This is not a serious problem, but it will allow the window | |||
information to be on occasion temporarily based on old reports from | information to be on occasion temporarily based on old reports from | |||
the data receiver. A refinement to avoid this problem is to act on | the data receiver. A refinement to avoid this problem is to act on | |||
the window information from segments that carry the highest | the window information from segments that carry the highest | |||
acknowledgment number (that is segments with acknowledgment number | acknowledgment number (that is, segments with an acknowledgment | |||
equal or greater than the highest previously received). | number equal to or greater than the highest previously received). | |||
Indicating a large window encourages transmissions. If more data | Indicating a large window encourages transmissions. If more data | |||
arrives than can be accepted, it will be discarded. This will result | arrives than can be accepted, it will be discarded. This will result | |||
in excessive retransmissions, adding unnecessarily to the load on the | in excessive retransmissions, adding unnecessarily to the load on the | |||
network and the TCP endpoints. Indicating a small window may | network and the TCP endpoints. Indicating a small window may | |||
restrict the transmission of data to the point of introducing a round | restrict the transmission of data to the point of introducing a | |||
trip delay between each new segment transmitted. | round-trip delay between each new segment transmitted. | |||
The mechanisms provided allow a TCP endpoint to advertise a large | The mechanisms provided allow a TCP endpoint to advertise a large | |||
window and to subsequently advertise a much smaller window without | window and to subsequently advertise a much smaller window without | |||
having accepted that much data. This, so-called "shrinking the | having accepted that much data. This so-called "shrinking the | |||
window," is strongly discouraged. The robustness principle [20] | window" is strongly discouraged. The robustness principle [19] | |||
dictates that TCP peers will not shrink the window themselves, but | dictates that TCP peers will not shrink the window themselves, but | |||
will be prepared for such behavior on the part of other TCP peers. | will be prepared for such behavior on the part of other TCP peers. | |||
A TCP receiver SHOULD NOT shrink the window, i.e., move the right | A TCP receiver SHOULD NOT shrink the window, i.e., move the right | |||
window edge to the left (SHLD-14). However, a sending TCP peer MUST | window edge to the left (SHLD-14). However, a sending TCP peer MUST | |||
be robust against window shrinking, which may cause the "usable | be robust against window shrinking, which may cause the "usable | |||
window" (see Section 3.8.6.2.1) to become negative (MUST-34). | window" (see Section 3.8.6.2.1) to become negative (MUST-34). | |||
If this happens, the sender SHOULD NOT send new data (SHLD-15), but | If this happens, the sender SHOULD NOT send new data (SHLD-15), but | |||
SHOULD retransmit normally the old unacknowledged data between | SHOULD retransmit normally the old unacknowledged data between | |||
SND.UNA and SND.UNA+SND.WND (SHLD-16). The sender MAY also | SND.UNA and SND.UNA+SND.WND (SHLD-16). The sender MAY also | |||
retransmit old data beyond SND.UNA+SND.WND (MAY-7), but SHOULD NOT | retransmit old data beyond SND.UNA+SND.WND (MAY-7), but SHOULD NOT | |||
time out the connection if data beyond the right window edge is not | time out the connection if data beyond the right window edge is not | |||
acknowledged (SHLD-17). If the window shrinks to zero, the TCP | acknowledged (SHLD-17). If the window shrinks to zero, the TCP | |||
implementation MUST probe it in the standard way (described below) | implementation MUST probe it in the standard way (described below) | |||
(MUST-35). | (MUST-35). | |||
3.8.6.1. Zero Window Probing | 3.8.6.1. Zero-Window Probing | |||
The sending TCP peer must regularly transmit at least one octet of | The sending TCP peer must regularly transmit at least one octet of | |||
new data (if available) or retransmit to the receiving TCP peer even | new data (if available), or retransmit to the receiving TCP peer even | |||
if the send window is zero, in order to "probe" the window. This | if the send window is zero, in order to "probe" the window. This | |||
retransmission is essential to guarantee that when either TCP peer | retransmission is essential to guarantee that when either TCP peer | |||
has a zero window the re-opening of the window will be reliably | has a zero window the reopening of the window will be reliably | |||
reported to the other. This is referred to as Zero-Window Probing | reported to the other. This is referred to as Zero-Window Probing | |||
(ZWP) in other documents. | (ZWP) in other documents. | |||
Probing of zero (offered) windows MUST be supported (MUST-36). | Probing of zero (offered) windows MUST be supported (MUST-36). | |||
A TCP implementation MAY keep its offered receive window closed | A TCP implementation MAY keep its offered receive window closed | |||
indefinitely (MAY-8). As long as the receiving TCP peer continues to | indefinitely (MAY-8). As long as the receiving TCP peer continues to | |||
send acknowledgments in response to the probe segments, the sending | send acknowledgments in response to the probe segments, the sending | |||
TCP peer MUST allow the connection to stay open (MUST-37). This | TCP peer MUST allow the connection to stay open (MUST-37). This | |||
enables TCP to function in scenarios such as the "printer ran out of | enables TCP to function in scenarios such as the "printer ran out of | |||
paper" situation described in Section 4.2.2.17 of [20]. The behavior | paper" situation described in Section 4.2.2.17 of [19]. The behavior | |||
is subject to the implementation's resource management concerns, as | is subject to the implementation's resource management concerns, as | |||
noted in [42]. | noted in [41]. | |||
When the receiving TCP peer has a zero window and a segment arrives | When the receiving TCP peer has a zero window and a segment arrives, | |||
it must still send an acknowledgment showing its next expected | it must still send an acknowledgment showing its next expected | |||
sequence number and current window (zero). | sequence number and current window (zero). | |||
The transmitting host SHOULD send the first zero-window probe when a | The transmitting host SHOULD send the first zero-window probe when a | |||
zero window has existed for the retransmission timeout period (SHLD- | zero window has existed for the retransmission timeout period (SHLD- | |||
29) (Section 3.8.1), and SHOULD increase exponentially the interval | 29) (Section 3.8.1), and SHOULD increase exponentially the interval | |||
between successive probes (SHLD-30). | between successive probes (SHLD-30). | |||
3.8.6.2. Silly Window Syndrome Avoidance | 3.8.6.2. Silly Window Syndrome Avoidance | |||
skipping to change at page 47, line 9 ¶ | skipping to change at line 2203 ¶ | |||
performance. Algorithms to avoid SWS are described below for both | performance. Algorithms to avoid SWS are described below for both | |||
the sending side and the receiving side. RFC 1122 contains more | the sending side and the receiving side. RFC 1122 contains more | |||
detailed discussion of the SWS problem. Note that the Nagle | detailed discussion of the SWS problem. Note that the Nagle | |||
algorithm and the sender SWS avoidance algorithm play complementary | algorithm and the sender SWS avoidance algorithm play complementary | |||
roles in improving performance. The Nagle algorithm discourages | roles in improving performance. The Nagle algorithm discourages | |||
sending tiny segments when the data to be sent increases in small | sending tiny segments when the data to be sent increases in small | |||
increments, while the SWS avoidance algorithm discourages small | increments, while the SWS avoidance algorithm discourages small | |||
segments resulting from the right window edge advancing in small | segments resulting from the right window edge advancing in small | |||
increments. | increments. | |||
3.8.6.2.1. Sender's Algorithm - When to Send Data | 3.8.6.2.1. Sender's Algorithm -- When to Send Data | |||
A TCP implementation MUST include a SWS avoidance algorithm in the | A TCP implementation MUST include a SWS avoidance algorithm in the | |||
sender (MUST-38). | sender (MUST-38). | |||
The Nagle algorithm from Section 3.7.4 additionally describes how to | The Nagle algorithm from Section 3.7.4 additionally describes how to | |||
coalesce short segments. | coalesce short segments. | |||
The sender's SWS avoidance algorithm is more difficult than the | The sender's SWS avoidance algorithm is more difficult than the | |||
receiver's, because the sender does not know (directly) the | receiver's because the sender does not know (directly) the receiver's | |||
receiver's total buffer space RCV.BUFF. An approach that has been | total buffer space (RCV.BUFF). An approach that has been found to | |||
found to work well is for the sender to calculate Max(SND.WND), the | work well is for the sender to calculate Max(SND.WND), which is the | |||
maximum send window it has seen so far on the connection, and to use | maximum send window it has seen so far on the connection, and to use | |||
this value as an estimate of RCV.BUFF. Unfortunately, this can only | this value as an estimate of RCV.BUFF. Unfortunately, this can only | |||
be an estimate; the receiver may at any time reduce the size of | be an estimate; the receiver may at any time reduce the size of | |||
RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a | RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a | |||
timeout to force transmission of data, overriding the SWS avoidance | timeout to force transmission of data, overriding the SWS avoidance | |||
algorithm. In practice, this timeout should seldom occur. | algorithm. In practice, this timeout should seldom occur. | |||
The "usable window" is: | The "usable window" is: | |||
U = SND.UNA + SND.WND - SND.NXT | U = SND.UNA + SND.WND - SND.NXT | |||
skipping to change at page 47, line 46 ¶ | skipping to change at line 2240 ¶ | |||
Send data: | Send data: | |||
(1) if a maximum-sized segment can be sent, i.e., if: | (1) if a maximum-sized segment can be sent, i.e., if: | |||
min(D,U) >= Eff.snd.MSS; | min(D,U) >= Eff.snd.MSS; | |||
(2) or if the data is pushed and all queued data can be sent now, | (2) or if the data is pushed and all queued data can be sent now, | |||
i.e., if: | i.e., if: | |||
[SND.NXT = SND.UNA and] PUSHED and D <= U | [SND.NXT = SND.UNA and] PUSHed and D <= U | |||
(the bracketed condition is imposed by the Nagle algorithm); | (the bracketed condition is imposed by the Nagle algorithm); | |||
(3) or if at least a fraction Fs of the maximum window can be sent, | (3) or if at least a fraction Fs of the maximum window can be sent, | |||
i.e., if: | i.e., if: | |||
[SND.NXT = SND.UNA and] | [SND.NXT = SND.UNA and] | |||
min(D,U) >= Fs * Max(SND.WND); | min(D,U) >= Fs * Max(SND.WND); | |||
(4) or if the override timeout occurs. | (4) or if the override timeout occurs. | |||
Here Fs is a fraction whose recommended value is 1/2. The override | Here Fs is a fraction whose recommended value is 1/2. The override | |||
timeout should be in the range 0.1 - 1.0 seconds. It may be | timeout should be in the range 0.1 - 1.0 seconds. It may be | |||
convenient to combine this timer with the timer used to probe zero | convenient to combine this timer with the timer used to probe zero | |||
windows (Section 3.8.6.1). | windows (Section 3.8.6.1). | |||
3.8.6.2.2. Receiver's Algorithm - When to Send a Window Update | 3.8.6.2.2. Receiver's Algorithm -- When to Send a Window Update | |||
A TCP implementation MUST include a SWS avoidance algorithm in the | A TCP implementation MUST include a SWS avoidance algorithm in the | |||
receiver (MUST-39). | receiver (MUST-39). | |||
The receiver's SWS avoidance algorithm determines when the right | The receiver's SWS avoidance algorithm determines when the right | |||
window edge may be advanced; this is customarily known as "updating | window edge may be advanced; this is customarily known as "updating | |||
the window". This algorithm combines with the delayed ACK algorithm | the window". This algorithm combines with the delayed ACK algorithm | |||
(Section 3.8.6.3) to determine when an ACK segment containing the | (Section 3.8.6.3) to determine when an ACK segment containing the | |||
current window will really be sent to the receiver. | current window will really be sent to the receiver. | |||
skipping to change at page 49, line 23 ¶ | skipping to change at line 2314 ¶ | |||
Eff.snd.MSS is the effective send MSS for the connection (see | Eff.snd.MSS is the effective send MSS for the connection (see | |||
Section 3.7.1). When the inequality is satisfied, RCV.WND is set to | Section 3.7.1). When the inequality is satisfied, RCV.WND is set to | |||
RCV.BUFF-RCV.USER. | RCV.BUFF-RCV.USER. | |||
Note that the general effect of this algorithm is to advance RCV.WND | Note that the general effect of this algorithm is to advance RCV.WND | |||
in increments of Eff.snd.MSS (for realistic receive buffers: | in increments of Eff.snd.MSS (for realistic receive buffers: | |||
Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its | Eff.snd.MSS < RCV.BUFF/2). Note also that the receiver must use its | |||
own Eff.snd.MSS, making the assumption that it is the same as the | own Eff.snd.MSS, making the assumption that it is the same as the | |||
sender's. | sender's. | |||
3.8.6.3. Delayed Acknowledgements - When to Send an ACK Segment | 3.8.6.3. Delayed Acknowledgments -- When to Send an ACK Segment | |||
A host that is receiving a stream of TCP data segments can increase | A host that is receiving a stream of TCP data segments can increase | |||
efficiency in both the Internet and the hosts by sending fewer than | efficiency in both the network and the hosts by sending fewer than | |||
one ACK (acknowledgment) segment per data segment received; this is | one ACK (acknowledgment) segment per data segment received; this is | |||
known as a "delayed ACK". | known as a "delayed ACK". | |||
A TCP endpoint SHOULD implement a delayed ACK (SHLD-18), but an ACK | A TCP endpoint SHOULD implement a delayed ACK (SHLD-18), but an ACK | |||
should not be excessively delayed; in particular, the delay MUST be | should not be excessively delayed; in particular, the delay MUST be | |||
less than 0.5 seconds (MUST-40). An ACK SHOULD be generated for at | less than 0.5 seconds (MUST-40). An ACK SHOULD be generated for at | |||
least every second full-sized segment or 2*RMSS bytes of new data | least every second full-sized segment or 2*RMSS bytes of new data | |||
(where RMSS is the MSS specified by the TCP endpoint receiving the | (where RMSS is the MSS specified by the TCP endpoint receiving the | |||
segments to be acknowledged, or the default value if not specified) | segments to be acknowledged, or the default value if not specified) | |||
(SHLD-19). Excessive delays on ACKs can disturb the round-trip | (SHLD-19). Excessive delays on ACKs can disturb the round-trip | |||
timing and packet "clocking" algorithms. More complete discussion of | timing and packet "clocking" algorithms. More complete discussion of | |||
delayed ACK behavior is in Section 4.2 of RFC 5681 [8], including | delayed ACK behavior is in Section 4.2 of RFC 5681 [8], including | |||
recommendations to immediately acknowledge out-of-order segments, | recommendations to immediately acknowledge out-of-order segments, | |||
segments above a gap in sequence space, or segments that fill all or | segments above a gap in sequence space, or segments that fill all or | |||
part of a gap, in order to accelerate loss recovery. | part of a gap, in order to accelerate loss recovery. | |||
Note that there are several current practices that further lead to a | Note that there are several current practices that further lead to a | |||
reduced number of ACKs, including generic receive offload (GRO) [73], | reduced number of ACKs, including generic receive offload (GRO) [72], | |||
ACK compression, and ACK decimation [29]. | ACK compression, and ACK decimation [28]. | |||
3.9. Interfaces | 3.9. Interfaces | |||
There are of course two interfaces of concern: the user/TCP interface | There are of course two interfaces of concern: the user/TCP interface | |||
and the TCP/lower level interface. We have a fairly elaborate model | and the TCP/lower-level interface. We have a fairly elaborate model | |||
of the user/TCP interface, but the interface to the lower level | of the user/TCP interface, but the interface to the lower-level | |||
protocol module is left unspecified here, since it will be specified | protocol module is left unspecified here since it will be specified | |||
in detail by the specification of the lower level protocol. For the | in detail by the specification of the lower-level protocol. For the | |||
case that the lower level is IP we note some of the parameter values | case that the lower level is IP, we note some of the parameter values | |||
that TCP implementations might use. | that TCP implementations might use. | |||
3.9.1. User/TCP Interface | 3.9.1. User/TCP Interface | |||
The following functional description of user commands to the TCP | The following functional description of user commands to the TCP | |||
implementation is, at best, fictional, since every operating system | implementation is, at best, fictional, since every operating system | |||
will have different facilities. Consequently, we must warn readers | will have different facilities. Consequently, we must warn readers | |||
that different TCP implementations may have different user | that different TCP implementations may have different user | |||
interfaces. However, all TCP implementations must provide a certain | interfaces. However, all TCP implementations must provide a certain | |||
minimum set of services to guarantee that all TCP implementations can | minimum set of services to guarantee that all TCP implementations can | |||
support the same protocol hierarchy. This section specifies the | support the same protocol hierarchy. This section specifies the | |||
functional interfaces required of all TCP implementations. | functional interfaces required of all TCP implementations. | |||
Section 3.1 of [54] also identifies primitives provided by TCP, and | Section 3.1 of [53] also identifies primitives provided by TCP and | |||
could be used as an additional reference for implementers. | could be used as an additional reference for implementers. | |||
The following sections functionally characterize a USER/TCP | The following sections functionally characterize a user/TCP | |||
interface. The notation used is similar to most procedure or | interface. The notation used is similar to most procedure or | |||
function calls in high level languages, but this usage is not meant | function calls in high-level languages, but this usage is not meant | |||
to rule out trap type service calls. | to rule out trap-type service calls. | |||
The user commands described below specify the basic functions the TCP | The user commands described below specify the basic functions the TCP | |||
implementation must perform to support interprocess communication. | implementation must perform to support interprocess communication. | |||
Individual implementations must define their own exact format, and | Individual implementations must define their own exact format and may | |||
may provide combinations or subsets of the basic functions in single | provide combinations or subsets of the basic functions in single | |||
calls. In particular, some implementations may wish to automatically | calls. In particular, some implementations may wish to automatically | |||
OPEN a connection on the first SEND or RECEIVE issued by the user for | OPEN a connection on the first SEND or RECEIVE issued by the user for | |||
a given connection. | a given connection. | |||
In providing interprocess communication facilities, the TCP | In providing interprocess communication facilities, the TCP | |||
implementation must not only accept commands, but must also return | implementation must not only accept commands, but must also return | |||
information to the processes it serves. The latter consists of: | information to the processes it serves. The latter consists of: | |||
(a) general information about a connection (e.g., interrupts, | (a) general information about a connection (e.g., interrupts, remote | |||
remote close, binding of unspecified remote socket). | close, binding of unspecified remote socket). | |||
(b) replies to specific user commands indicating success or | (b) replies to specific user commands indicating success or various | |||
various types of failure. | types of failure. | |||
3.9.1.1. Open | 3.9.1.1. Open | |||
Format: OPEN (local port, remote socket, active/passive [, | Format: OPEN (local port, remote socket, active/passive [, timeout] | |||
timeout] [, DiffServ field] [, security/compartment] [local IP | [, Diffserv field] [, security/compartment] [, local IP address] [, | |||
address,] [, options]) -> local connection name | options]) -> local connection name | |||
If the active/passive flag is set to passive, then this is a call | If the active/passive flag is set to passive, then this is a call to | |||
to LISTEN for an incoming connection. A passive open may have | LISTEN for an incoming connection. A passive OPEN may have either a | |||
either a fully specified remote socket to wait for a particular | fully specified remote socket to wait for a particular connection or | |||
connection or an unspecified remote socket to wait for any call. | an unspecified remote socket to wait for any call. A fully specified | |||
A fully specified passive call can be made active by the | passive call can be made active by the subsequent execution of a | |||
subsequent execution of a SEND. | SEND. | |||
A transmission control block (TCB) is created and partially filled | A transmission control block (TCB) is created and partially filled in | |||
in with data from the OPEN command parameters. | with data from the OPEN command parameters. | |||
Every passive OPEN call either creates a new connection record in | Every passive OPEN call either creates a new connection record in | |||
LISTEN state, or it returns an error; it MUST NOT affect any | LISTEN state, or it returns an error; it MUST NOT affect any | |||
previously created connection record (MUST-41). | previously created connection record (MUST-41). | |||
A TCP implementation that supports multiple concurrent connections | A TCP implementation that supports multiple concurrent connections | |||
MUST provide an OPEN call that will functionally allow an | MUST provide an OPEN call that will functionally allow an application | |||
application to LISTEN on a port while a connection block with the | to LISTEN on a port while a connection block with the same local port | |||
same local port is in SYN-SENT or SYN-RECEIVED state (MUST-42). | is in SYN-SENT or SYN-RECEIVED state (MUST-42). | |||
On an active OPEN command, the TCP endpoint will begin the | On an active OPEN command, the TCP endpoint will begin the procedure | |||
procedure to synchronize (i.e., establish) the connection at once. | to synchronize (i.e., establish) the connection at once. | |||
The timeout, if present, permits the caller to set up a timeout | The timeout, if present, permits the caller to set up a timeout for | |||
for all data submitted to TCP. If data is not successfully | all data submitted to TCP. If data is not successfully delivered to | |||
delivered to the destination within the timeout period, the TCP | the destination within the timeout period, the TCP endpoint will | |||
endpoint will abort the connection. The present global default is | abort the connection. The present global default is five minutes. | |||
five minutes. | ||||
The TCP implementation or some component of the operating system | The TCP implementation or some component of the operating system will | |||
will verify the user's authority to open a connection with the | verify the user's authority to open a connection with the specified | |||
specified DiffServ field value or security/compartment. The | Diffserv field value or security/compartment. The absence of a | |||
absence of a DiffServ field value or security/compartment | Diffserv field value or security/compartment specification in the | |||
specification in the OPEN call indicates the default values must | OPEN call indicates the default values must be used. | |||
be used. | ||||
TCP will accept incoming requests as matching only if the | TCP will accept incoming requests as matching only if the security/ | |||
security/compartment information is exactly the same as that | compartment information is exactly the same as that requested in the | |||
requested in the OPEN call. | OPEN call. | |||
The DiffServ field value indicated by the user only impacts | The Diffserv field value indicated by the user only impacts outgoing | |||
outgoing packets, may be altered en route through the network, and | packets, may be altered en route through the network, and has no | |||
has no direct bearing or relation to received packets. | direct bearing or relation to received packets. | |||
A local connection name will be returned to the user by the TCP | A local connection name will be returned to the user by the TCP | |||
implementation. The local connection name can then be used as a | implementation. The local connection name can then be used as a | |||
short-hand term for the connection defined by the <local socket, | shorthand term for the connection defined by the <local socket, | |||
remote socket> pair. | remote socket> pair. | |||
The optional "local IP address" parameter MUST be supported to | The optional "local IP address" parameter MUST be supported to allow | |||
allow the specification of the local IP address (MUST-43). This | the specification of the local IP address (MUST-43). This enables | |||
enables applications that need to select the local IP address used | applications that need to select the local IP address used when | |||
when multihoming is present. | multihoming is present. | |||
A passive OPEN call with a specified "local IP address" parameter | A passive OPEN call with a specified "local IP address" parameter | |||
will await an incoming connection request to that address. If the | will await an incoming connection request to that address. If the | |||
parameter is unspecified, a passive OPEN will await an incoming | parameter is unspecified, a passive OPEN will await an incoming | |||
connection request to any local IP address, and then bind the | connection request to any local IP address and then bind the local IP | |||
local IP address of the connection to the particular address that | address of the connection to the particular address that is used. | |||
is used. | ||||
For an active OPEN call, a specified "local IP address" parameter | For an active OPEN call, a specified "local IP address" parameter | |||
will be used for opening the connection. If the parameter is | will be used for opening the connection. If the parameter is | |||
unspecified, the host will choose an appropriate local IP address | unspecified, the host will choose an appropriate local IP address | |||
(see RFC 1122 section 3.3.4.2). | (see RFC 1122, Section 3.3.4.2). | |||
If an application on a multihomed host does not specify the local | If an application on a multihomed host does not specify the local IP | |||
IP address when actively opening a TCP connection, then the TCP | address when actively opening a TCP connection, then the TCP | |||
implementation MUST ask the IP layer to select a local IP address | implementation MUST ask the IP layer to select a local IP address | |||
before sending the (first) SYN (MUST-44). See the function | before sending the (first) SYN (MUST-44). See the function | |||
GET_SRCADDR() in Section 3.4 of RFC 1122. | GET_SRCADDR() in Section 3.4 of RFC 1122. | |||
At all other times, a previous segment has either been sent or | At all other times, a previous segment has either been sent or | |||
received on this connection, and TCP implementations MUST use the | received on this connection, and TCP implementations MUST use the | |||
same local address that was used in those previous segments (MUST- | same local address that was used in those previous segments (MUST- | |||
45). | 45). | |||
A TCP implementation MUST reject as an error a local OPEN call for | A TCP implementation MUST reject as an error a local OPEN call for an | |||
an invalid remote IP address (e.g., a broadcast or multicast | invalid remote IP address (e.g., a broadcast or multicast address) | |||
address) (MUST-46). | (MUST-46). | |||
3.9.1.2. Send | 3.9.1.2. Send | |||
Format: SEND (local connection name, buffer address, byte count, | Format: SEND (local connection name, buffer address, byte count, | |||
PUSH flag (optional), URGENT flag [,timeout]) | URGENT flag [, PUSH flag] [, timeout]) | |||
This call causes the data contained in the indicated user buffer | ||||
to be sent on the indicated connection. If the connection has not | ||||
been opened, the SEND is considered an error. Some | ||||
implementations may allow users to SEND first; in which case, an | ||||
automatic OPEN would be done. For example, this might be one way | ||||
for application data to be included in SYN segments. If the | ||||
calling process is not authorized to use this connection, an error | ||||
is returned. | ||||
A TCP endpoint MAY implement PUSH flags on SEND calls (MAY-15). | This call causes the data contained in the indicated user buffer to | |||
If PUSH flags are not implemented, then the sending TCP peer: (1) | be sent on the indicated connection. If the connection has not been | |||
MUST NOT buffer data indefinitely (MUST-60), and (2) MUST set the | opened, the SEND is considered an error. Some implementations may | |||
PSH bit in the last buffered segment (i.e., when there is no more | allow users to SEND first; in which case, an automatic OPEN would be | |||
queued data to be sent) (MUST-61). The remaining description | done. For example, this might be one way for application data to be | |||
below assumes the PUSH flag is supported on SEND calls. | included in SYN segments. If the calling process is not authorized | |||
to use this connection, an error is returned. | ||||
If the PUSH flag is set, the application intends the data to be | A TCP endpoint MAY implement PUSH flags on SEND calls (MAY-15). If | |||
transmitted promptly to the receiver, and the PUSH bit will be set | PUSH flags are not implemented, then the sending TCP peer: (1) MUST | |||
in the last TCP segment created from the buffer. | NOT buffer data indefinitely (MUST-60), and (2) MUST set the PSH bit | |||
in the last buffered segment (i.e., when there is no more queued data | ||||
to be sent) (MUST-61). The remaining description below assumes the | ||||
PUSH flag is supported on SEND calls. | ||||
The PSH bit is not a record marker and is independent of segment | If the PUSH flag is set, the application intends the data to be | |||
boundaries. The transmitter SHOULD collapse successive bits when | transmitted promptly to the receiver, and the PSH bit will be set in | |||
it packetizes data, to send the largest possible segment (SHLD- | the last TCP segment created from the buffer. | |||
27). | ||||
If the PUSH flag is not set, the data may be combined with data | The PSH bit is not a record marker and is independent of segment | |||
from subsequent SENDs for transmission efficiency. When an | boundaries. The transmitter SHOULD collapse successive bits when it | |||
application issues a series of SEND calls without setting the PUSH | packetizes data, to send the largest possible segment (SHLD-27). | |||
flag, the TCP implementation MAY aggregate the data internally | ||||
without sending it (MAY-16). Note that when the Nagle algorithm | ||||
is in use, TCP implementations may buffer the data before sending, | ||||
without regard to the PUSH flag (see Section 3.7.4). | ||||
An application program is logically required to set the PUSH flag | If the PUSH flag is not set, the data may be combined with data from | |||
in a SEND call whenever it needs to force delivery of the data to | subsequent SENDs for transmission efficiency. When an application | |||
avoid a communication deadlock. However, a TCP implementation | issues a series of SEND calls without setting the PUSH flag, the TCP | |||
SHOULD send a maximum-sized segment whenever possible (SHLD-28), | implementation MAY aggregate the data internally without sending it | |||
to improve performance (see Section 3.8.6.2.1). | (MAY-16). Note that when the Nagle algorithm is in use, TCP | |||
implementations may buffer the data before sending, without regard to | ||||
the PUSH flag (see Section 3.7.4). | ||||
New applications SHOULD NOT set the URGENT flag [40] due to | An application program is logically required to set the PUSH flag in | |||
implementation differences and middlebox issues (SHLD-13). | a SEND call whenever it needs to force delivery of the data to avoid | |||
a communication deadlock. However, a TCP implementation SHOULD send | ||||
a maximum-sized segment whenever possible (SHLD-28) to improve | ||||
performance (see Section 3.8.6.2.1). | ||||
If the URGENT flag is set, segments sent to the destination TCP | New applications SHOULD NOT set the URGENT flag [39] due to | |||
peer will have the urgent pointer set. The receiving TCP peer | implementation differences and middlebox issues (SHLD-13). | |||
will signal the urgent condition to the receiving process if the | ||||
urgent pointer indicates that data preceding the urgent pointer | ||||
has not been consumed by the receiving process. The purpose of | ||||
urgent is to stimulate the receiver to process the urgent data and | ||||
to indicate to the receiver when all the currently known urgent | ||||
data has been received. The number of times the sending user's | ||||
TCP implementation signals urgent will not necessarily be equal to | ||||
the number of times the receiving user will be notified of the | ||||
presence of urgent data. | ||||
If no remote socket was specified in the OPEN, but the connection | If the URGENT flag is set, segments sent to the destination TCP peer | |||
is established (e.g., because a LISTENing connection has become | will have the urgent pointer set. The receiving TCP peer will signal | |||
specific due to a remote segment arriving for the local socket), | the urgent condition to the receiving process if the urgent pointer | |||
then the designated buffer is sent to the implied remote socket. | indicates that data preceding the urgent pointer has not been | |||
Users who make use of OPEN with an unspecified remote socket can | consumed by the receiving process. The purpose of the URGENT flag is | |||
make use of SEND without ever explicitly knowing the remote socket | to stimulate the receiver to process the urgent data and to indicate | |||
address. | to the receiver when all the currently known urgent data has been | |||
received. The number of times the sending user's TCP implementation | ||||
signals urgent will not necessarily be equal to the number of times | ||||
the receiving user will be notified of the presence of urgent data. | ||||
However, if a SEND is attempted before the remote socket becomes | If no remote socket was specified in the OPEN, but the connection is | |||
specified, an error will be returned. Users can use the STATUS | established (e.g., because a LISTENing connection has become specific | |||
call to determine the status of the connection. Some TCP | due to a remote segment arriving for the local socket), then the | |||
implementations may notify the user when an unspecified socket is | designated buffer is sent to the implied remote socket. Users who | |||
bound. | make use of OPEN with an unspecified remote socket can make use of | |||
SEND without ever explicitly knowing the remote socket address. | ||||
If a timeout is specified, the current user timeout for this | However, if a SEND is attempted before the remote socket becomes | |||
connection is changed to the new one. | specified, an error will be returned. Users can use the STATUS call | |||
to determine the status of the connection. Some TCP implementations | ||||
may notify the user when an unspecified socket is bound. | ||||
In the simplest implementation, SEND would not return control to | If a timeout is specified, the current user timeout for this | |||
the sending process until either the transmission was complete or | connection is changed to the new one. | |||
the timeout had been exceeded. However, this simple method is | ||||
both subject to deadlocks (for example, both sides of the | ||||
connection might try to do SENDs before doing any RECEIVEs) and | ||||
offers poor performance, so it is not recommended. A more | ||||
sophisticated implementation would return immediately to allow the | ||||
process to run concurrently with network I/O, and, furthermore, to | ||||
allow multiple SENDs to be in progress. Multiple SENDs are served | ||||
in first come, first served order, so the TCP endpoint will queue | ||||
those it cannot service immediately. | ||||
We have implicitly assumed an asynchronous user interface in which | In the simplest implementation, SEND would not return control to the | |||
a SEND later elicits some kind of SIGNAL or pseudo-interrupt from | sending process until either the transmission was complete or the | |||
the serving TCP endpoint. An alternative is to return a response | timeout had been exceeded. However, this simple method is both | |||
immediately. For instance, SENDs might return immediate local | subject to deadlocks (for example, both sides of the connection might | |||
acknowledgment, even if the segment sent had not been acknowledged | try to do SENDs before doing any RECEIVEs) and offers poor | |||
by the distant TCP endpoint. We could optimistically assume | performance, so it is not recommended. A more sophisticated | |||
eventual success. If we are wrong, the connection will close | implementation would return immediately to allow the process to run | |||
anyway due to the timeout. In implementations of this kind | concurrently with network I/O, and, furthermore, to allow multiple | |||
(synchronous), there will still be some asynchronous signals, but | SENDs to be in progress. Multiple SENDs are served in first come, | |||
these will deal with the connection itself, and not with specific | first served order, so the TCP endpoint will queue those it cannot | |||
segments or buffers. | service immediately. | |||
In order for the process to distinguish among error or success | We have implicitly assumed an asynchronous user interface in which a | |||
indications for different SENDs, it might be appropriate for the | SEND later elicits some kind of SIGNAL or pseudo-interrupt from the | |||
buffer address to be returned along with the coded response to the | serving TCP endpoint. An alternative is to return a response | |||
SEND request. TCP-to-user signals are discussed below, indicating | immediately. For instance, SENDs might return immediate local | |||
the information that should be returned to the calling process. | acknowledgment, even if the segment sent had not been acknowledged by | |||
the distant TCP endpoint. We could optimistically assume eventual | ||||
success. If we are wrong, the connection will close anyway due to | ||||
the timeout. In implementations of this kind (synchronous), there | ||||
will still be some asynchronous signals, but these will deal with the | ||||
connection itself, and not with specific segments or buffers. | ||||
In order for the process to distinguish among error or success | ||||
indications for different SENDs, it might be appropriate for the | ||||
buffer address to be returned along with the coded response to the | ||||
SEND request. TCP-to-user signals are discussed below, indicating | ||||
the information that should be returned to the calling process. | ||||
3.9.1.3. Receive | 3.9.1.3. Receive | |||
Format: RECEIVE (local connection name, buffer address, byte | Format: RECEIVE (local connection name, buffer address, byte count) | |||
count) -> byte count, urgent flag, push flag (optional) | -> byte count, URGENT flag [, PUSH flag] | |||
This command allocates a receiving buffer associated with the | This command allocates a receiving buffer associated with the | |||
specified connection. If no OPEN precedes this command or the | specified connection. If no OPEN precedes this command or the | |||
calling process is not authorized to use this connection, an error | calling process is not authorized to use this connection, an error is | |||
is returned. | returned. | |||
In the simplest implementation, control would not return to the | In the simplest implementation, control would not return to the | |||
calling program until either the buffer was filled, or some error | calling program until either the buffer was filled or some error | |||
occurred, but this scheme is highly subject to deadlocks. A more | occurred, but this scheme is highly subject to deadlocks. A more | |||
sophisticated implementation would permit several RECEIVEs to be | sophisticated implementation would permit several RECEIVEs to be | |||
outstanding at once. These would be filled as segments arrive. | outstanding at once. These would be filled as segments arrive. This | |||
This strategy permits increased throughput at the cost of a more | strategy permits increased throughput at the cost of a more elaborate | |||
elaborate scheme (possibly asynchronous) to notify the calling | scheme (possibly asynchronous) to notify the calling program that a | |||
program that a PUSH has been seen or a buffer filled. | PUSH has been seen or a buffer filled. | |||
A TCP receiver MAY pass a received PSH flag to the application | A TCP receiver MAY pass a received PSH bit to the application layer | |||
layer via the PUSH flag in the interface (MAY-17), but it is not | via the PUSH flag in the interface (MAY-17), but it is not required | |||
required (this was clarified in RFC 1122 section 4.2.2.2). The | (this was clarified in RFC 1122, Section 4.2.2.2). The remainder of | |||
remainder of text describing the RECEIVE call below assumes that | text describing the RECEIVE call below assumes that passing the PUSH | |||
passing the PUSH indication is supported. | indication is supported. | |||
If enough data arrive to fill the buffer before a PUSH is seen, | If enough data arrive to fill the buffer before a PUSH is seen, the | |||
the PUSH flag will not be set in the response to the RECEIVE. The | PUSH flag will not be set in the response to the RECEIVE. The buffer | |||
buffer will be filled with as much data as it can hold. If a PUSH | will be filled with as much data as it can hold. If a PUSH is seen | |||
is seen before the buffer is filled the buffer will be returned | before the buffer is filled, the buffer will be returned partially | |||
partially filled and PUSH indicated. | filled and PUSH indicated. | |||
If there is urgent data the user will have been informed as soon | If there is urgent data, the user will have been informed as soon as | |||
as it arrived via a TCP-to-user signal. The receiving user should | it arrived via a TCP-to-user signal. The receiving user should thus | |||
thus be in "urgent mode". If the URGENT flag is on, additional | be in "urgent mode". If the URGENT flag is on, additional urgent | |||
urgent data remains. If the URGENT flag is off, this call to | data remains. If the URGENT flag is off, this call to RECEIVE has | |||
RECEIVE has returned all the urgent data, and the user may now | returned all the urgent data, and the user may now leave "urgent | |||
leave "urgent mode". Note that data following the urgent pointer | mode". Note that data following the urgent pointer (non-urgent data) | |||
(non-urgent data) cannot be delivered to the user in the same | cannot be delivered to the user in the same buffer with preceding | |||
buffer with preceding urgent data unless the boundary is clearly | urgent data unless the boundary is clearly marked for the user. | |||
marked for the user. | ||||
To distinguish among several outstanding RECEIVEs and to take care | To distinguish among several outstanding RECEIVEs and to take care of | |||
of the case that a buffer is not completely filled, the return | the case that a buffer is not completely filled, the return code is | |||
code is accompanied by both a buffer pointer and a byte count | accompanied by both a buffer pointer and a byte count indicating the | |||
indicating the actual length of the data received. | actual length of the data received. | |||
Alternative implementations of RECEIVE might have the TCP endpoint | Alternative implementations of RECEIVE might have the TCP endpoint | |||
allocate buffer storage, or the TCP endpoint might share a ring | allocate buffer storage, or the TCP endpoint might share a ring | |||
buffer with the user. | buffer with the user. | |||
3.9.1.4. Close | 3.9.1.4. Close | |||
Format: CLOSE (local connection name) | Format: CLOSE (local connection name) | |||
This command causes the connection specified to be closed. If the | This command causes the connection specified to be closed. If the | |||
connection is not open or the calling process is not authorized to | connection is not open or the calling process is not authorized to | |||
use this connection, an error is returned. Closing connections is | use this connection, an error is returned. Closing connections is | |||
intended to be a graceful operation in the sense that outstanding | intended to be a graceful operation in the sense that outstanding | |||
SENDs will be transmitted (and retransmitted), as flow control | SENDs will be transmitted (and retransmitted), as flow control | |||
permits, until all have been serviced. Thus, it should be | permits, until all have been serviced. Thus, it should be acceptable | |||
acceptable to make several SEND calls, followed by a CLOSE, and | to make several SEND calls, followed by a CLOSE, and expect all the | |||
expect all the data to be sent to the destination. It should also | data to be sent to the destination. It should also be clear that | |||
be clear that users should continue to RECEIVE on CLOSING | users should continue to RECEIVE on CLOSING connections since the | |||
connections, since the remote peer may be trying to transmit the | remote peer may be trying to transmit the last of its data. Thus, | |||
last of its data. Thus, CLOSE means "I have no more to send" but | CLOSE means "I have no more to send" but does not mean "I will not | |||
does not mean "I will not receive any more." It may happen (if | receive any more." It may happen (if the user-level protocol is not | |||
the user level protocol is not well-thought-out) that the closing | well thought out) that the closing side is unable to get rid of all | |||
side is unable to get rid of all its data before timing out. In | its data before timing out. In this event, CLOSE turns into ABORT, | |||
this event, CLOSE turns into ABORT, and the closing TCP peer gives | and the closing TCP peer gives up. | |||
up. | ||||
The user may CLOSE the connection at any time on their own | The user may CLOSE the connection at any time on their own | |||
initiative, or in response to various prompts from the TCP | initiative, or in response to various prompts from the TCP | |||
implementation (e.g., remote close executed, transmission timeout | implementation (e.g., remote close executed, transmission timeout | |||
exceeded, destination inaccessible). | exceeded, destination inaccessible). | |||
Because closing a connection requires communication with the | Because closing a connection requires communication with the remote | |||
remote TCP peer, connections may remain in the closing state for a | TCP peer, connections may remain in the closing state for a short | |||
short time. Attempts to reopen the connection before the TCP peer | time. Attempts to reopen the connection before the TCP peer replies | |||
replies to the CLOSE command will result in error responses. | to the CLOSE command will result in error responses. | |||
Close also implies push function. | Close also implies push function. | |||
3.9.1.5. Status | 3.9.1.5. Status | |||
Format: STATUS (local connection name) -> status data | Format: STATUS (local connection name) -> status data | |||
This is an implementation dependent user command and could be | ||||
excluded without adverse effect. Information returned would | ||||
typically come from the TCB associated with the connection. | ||||
This command returns a data block containing the following | This is an implementation-dependent user command and could be | |||
information: | excluded without adverse effect. Information returned would | |||
typically come from the TCB associated with the connection. | ||||
- local socket, | This command returns a data block containing the following | |||
information: | ||||
remote socket, | local socket, | |||
local connection name, | remote socket, | |||
receive window, | local connection name, | |||
send window, | receive window, | |||
connection state, | send window, | |||
number of buffers awaiting acknowledgment, | connection state, | |||
number of buffers pending receipt, | number of buffers awaiting acknowledgment, | |||
urgent state, | number of buffers pending receipt, | |||
DiffServ field value, | urgent state, | |||
security/compartment, | Diffserv field value, | |||
and transmission timeout. | security/compartment, and | |||
Depending on the state of the connection, or on the implementation | transmission timeout. | |||
itself, some of this information may not be available or | ||||
meaningful. If the calling process is not authorized to use this | Depending on the state of the connection, or on the implementation | |||
connection, an error is returned. This prevents unauthorized | itself, some of this information may not be available or meaningful. | |||
processes from gaining information about a connection. | If the calling process is not authorized to use this connection, an | |||
error is returned. This prevents unauthorized processes from gaining | ||||
information about a connection. | ||||
3.9.1.6. Abort | 3.9.1.6. Abort | |||
Format: ABORT (local connection name) | Format: ABORT (local connection name) | |||
This command causes all pending SENDs and RECEIVES to be aborted, | This command causes all pending SENDs and RECEIVES to be aborted, the | |||
the TCB to be removed, and a special RESET message to be sent to | TCB to be removed, and a special RST message to be sent to the remote | |||
the remote TCP peer of the connection. Depending on the | TCP peer of the connection. Depending on the implementation, users | |||
implementation, users may receive abort indications for each | may receive abort indications for each outstanding SEND or RECEIVE, | |||
outstanding SEND or RECEIVE, or may simply receive an ABORT- | or may simply receive an ABORT-acknowledgment. | |||
acknowledgment. | ||||
3.9.1.7. Flush | 3.9.1.7. Flush | |||
Some TCP implementations have included a FLUSH call, which will | Some TCP implementations have included a FLUSH call, which will empty | |||
empty the TCP send queue of any data that the user has issued SEND | the TCP send queue of any data that the user has issued SEND calls | |||
calls for but is still to the right of the current send window. | for but is still to the right of the current send window. That is, | |||
That is, it flushes as much queued send data as possible without | it flushes as much queued send data as possible without losing | |||
losing sequence number synchronization. The FLUSH call MAY be | sequence number synchronization. The FLUSH call MAY be implemented | |||
implemented (MAY-14). | (MAY-14). | |||
3.9.1.8. Asynchronous Reports | 3.9.1.8. Asynchronous Reports | |||
There MUST be a mechanism for reporting soft TCP error conditions | There MUST be a mechanism for reporting soft TCP error conditions to | |||
to the application (MUST-47). Generically, we assume this takes | the application (MUST-47). Generically, we assume this takes the | |||
the form of an application-supplied ERROR_REPORT routine that may | form of an application-supplied ERROR_REPORT routine that may be | |||
be upcalled asynchronously from the transport layer: | upcalled asynchronously from the transport layer: | |||
- ERROR_REPORT(local connection name, reason, subreason) | ERROR_REPORT(local connection name, reason, subreason) | |||
The precise encoding of the reason and subreason parameters is not | The precise encoding of the reason and subreason parameters is not | |||
specified here. However, the conditions that are reported | specified here. However, the conditions that are reported | |||
asynchronously to the application MUST include: | asynchronously to the application MUST include: | |||
- * ICMP error message arrived (see Section 3.9.2.2 for | * ICMP error message arrived (see Section 3.9.2.2 for description of | |||
description of handling each ICMP message type, since some | handling each ICMP message type since some message types need to | |||
message types need to be suppressed from generating reports to | be suppressed from generating reports to the application) | |||
the application) | ||||
- * Excessive retransmissions (see Section 3.8.3) | * Excessive retransmissions (see Section 3.8.3) | |||
- * Urgent pointer advance (see Section 3.8.5) | * Urgent pointer advance (see Section 3.8.5) | |||
However, an application program that does not want to receive such | However, an application program that does not want to receive such | |||
ERROR_REPORT calls SHOULD be able to effectively disable these | ERROR_REPORT calls SHOULD be able to effectively disable these calls | |||
calls (SHLD-20). | (SHLD-20). | |||
3.9.1.9. Set Differentiated Services Field (IPv4 TOS or IPv6 Traffic | 3.9.1.9. Set Differentiated Services Field (IPv4 TOS or IPv6 Traffic | |||
Class) | Class) | |||
The application layer MUST be able to specify the Differentiated | The application layer MUST be able to specify the Differentiated | |||
Services field for segments that are sent on a connection (MUST- | Services field for segments that are sent on a connection (MUST-48). | |||
48). The Differentiated Services field includes the 6-bit | The Differentiated Services field includes the 6-bit Differentiated | |||
Differentiated Services Code Point (DSCP) value. It is not | Services Codepoint (DSCP) value. It is not required, but the | |||
required, but the application SHOULD be able to change the | application SHOULD be able to change the Differentiated Services | |||
Differentiated Services field during the connection lifetime | field during the connection lifetime (SHLD-21). TCP implementations | |||
(SHLD-21). TCP implementations SHOULD pass the current | SHOULD pass the current Differentiated Services field value without | |||
Differentiated Services field value without change to the IP | change to the IP layer, when it sends segments on the connection | |||
layer, when it sends segments on the connection (SHLD-22). | (SHLD-22). | |||
The Differentiated Services field will be specified independently | The Differentiated Services field will be specified independently in | |||
in each direction on the connection, so that the receiver | each direction on the connection, so that the receiver application | |||
application will specify the Differentiated Services field used | will specify the Differentiated Services field used for ACK segments. | |||
for ACK segments. | ||||
TCP implementations MAY pass the most recently received | TCP implementations MAY pass the most recently received | |||
Differentiated Services field up to the application (MAY-9). | Differentiated Services field up to the application (MAY-9). | |||
3.9.2. TCP/Lower-Level Interface | 3.9.2. TCP/Lower-Level Interface | |||
The TCP endpoint calls on a lower level protocol module to actually | The TCP endpoint calls on a lower-level protocol module to actually | |||
send and receive information over a network. The two current | send and receive information over a network. The two current | |||
standard Internet Protocol (IP) versions layered below TCP are IPv4 | standard Internet Protocol (IP) versions layered below TCP are IPv4 | |||
[1] and IPv6 [13]. | [1] and IPv6 [13]. | |||
If the lower level protocol is IPv4 it provides arguments for a type | If the lower-level protocol is IPv4, it provides arguments for a type | |||
of service (used within the Differentiated Services field) and for a | of service (used within the Differentiated Services field) and for a | |||
time to live. TCP uses the following settings for these parameters: | time to live. TCP uses the following settings for these parameters: | |||
DiffServ field: The IP header value for the DiffServ field is | Diffserv field: The IP header value for the Diffserv field is given | |||
given by the user. This includes the bits of the DiffServ Code | by the user. This includes the bits of the Diffserv Codepoint | |||
Point (DSCP). | (DSCP). | |||
Time to Live (TTL): The TTL value used to send TCP segments MUST | Time to Live (TTL): The TTL value used to send TCP segments MUST be | |||
be configurable (MUST-49). | configurable (MUST-49). | |||
- Note that RFC 793 specified one minute (60 seconds) as a | * Note that RFC 793 specified one minute (60 seconds) as a | |||
constant for the TTL, because the assumed maximum segment | constant for the TTL because the assumed maximum segment | |||
lifetime was two minutes. This was intended to explicitly ask | lifetime was two minutes. This was intended to explicitly ask | |||
that a segment be destroyed if it cannot be delivered by the | that a segment be destroyed if it could not be delivered by the | |||
internet system within one minute. RFC 1122 changed this | internet system within one minute. RFC 1122 updated RFC 793 to | |||
specification to require that the TTL be configurable. | require that the TTL be configurable. | |||
- Note that the DiffServ field is permitted to change during a | * Note that the Diffserv field is permitted to change during a | |||
connection (Section 4.2.4.2 of RFC 1122). However, the | connection (Section 4.2.4.2 of RFC 1122). However, the | |||
application interface might not support this ability, and the | application interface might not support this ability, and the | |||
application does not have knowledge about individual TCP | application does not have knowledge about individual TCP | |||
segments, so this can only be done on a coarse granularity, at | segments, so this can only be done on a coarse granularity, at | |||
best. This limitation is further discussed in RFC 7657 (sec | best. This limitation is further discussed in RFC 7657 | |||
5.1, 5.3, and 6) [51]. Generally, an application SHOULD NOT | (Sections 5.1, 5.3, and 6) [50]. Generally, an application | |||
change the DiffServ field value during the course of a | SHOULD NOT change the Diffserv field value during the course of | |||
connection (SHLD-23). | a connection (SHLD-23). | |||
Any lower level protocol will have to provide the source address, | Any lower-level protocol will have to provide the source address, | |||
destination address, and protocol fields, and some way to determine | destination address, and protocol fields, and some way to determine | |||
the "TCP length", both to provide the functional equivalent service | the "TCP length", both to provide the functional equivalent service | |||
of IP and to be used in the TCP checksum. | of IP and to be used in the TCP checksum. | |||
When received options are passed up to TCP from the IP layer, a TCP | When received options are passed up to TCP from the IP layer, a TCP | |||
implementation MUST ignore options that it does not understand (MUST- | implementation MUST ignore options that it does not understand (MUST- | |||
50). | 50). | |||
A TCP implementation MAY support the Time Stamp (MAY-10) and Record | A TCP implementation MAY support the Timestamp (MAY-10) and Record | |||
Route (MAY-11) options. | Route (MAY-11) Options. | |||
3.9.2.1. Source Routing | 3.9.2.1. Source Routing | |||
If the lower level is IP (or other protocol that provides this | If the lower level is IP (or other protocol that provides this | |||
feature) and source routing is used, the interface must allow the | feature) and source routing is used, the interface must allow the | |||
route information to be communicated. This is especially important | route information to be communicated. This is especially important | |||
so that the source and destination addresses used in the TCP checksum | so that the source and destination addresses used in the TCP checksum | |||
be the originating source and ultimate destination. It is also | be the originating source and ultimate destination. It is also | |||
important to preserve the return route to answer connection requests. | important to preserve the return route to answer connection requests. | |||
An application MUST be able to specify a source route when it | An application MUST be able to specify a source route when it | |||
actively opens a TCP connection (MUST-51), and this MUST take | actively opens a TCP connection (MUST-51), and this MUST take | |||
precedence over a source route received in a datagram (MUST-52). | precedence over a source route received in a datagram (MUST-52). | |||
When a TCP connection is OPENed passively and a packet arrives with a | When a TCP connection is OPENed passively and a packet arrives with a | |||
completed IP Source Route option (containing a return route), TCP | completed IP Source Route Option (containing a return route), TCP | |||
implementations MUST save the return route and use it for all | implementations MUST save the return route and use it for all | |||
segments sent on this connection (MUST-53). If a different source | segments sent on this connection (MUST-53). If a different source | |||
route arrives in a later segment, the later definition SHOULD | route arrives in a later segment, the later definition SHOULD | |||
override the earlier one (SHLD-24). | override the earlier one (SHLD-24). | |||
3.9.2.2. ICMP Messages | 3.9.2.2. ICMP Messages | |||
TCP implementations MUST act on an ICMP error message passed up from | TCP implementations MUST act on an ICMP error message passed up from | |||
the IP layer, directing it to the connection that created the error | the IP layer, directing it to the connection that created the error | |||
(MUST-54). The necessary demultiplexing information can be found in | (MUST-54). The necessary demultiplexing information can be found in | |||
the IP header contained within the ICMP message. | the IP header contained within the ICMP message. | |||
This applies to ICMPv6 in addition to IPv4 ICMP. | This applies to ICMPv6 in addition to IPv4 ICMP. | |||
[36] contains discussion of specific ICMP and ICMPv6 messages | [35] contains discussion of specific ICMP and ICMPv6 messages | |||
classified as either "soft" or "hard" errors that may bear different | classified as either "soft" or "hard" errors that may bear different | |||
responses. Treatment for classes of ICMP messages is described | responses. Treatment for classes of ICMP messages is described | |||
below: | below: | |||
Source Quench | Source Quench | |||
TCP implementations MUST silently discard any received ICMP Source | TCP implementations MUST silently discard any received ICMP Source | |||
Quench messages (MUST-55). See [11] for discussion. | Quench messages (MUST-55). See [11] for discussion. | |||
Soft Errors | Soft Errors | |||
For IPv4 ICMP these include: Destination Unreachable -- codes 0, 1, | For IPv4 ICMP, these include: Destination Unreachable -- codes 0, | |||
5; Time Exceeded -- codes 0, 1; and Parameter Problem. | 1, 5; Time Exceeded -- codes 0, 1; and Parameter Problem. | |||
For ICMPv6 these include: Destination Unreachable -- codes 0, 3; | For ICMPv6, these include: Destination Unreachable -- codes 0, 3; | |||
Time Exceeded -- codes 0, 1; and Parameter Problem -- codes 0, 1, | Time Exceeded -- codes 0, 1; and Parameter Problem -- codes 0, 1, | |||
2. | 2. | |||
Since these Unreachable messages indicate soft error conditions, | Since these Unreachable messages indicate soft error conditions, a | |||
TCP implementations MUST NOT abort the connection (MUST-56), and it | TCP implementation MUST NOT abort the connection (MUST-56), and it | |||
SHOULD make the information available to the application (SHLD-25). | SHOULD make the information available to the application (SHLD-25). | |||
Hard Errors | Hard Errors | |||
For ICMP these include Destination Unreachable -- codes 2-4. | For ICMP these include Destination Unreachable -- codes 2-4. | |||
These are hard error conditions, so TCP implementations SHOULD | These are hard error conditions, so TCP implementations SHOULD | |||
abort the connection (SHLD-26). [36] notes that some | abort the connection (SHLD-26). [35] notes that some | |||
implementations do not abort connections when an ICMP hard error is | implementations do not abort connections when an ICMP hard error is | |||
received for a connection that is in any of the synchronized | received for a connection that is in any of the synchronized | |||
states. | states. | |||
Note that [36] section 4 describes widespread implementation behavior | Note that [35], Section 4 describes widespread implementation | |||
that treats soft errors as hard errors during connection | behavior that treats soft errors as hard errors during connection | |||
establishment. | establishment. | |||
3.9.2.3. Source Address Validation | 3.9.2.3. Source Address Validation | |||
RFC 1122 requires addresses to be validated in incoming SYN packets: | RFC 1122 requires addresses to be validated in incoming SYN packets: | |||
An incoming SYN with an invalid source address MUST be ignored | | An incoming SYN with an invalid source address MUST be ignored | |||
either by TCP or by the IP layer (MUST-63) (Section 3.2.1.3 of | | either by TCP or by the IP layer [(MUST-63)] (see | |||
[20]). | | Section 3.2.1.3). | |||
| | ||||
A TCP implementation MUST silently discard an incoming SYN segment | | A TCP implementation MUST silently discard an incoming SYN segment | |||
that is addressed to a broadcast or multicast address (MUST-57). | | that is addressed to a broadcast or multicast address [(MUST-57)]. | |||
This prevents connection state and replies from being erroneously | This prevents connection state and replies from being erroneously | |||
generated, and implementers should note that this guidance is | generated, and implementers should note that this guidance is | |||
applicable to all incoming segments, not just SYNs, as specifically | applicable to all incoming segments, not just SYNs, as specifically | |||
indicated in RFC 1122. | indicated in RFC 1122. | |||
3.10. Event Processing | 3.10. Event Processing | |||
The processing depicted in this section is an example of one possible | The processing depicted in this section is an example of one possible | |||
implementation. Other implementations may have slightly different | implementation. Other implementations may have slightly different | |||
processing sequences, but they should differ from those in this | processing sequences, but they should differ from those in this | |||
section only in detail, not in substance. | section only in detail, not in substance. | |||
The activity of the TCP endpoint can be characterized as responding | The activity of the TCP endpoint can be characterized as responding | |||
to events. The events that occur can be cast into three categories: | to events. The events that occur can be cast into three categories: | |||
user calls, arriving segments, and timeouts. This section describes | user calls, arriving segments, and timeouts. This section describes | |||
the processing the TCP endpoint does in response to each of the | the processing the TCP endpoint does in response to each of the | |||
events. In many cases the processing required depends on the state | events. In many cases, the processing required depends on the state | |||
of the connection. | of the connection. | |||
Events that occur: | Events that occur: | |||
User Calls | User Calls | |||
- OPEN | OPEN | |||
SEND | SEND | |||
RECEIVE | RECEIVE | |||
CLOSE | CLOSE | |||
ABORT | ABORT | |||
STATUS | STATUS | |||
Arriving Segments | Arriving Segments | |||
- SEGMENT ARRIVES | SEGMENT ARRIVES | |||
Timeouts | Timeouts | |||
- USER TIMEOUT | USER TIMEOUT | |||
RETRANSMISSION TIMEOUT | RETRANSMISSION TIMEOUT | |||
TIME-WAIT TIMEOUT | TIME-WAIT TIMEOUT | |||
The model of the TCP/user interface is that user commands receive an | The model of the TCP/user interface is that user commands receive an | |||
immediate return and possibly a delayed response via an event or | immediate return and possibly a delayed response via an event or | |||
pseudo interrupt. In the following descriptions, the term "signal" | pseudo-interrupt. In the following descriptions, the term "signal" | |||
means cause a delayed response. | means cause a delayed response. | |||
Error responses in this document are identified by character strings. | Error responses in this document are identified by character strings. | |||
For example, user commands referencing connections that do not exist | For example, user commands referencing connections that do not exist | |||
receive "error: connection not open". | receive "error: connection not open". | |||
Please note in the following that all arithmetic on sequence numbers, | Please note in the following that all arithmetic on sequence numbers, | |||
acknowledgment numbers, windows, et cetera, is modulo 2**32 (the size | acknowledgment numbers, windows, et cetera, is modulo 2^32 (the size | |||
of the sequence number space). Also note that "=<" means less than | of the sequence number space). Also note that "=<" means less than | |||
or equal to (modulo 2**32). | or equal to (modulo 2^32). | |||
A natural way to think about processing incoming segments is to | A natural way to think about processing incoming segments is to | |||
imagine that they are first tested for proper sequence number (i.e., | imagine that they are first tested for proper sequence number (i.e., | |||
that their contents lie in the range of the expected "receive window" | that their contents lie in the range of the expected "receive window" | |||
in the sequence number space) and then that they are generally queued | in the sequence number space) and then that they are generally queued | |||
and processed in sequence number order. | and processed in sequence number order. | |||
When a segment overlaps other already received segments we | When a segment overlaps other already received segments, we | |||
reconstruct the segment to contain just the new data, and adjust the | reconstruct the segment to contain just the new data and adjust the | |||
header fields to be consistent. | header fields to be consistent. | |||
Note that if no state change is mentioned the TCP connection stays in | Note that if no state change is mentioned, the TCP connection stays | |||
the same state. | in the same state. | |||
3.10.1. OPEN Call | 3.10.1. OPEN Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- Create a new transmission control block (TCB) to hold | * Create a new transmission control block (TCB) to hold connection | |||
connection state information. Fill in local socket identifier, | state information. Fill in local socket identifier, remote | |||
remote socket, DiffServ field, security/compartment, and user | socket, Diffserv field, security/compartment, and user timeout | |||
timeout information. Note that some parts of the remote socket | information. Note that some parts of the remote socket may be | |||
may be unspecified in a passive OPEN and are to be filled in by | unspecified in a passive OPEN and are to be filled in by the | |||
the parameters of the incoming SYN segment. Verify the | parameters of the incoming SYN segment. Verify the security and | |||
security and DiffServ value requested are allowed for this | Diffserv value requested are allowed for this user, if not, return | |||
user, if not return "error: DiffServ value not allowed" or | "error: Diffserv value not allowed" or "error: security/ | |||
"error: security/compartment not allowed." If passive enter | compartment not allowed". If passive, enter the LISTEN state and | |||
the LISTEN state and return. If active and the remote socket | return. If active and the remote socket is unspecified, return | |||
is unspecified, return "error: remote socket unspecified"; if | "error: remote socket unspecified"; if active and the remote | |||
active and the remote socket is specified, issue a SYN segment. | socket is specified, issue a SYN segment. An initial send | |||
An initial send sequence number (ISS) is selected. A SYN | sequence number (ISS) is selected. A SYN segment of the form | |||
segment of the form <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to | <SEQ=ISS><CTL=SYN> is sent. Set SND.UNA to ISS, SND.NXT to ISS+1, | |||
ISS, SND.NXT to ISS+1, enter SYN-SENT state, and return. | enter SYN-SENT state, and return. | |||
- If the caller does not have access to the local socket | * If the caller does not have access to the local socket specified, | |||
specified, return "error: connection illegal for this process". | return "error: connection illegal for this process". If there is | |||
If there is no room to create a new connection, return "error: | no room to create a new connection, return "error: insufficient | |||
insufficient resources". | resources". | |||
LISTEN STATE | LISTEN STATE | |||
- If the OPEN call is active and the remote socket is specified, | ||||
then change the connection from passive to active, select an | ||||
ISS. Send a SYN segment, set SND.UNA to ISS, SND.NXT to ISS+1. | ||||
Enter SYN-SENT state. Data associated with SEND may be sent | ||||
with SYN segment or queued for transmission after entering | ||||
ESTABLISHED state. The urgent bit if requested in the command | ||||
must be sent with the data segments sent as a result of this | ||||
command. If there is no room to queue the request, respond | ||||
with "error: insufficient resources". If the remote socket was | ||||
not specified, then return "error: remote socket unspecified". | ||||
SYN-SENT STATE | * If the OPEN call is active and the remote socket is specified, | |||
then change the connection from passive to active, select an ISS. | ||||
Send a SYN segment, set SND.UNA to ISS, SND.NXT to ISS+1. Enter | ||||
SYN-SENT state. Data associated with SEND may be sent with SYN | ||||
segment or queued for transmission after entering ESTABLISHED | ||||
state. The urgent bit if requested in the command must be sent | ||||
with the data segments sent as a result of this command. If there | ||||
is no room to queue the request, respond with "error: insufficient | ||||
resources". If the remote socket was not specified, then return | ||||
"error: remote socket unspecified". | ||||
SYN-RECEIVED STATE | SYN-SENT STATE | |||
ESTABLISHED STATE | SYN-RECEIVED STATE | |||
FIN-WAIT-1 STATE | ESTABLISHED STATE | |||
FIN-WAIT-2 STATE | FIN-WAIT-1 STATE | |||
CLOSE-WAIT STATE | FIN-WAIT-2 STATE | |||
CLOSING STATE | CLOSE-WAIT STATE | |||
LAST-ACK STATE | CLOSING STATE | |||
TIME-WAIT STATE | LAST-ACK STATE | |||
- Return "error: connection already exists". | TIME-WAIT STATE | |||
* Return "error: connection already exists". | ||||
3.10.2. SEND Call | 3.10.2. SEND Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- If the user does not have access to such a connection, then | * If the user does not have access to such a connection, then return | |||
return "error: connection illegal for this process". | "error: connection illegal for this process". | |||
- Otherwise, return "error: connection does not exist". | * Otherwise, return "error: connection does not exist". | |||
LISTEN STATE | LISTEN STATE | |||
- If the remote socket is specified, then change the connection | * If the remote socket is specified, then change the connection from | |||
from passive to active, select an ISS. Send a SYN segment, set | passive to active, select an ISS. Send a SYN segment, set SND.UNA | |||
SND.UNA to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data | to ISS, SND.NXT to ISS+1. Enter SYN-SENT state. Data associated | |||
associated with SEND may be sent with SYN segment or queued for | with SEND may be sent with SYN segment or queued for transmission | |||
transmission after entering ESTABLISHED state. The urgent bit | after entering ESTABLISHED state. The urgent bit if requested in | |||
if requested in the command must be sent with the data segments | the command must be sent with the data segments sent as a result | |||
sent as a result of this command. If there is no room to queue | of this command. If there is no room to queue the request, | |||
the request, respond with "error: insufficient resources". If | respond with "error: insufficient resources". If the remote | |||
the remote socket was not specified, then return "error: remote | socket was not specified, then return "error: remote socket | |||
socket unspecified". | unspecified". | |||
SYN-SENT STATE | SYN-SENT STATE | |||
SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
- Queue the data for transmission after entering ESTABLISHED | * Queue the data for transmission after entering ESTABLISHED state. | |||
state. If no space to queue, respond with "error: insufficient | If no space to queue, respond with "error: insufficient | |||
resources". | resources". | |||
ESTABLISHED STATE | ESTABLISHED STATE | |||
CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
- Segmentize the buffer and send it with a piggybacked | * Segmentize the buffer and send it with a piggybacked | |||
acknowledgment (acknowledgment value = RCV.NXT). If there is | acknowledgment (acknowledgment value = RCV.NXT). If there is | |||
insufficient space to remember this buffer, simply return | insufficient space to remember this buffer, simply return "error: | |||
"error: insufficient resources". | insufficient resources". | |||
- If the urgent flag is set, then SND.UP <- SND.NXT and set the | * If the URGENT flag is set, then SND.UP <- SND.NXT and set the | |||
urgent pointer in the outgoing segments. | urgent pointer in the outgoing segments. | |||
FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
CLOSING STATE | CLOSING STATE | |||
LAST-ACK STATE | LAST-ACK STATE | |||
TIME-WAIT STATE | TIME-WAIT STATE | |||
- Return "error: connection closing" and do not service request. | * Return "error: connection closing" and do not service request. | |||
3.10.3. RECEIVE Call | 3.10.3. RECEIVE Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- If the user does not have access to such a connection, return | * If the user does not have access to such a connection, return | |||
"error: connection illegal for this process". | "error: connection illegal for this process". | |||
- Otherwise return "error: connection does not exist". | * Otherwise, return "error: connection does not exist". | |||
LISTEN STATE | LISTEN STATE | |||
SYN-SENT STATE | ||||
SYN-RECEIVED STATE | SYN-SENT STATE | |||
- Queue for processing after entering ESTABLISHED state. If | SYN-RECEIVED STATE | |||
there is no room to queue this request, respond with "error: | ||||
insufficient resources". | ||||
ESTABLISHED STATE | * Queue for processing after entering ESTABLISHED state. If there | |||
is no room to queue this request, respond with "error: | ||||
insufficient resources". | ||||
FIN-WAIT-1 STATE | ESTABLISHED STATE | |||
FIN-WAIT-2 STATE | FIN-WAIT-1 STATE | |||
- If insufficient incoming segments are queued to satisfy the | FIN-WAIT-2 STATE | |||
request, queue the request. If there is no queue space to | ||||
remember the RECEIVE, respond with "error: insufficient | ||||
resources". | ||||
- Reassemble queued incoming segments into receive buffer and | * If insufficient incoming segments are queued to satisfy the | |||
return to user. Mark "push seen" (PUSH) if this is the case. | request, queue the request. If there is no queue space to | |||
remember the RECEIVE, respond with "error: insufficient | ||||
resources". | ||||
- If RCV.UP is in advance of the data currently being passed to | * Reassemble queued incoming segments into receive buffer and return | |||
the user notify the user of the presence of urgent data. | to user. Mark "push seen" (PUSH) if this is the case. | |||
- When the TCP endpoint takes responsibility for delivering data | * If RCV.UP is in advance of the data currently being passed to the | |||
to the user that fact must be communicated to the sender via an | user, notify the user of the presence of urgent data. | |||
acknowledgment. The formation of such an acknowledgment is | ||||
described below in the discussion of processing an incoming | ||||
segment. | ||||
CLOSE-WAIT STATE | * When the TCP endpoint takes responsibility for delivering data to | |||
the user, that fact must be communicated to the sender via an | ||||
acknowledgment. The formation of such an acknowledgment is | ||||
described below in the discussion of processing an incoming | ||||
segment. | ||||
- Since the remote side has already sent FIN, RECEIVEs must be | CLOSE-WAIT STATE | |||
satisfied by data already on hand, but not yet delivered to the | ||||
user. If no text is awaiting delivery, the RECEIVE will get an | ||||
"error: connection closing" response. Otherwise, any remaining | ||||
data can be used to satisfy the RECEIVE. | ||||
CLOSING STATE | * Since the remote side has already sent FIN, RECEIVEs must be | |||
satisfied by data already on hand, but not yet delivered to the | ||||
user. If no text is awaiting delivery, the RECEIVE will get an | ||||
"error: connection closing" response. Otherwise, any remaining | ||||
data can be used to satisfy the RECEIVE. | ||||
LAST-ACK STATE | CLOSING STATE | |||
TIME-WAIT STATE | LAST-ACK STATE | |||
- Return "error: connection closing". | TIME-WAIT STATE | |||
* Return "error: connection closing". | ||||
3.10.4. CLOSE Call | 3.10.4. CLOSE Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- If the user does not have access to such a connection, return | * If the user does not have access to such a connection, return | |||
"error: connection illegal for this process". | "error: connection illegal for this process". | |||
- Otherwise, return "error: connection does not exist". | * Otherwise, return "error: connection does not exist". | |||
LISTEN STATE | LISTEN STATE | |||
- Any outstanding RECEIVEs are returned with "error: closing" | * Any outstanding RECEIVEs are returned with "error: closing" | |||
responses. Delete TCB, enter CLOSED state, and return. | responses. Delete TCB, enter CLOSED state, and return. | |||
SYN-SENT STATE | SYN-SENT STATE | |||
- Delete the TCB and return "error: closing" responses to any | * Delete the TCB and return "error: closing" responses to any queued | |||
queued SENDs, or RECEIVEs. | SENDs, or RECEIVEs. | |||
SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
- If no SENDs have been issued and there is no pending data to | * If no SENDs have been issued and there is no pending data to send, | |||
send, then form a FIN segment and send it, and enter FIN-WAIT-1 | then form a FIN segment and send it, and enter FIN-WAIT-1 state; | |||
state; otherwise queue for processing after entering | otherwise, queue for processing after entering ESTABLISHED state. | |||
ESTABLISHED state. | ||||
ESTABLISHED STATE | ESTABLISHED STATE | |||
- Queue this until all preceding SENDs have been segmentized, | * Queue this until all preceding SENDs have been segmentized, then | |||
then form a FIN segment and send it. In any case, enter FIN- | form a FIN segment and send it. In any case, enter FIN-WAIT-1 | |||
WAIT-1 state. | state. | |||
FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
- Strictly speaking, this is an error and should receive an | * Strictly speaking, this is an error and should receive an "error: | |||
"error: connection closing" response. An "ok" response would | connection closing" response. An "ok" response would be | |||
be acceptable, too, as long as a second FIN is not emitted (the | acceptable, too, as long as a second FIN is not emitted (the first | |||
first FIN may be retransmitted though). | FIN may be retransmitted, though). | |||
CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
- Queue this request until all preceding SENDs have been | * Queue this request until all preceding SENDs have been | |||
segmentized; then send a FIN segment, enter LAST-ACK state. | segmentized; then send a FIN segment, enter LAST-ACK state. | |||
CLOSING STATE | CLOSING STATE | |||
LAST-ACK STATE | ||||
TIME-WAIT STATE | LAST-ACK STATE | |||
- Respond with "error: connection closing". | TIME-WAIT STATE | |||
* Respond with "error: connection closing". | ||||
3.10.5. ABORT Call | 3.10.5. ABORT Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- If the user should not have access to such a connection, return | * If the user should not have access to such a connection, return | |||
"error: connection illegal for this process". | "error: connection illegal for this process". | |||
- Otherwise return "error: connection does not exist". | * Otherwise, return "error: connection does not exist". | |||
LISTEN STATE | LISTEN STATE | |||
- Any outstanding RECEIVEs should be returned with "error: | * Any outstanding RECEIVEs should be returned with "error: | |||
connection reset" responses. Delete TCB, enter CLOSED state, | connection reset" responses. Delete TCB, enter CLOSED state, and | |||
and return. | return. | |||
SYN-SENT STATE | SYN-SENT STATE | |||
- All queued SENDs and RECEIVEs should be given "connection | * All queued SENDs and RECEIVEs should be given "connection reset" | |||
reset" notification, delete the TCB, enter CLOSED state, and | notification. Delete the TCB, enter CLOSED state, and return. | |||
return. | ||||
SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
ESTABLISHED STATE | ESTABLISHED STATE | |||
FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
- Send a reset segment: | * Send a reset segment: | |||
o <SEQ=SND.NXT><CTL=RST> | <SEQ=SND.NXT><CTL=RST> | |||
- All queued SENDs and RECEIVEs should be given "connection | * All queued SENDs and RECEIVEs should be given "connection reset" | |||
reset" notification; all segments queued for transmission | notification; all segments queued for transmission (except for the | |||
(except for the RST formed above) or retransmission should be | RST formed above) or retransmission should be flushed. Delete the | |||
flushed, delete the TCB, enter CLOSED state, and return. | TCB, enter CLOSED state, and return. | |||
CLOSING STATE LAST-ACK STATE TIME-WAIT STATE | CLOSING STATE | |||
- Respond with "ok" and delete the TCB, enter CLOSED state, and | ||||
return. | LAST-ACK STATE | |||
TIME-WAIT STATE | ||||
* Respond with "ok" and delete the TCB, enter CLOSED state, and | ||||
return. | ||||
3.10.6. STATUS Call | 3.10.6. STATUS Call | |||
CLOSED STATE (i.e., TCB does not exist) | CLOSED STATE (i.e., TCB does not exist) | |||
- If the user should not have access to such a connection, return | * If the user should not have access to such a connection, return | |||
"error: connection illegal for this process". | "error: connection illegal for this process". | |||
- Otherwise return "error: connection does not exist". | * Otherwise, return "error: connection does not exist". | |||
LISTEN STATE | LISTEN STATE | |||
- Return "state = LISTEN", and the TCB pointer. | * Return "state = LISTEN" and the TCB pointer. | |||
SYN-SENT STATE | SYN-SENT STATE | |||
- Return "state = SYN-SENT", and the TCB pointer. | * Return "state = SYN-SENT" and the TCB pointer. | |||
SYN-RECEIVED STATE | SYN-RECEIVED STATE | |||
- Return "state = SYN-RECEIVED", and the TCB pointer. | * Return "state = SYN-RECEIVED" and the TCB pointer. | |||
ESTABLISHED STATE | ESTABLISHED STATE | |||
- Return "state = ESTABLISHED", and the TCB pointer. | * Return "state = ESTABLISHED" and the TCB pointer. | |||
FIN-WAIT-1 STATE | FIN-WAIT-1 STATE | |||
- Return "state = FIN-WAIT-1", and the TCB pointer. | * Return "state = FIN-WAIT-1" and the TCB pointer. | |||
FIN-WAIT-2 STATE | FIN-WAIT-2 STATE | |||
- Return "state = FIN-WAIT-2", and the TCB pointer. | * Return "state = FIN-WAIT-2" and the TCB pointer. | |||
CLOSE-WAIT STATE | CLOSE-WAIT STATE | |||
- Return "state = CLOSE-WAIT", and the TCB pointer. | * Return "state = CLOSE-WAIT" and the TCB pointer. | |||
CLOSING STATE | CLOSING STATE | |||
- Return "state = CLOSING", and the TCB pointer. | * Return "state = CLOSING" and the TCB pointer. | |||
LAST-ACK STATE | LAST-ACK STATE | |||
- Return "state = LAST-ACK", and the TCB pointer. | * Return "state = LAST-ACK" and the TCB pointer. | |||
TIME-WAIT STATE | TIME-WAIT STATE | |||
- Return "state = TIME-WAIT", and the TCB pointer. | * Return "state = TIME-WAIT" and the TCB pointer. | |||
3.10.7. SEGMENT ARRIVES | 3.10.7. SEGMENT ARRIVES | |||
3.10.7.1. CLOSED State | 3.10.7.1. CLOSED STATE | |||
If the state is CLOSED (i.e., TCB does not exist) then | If the state is CLOSED (i.e., TCB does not exist), then | |||
all data in the incoming segment is discarded. An incoming | all data in the incoming segment is discarded. An incoming | |||
segment containing a RST is discarded. An incoming segment not | segment containing a RST is discarded. An incoming segment not | |||
containing a RST causes a RST to be sent in response. The | containing a RST causes a RST to be sent in response. The | |||
acknowledgment and sequence field values are selected to make the | acknowledgment and sequence field values are selected to make the | |||
reset sequence acceptable to the TCP endpoint that sent the | reset sequence acceptable to the TCP endpoint that sent the | |||
offending segment. | offending segment. | |||
If the ACK bit is off, sequence number zero is used, | If the ACK bit is off, sequence number zero is used, | |||
- <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | |||
If the ACK bit is on, | If the ACK bit is on, | |||
- <SEQ=SEG.ACK><CTL=RST> | <SEQ=SEG.ACK><CTL=RST> | |||
Return. | Return. | |||
3.10.7.2. LISTEN State | 3.10.7.2. LISTEN STATE | |||
If the state is LISTEN then | If the state is LISTEN, then | |||
first check for an RST | First, check for a RST: | |||
- An incoming RST segment could not be valid, since it could not | - An incoming RST segment could not be valid since it could not | |||
have been sent in response to anything sent by this incarnation | have been sent in response to anything sent by this incarnation | |||
of the connection. An incoming RST should be ignored. Return. | of the connection. An incoming RST should be ignored. Return. | |||
second check for an ACK | Second, check for an ACK: | |||
- Any acknowledgment is bad if it arrives on a connection still | - Any acknowledgment is bad if it arrives on a connection still | |||
in the LISTEN state. An acceptable reset segment should be | in the LISTEN state. An acceptable reset segment should be | |||
formed for any arriving ACK-bearing segment. The RST should be | formed for any arriving ACK-bearing segment. The RST should be | |||
formatted as follows: | formatted as follows: | |||
o <SEQ=SEG.ACK><CTL=RST> | <SEQ=SEG.ACK><CTL=RST> | |||
- Return. | - Return. | |||
third check for a SYN | Third, check for a SYN: | |||
- If the SYN bit is set, check the security. If the security/ | - If the SYN bit is set, check the security. If the security/ | |||
compartment on the incoming segment does not exactly match the | compartment on the incoming segment does not exactly match the | |||
security/compartment in the TCB then send a reset and return. | security/compartment in the TCB, then send a reset and return. | |||
o <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | |||
- Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ and any other | - Set RCV.NXT to SEG.SEQ+1, IRS is set to SEG.SEQ, and any other | |||
control or text should be queued for processing later. ISS | control or text should be queued for processing later. ISS | |||
should be selected and a SYN segment sent of the form: | should be selected and a SYN segment sent of the form: | |||
o <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK> | <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK> | |||
- SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection | - SND.NXT is set to ISS+1 and SND.UNA to ISS. The connection | |||
state should be changed to SYN-RECEIVED. Note that any other | state should be changed to SYN-RECEIVED. Note that any other | |||
incoming control or data (combined with SYN) will be processed | incoming control or data (combined with SYN) will be processed | |||
in the SYN-RECEIVED state, but processing of SYN and ACK should | in the SYN-RECEIVED state, but processing of SYN and ACK should | |||
not be repeated. If the listen was not fully specified (i.e., | not be repeated. If the listen was not fully specified (i.e., | |||
the remote socket was not fully specified), then the | the remote socket was not fully specified), then the | |||
unspecified fields should be filled in now. | unspecified fields should be filled in now. | |||
fourth other data or control | Fourth, other data or control: | |||
- This should not be reached. Drop the segment and return. Any | - This should not be reached. Drop the segment and return. Any | |||
other control or data-bearing segment (not containing SYN) must | other control or data-bearing segment (not containing SYN) must | |||
have an ACK and thus would have been discarded by the ACK | have an ACK and thus would have been discarded by the ACK | |||
processing in the second step, unless it was first discarded by | processing in the second step, unless it was first discarded by | |||
RST checking in the first step. | RST checking in the first step. | |||
3.10.7.3. SYN-SENT State | 3.10.7.3. SYN-SENT STATE | |||
If the state is SYN-SENT then | If the state is SYN-SENT, then | |||
first check the ACK bit | First, check the ACK bit: | |||
- If the ACK bit is set | - If the ACK bit is set, | |||
o If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset | o If SEG.ACK =< ISS or SEG.ACK > SND.NXT, send a reset (unless | |||
(unless the RST bit is set, if so drop the segment and | the RST bit is set, if so drop the segment and return) | |||
return) | ||||
+ <SEQ=SEG.ACK><CTL=RST> | <SEQ=SEG.ACK><CTL=RST> | |||
o and discard the segment. Return. | o and discard the segment. Return. | |||
o If SND.UNA < SEG.ACK =< SND.NXT then the ACK is acceptable. | o If SND.UNA < SEG.ACK =< SND.NXT, then the ACK is acceptable. | |||
Some deployed TCP code has used the check SEG.ACK == SND.NXT | Some deployed TCP code has used the check SEG.ACK == SND.NXT | |||
(using "==" rather than "=<", but this is not appropriate | (using "==" rather than "=<"), but this is not appropriate | |||
when the stack is capable of sending data on the SYN, | when the stack is capable of sending data on the SYN because | |||
because the TCP peer may not accept and acknowledge all of | the TCP peer may not accept and acknowledge all of the data | |||
the data on the SYN. | on the SYN. | |||
second check the RST bit | Second, check the RST bit: | |||
- If the RST bit is set | - If the RST bit is set, | |||
o A potential blind reset attack is described in RFC 5961 [9]. | o A potential blind reset attack is described in RFC 5961 [9]. | |||
The mitigation described in that document has specific | The mitigation described in that document has specific | |||
applicability explained therein, and is not a substitute for | applicability explained therein, and is not a substitute for | |||
cryptographic protection (e.g. IPsec or TCP-AO). A TCP | cryptographic protection (e.g., IPsec or TCP-AO). A TCP | |||
implementation that supports the RFC 5961 mitigation SHOULD | implementation that supports the mitigation described in RFC | |||
first check that the sequence number exactly matches RCV.NXT | 5961 SHOULD first check that the sequence number exactly | |||
prior to executing the action in the next paragraph. | matches RCV.NXT prior to executing the action in the next | |||
paragraph. | ||||
o If the ACK was acceptable then signal the user "error: | o If the ACK was acceptable, then signal to the user "error: | |||
connection reset", drop the segment, enter CLOSED state, | connection reset", drop the segment, enter CLOSED state, | |||
delete TCB, and return. Otherwise (no ACK), drop the | delete TCB, and return. Otherwise (no ACK), drop the | |||
segment and return. | segment and return. | |||
third check the security | Third, check the security: | |||
- If the security/compartment in the segment does not exactly | - If the security/compartment in the segment does not exactly | |||
match the security/compartment in the TCB, send a reset | match the security/compartment in the TCB, send a reset: | |||
o If there is an ACK | o If there is an ACK, | |||
+ <SEQ=SEG.ACK><CTL=RST> | <SEQ=SEG.ACK><CTL=RST> | |||
o Otherwise | o Otherwise, | |||
+ <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> | |||
- If a reset was sent, discard the segment and return. | - If a reset was sent, discard the segment and return. | |||
fourth check the SYN bit | Fourth, check the SYN bit: | |||
- This step should be reached only if the ACK is ok, or there is | - This step should be reached only if the ACK is ok, or there is | |||
no ACK, and the segment did not contain a RST. | no ACK, and the segment did not contain a RST. | |||
- If the SYN bit is on and the security/compartment is acceptable | - If the SYN bit is on and the security/compartment is | |||
then, RCV.NXT is set to SEG.SEQ+1, IRS is set to SEG.SEQ. | acceptable, then RCV.NXT is set to SEG.SEQ+1, IRS is set to | |||
SND.UNA should be advanced to equal SEG.ACK (if there is an | SEG.SEQ. SND.UNA should be advanced to equal SEG.ACK (if there | |||
ACK), and any segments on the retransmission queue that are | is an ACK), and any segments on the retransmission queue that | |||
thereby acknowledged should be removed. | are thereby acknowledged should be removed. | |||
- If SND.UNA > ISS (our SYN has been ACKed), change the | - If SND.UNA > ISS (our SYN has been ACKed), change the | |||
connection state to ESTABLISHED, form an ACK segment | connection state to ESTABLISHED, form an ACK segment | |||
o <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
- and send it. Data or controls that were queued for | - and send it. Data or controls that were queued for | |||
transmission MAY be included. Some TCP implementations | transmission MAY be included. Some TCP implementations | |||
suppress sending this segment when the received segment | suppress sending this segment when the received segment | |||
contains data that will anyways generate an acknowledgement in | contains data that will anyways generate an acknowledgment in | |||
the later processing steps, saving this extra acknowledgement | the later processing steps, saving this extra acknowledgment of | |||
of the SYN from being sent. If there are other controls or | the SYN from being sent. If there are other controls or text | |||
text in the segment then continue processing at the sixth step | in the segment, then continue processing at the sixth step | |||
under Section 3.10.7.4 where the URG bit is checked, otherwise | under Section 3.10.7.4 where the URG bit is checked; otherwise, | |||
return. | return. | |||
- Otherwise enter SYN-RECEIVED, form a SYN,ACK segment | - Otherwise, enter SYN-RECEIVED, form a SYN,ACK segment | |||
o <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK> | <SEQ=ISS><ACK=RCV.NXT><CTL=SYN,ACK> | |||
- and send it. Set the variables: | - and send it. Set the variables: | |||
o SND.WND <- SEG.WND | SND.WND <- SEG.WND | |||
SND.WL1 <- SEG.SEQ | SND.WL1 <- SEG.SEQ | |||
SND.WL2 <- SEG.ACK | SND.WL2 <- SEG.ACK | |||
If there are other controls or text in the segment, queue them | If there are other controls or text in the segment, queue them | |||
for processing after the ESTABLISHED state has been reached, | for processing after the ESTABLISHED state has been reached, | |||
return. | return. | |||
- Note that it is legal to send and receive application data on | - Note that it is legal to send and receive application data on | |||
SYN segments (this is the "text in the segment" mentioned | SYN segments (this is the "text in the segment" mentioned | |||
above. There has been significant misinformation and | above). There has been significant misinformation and | |||
misunderstanding of this topic historically. Some firewalls | misunderstanding of this topic historically. Some firewalls | |||
and security devices consider this suspicious. However, the | and security devices consider this suspicious. However, the | |||
capability was used in T/TCP [22] and is used in TCP Fast Open | capability was used in T/TCP [21] and is used in TCP Fast Open | |||
(TFO) [49], so is important for implementations and network | (TFO) [48], so is important for implementations and network | |||
devices to permit. | devices to permit. | |||
fifth, if neither of the SYN or RST bits is set then drop the | Fifth, if neither of the SYN or RST bits is set, then drop the | |||
segment and return. | segment and return. | |||
3.10.7.4. Other States | 3.10.7.4. Other States | |||
Otherwise, | Otherwise, | |||
first check sequence number | First, check sequence number: | |||
- SYN-RECEIVED STATE | ||||
ESTABLISHED STATE | ||||
FIN-WAIT-1 STATE | - SYN-RECEIVED STATE | |||
FIN-WAIT-2 STATE | - ESTABLISHED STATE | |||
CLOSE-WAIT STATE | - FIN-WAIT-1 STATE | |||
CLOSING STATE | - FIN-WAIT-2 STATE | |||
LAST-ACK STATE | - CLOSE-WAIT STATE | |||
TIME-WAIT STATE | - CLOSING STATE | |||
o Segments are processed in sequence. Initial tests on | - LAST-ACK STATE | |||
arrival are used to discard old duplicates, but further | ||||
processing is done in SEG.SEQ order. If a segment's | ||||
contents straddle the boundary between old and new, only the | ||||
new parts are processed. | ||||
o In general, the processing of received segments MUST be | - TIME-WAIT STATE | |||
implemented to aggregate ACK segments whenever possible | ||||
(MUST-58). For example, if the TCP endpoint is processing a | ||||
series of queued segments, it MUST process them all before | ||||
sending any ACK segments (MUST-59). | ||||
o There are four cases for the acceptability test for an | o Segments are processed in sequence. Initial tests on | |||
incoming segment: | arrival are used to discard old duplicates, but further | |||
processing is done in SEG.SEQ order. If a segment's | ||||
contents straddle the boundary between old and new, only the | ||||
new parts are processed. | ||||
Segment Receive Test | o In general, the processing of received segments MUST be | |||
Length Window | implemented to aggregate ACK segments whenever possible | |||
------- ------- ------------------------------------------- | (MUST-58). For example, if the TCP endpoint is processing a | |||
series of queued segments, it MUST process them all before | ||||
sending any ACK segments (MUST-59). | ||||
0 0 SEG.SEQ = RCV.NXT | o There are four cases for the acceptability test for an | |||
incoming segment: | ||||
0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | +=========+=========+======================================+ | |||
| Segment | Receive | Test | | ||||
| Length | Window | | | ||||
+=========+=========+======================================+ | ||||
| 0 | 0 | SEG.SEQ = RCV.NXT | | ||||
+---------+---------+--------------------------------------+ | ||||
| 0 | >0 | RCV.NXT =< SEG.SEQ < | | ||||
| | | RCV.NXT+RCV.WND | | ||||
+---------+---------+--------------------------------------+ | ||||
| >0 | 0 | not acceptable | | ||||
+---------+---------+--------------------------------------+ | ||||
| >0 | >0 | RCV.NXT =< SEG.SEQ < | | ||||
| | | RCV.NXT+RCV.WND | | ||||
| | | | | ||||
| | | or | | ||||
| | | | | ||||
| | | RCV.NXT =< SEG.SEQ+SEG.LEN-1 | | ||||
| | | < RCV.NXT+RCV.WND | | ||||
+---------+---------+--------------------------------------+ | ||||
>0 0 not acceptable | Table 6: Segment Acceptability Tests | |||
>0 >0 RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND | o In implementing sequence number validation as described | |||
or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND | here, please note Appendix A.2. | |||
o In implementing sequence number validation as described | o If the RCV.WND is zero, no segments will be acceptable, but | |||
here, please note Appendix A.2. | special allowance should be made to accept valid ACKs, URGs, | |||
and RSTs. | ||||
o If the RCV.WND is zero, no segments will be acceptable, but | o If an incoming segment is not acceptable, an acknowledgment | |||
special allowance should be made to accept valid ACKs, URGs | should be sent in reply (unless the RST bit is set, if so | |||
and RSTs. | drop the segment and return): | |||
o If an incoming segment is not acceptable, an acknowledgment | <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
should be sent in reply (unless the RST bit is set, if so | ||||
drop the segment and return): | ||||
+ <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | o After sending the acknowledgment, drop the unacceptable | |||
segment and return. | ||||
o After sending the acknowledgment, drop the unacceptable | o Note that for the TIME-WAIT state, there is an improved | |||
segment and return. | algorithm described in [40] for handling incoming SYN | |||
segments that utilizes timestamps rather than relying on the | ||||
sequence number check described here. When the improved | ||||
algorithm is implemented, the logic above is not applicable | ||||
for incoming SYN segments with Timestamp Options, received | ||||
on a connection in the TIME-WAIT state. | ||||
o Note that for the TIME-WAIT state, there is an improved | o In the following it is assumed that the segment is the | |||
algorithm described in [41] for handling incoming SYN | idealized segment that begins at RCV.NXT and does not exceed | |||
segments, that utilizes timestamps rather than relying on | the window. One could tailor actual segments to fit this | |||
the sequence number check described here. When the improved | assumption by trimming off any portions that lie outside the | |||
algorithm is implemented, the logic above is not applicable | window (including SYN and FIN) and only processing further | |||
for incoming SYN segments with timestamp options, received | if the segment then begins at RCV.NXT. Segments with higher | |||
on a connection in the TIME-WAIT state. | beginning sequence numbers SHOULD be held for later | |||
processing (SHLD-31). | ||||
o In the following it is assumed that the segment is the | Second, check the RST bit: | |||
idealized segment that begins at RCV.NXT and does not exceed | ||||
the window. One could tailor actual segments to fit this | ||||
assumption by trimming off any portions that lie outside the | ||||
window (including SYN and FIN), and only processing further | ||||
if the segment then begins at RCV.NXT. Segments with higher | ||||
beginning sequence numbers SHOULD be held for later | ||||
processing (SHLD-31). | ||||
- second check the RST bit, | - RFC 5961 [9], Section 3 describes a potential blind reset | |||
o RFC 5961 [9] section 3 describes a potential blind reset | attack and optional mitigation approach. This does not provide | |||
attack and optional mitigation approach. This does not | a cryptographic protection (e.g., as in IPsec or TCP-AO) but | |||
provide a cryptographic protection (e.g. as in IPsec or TCP- | can be applicable in situations described in RFC 5961. For | |||
AO), but can be applicable in situations described in RFC | stacks implementing the protection described in RFC 5961, the | |||
5961. For stacks implementing the RFC 5961 protection, the | three checks below apply; otherwise, processing for these | |||
three checks below apply, otherwise processing for these | ||||
states is indicated further below. | states is indicated further below. | |||
+ 1) If the RST bit is set and the sequence number is | 1) If the RST bit is set and the sequence number is outside | |||
outside the current receive window, silently drop the | the current receive window, silently drop the segment. | |||
segment. | ||||
+ 2) If the RST bit is set and the sequence number exactly | 2) If the RST bit is set and the sequence number exactly | |||
matches the next expected sequence number (RCV.NXT), then | matches the next expected sequence number (RCV.NXT), then | |||
TCP endpoints MUST reset the connection in the manner | TCP endpoints MUST reset the connection in the manner | |||
prescribed below according to the connection state. | prescribed below according to the connection state. | |||
+ 3) If the RST bit is set and the sequence number does not | 3) If the RST bit is set and the sequence number does not | |||
exactly match the next expected sequence value, yet is | exactly match the next expected sequence value, yet is | |||
within the current receive window, TCP endpoints MUST | within the current receive window, TCP endpoints MUST send | |||
send an acknowledgement (challenge ACK): | an acknowledgment (challenge ACK): | |||
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
After sending the challenge ACK, TCP endpoints MUST drop | After sending the challenge ACK, TCP endpoints MUST drop | |||
the unacceptable segment and stop processing the incoming | the unacceptable segment and stop processing the incoming | |||
packet further. Note that RFC 5961 and Errata ID 4772 | packet further. Note that RFC 5961 and Errata ID 4772 [99] | |||
contain additional considerations for ACK throttling in | contain additional considerations for ACK throttling in an | |||
an implementation. | implementation. | |||
o SYN-RECEIVED STATE | - SYN-RECEIVED STATE | |||
+ If the RST bit is set | o If the RST bit is set, | |||
* If this connection was initiated with a passive OPEN | + If this connection was initiated with a passive OPEN | |||
(i.e., came from the LISTEN state), then return this | (i.e., came from the LISTEN state), then return this | |||
connection to LISTEN state and return. The user need | connection to LISTEN state and return. The user need not | |||
not be informed. If this connection was initiated | be informed. If this connection was initiated with an | |||
with an active OPEN (i.e., came from SYN-SENT state) | active OPEN (i.e., came from SYN-SENT state), then the | |||
then the connection was refused, signal the user | connection was refused; signal the user "connection | |||
"connection refused". In either case, the | refused". In either case, the retransmission queue | |||
retransmission queue should be flushed. And in the | should be flushed. And in the active OPEN case, enter | |||
active OPEN case, enter the CLOSED state and delete | the CLOSED state and delete the TCB, and return. | |||
the TCB, and return. | ||||
o ESTABLISHED | - ESTABLISHED STATE | |||
FIN-WAIT-1 | - FIN-WAIT-1 STATE | |||
FIN-WAIT-2 | ||||
CLOSE-WAIT | - FIN-WAIT-2 STATE | |||
+ If the RST bit is set then, any outstanding RECEIVEs and | - CLOSE-WAIT STATE | |||
SEND should receive "reset" responses. All segment | ||||
queues should be flushed. Users should also receive an | ||||
unsolicited general "connection reset" signal. Enter the | ||||
CLOSED state, delete the TCB, and return. | ||||
o CLOSING STATE | o If the RST bit is set, then any outstanding RECEIVEs and | |||
SEND should receive "reset" responses. All segment queues | ||||
should be flushed. Users should also receive an unsolicited | ||||
general "connection reset" signal. Enter the CLOSED state, | ||||
delete the TCB, and return. | ||||
LAST-ACK STATE | - CLOSING STATE | |||
TIME-WAIT | - LAST-ACK STATE | |||
+ If the RST bit is set then, enter the CLOSED state, | - TIME-WAIT STATE | |||
delete the TCB, and return. | ||||
- third check security | o If the RST bit is set, then enter the CLOSED state, delete | |||
the TCB, and return. | ||||
o SYN-RECEIVED | Third, check security: | |||
+ If the security/compartment in the segment does not | - SYN-RECEIVED STATE | |||
exactly match the security/compartment in the TCB then | ||||
send a reset, and return. | ||||
o ESTABLISHED | o If the security/compartment in the segment does not exactly | |||
match the security/compartment in the TCB, then send a reset | ||||
and return. | ||||
FIN-WAIT-1 | - ESTABLISHED STATE | |||
FIN-WAIT-2 | - FIN-WAIT-1 STATE | |||
CLOSE-WAIT | - FIN-WAIT-2 STATE | |||
CLOSING | - CLOSE-WAIT STATE | |||
LAST-ACK | - CLOSING STATE | |||
TIME-WAIT | - LAST-ACK STATE | |||
+ If the security/compartment in the segment does not | - TIME-WAIT STATE | |||
exactly match the security/compartment in the TCB then | ||||
send a reset, any outstanding RECEIVEs and SEND should | ||||
receive "reset" responses. All segment queues should be | ||||
flushed. Users should also receive an unsolicited | ||||
general "connection reset" signal. Enter the CLOSED | ||||
state, delete the TCB, and return. | ||||
o Note this check is placed following the sequence check to | o If the security/compartment in the segment does not exactly | |||
match the security/compartment in the TCB, then send a | ||||
reset; any outstanding RECEIVEs and SEND should receive | ||||
"reset" responses. All segment queues should be flushed. | ||||
Users should also receive an unsolicited general "connection | ||||
reset" signal. Enter the CLOSED state, delete the TCB, and | ||||
return. | ||||
- Note this check is placed following the sequence check to | ||||
prevent a segment from an old connection between these port | prevent a segment from an old connection between these port | |||
numbers with a different security from causing an abort of | numbers with a different security from causing an abort of the | |||
the current connection. | current connection. | |||
- fourth, check the SYN bit, | Fourth, check the SYN bit: | |||
o SYN-RECEIVED | - SYN-RECEIVED STATE | |||
+ If the connection was initiated with a passive OPEN, then | o If the connection was initiated with a passive OPEN, then | |||
return this connection to the LISTEN state and return. | return this connection to the LISTEN state and return. | |||
Otherwise, handle per the directions for synchronized | Otherwise, handle per the directions for synchronized states | |||
states below. | below. | |||
ESTABLISHED STATE | - ESTABLISHED STATE | |||
FIN-WAIT STATE-1 | - FIN-WAIT-1 STATE | |||
FIN-WAIT STATE-2 | - FIN-WAIT-2 STATE | |||
CLOSE-WAIT STATE | - CLOSE-WAIT STATE | |||
CLOSING STATE | - CLOSING STATE | |||
LAST-ACK STATE | - LAST-ACK STATE | |||
TIME-WAIT STATE | - TIME-WAIT STATE | |||
+ If the SYN bit is set in these synchronized states, it | o If the SYN bit is set in these synchronized states, it may | |||
may be either a legitimate new connection attempt (e.g. | be either a legitimate new connection attempt (e.g., in the | |||
in the case of TIME-WAIT), an error where the connection | case of TIME-WAIT), an error where the connection should be | |||
should be reset, or the result of an attack attempt, as | reset, or the result of an attack attempt, as described in | |||
described in RFC 5961 [9]. For the TIME-WAIT state, new | RFC 5961 [9]. For the TIME-WAIT state, new connections can | |||
connections can be accepted if the timestamp option is | be accepted if the Timestamp Option is used and meets | |||
used and meets expectations (per [41]). For all other | expectations (per [40]). For all other cases, RFC 5961 | |||
cases, RFC 5961 provides a mitigation with applicability | provides a mitigation with applicability to some situations, | |||
to some situations, though there are also alternatives | though there are also alternatives that offer cryptographic | |||
that offer cryptographic protection (see Section 7). RFC | protection (see Section 7). RFC 5961 recommends that in | |||
5961 recommends that in these synchronized states, if the | these synchronized states, if the SYN bit is set, | |||
SYN bit is set, irrespective of the sequence number, TCP | irrespective of the sequence number, TCP endpoints MUST send | |||
endpoints MUST send a "challenge ACK" to the remote peer: | a "challenge ACK" to the remote peer: | |||
+ <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
+ After sending the acknowledgement, TCP implementations | o After sending the acknowledgment, TCP implementations MUST | |||
MUST drop the unacceptable segment and stop processing | drop the unacceptable segment and stop processing further. | |||
further. Note that RFC 5961 and Errata ID 4772 contain | Note that RFC 5961 and Errata ID 4772 [99] contain | |||
additional ACK throttling notes for an implementation. | additional ACK throttling notes for an implementation. | |||
+ For implementations that do not follow RFC 5961, the | o For implementations that do not follow RFC 5961, the | |||
original RFC 793 behavior follows in this paragraph. If | original behavior described in RFC 793 follows in this | |||
the SYN is in the window it is an error, send a reset, | paragraph. If the SYN is in the window it is an error: send | |||
any outstanding RECEIVEs and SEND should receive "reset" | a reset, any outstanding RECEIVEs and SEND should receive | |||
responses, all segment queues should be flushed, the user | "reset" responses, all segment queues should be flushed, the | |||
should also receive an unsolicited general "connection | user should also receive an unsolicited general "connection | |||
reset" signal, enter the CLOSED state, delete the TCB, | reset" signal, enter the CLOSED state, delete the TCB, and | |||
and return. | return. | |||
+ If the SYN is not in the window this step would not be | o If the SYN is not in the window, this step would not be | |||
reached and an ACK would have been sent in the first step | reached and an ACK would have been sent in the first step | |||
(sequence number check). | (sequence number check). | |||
- fifth check the ACK field, | Fifth, check the ACK field: | |||
o if the ACK bit is off drop the segment and return | - if the ACK bit is off, drop the segment and return | |||
o if the ACK bit is on | - if the ACK bit is on, | |||
+ RFC 5961 [9] section 5 describes a potential blind data | o RFC 5961 [9], Section 5 describes a potential blind data | |||
injection attack, and mitigation that implementations MAY | injection attack, and mitigation that implementations MAY | |||
choose to include (MAY-12). TCP stacks that implement | choose to include (MAY-12). TCP stacks that implement RFC | |||
RFC 5961 MUST add an input check that the ACK value is | 5961 MUST add an input check that the ACK value is | |||
acceptable only if it is in the range of ((SND.UNA - | acceptable only if it is in the range of ((SND.UNA - | |||
MAX.SND.WND) =< SEG.ACK =< SND.NXT). All incoming | MAX.SND.WND) =< SEG.ACK =< SND.NXT). All incoming segments | |||
segments whose ACK value doesn't satisfy the above | whose ACK value doesn't satisfy the above condition MUST be | |||
condition MUST be discarded and an ACK sent back. The | discarded and an ACK sent back. The new state variable | |||
new state variable MAX.SND.WND is defined as the largest | MAX.SND.WND is defined as the largest window that the local | |||
window that the local sender has ever received from its | sender has ever received from its peer (subject to window | |||
peer (subject to window scaling) or may be hard-coded to | scaling) or may be hard-coded to a maximum permissible | |||
a maximum permissible window value. When the ACK value | window value. When the ACK value is acceptable, the per- | |||
is acceptable, the processing per-state below applies: | state processing below applies: | |||
+ SYN-RECEIVED STATE | o SYN-RECEIVED STATE | |||
* If SND.UNA < SEG.ACK =< SND.NXT then enter ESTABLISHED | + If SND.UNA < SEG.ACK =< SND.NXT, then enter ESTABLISHED | |||
state and continue processing with variables below set | state and continue processing with the variables below | |||
to: | set to: | |||
- SND.WND <- SEG.WND | SND.WND <- SEG.WND | |||
SND.WL1 <- SEG.SEQ | SND.WL1 <- SEG.SEQ | |||
SND.WL2 <- SEG.ACK | SND.WL2 <- SEG.ACK | |||
* If the segment acknowledgment is not acceptable, form | + If the segment acknowledgment is not acceptable, form a | |||
a reset segment, | reset segment | |||
- <SEQ=SEG.ACK><CTL=RST> | ||||
* and send it. | <SEQ=SEG.ACK><CTL=RST> | |||
+ ESTABLISHED STATE | + and send it. | |||
* If SND.UNA < SEG.ACK =< SND.NXT then, set SND.UNA <- | o ESTABLISHED STATE | |||
SEG.ACK. Any segments on the retransmission queue | ||||
that are thereby entirely acknowledged are removed. | ||||
Users should receive positive acknowledgments for | ||||
buffers that have been SENT and fully acknowledged | ||||
(i.e., SEND buffer should be returned with "ok" | ||||
response). If the ACK is a duplicate (SEG.ACK =< | ||||
SND.UNA), it can be ignored. If the ACK acks | ||||
something not yet sent (SEG.ACK > SND.NXT) then send | ||||
an ACK, drop the segment, and return. | ||||
* If SND.UNA =< SEG.ACK =< SND.NXT, the send window | + If SND.UNA < SEG.ACK =< SND.NXT, then set SND.UNA <- | |||
should be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 | SEG.ACK. Any segments on the retransmission queue that | |||
= SEG.SEQ and SND.WL2 =< SEG.ACK)), set SND.WND <- | are thereby entirely acknowledged are removed. Users | |||
SEG.WND, set SND.WL1 <- SEG.SEQ, and set SND.WL2 <- | should receive positive acknowledgments for buffers that | |||
SEG.ACK. | have been SENT and fully acknowledged (i.e., SEND buffer | |||
should be returned with "ok" response). If the ACK is a | ||||
duplicate (SEG.ACK =< SND.UNA), it can be ignored. If | ||||
the ACK acks something not yet sent (SEG.ACK > SND.NXT), | ||||
then send an ACK, drop the segment, and return. | ||||
* Note that SND.WND is an offset from SND.UNA, that | + If SND.UNA =< SEG.ACK =< SND.NXT, the send window should | |||
SND.WL1 records the sequence number of the last | be updated. If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ | |||
segment used to update SND.WND, and that SND.WL2 | and SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set | |||
records the acknowledgment number of the last segment | SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK. | |||
used to update SND.WND. The check here prevents using | ||||
old segments to update the window. | ||||
+ FIN-WAIT-1 STATE | + Note that SND.WND is an offset from SND.UNA, that SND.WL1 | |||
records the sequence number of the last segment used to | ||||
update SND.WND, and that SND.WL2 records the | ||||
acknowledgment number of the last segment used to update | ||||
SND.WND. The check here prevents using old segments to | ||||
update the window. | ||||
* In addition to the processing for the ESTABLISHED | o FIN-WAIT-1 STATE | |||
state, if the FIN segment is now acknowledged then | ||||
enter FIN-WAIT-2 and continue processing in that | ||||
state. | ||||
+ FIN-WAIT-2 STATE | + In addition to the processing for the ESTABLISHED state, | |||
if the FIN segment is now acknowledged, then enter FIN- | ||||
WAIT-2 and continue processing in that state. | ||||
* In addition to the processing for the ESTABLISHED | o FIN-WAIT-2 STATE | |||
state, if the retransmission queue is empty, the | ||||
user's CLOSE can be acknowledged ("ok") but do not | ||||
delete the TCB. | ||||
+ CLOSE-WAIT STATE | + In addition to the processing for the ESTABLISHED state, | |||
if the retransmission queue is empty, the user's CLOSE | ||||
can be acknowledged ("ok") but do not delete the TCB. | ||||
* Do the same processing as for the ESTABLISHED state. | o CLOSE-WAIT STATE | |||
+ CLOSING STATE | + Do the same processing as for the ESTABLISHED state. | |||
* In addition to the processing for the ESTABLISHED | o CLOSING STATE | |||
state, if the ACK acknowledges our FIN then enter the | ||||
TIME-WAIT state, otherwise ignore the segment. | ||||
+ LAST-ACK STATE | + In addition to the processing for the ESTABLISHED state, | |||
if the ACK acknowledges our FIN, then enter the TIME-WAIT | ||||
state; otherwise, ignore the segment. | ||||
* The only thing that can arrive in this state is an | o LAST-ACK STATE | |||
+ The only thing that can arrive in this state is an | ||||
acknowledgment of our FIN. If our FIN is now | acknowledgment of our FIN. If our FIN is now | |||
acknowledged, delete the TCB, enter the CLOSED state, | acknowledged, delete the TCB, enter the CLOSED state, and | |||
and return. | return. | |||
+ TIME-WAIT STATE | o TIME-WAIT STATE | |||
* The only thing that can arrive in this state is a | + The only thing that can arrive in this state is a | |||
retransmission of the remote FIN. Acknowledge it, and | retransmission of the remote FIN. Acknowledge it, and | |||
restart the 2 MSL timeout. | restart the 2 MSL timeout. | |||
- sixth, check the URG bit, | Sixth, check the URG bit: | |||
o ESTABLISHED STATE | - ESTABLISHED STATE | |||
FIN-WAIT-1 STATE | - FIN-WAIT-1 STATE | |||
FIN-WAIT-2 STATE | - FIN-WAIT-2 STATE | |||
+ If the URG bit is set, RCV.UP <- max(RCV.UP,SEG.UP), and | o If the URG bit is set, RCV.UP <- max(RCV.UP,SEG.UP), and | |||
signal the user that the remote side has urgent data if | signal the user that the remote side has urgent data if the | |||
the urgent pointer (RCV.UP) is in advance of the data | urgent pointer (RCV.UP) is in advance of the data consumed. | |||
consumed. If the user has already been signaled (or is | If the user has already been signaled (or is still in the | |||
still in the "urgent mode") for this continuous sequence | "urgent mode") for this continuous sequence of urgent data, | |||
of urgent data, do not signal the user again. | do not signal the user again. | |||
o CLOSE-WAIT STATE | - CLOSE-WAIT STATE | |||
CLOSING STATE | - CLOSING STATE | |||
LAST-ACK STATE | - LAST-ACK STATE | |||
TIME-WAIT | - TIME-WAIT STATE | |||
+ This should not occur, since a FIN has been received from | o This should not occur since a FIN has been received from the | |||
the remote side. Ignore the URG. | remote side. Ignore the URG. | |||
- seventh, process the segment text, | Seventh, process the segment text: | |||
o ESTABLISHED STATE | - ESTABLISHED STATE | |||
FIN-WAIT-1 STATE | ||||
FIN-WAIT-2 STATE | - FIN-WAIT-1 STATE | |||
+ Once in the ESTABLISHED state, it is possible to deliver | - FIN-WAIT-2 STATE | |||
o Once in the ESTABLISHED state, it is possible to deliver | ||||
segment data to user RECEIVE buffers. Data from segments | segment data to user RECEIVE buffers. Data from segments | |||
can be moved into buffers until either the buffer is full | can be moved into buffers until either the buffer is full or | |||
or the segment is empty. If the segment empties and | the segment is empty. If the segment empties and carries a | |||
carries a PUSH flag, then the user is informed, when the | PUSH flag, then the user is informed, when the buffer is | |||
buffer is returned, that a PUSH has been received. | returned, that a PUSH has been received. | |||
+ When the TCP endpoint takes responsibility for delivering | o When the TCP endpoint takes responsibility for delivering | |||
the data to the user it must also acknowledge the receipt | the data to the user, it must also acknowledge the receipt | |||
of the data. | of the data. | |||
+ Once the TCP endpoint takes responsibility for the data | o Once the TCP endpoint takes responsibility for the data, it | |||
it advances RCV.NXT over the data accepted, and adjusts | advances RCV.NXT over the data accepted, and adjusts RCV.WND | |||
RCV.WND as appropriate to the current buffer | as appropriate to the current buffer availability. The | |||
availability. The total of RCV.NXT and RCV.WND should | total of RCV.NXT and RCV.WND should not be reduced. | |||
not be reduced. | ||||
+ A TCP implementation MAY send an ACK segment | o A TCP implementation MAY send an ACK segment acknowledging | |||
acknowledging RCV.NXT when a valid segment arrives that | RCV.NXT when a valid segment arrives that is in the window | |||
is in the window but not at the left window edge (MAY- | but not at the left window edge (MAY-13). | |||
13). | ||||
+ Please note the window management suggestions in | o Please note the window management suggestions in | |||
Section 3.8. | Section 3.8. | |||
+ Send an acknowledgment of the form: | o Send an acknowledgment of the form: | |||
* <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> | |||
+ This acknowledgment should be piggybacked on a segment | o This acknowledgment should be piggybacked on a segment being | |||
being transmitted if possible without incurring undue | transmitted if possible without incurring undue delay. | |||
delay. | ||||
o CLOSE-WAIT STATE | - CLOSE-WAIT STATE | |||
CLOSING STATE | - CLOSING STATE | |||
LAST-ACK STATE | - LAST-ACK STATE | |||
TIME-WAIT STATE | - TIME-WAIT STATE | |||
+ This should not occur, since a FIN has been received from | o This should not occur since a FIN has been received from the | |||
the remote side. Ignore the segment text. | remote side. Ignore the segment text. | |||
- eighth, check the FIN bit, | Eighth, check the FIN bit: | |||
o Do not process the FIN if the state is CLOSED, LISTEN or | - Do not process the FIN if the state is CLOSED, LISTEN, or SYN- | |||
SYN-SENT since the SEG.SEQ cannot be validated; drop the | SENT since the SEG.SEQ cannot be validated; drop the segment | |||
segment and return. | and return. | |||
o If the FIN bit is set, signal the user "connection closing" | - If the FIN bit is set, signal the user "connection closing" and | |||
and return any pending RECEIVEs with same message, advance | return any pending RECEIVEs with same message, advance RCV.NXT | |||
RCV.NXT over the FIN, and send an acknowledgment for the | over the FIN, and send an acknowledgment for the FIN. Note | |||
FIN. Note that FIN implies PUSH for any segment text not | that FIN implies PUSH for any segment text not yet delivered to | |||
yet delivered to the user. | the user. | |||
+ SYN-RECEIVED STATE | o SYN-RECEIVED STATE | |||
ESTABLISHED STATE | o ESTABLISHED STATE | |||
* Enter the CLOSE-WAIT state. | + Enter the CLOSE-WAIT state. | |||
+ FIN-WAIT-1 STATE | o FIN-WAIT-1 STATE | |||
* If our FIN has been ACKed (perhaps in this segment), | + If our FIN has been ACKed (perhaps in this segment), then | |||
then enter TIME-WAIT, start the time-wait timer, turn | enter TIME-WAIT, start the time-wait timer, turn off the | |||
off the other timers; otherwise enter the CLOSING | other timers; otherwise, enter the CLOSING state. | |||
state. | ||||
+ FIN-WAIT-2 STATE | o FIN-WAIT-2 STATE | |||
* Enter the TIME-WAIT state. Start the time-wait timer, | + Enter the TIME-WAIT state. Start the time-wait timer, | |||
turn off the other timers. | turn off the other timers. | |||
+ CLOSE-WAIT STATE | o CLOSE-WAIT STATE | |||
* Remain in the CLOSE-WAIT state. | + Remain in the CLOSE-WAIT state. | |||
+ CLOSING STATE | o CLOSING STATE | |||
* Remain in the CLOSING state. | + Remain in the CLOSING state. | |||
+ LAST-ACK STATE | o LAST-ACK STATE | |||
* Remain in the LAST-ACK state. | + Remain in the LAST-ACK state. | |||
+ TIME-WAIT STATE | o TIME-WAIT STATE | |||
* Remain in the TIME-WAIT state. Restart the 2 MSL | + Remain in the TIME-WAIT state. Restart the 2 MSL time- | |||
time-wait timeout. | wait timeout. | |||
- and return. | and return. | |||
3.10.8. Timeouts | 3.10.8. Timeouts | |||
USER TIMEOUT | USER TIMEOUT | |||
- For any state if the user timeout expires, flush all queues, | * For any state if the user timeout expires, flush all queues, | |||
signal the user "error: connection aborted due to user timeout" | signal the user "error: connection aborted due to user timeout" in | |||
in general and for any outstanding calls, delete the TCB, enter | general and for any outstanding calls, delete the TCB, enter the | |||
the CLOSED state and return. | CLOSED state, and return. | |||
RETRANSMISSION TIMEOUT | RETRANSMISSION TIMEOUT | |||
- For any state if the retransmission timeout expires on a | * For any state if the retransmission timeout expires on a segment | |||
segment in the retransmission queue, send the segment at the | in the retransmission queue, send the segment at the front of the | |||
front of the retransmission queue again, reinitialize the | retransmission queue again, reinitialize the retransmission timer, | |||
retransmission timer, and return. | and return. | |||
TIME-WAIT TIMEOUT | TIME-WAIT TIMEOUT | |||
- If the time-wait timeout expires on a connection delete the | * If the time-wait timeout expires on a connection, delete the TCB, | |||
TCB, enter the CLOSED state and return. | enter the CLOSED state, and return. | |||
4. Glossary | 4. Glossary | |||
ACK | ACK | |||
A control bit (acknowledge) occupying no sequence space, | A control bit (acknowledge) occupying no sequence space, | |||
which indicates that the acknowledgment field of this segment | which indicates that the acknowledgment field of this segment | |||
specifies the next sequence number the sender of this segment | specifies the next sequence number the sender of this segment | |||
is expecting to receive, hence acknowledging receipt of all | is expecting to receive, hence acknowledging receipt of all | |||
previous sequence numbers. | previous sequence numbers. | |||
connection | connection | |||
A logical communication path identified by a pair of sockets. | A logical communication path identified by a pair of sockets. | |||
datagram | datagram | |||
A message sent in a packet switched computer communications | A message sent in a packet-switched computer communications | |||
network. | network. | |||
Destination Address | Destination Address | |||
The network layer address of the endpoint intended to receive | The network-layer address of the endpoint intended to receive | |||
a segment. | a segment. | |||
FIN | FIN | |||
A control bit (finis) occupying one sequence number, which | A control bit (finis) occupying one sequence number, which | |||
indicates that the sender will send no more data or control | indicates that the sender will send no more data or control | |||
occupying sequence space. | occupying sequence space. | |||
flush | flush | |||
To remove all of the contents (data or segments) from a store | To remove all of the contents (data or segments) from a store | |||
(buffer or queue). | (buffer or queue). | |||
fragment | fragment | |||
A portion of a logical unit of data, in particular an | A portion of a logical unit of data. In particular, an | |||
internet fragment is a portion of an internet datagram. | internet fragment is a portion of an internet datagram. | |||
header | header | |||
Control information at the beginning of a message, segment, | Control information at the beginning of a message, segment, | |||
fragment, packet or block of data. | fragment, packet, or block of data. | |||
host | host | |||
A computer. In particular a source or destination of | A computer. In particular, a source or destination of | |||
messages from the point of view of the communication network. | messages from the point of view of the communication network. | |||
Identification | Identification | |||
An Internet Protocol field. This identifying value assigned | An Internet Protocol field. This identifying value assigned | |||
by the sender aids in assembling the fragments of a datagram. | by the sender aids in assembling the fragments of a datagram. | |||
internet address | internet address | |||
A network layer address. | A network-layer address. | |||
internet datagram | internet datagram | |||
A unit of data exchanged between internet hosts, together | A unit of data exchanged between internet hosts, together | |||
with the internet header that allows the datagram to be | with the internet header that allows the datagram to be | |||
routed from source to destination. | routed from source to destination. | |||
internet fragment | internet fragment | |||
A portion of the data of an internet datagram with an | A portion of the data of an internet datagram with an | |||
internet header. | internet header. | |||
IP | IP | |||
Internet Protocol. See [1] and [13]. | Internet Protocol. See [1] and [13]. | |||
IRS | IRS | |||
The Initial Receive Sequence number. The first sequence | The Initial Receive Sequence number. The first sequence | |||
number used by the sender on a connection. | number used by the sender on a connection. | |||
ISN | ISN | |||
The Initial Sequence Number. The first sequence number used | The Initial Sequence Number. The first sequence number used | |||
on a connection, (either ISS or IRS). Selected in a way that | on a connection (either ISS or IRS). Selected in a way that | |||
is unique within a given period of time and is unpredictable | is unique within a given period of time and is unpredictable | |||
to attackers. | to attackers. | |||
ISS | ISS | |||
The Initial Send Sequence number. The first sequence number | The Initial Send Sequence number. The first sequence number | |||
used by the sender on a connection. | used by the sender on a connection. | |||
left sequence | left sequence | |||
This is the next sequence number to be acknowledged by the | This is the next sequence number to be acknowledged by the | |||
data receiving TCP endpoint (or the lowest currently | data-receiving TCP endpoint (or the lowest currently | |||
unacknowledged sequence number) and is sometimes referred to | unacknowledged sequence number) and is sometimes referred to | |||
as the left edge of the send window. | as the left edge of the send window. | |||
module | module | |||
An implementation, usually in software, of a protocol or | An implementation, usually in software, of a protocol or | |||
other procedure. | other procedure. | |||
MSL | MSL | |||
Maximum Segment Lifetime, the time a TCP segment can exist in | Maximum Segment Lifetime, the time a TCP segment can exist in | |||
the internetwork system. Arbitrarily defined to be 2 | the internetwork system. Arbitrarily defined to be 2 | |||
minutes. | minutes. | |||
octet | octet | |||
An eight bit byte. | An eight-bit byte. | |||
Options | Options | |||
An Option field may contain several options, and each option | An Option field may contain several options, and each option | |||
may be several octets in length. | may be several octets in length. | |||
packet | packet | |||
A package of data with a header that may or may not be | A package of data with a header that may or may not be | |||
logically complete. More often a physical packaging than a | logically complete. More often a physical packaging than a | |||
logical packaging of data. | logical packaging of data. | |||
skipping to change at page 87, line 46 ¶ | skipping to change at line 4108 ¶ | |||
SEG.SEQ | SEG.SEQ | |||
segment sequence | segment sequence | |||
SEG.UP | SEG.UP | |||
segment urgent pointer field | segment urgent pointer field | |||
SEG.WND | SEG.WND | |||
segment window field | segment window field | |||
segment | segment | |||
A logical unit of data, in particular a TCP segment is the | A logical unit of data. In particular, a TCP segment is the | |||
unit of data transferred between a pair of TCP modules. | unit of data transferred between a pair of TCP modules. | |||
segment acknowledgment | segment acknowledgment | |||
The sequence number in the acknowledgment field of the | The sequence number in the acknowledgment field of the | |||
arriving segment. | arriving segment. | |||
segment length | segment length | |||
The amount of sequence number space occupied by a segment, | The amount of sequence number space occupied by a segment, | |||
including any controls that occupy sequence space. | including any controls that occupy sequence space. | |||
skipping to change at page 88, line 23 ¶ | skipping to change at line 4133 ¶ | |||
This is the next sequence number the local (sending) TCP | This is the next sequence number the local (sending) TCP | |||
endpoint will use on the connection. It is initially | endpoint will use on the connection. It is initially | |||
selected from an initial sequence number curve (ISN) and is | selected from an initial sequence number curve (ISN) and is | |||
incremented for each octet of data or sequenced control | incremented for each octet of data or sequenced control | |||
transmitted. | transmitted. | |||
send window | send window | |||
This represents the sequence numbers that the remote | This represents the sequence numbers that the remote | |||
(receiving) TCP endpoint is willing to receive. It is the | (receiving) TCP endpoint is willing to receive. It is the | |||
value of the window field specified in segments from the | value of the window field specified in segments from the | |||
remote (data receiving) TCP endpoint. The range of new | remote (data-receiving) TCP endpoint. The range of new | |||
sequence numbers that may be emitted by a TCP implementation | sequence numbers that may be emitted by a TCP implementation | |||
lies between SND.NXT and SND.UNA + SND.WND - 1. | lies between SND.NXT and SND.UNA + SND.WND - 1. | |||
(Retransmissions of sequence numbers between SND.UNA and | (Retransmissions of sequence numbers between SND.UNA and | |||
SND.NXT are expected, of course.) | SND.NXT are expected, of course.) | |||
SND.NXT | SND.NXT | |||
send sequence | send sequence | |||
SND.UNA | SND.UNA | |||
left sequence | left sequence | |||
skipping to change at page 88, line 52 ¶ | skipping to change at line 4162 ¶ | |||
segment acknowledgment number at last window update | segment acknowledgment number at last window update | |||
SND.WND | SND.WND | |||
send window | send window | |||
socket (or socket number, or socket address, or socket identifier) | socket (or socket number, or socket address, or socket identifier) | |||
An address that specifically includes a port identifier, that | An address that specifically includes a port identifier, that | |||
is, the concatenation of an Internet Address with a TCP port. | is, the concatenation of an Internet Address with a TCP port. | |||
Source Address | Source Address | |||
The network layer address of the sending endpoint. | The network-layer address of the sending endpoint. | |||
SYN | SYN | |||
A control bit in the incoming segment, occupying one sequence | A control bit in the incoming segment, occupying one sequence | |||
number, used at the initiation of a connection, to indicate | number, used at the initiation of a connection to indicate | |||
where the sequence numbering will start. | where the sequence numbering will start. | |||
TCB | TCB | |||
Transmission control block, the data structure that records | Transmission control block, the data structure that records | |||
the state of a connection. | the state of a connection. | |||
TCP | TCP | |||
Transmission Control Protocol: A host-to-host protocol for | Transmission Control Protocol: a host-to-host protocol for | |||
reliable communication in internetwork environments. | reliable communication in internetwork environments. | |||
TOS | TOS | |||
Type of Service, an obsoleted IPv4 field. The same header | Type of Service, an obsoleted IPv4 field. The same header | |||
bits currently are used for the Differentiated Services field | bits currently are used for the Differentiated Services field | |||
[4] containing the Differentiated Services Code Point (DSCP) | [4] containing the Differentiated Services Codepoint (DSCP) | |||
value and the 2-bit ECN codepoint [6]. | value and the 2-bit ECN codepoint [6]. | |||
Type of Service | Type of Service | |||
See "TOS". | See "TOS". | |||
URG | URG | |||
A control bit (urgent), occupying no sequence space, used to | A control bit (urgent), occupying no sequence space, used to | |||
indicate that the receiving user should be notified to do | indicate that the receiving user should be notified to do | |||
urgent processing as long as there is data to be consumed | urgent processing as long as there is data to be consumed | |||
with sequence numbers less than the value indicated by the | with sequence numbers less than the value indicated by the | |||
urgent pointer. | urgent pointer. | |||
urgent pointer | urgent pointer | |||
A control field meaningful only when the URG bit is on. This | A control field meaningful only when the URG bit is on. This | |||
field communicates the value of the urgent pointer that | field communicates the value of the urgent pointer that | |||
indicates the data octet associated with the sending user's | indicates the data octet associated with the sending user's | |||
urgent call. | urgent call. | |||
5. Changes from RFC 793 | 5. Changes from RFC 793 | |||
This document obsoletes RFC 793 as well as RFC 6093 and 6528, which | This document obsoletes RFC 793 as well as RFCs 6093 and 6528, which | |||
updated 793. In all cases, only the normative protocol specification | updated 793. In all cases, only the normative protocol specification | |||
and requirements have been incorporated into this document, and some | and requirements have been incorporated into this document, and some | |||
informational text with background and rationale may not have been | informational text with background and rationale may not have been | |||
carried in. The informational content of those documents is still | carried in. The informational content of those documents is still | |||
valuable in learning about and understanding TCP, and they are valid | valuable in learning about and understanding TCP, and they are valid | |||
Informational references, even though their normative content has | Informational references, even though their normative content has | |||
been incorporated into this document. | been incorporated into this document. | |||
The main body of this document was adapted from RFC 793's Section 3, | The main body of this document was adapted from RFC 793's Section 3, | |||
titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting | titled "FUNCTIONAL SPECIFICATION", with an attempt to keep formatting | |||
and layout as close as possible. | and layout as close as possible. | |||
The collection of applicable RFC Errata that have been reported and | The collection of applicable RFC errata that have been reported and | |||
either accepted or held for an update to RFC 793 were incorporated | either accepted or held for an update to RFC 793 were incorporated | |||
(Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1571, 1572, | (Errata IDs: 573 [73], 574 [74], 700 [75], 701 [76], 1283 [77], 1561 | |||
2297, 2298, 2748, 2749, 2934, 3213, 3300, 3301, 6222). Some errata | [78], 1562 [79], 1564 [80], 1571 [81], 1572 [82], 2297 [83], 2298 | |||
were not applicable due to other changes (Errata IDs: 572, 575, 1565, | [84], 2748 [85], 2749 [86], 2934 [87], 3213 [88], 3300 [89], 3301 | |||
1569, 2296, 3305, 3602). | [90], 6222 [91]). Some errata were not applicable due to other | |||
changes (Errata IDs: 572 [92], 575 [93], 1565 [94], 1569 [95], 2296 | ||||
[96], 3305 [97], 3602 [98]). | ||||
Changes to the specification of the Urgent Pointer described in RFCs | Changes to the specification of the urgent pointer described in RFCs | |||
1011, 1122, and 6093 were incorporated. See RFC 6093 for detailed | 1011, 1122, and 6093 were incorporated. See RFC 6093 for detailed | |||
discussion of why these changes were necessary. | discussion of why these changes were necessary. | |||
The discussion of the RTO from RFC 793 was updated to refer to RFC | The discussion of the RTO from RFC 793 was updated to refer to RFC | |||
6298. The RFC 1122 text on the RTO originally replaced the 793 text, | 6298. The text on the RTO in RFC 1122 originally replaced the text | |||
however, RFC 2988 should have updated 1122, and has subsequently been | in RFC 793; however, RFC 2988 should have updated RFC 1122 and has | |||
obsoleted by 6298. | subsequently been obsoleted by RFC 6298. | |||
RFC 1011 [19] contains a number of comments about RFC 793, including | RFC 1011 [18] contains a number of comments about RFC 793, including | |||
some needed changes to the TCP specification. These are expanded in | some needed changes to the TCP specification. These are expanded in | |||
RFC 1122, which contains a collection of other changes and | RFC 1122, which contains a collection of other changes and | |||
clarifications to RFC 793. The normative items impacting the | clarifications to RFC 793. The normative items impacting the | |||
protocol have been incorporated here, though some historically useful | protocol have been incorporated here, though some historically useful | |||
implementation advice and informative discussion from RFC 1122 is not | implementation advice and informative discussion from RFC 1122 is not | |||
included here. The present document updates RFC 1011, since this is | included here. The present document, which is now the TCP | |||
now the TCP specification rather than RFC 793, and the comments noted | specification rather than RFC 793, updates RFC 1011, and the comments | |||
in 1011 have been incorporated. | noted in RFC 1011 have been incorporated. | |||
RFC 1122 contains more than just TCP requirements, so this document | RFC 1122 contains more than just TCP requirements, so this document | |||
can't obsolete RFC 1122 entirely. It is only marked as "updating" | can't obsolete RFC 1122 entirely. It is only marked as "updating" | |||
1122, however, it should be understood to effectively obsolete all of | RFC 1122; however, it should be understood to effectively obsolete | |||
the RFC 1122 material on TCP. | all of the material on TCP found in RFC 1122. | |||
The more secure Initial Sequence Number generation algorithm from RFC | The more secure initial sequence number generation algorithm from RFC | |||
6528 was incorporated. See RFC 6528 for discussion of the attacks | 6528 was incorporated. See RFC 6528 for discussion of the attacks | |||
that this mitigates, as well as advice on selecting PRF algorithms | that this mitigates, as well as advice on selecting PRF algorithms | |||
and managing secret key data. | and managing secret key data. | |||
A note based on RFC 6429 was added to explicitly clarify that system | A note based on RFC 6429 was added to explicitly clarify that system | |||
resource management concerns allow connection resources to be | resource management concerns allow connection resources to be | |||
reclaimed. RFC 6429 is obsoleted in the sense that this | reclaimed. RFC 6429 is obsoleted in the sense that the clarification | |||
clarification has been reflected in this update to the base TCP | it describes has been reflected within this base TCP specification. | |||
specification now. | ||||
The description of congestion control implementation was added, based | The description of congestion control implementation was added based | |||
on the set of documents that are IETF BCP or Standards Track on the | on the set of documents that are IETF BCP or Standards Track on the | |||
topic, and the current state of common implementations. | topic and the current state of common implementations. | |||
RFC EDITOR'S NOTE: the content below is for detailed change tracking | ||||
and planning, and not to be included with the final revision of the | ||||
document. | ||||
This document started as draft-eddy-rfc793bis-00, that was merely a | ||||
proposal and rough plan for updating RFC 793. | ||||
The -01 revision of this draft-eddy-rfc793bis incorporates the | ||||
content of RFC 793 Section 3 titled "FUNCTIONAL SPECIFICATION". | ||||
Other content from RFC 793 has not been incorporated. The -01 | ||||
revision of this document makes some minor formatting changes to the | ||||
RFC 793 content in order to convert the content into XML2RFC format | ||||
and account for left-out parts of RFC 793. For instance, figure | ||||
numbering differs and some indentation is not exactly the same. | ||||
The -02 revision of draft-eddy-rfc793bis incorporates errata that | ||||
have been verified: | ||||
Errata ID 573: Reported by Bob Braden (note: This errata report | ||||
basically is just a reminder that RFC 1122 updates 793. Some of | ||||
the associated changes are left pending to a separate revision | ||||
that incorporates 1122. Bob's mention of PUSH in 793 section 2.8 | ||||
was not applicable here because that section was not part of the | ||||
"functional specification". Also, the 1122 text on the | ||||
retransmission timeout also has been updated by subsequent RFCs, | ||||
so the change here deviates from Bob's suggestion to apply the | ||||
1122 text.) | ||||
Errata ID 574: Reported by Yin Shuming | ||||
Errata ID 700: Reported by Yin Shuming | ||||
Errata ID 701: Reported by Yin Shuming | ||||
Errata ID 1283: Reported by Pei-chun Cheng | ||||
Errata ID 1561: Reported by Constantin Hagemeier | ||||
Errata ID 1562: Reported by Constantin Hagemeier | ||||
Errata ID 1564: Reported by Constantin Hagemeier | ||||
Errata ID 1565: Reported by Constantin Hagemeier | ||||
Errata ID 1571: Reported by Constantin Hagemeier | ||||
Errata ID 1572: Reported by Constantin Hagemeier | ||||
Errata ID 2296: Reported by Vishwas Manral | ||||
Errata ID 2297: Reported by Vishwas Manral | ||||
Errata ID 2298: Reported by Vishwas Manral | ||||
Errata ID 2748: Reported by Mykyta Yevstifeyev | ||||
Errata ID 2749: Reported by Mykyta Yevstifeyev | ||||
Errata ID 2934: Reported by Constantin Hagemeier | ||||
Errata ID 3213: Reported by EugnJun Yi | ||||
Errata ID 3300: Reported by Botong Huang | ||||
Errata ID 3301: Reported by Botong Huang | ||||
Errata ID 3305: Reported by Botong Huang | ||||
Note: Some verified errata were not used in this update, as they | ||||
relate to sections of RFC 793 elided from this document. These | ||||
include Errata ID 572, 575, and 1569. | ||||
Note: Errata ID 3602 was not applied in this revision as it is | ||||
duplicative of the 1122 corrections. | ||||
Not related to RFC 793 content, this revision also makes small tweaks | ||||
to the introductory text, fixes indentation of the pseudo header | ||||
diagram, and notes that the Security Considerations should also | ||||
include privacy, when this section is written. | ||||
The -03 revision of draft-eddy-rfc793bis revises all discussion of | ||||
the urgent pointer in order to comply with RFC 6093, 1122, and 1011. | ||||
Since 1122 held requirements on the urgent pointer, the full list of | ||||
requirements was brought into an appendix of this document, so that | ||||
it can be updated as-needed. | ||||
The -04 revision of draft-eddy-rfc793bis includes the ISN generation | ||||
changes from RFC 6528. | ||||
The -05 revision of draft-eddy-rfc793bis incorporates MSS | ||||
requirements and definitions from RFC 879 [17], 1122, and 6691, as | ||||
well as option-handling requirements from RFC 1122. | ||||
The -00 revision of draft-ietf-tcpm-rfc793bis incorporates several | ||||
additional clarifications and updates to the section on segmentation, | ||||
many of which are based on feedback from Joe Touch improving from the | ||||
initial text on this in the previous revision. | ||||
The -01 revision incorporates the change to Reserved bits due to ECN, | ||||
as well as many other changes that come from RFC 1122. | ||||
The -02 revision has small formatting modifications in order to | ||||
address xml2rfc warnings about long lines. It was a quick update to | ||||
avoid document expiration. TCPM working group discussion in 2015 | ||||
also indicated that we should not try to add sections on | ||||
implementation advice or similar non-normative information. | ||||
The -03 revision incorporates more content from RFC 1122: Passive | ||||
OPEN Calls, Time-To-Live, Multihoming, IP Options, ICMP messages, | ||||
Data Communications, When to Send Data, When to Send a Window Update, | ||||
Managing the Window, Probing Zero Windows, When to Send an ACK | ||||
Segment. The section on data communications was re-organized into | ||||
clearer subsections (previously headings were embedded in the 793 | ||||
text), and windows management advice from 793 was removed (as | ||||
reviewed by TCPM working group) in favor of the 1122 additions on | ||||
SWS, ZWP, and related topics. | ||||
The -04 revision includes reference to RFC 6429 on the ZWP condition, | ||||
RFC1122 material on TCP Connection Failures, TCP Keep-Alives, | ||||
Acknowledging Queued Segments, and Remote Address Validation. RTO | ||||
computation is referenced from RFC 6298 rather than RFC 1122. | ||||
The -05 revision includes the requirement to implement TCP congestion | ||||
control with recommendation to implement ECN, the RFC 6633 update to | ||||
1122, which changed the requirement on responding to source quench | ||||
ICMP messages, and discussion of ICMP (and ICMPv6) soft and hard | ||||
errors per RFC 5461 (ICMPv6 handling for TCP doesn't seem to be | ||||
mentioned elsewhere in standards track). | ||||
The -06 revision includes an appendix on "Other Implementation Notes" | ||||
to capture widely-deployed fundamental features that are not | ||||
contained in the RFC series yet. It also added mention of RFC 6994 | ||||
and the IANA TCP parameters registry as a reference. It includes | ||||
references to RFC 5961 in appropriate places. The references to TOS | ||||
were changed to DiffServ field, based on reflecting RFC 2474 as well | ||||
as the IPv6 presence of traffic class (carrying DiffServ field) | ||||
rather than TOS. | ||||
The -07 revision includes reference to RFC 6191, updated security | ||||
considerations, discussion of additional implementation | ||||
considerations, and clarification of data on the SYN. | ||||
The -08 revision includes changes based on: | ||||
describing treatment of reserved bits (following TCPM mailing list | ||||
thread from July 2014 on "793bis item - reserved bit behavior" | ||||
addition a brief TCP key concepts section to make up for not | ||||
including the outdated section 2 of RFC 793 | ||||
changed "TCP" to "host" to resolve conflict between 1122 wording | ||||
on whether TCP or the network layer chooses an address when | ||||
multihomed | ||||
fixed/updated definition of options in glossary | ||||
moved note on aggregating ACKs from 1122 to a more appropriate | ||||
location | ||||
resolved notes on IP precedence and security/compartment | ||||
added implementation note on sequence number validation | ||||
added note that PUSH does not apply when Nagle is active | ||||
added 1122 content on asynchronous reports to replace 793 section | ||||
on TCP to user messages | ||||
The -09 revision fixes section numbering problems. | ||||
The -10 revision includes additions to the security considerations | ||||
based on comments from Joe Touch, and suggested edits on RST/FIN | ||||
notification, RFC 2525 reference, and other edits suggested by | ||||
Yuchung Cheng, as well as modifications to DiffServ text from Yuchung | ||||
Cheng and Gorry Fairhurst. | ||||
The -11 revision includes a start at identifying all of the | ||||
requirements text and referencing each instance in the common table | ||||
at the end of the document. | ||||
The -12 revision completes the requirement language indexing started | ||||
in -11 and adds necessary description of the PUSH functionality that | ||||
was missing. | ||||
The -13 revision contains only changes in the inline editor notes. | ||||
The -14 revision includes updates with regard to several comments | ||||
from the mailing list, including editorial fixes, adding IANA | ||||
considerations for the header flags, improving figure title | ||||
placement, and breaking up the "Terminology" section into more | ||||
appropriately titled subsections. | ||||
The -15 revision has many technical and editorial corrections from | ||||
Gorry Fairhurst's review, and subsequent discussion on the TCPM list, | ||||
as well as some other collected clarifications and improvements from | ||||
mailing list discussion. | ||||
The -16 revision addresses several discussions that rose from | ||||
additional reviews and follow-up on some of Gorry Fairhurst's | ||||
comments from revision 14. | ||||
The -17 revision includes errata 6222 from Charles Deng, update to | ||||
the key words boilerplate, updated description of the header flags | ||||
registry changes, and clarification about connections rather than | ||||
users in the discussion of OPEN calls. | ||||
The -18 revision includes editorial changes to the IANA | ||||
considerations, based on comments from Richard Scheffenegger at the | ||||
IETF 108 TCPM virtual meeting. | ||||
The -19 revision includes editorial changes from Errata 6281 and 6282 | ||||
reported by Merlin Buge. It also includes WGLC changes noted by | ||||
Mohamed Boucadair, Rahul Jadhav, Praveen Balasubramanian, Matt Olson, | ||||
Yi Huang, Joe Touch, and Juhamatti Kuusisaari. | ||||
The -20 revision includes text on congestion control based on mailing | ||||
list and meeting discussion, put together in its final form by Markku | ||||
Kojo. It also clarifies that SACK, WS, and TS options are | ||||
recommended for high performance, but not needed for basic | ||||
interoperability. It also clarifies that the length field is | ||||
required for new TCP options. | ||||
The -21 revision includes slight changes to the header diagram for | ||||
compatibility with tooling, from Stephen McQuistin, clarification on | ||||
the meaning of idle connections from Yuchung Cheng, Neal Cardwell, | ||||
Michael Scharf, and Richard Scheffenegger, editorial improvements | ||||
from Markku Kojo, notes that some stacks suppress extra | ||||
acknowledgments of the SYN when SYN-ACK carries data from Richard | ||||
Scheffenegger, and adds MAY-18 numbering based on note from Jonathan | ||||
Morton. | ||||
The -22 revision includes small clarifications on terminology (might | ||||
versus may) and IPv6 extension headers versus IPv4 options, based on | ||||
comments from Gorry Fairhurst. | ||||
The -23 revision has a fix to indentation from Michael Tuexen and | ||||
idnits issues addressed from Michael Scharf. | ||||
The -24 revision incorporates changes after Martin Duke's AD review, | ||||
including further feedback on those comments from Yuchung Cheng and | ||||
Joe Touch. Important changes for review include (1) removal of the | ||||
need to check for the PUSH flag when evaluating the SWS override | ||||
timer expiration, (2) clarification about receding urgent pointer, | ||||
and (3) de-duplicating handling of the RST checking between step 4 | ||||
and step 1. | ||||
The -25 revision incorporates changes based on the GENART review from | ||||
Francis Dupont, SECDIR review from Kyle Rose, and OPSDIR review from | ||||
Sarah Banks. | ||||
The -26 revision incorporates changes stemming from the IESG reviews, | ||||
and INTDIR review from Bernie Volz. | ||||
The -27 revision fixes a few small editorial incompatibilities that | ||||
Stephen McQuistin found related to automated code generation. | ||||
The -28 revision addresses some COMMENTs from Ben Kaduk's IESG | ||||
review. | ||||
Some other suggested changes that will not be incorporated in this | ||||
793 update unless TCPM consensus changes with regard to scope are: | ||||
1. Tony Sabatini's suggestion for describing DO field | ||||
2. Per discussion with Joe Touch (TAPS list, 6/20/2015), the | ||||
description of the API could be revisited | ||||
3. Reducing the R2 value for SYNs has been suggested as a possible | ||||
topic for future consideration. | ||||
Early in the process of updating RFC 793, Scott Brim mentioned that | ||||
this should include a PERPASS/privacy review. This may be something | ||||
for the chairs or AD to request during WGLC or IETF LC. | ||||
6. IANA Considerations | 6. IANA Considerations | |||
In the "Transmission Control Protocol (TCP) Header Flags" registry, | In the "Transmission Control Protocol (TCP) Header Flags" registry, | |||
IANA is asked to make several changes described in this section. | IANA has made several changes as described in this section. | |||
RFC 3168 originally created this registry, but only populated it with | RFC 3168 originally created this registry but only populated it with | |||
the new bits defined in RFC 3168, neglecting the other bits that had | the new bits defined in RFC 3168, neglecting the other bits that had | |||
previously been described in RFC 793 and other documents. Bit 7 has | previously been described in RFC 793 and other documents. Bit 7 has | |||
since also been updated by RFC 8311. | since also been updated by RFC 8311 [54]. | |||
The "Bit" column is renamed below as the "Bit Offset" column, since | ||||
it references each header flag's offset within the 16-bit aligned | ||||
view of the TCP header in Figure 1. The bits in offsets 0 through 4 | ||||
are the TCP segment Data Offset field, and not header flags. | ||||
IANA should add a column for "Assignment Notes". | The "Bit" column has been renamed below as the "Bit Offset" column | |||
because it references each header flag's offset within the 16-bit | ||||
aligned view of the TCP header in Figure 1. The bits in offsets 0 | ||||
through 3 are the TCP segment Data Offset field, and not header | ||||
flags. | ||||
IANA should assign values indicated below. | IANA has added a column for "Assignment Notes". | |||
TCP Header Flags | IANA has assigned values as indicated below. | |||
Bit Name Reference Assignment Notes | +========+===================+===========+====================+ | |||
Offset | | Bit | Name | Reference | Assignment Notes | | |||
--- ---- --------- ---------------- | | Offset | | | | | |||
4 Reserved for future use (this document) | +========+===================+===========+====================+ | |||
5 Reserved for future use (this document) | | 4 | Reserved for | RFC 9293 | | | |||
6 Reserved for future use (this document) | | | future use | | | | |||
7 Reserved for future use [RFC8311] [1] | +--------+-------------------+-----------+--------------------+ | |||
8 CWR (Congestion Window Reduced) [RFC3168] | | 5 | Reserved for | RFC 9293 | | | |||
9 ECE (ECN-Echo) [RFC3168] | | | future use | | | | |||
10 Urgent Pointer field is significant (URG) (this document) | +--------+-------------------+-----------+--------------------+ | |||
11 Acknowledgment field is significant (ACK) (this document) | | 6 | Reserved for | RFC 9293 | | | |||
12 Push Function (PSH) (this document) | | | future use | | | | |||
13 Reset the connection (RST) (this document) | +--------+-------------------+-----------+--------------------+ | |||
14 Synchronize sequence numbers (SYN) (this document) | | 7 | Reserved for | RFC 8311 | Previously used by | | |||
15 No more data from sender (FIN) (this document) | | | future use | | Historic RFC 3540 | | |||
| | | | as NS (Nonce Sum). | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 8 | CWR (Congestion | RFC 3168 | | | ||||
| | Window Reduced) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 9 | ECE (ECN-Echo) | RFC 3168 | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 10 | Urgent pointer | RFC 9293 | | | ||||
| | field is | | | | ||||
| | significant (URG) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 11 | Acknowledgment | RFC 9293 | | | ||||
| | field is | | | | ||||
| | significant (ACK) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 12 | Push function | RFC 9293 | | | ||||
| | (PSH) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 13 | Reset the | RFC 9293 | | | ||||
| | connection (RST) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 14 | Synchronize | RFC 9293 | | | ||||
| | sequence numbers | | | | ||||
| | (SYN) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
| 15 | No more data from | RFC 9293 | | | ||||
| | sender (FIN) | | | | ||||
+--------+-------------------+-----------+--------------------+ | ||||
FOOTNOTES: | Table 7: TCP Header Flags | |||
[1] Previously used by Historic [RFC3540] as NS (Nonce Sum). | ||||
This TCP Header Flags registry should also be moved to a sub-registry | The "TCP Header Flags" registry has also been moved to a subregistry | |||
under the global "Transmission Control Protocol (TCP) Parameters | under the global "Transmission Control Protocol (TCP) Parameters" | |||
registry (https://www.iana.org/assignments/tcp-parameters/tcp- | registry <https://www.iana.org/assignments/tcp-parameters/>. | |||
parameters.xhtml). | ||||
The registry's Registration Procedure should remain Standards Action, | The registry's Registration Procedure remains Standards Action, but | |||
but the Reference can be updated to this document, and the Note | the Reference has been updated to this document, and the Note has | |||
removed. | been removed. | |||
7. Security and Privacy Considerations | 7. Security and Privacy Considerations | |||
The TCP design includes only rudimentary security features that | The TCP design includes only rudimentary security features that | |||
improve the robustness and reliability of connections and application | improve the robustness and reliability of connections and application | |||
data transfer, but there are no built-in cryptographic capabilities | data transfer, but there are no built-in cryptographic capabilities | |||
to support any form of confidentiality, authentication, or other | to support any form of confidentiality, authentication, or other | |||
typical security functions. Non-cryptographic enhancements (e.g. | typical security functions. Non-cryptographic enhancements (e.g., | |||
[9]) have been developed to improve robustness of TCP connections to | [9]) have been developed to improve robustness of TCP connections to | |||
particular types of attacks, but the applicability and protections of | particular types of attacks, but the applicability and protections of | |||
non-cryptographic enhancements are limited (e.g. see section 1.1 of | non-cryptographic enhancements are limited (e.g., see Section 1.1 of | |||
[9]). Applications typically utilize lower-layer (e.g. IPsec) and | [9]). Applications typically utilize lower-layer (e.g., IPsec) and | |||
upper-layer (e.g. TLS) protocols to provide security and privacy for | upper-layer (e.g., TLS) protocols to provide security and privacy for | |||
TCP connections and application data carried in TCP. Methods based | TCP connections and application data carried in TCP. Methods based | |||
on TCP options have been developed as well, to support some security | on TCP Options have been developed as well, to support some security | |||
capabilities. | capabilities. | |||
In order to fully provide confidentiality, integrity protection, and | In order to fully provide confidentiality, integrity protection, and | |||
authentication for TCP connections (including their control flags) | authentication for TCP connections (including their control flags), | |||
IPsec is the only current effective method. For integrity protection | IPsec is the only current effective method. For integrity protection | |||
and authentication, the TCP Authentication Option (TCP-AO) [39] is | and authentication, the TCP Authentication Option (TCP-AO) [38] is | |||
available, with a proposed extension to also provide confidentiality | available, with a proposed extension to also provide confidentiality | |||
for the segment payload. Other methods discussed in this section may | for the segment payload. Other methods discussed in this section may | |||
provide confidentiality or integrity protection for the payload, but | provide confidentiality or integrity protection for the payload, but | |||
for the TCP header only cover either a subset of the fields (e.g. | for the TCP header only cover either a subset of the fields (e.g., | |||
tcpcrypt [57]) or none at all (e.g. TLS). Other security features | tcpcrypt [57]) or none at all (e.g., TLS). Other security features | |||
that have been added to TCP (e.g. ISN generation, sequence number | that have been added to TCP (e.g., ISN generation, sequence number | |||
checks, and others) are only capable of partially hindering attacks. | checks, and others) are only capable of partially hindering attacks. | |||
Applications using long-lived TCP flows have been vulnerable to | Applications using long-lived TCP flows have been vulnerable to | |||
attacks that exploit the processing of control flags described in | attacks that exploit the processing of control flags described in | |||
earlier TCP specifications [34]. TCP-MD5 was a commonly implemented | earlier TCP specifications [33]. TCP-MD5 was a commonly implemented | |||
TCP option to support authentication for some of these connections, | TCP Option to support authentication for some of these connections, | |||
but had flaws and is now deprecated. TCP-AO provides a capability to | but had flaws and is now deprecated. TCP-AO provides a capability to | |||
protect long-lived TCP connections from attacks, and has superior | protect long-lived TCP connections from attacks and has superior | |||
properties to TCP-MD5. It does not provide any privacy for | properties to TCP-MD5. It does not provide any privacy for | |||
application data, nor for the TCP headers. | application data or for the TCP headers. | |||
The "tcpcrypt" [57] Experimental extension to TCP provides the | The "tcpcrypt" [57] experimental extension to TCP provides the | |||
ability to cryptographically protect connection data. Metadata | ability to cryptographically protect connection data. Metadata | |||
aspects of the TCP flow are still visible, but the application stream | aspects of the TCP flow are still visible, but the application stream | |||
is well-protected. Within the TCP header, only the urgent pointer | is well protected. Within the TCP header, only the urgent pointer | |||
and FIN flag are protected through tcpcrypt. | and FIN flag are protected through tcpcrypt. | |||
The TCP Roadmap [50] includes notes about several RFCs related to TCP | The TCP Roadmap [49] includes notes about several RFCs related to TCP | |||
security. Many of the enhancements provided by these RFCs have been | security. Many of the enhancements provided by these RFCs have been | |||
integrated into the present document, including ISN generation, | integrated into the present document, including ISN generation, | |||
mitigating blind in-window attacks, and improving handling of soft | mitigating blind in-window attacks, and improving handling of soft | |||
errors and ICMP packets. These are all discussed in greater detail | errors and ICMP packets. These are all discussed in greater detail | |||
in the referenced RFCs that originally described the changes needed | in the referenced RFCs that originally described the changes needed | |||
to earlier TCP specifications. Additionally, see RFC 6093 [40] for | to earlier TCP specifications. Additionally, see RFC 6093 [39] for | |||
discussion of security considerations related to the urgent pointer | discussion of security considerations related to the urgent pointer | |||
field, that has been deprecated. | field, which also discourages new applications from using the urgent | |||
pointer. | ||||
Since TCP is often used for bulk transfer flows, some attacks are | Since TCP is often used for bulk transfer flows, some attacks are | |||
possible that abuse the TCP congestion control logic. An example is | possible that abuse the TCP congestion control logic. An example is | |||
"ACK-division" attacks. Updates that have been made to the TCP | "ACK-division" attacks. Updates that have been made to the TCP | |||
congestion control specifications include mechanisms like Appropriate | congestion control specifications include mechanisms like Appropriate | |||
Byte Counting (ABC) [30] that act as mitigations to these attacks. | Byte Counting (ABC) [29] that act as mitigations to these attacks. | |||
Other attacks are focused on exhausting the resources of a TCP | Other attacks are focused on exhausting the resources of a TCP | |||
server. Examples include SYN flooding [33] or wasting resources on | server. Examples include SYN flooding [32] or wasting resources on | |||
non-progressing connections [42]. Operating systems commonly | non-progressing connections [41]. Operating systems commonly | |||
implement mitigations for these attacks. Some common defenses also | implement mitigations for these attacks. Some common defenses also | |||
utilize proxies, stateful firewalls, and other technologies outside | utilize proxies, stateful firewalls, and other technologies outside | |||
the end-host TCP implementation. | the end-host TCP implementation. | |||
The concept of a protocol's "wire image" is described in RFC 8546 | The concept of a protocol's "wire image" is described in RFC 8546 | |||
[56], which describes how TCP's cleartext headers expose more | [56], which describes how TCP's cleartext headers expose more | |||
metadata to nodes on the path than is strictly required to route the | metadata to nodes on the path than is strictly required to route the | |||
packets to their destination. On-path adversaries may be able to | packets to their destination. On-path adversaries may be able to | |||
leverage this metadata. Lessons learned in this respect from TCP | leverage this metadata. Lessons learned in this respect from TCP | |||
have been applied in the design of newer transports like QUIC [60]. | have been applied in the design of newer transports like QUIC [60]. | |||
Additionally, based partly on experiences with TCP and its | Additionally, based partly on experiences with TCP and its | |||
extensions, there are considerations that might be applicable for | extensions, there are considerations that might be applicable for | |||
future TCP extensions and other transports that the IETF has | future TCP extensions and other transports that the IETF has | |||
documented in RFC 9065 [61], along with IAB recommendations in RFC | documented in RFC 9065 [61], along with IAB recommendations in RFC | |||
8558 [58] and [68]. | 8558 [58] and [67]. | |||
There are also methods of "fingerprinting" that can be used to infer | There are also methods of "fingerprinting" that can be used to infer | |||
the host TCP implementation (operating system) version or platform | the host TCP implementation (operating system) version or platform | |||
information. These collect observations of several aspects such as | information. These collect observations of several aspects, such as | |||
the options present in segments, the ordering of options, the | the options present in segments, the ordering of options, the | |||
specific behaviors in the case of various conditions, packet timing, | specific behaviors in the case of various conditions, packet timing, | |||
packet sizing, and other aspects of the protocol that are left to be | packet sizing, and other aspects of the protocol that are left to be | |||
determined by an implementer, and can use those observations to | determined by an implementer, and can use those observations to | |||
identify information about the host and implementation. | identify information about the host and implementation. | |||
8. Acknowledgements | Since ICMP message processing also can interact with TCP connections, | |||
there is potential for ICMP-based attacks against TCP connections. | ||||
This document is largely a revision of RFC 793, which Jon Postel was | These are discussed in RFC 5927 [100], along with mitigations that | |||
the editor of. Due to his excellent work, it was able to last for | have been implemented. | |||
three decades before we felt the need to revise it. | ||||
Andre Oppermann was a contributor and helped to edit the first | ||||
revision of this document. | ||||
We are thankful for the assistance of the IETF TCPM working group | ||||
chairs, over the course of work on this document: | ||||
Michael Scharf | ||||
Yoshifumi Nishida | ||||
Pasi Sarolahti | ||||
Michael Tuexen | ||||
During the discussions of this work on the TCPM mailing list, in | ||||
working group meetings, and via area reviews, helpful comments, | ||||
critiques, and reviews were received from (listed alphabetically by | ||||
last name): Praveen Balasubramanian, David Borman, Mohamed Boucadair, | ||||
Bob Briscoe, Neal Cardwell, Yuchung Cheng, Martin Duke, Francis | ||||
Dupont, Ted Faber, Gorry Fairhurst, Fernando Gont, Rodney Grimes, Yi | ||||
Huang, Rahul Jadhav, Markku Kojo, Mike Kosek, Juhamatti Kuusisaari, | ||||
Kevin Lahey, Kevin Mason, Matt Mathis, Stephen McQuistin, Jonathan | ||||
Morton, Matt Olson, Tommy Pauly, Tom Petch, Hagen Paul Pfeifer, Kyle | ||||
Rose, Anthony Sabatini, Michael Scharf, Greg Skinner, Joe Touch, | ||||
Michael Tuexen, Reji Varghese, Bernie Volz, Tim Wicinski, Lloyd Wood, | ||||
and Alex Zimmermann. | ||||
Joe Touch provided additional help in clarifying the description of | ||||
segment size parameters and PMTUD/PLPMTUD recommendations. Markku | ||||
Kojo helped put together the text in the section on TCP Congestion | ||||
Control. | ||||
This document includes content from errata that were reported by | ||||
(listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, | ||||
Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta | ||||
Yevstifeyev, EungJun Yi, Botong Huang, Charles Deng, Merlin Buge. | ||||
9. References | 8. References | |||
9.1. Normative References | 8.1. Normative References | |||
[1] Postel, J., "Internet Protocol", STD 5, RFC 791, | [1] Postel, J., "Internet Protocol", STD 5, RFC 791, | |||
DOI 10.17487/RFC0791, September 1981, | DOI 10.17487/RFC0791, September 1981, | |||
<https://www.rfc-editor.org/info/rfc791>. | <https://www.rfc-editor.org/info/rfc791>. | |||
[2] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | [2] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | |||
DOI 10.17487/RFC1191, November 1990, | DOI 10.17487/RFC1191, November 1990, | |||
<https://www.rfc-editor.org/info/rfc1191>. | <https://www.rfc-editor.org/info/rfc1191>. | |||
[3] Bradner, S., "Key words for use in RFCs to Indicate | [3] Bradner, S., "Key words for use in RFCs to Indicate | |||
skipping to change at page 102, line 9 ¶ | skipping to change at line 4501 ¶ | |||
[14] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., | [14] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., | |||
"Path MTU Discovery for IP version 6", STD 87, RFC 8201, | "Path MTU Discovery for IP version 6", STD 87, RFC 8201, | |||
DOI 10.17487/RFC8201, July 2017, | DOI 10.17487/RFC8201, July 2017, | |||
<https://www.rfc-editor.org/info/rfc8201>. | <https://www.rfc-editor.org/info/rfc8201>. | |||
[15] Allman, M., "Requirements for Time-Based Loss Detection", | [15] Allman, M., "Requirements for Time-Based Loss Detection", | |||
BCP 233, RFC 8961, DOI 10.17487/RFC8961, November 2020, | BCP 233, RFC 8961, DOI 10.17487/RFC8961, November 2020, | |||
<https://www.rfc-editor.org/info/rfc8961>. | <https://www.rfc-editor.org/info/rfc8961>. | |||
9.2. Informative References | 8.2. Informative References | |||
[16] Postel, J., "Transmission Control Protocol", STD 7, | [16] Postel, J., "Transmission Control Protocol", STD 7, | |||
RFC 793, DOI 10.17487/RFC0793, September 1981, | RFC 793, DOI 10.17487/RFC0793, September 1981, | |||
<https://www.rfc-editor.org/info/rfc793>. | <https://www.rfc-editor.org/info/rfc793>. | |||
[17] Postel, J., "The TCP Maximum Segment Size and Related | [17] Nagle, J., "Congestion Control in IP/TCP Internetworks", | |||
Topics", RFC 879, DOI 10.17487/RFC0879, November 1983, | ||||
<https://www.rfc-editor.org/info/rfc879>. | ||||
[18] Nagle, J., "Congestion Control in IP/TCP Internetworks", | ||||
RFC 896, DOI 10.17487/RFC0896, January 1984, | RFC 896, DOI 10.17487/RFC0896, January 1984, | |||
<https://www.rfc-editor.org/info/rfc896>. | <https://www.rfc-editor.org/info/rfc896>. | |||
[19] Reynolds, J. and J. Postel, "Official Internet protocols", | [18] Reynolds, J. and J. Postel, "Official Internet protocols", | |||
RFC 1011, DOI 10.17487/RFC1011, May 1987, | RFC 1011, DOI 10.17487/RFC1011, May 1987, | |||
<https://www.rfc-editor.org/info/rfc1011>. | <https://www.rfc-editor.org/info/rfc1011>. | |||
[20] Braden, R., Ed., "Requirements for Internet Hosts - | [19] Braden, R., Ed., "Requirements for Internet Hosts - | |||
Communication Layers", STD 3, RFC 1122, | Communication Layers", STD 3, RFC 1122, | |||
DOI 10.17487/RFC1122, October 1989, | DOI 10.17487/RFC1122, October 1989, | |||
<https://www.rfc-editor.org/info/rfc1122>. | <https://www.rfc-editor.org/info/rfc1122>. | |||
[21] Almquist, P., "Type of Service in the Internet Protocol | [20] Almquist, P., "Type of Service in the Internet Protocol | |||
Suite", RFC 1349, DOI 10.17487/RFC1349, July 1992, | Suite", RFC 1349, DOI 10.17487/RFC1349, July 1992, | |||
<https://www.rfc-editor.org/info/rfc1349>. | <https://www.rfc-editor.org/info/rfc1349>. | |||
[22] Braden, R., "T/TCP -- TCP Extensions for Transactions | [21] Braden, R., "T/TCP -- TCP Extensions for Transactions | |||
Functional Specification", RFC 1644, DOI 10.17487/RFC1644, | Functional Specification", RFC 1644, DOI 10.17487/RFC1644, | |||
July 1994, <https://www.rfc-editor.org/info/rfc1644>. | July 1994, <https://www.rfc-editor.org/info/rfc1644>. | |||
[23] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [22] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
<https://www.rfc-editor.org/info/rfc2018>. | <https://www.rfc-editor.org/info/rfc2018>. | |||
[24] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, | [23] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, | |||
J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known | J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known | |||
TCP Implementation Problems", RFC 2525, | TCP Implementation Problems", RFC 2525, | |||
DOI 10.17487/RFC2525, March 1999, | DOI 10.17487/RFC2525, March 1999, | |||
<https://www.rfc-editor.org/info/rfc2525>. | <https://www.rfc-editor.org/info/rfc2525>. | |||
[25] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", | [24] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", | |||
RFC 2675, DOI 10.17487/RFC2675, August 1999, | RFC 2675, DOI 10.17487/RFC2675, August 1999, | |||
<https://www.rfc-editor.org/info/rfc2675>. | <https://www.rfc-editor.org/info/rfc2675>. | |||
[26] Xiao, X., Hannan, A., Paxson, V., and E. Crabbe, "TCP | [25] Xiao, X., Hannan, A., Paxson, V., and E. Crabbe, "TCP | |||
Processing of the IPv4 Precedence Field", RFC 2873, | Processing of the IPv4 Precedence Field", RFC 2873, | |||
DOI 10.17487/RFC2873, June 2000, | DOI 10.17487/RFC2873, June 2000, | |||
<https://www.rfc-editor.org/info/rfc2873>. | <https://www.rfc-editor.org/info/rfc2873>. | |||
[27] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An | [26] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An | |||
Extension to the Selective Acknowledgement (SACK) Option | Extension to the Selective Acknowledgement (SACK) Option | |||
for TCP", RFC 2883, DOI 10.17487/RFC2883, July 2000, | for TCP", RFC 2883, DOI 10.17487/RFC2883, July 2000, | |||
<https://www.rfc-editor.org/info/rfc2883>. | <https://www.rfc-editor.org/info/rfc2883>. | |||
[28] Lahey, K., "TCP Problems with Path MTU Discovery", | [27] Lahey, K., "TCP Problems with Path MTU Discovery", | |||
RFC 2923, DOI 10.17487/RFC2923, September 2000, | RFC 2923, DOI 10.17487/RFC2923, September 2000, | |||
<https://www.rfc-editor.org/info/rfc2923>. | <https://www.rfc-editor.org/info/rfc2923>. | |||
[29] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | [28] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | |||
Sooriyabandara, "TCP Performance Implications of Network | Sooriyabandara, "TCP Performance Implications of Network | |||
Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | |||
December 2002, <https://www.rfc-editor.org/info/rfc3449>. | December 2002, <https://www.rfc-editor.org/info/rfc3449>. | |||
[30] Allman, M., "TCP Congestion Control with Appropriate Byte | [29] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
[31] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | [30] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | |||
ICMPv6, UDP, and TCP Headers", RFC 4727, | ICMPv6, UDP, and TCP Headers", RFC 4727, | |||
DOI 10.17487/RFC4727, November 2006, | DOI 10.17487/RFC4727, November 2006, | |||
<https://www.rfc-editor.org/info/rfc4727>. | <https://www.rfc-editor.org/info/rfc4727>. | |||
[32] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | [31] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | |||
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | |||
<https://www.rfc-editor.org/info/rfc4821>. | <https://www.rfc-editor.org/info/rfc4821>. | |||
[33] Eddy, W., "TCP SYN Flooding Attacks and Common | [32] Eddy, W., "TCP SYN Flooding Attacks and Common | |||
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | |||
<https://www.rfc-editor.org/info/rfc4987>. | <https://www.rfc-editor.org/info/rfc4987>. | |||
[34] Touch, J., "Defending TCP Against Spoofing Attacks", | [33] Touch, J., "Defending TCP Against Spoofing Attacks", | |||
RFC 4953, DOI 10.17487/RFC4953, July 2007, | RFC 4953, DOI 10.17487/RFC4953, July 2007, | |||
<https://www.rfc-editor.org/info/rfc4953>. | <https://www.rfc-editor.org/info/rfc4953>. | |||
[35] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. | [34] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. | |||
Carrier, "Marker PDU Aligned Framing for TCP | Carrier, "Marker PDU Aligned Framing for TCP | |||
Specification", RFC 5044, DOI 10.17487/RFC5044, October | Specification", RFC 5044, DOI 10.17487/RFC5044, October | |||
2007, <https://www.rfc-editor.org/info/rfc5044>. | 2007, <https://www.rfc-editor.org/info/rfc5044>. | |||
[36] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, | [35] Gont, F., "TCP's Reaction to Soft Errors", RFC 5461, | |||
DOI 10.17487/RFC5461, February 2009, | DOI 10.17487/RFC5461, February 2009, | |||
<https://www.rfc-editor.org/info/rfc5461>. | <https://www.rfc-editor.org/info/rfc5461>. | |||
[37] StJohns, M., Atkinson, R., and G. Thomas, "Common | [36] StJohns, M., Atkinson, R., and G. Thomas, "Common | |||
Architecture Label IPv6 Security Option (CALIPSO)", | Architecture Label IPv6 Security Option (CALIPSO)", | |||
RFC 5570, DOI 10.17487/RFC5570, July 2009, | RFC 5570, DOI 10.17487/RFC5570, July 2009, | |||
<https://www.rfc-editor.org/info/rfc5570>. | <https://www.rfc-editor.org/info/rfc5570>. | |||
[38] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust | [37] Sandlund, K., Pelletier, G., and L-E. Jonsson, "The RObust | |||
Header Compression (ROHC) Framework", RFC 5795, | Header Compression (ROHC) Framework", RFC 5795, | |||
DOI 10.17487/RFC5795, March 2010, | DOI 10.17487/RFC5795, March 2010, | |||
<https://www.rfc-editor.org/info/rfc5795>. | <https://www.rfc-editor.org/info/rfc5795>. | |||
[39] Touch, J., Mankin, A., and R. Bonica, "The TCP | [38] Touch, J., Mankin, A., and R. Bonica, "The TCP | |||
Authentication Option", RFC 5925, DOI 10.17487/RFC5925, | Authentication Option", RFC 5925, DOI 10.17487/RFC5925, | |||
June 2010, <https://www.rfc-editor.org/info/rfc5925>. | June 2010, <https://www.rfc-editor.org/info/rfc5925>. | |||
[40] Gont, F. and A. Yourtchenko, "On the Implementation of the | [39] Gont, F. and A. Yourtchenko, "On the Implementation of the | |||
TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, | TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, | |||
January 2011, <https://www.rfc-editor.org/info/rfc6093>. | January 2011, <https://www.rfc-editor.org/info/rfc6093>. | |||
[41] Gont, F., "Reducing the TIME-WAIT State Using TCP | [40] Gont, F., "Reducing the TIME-WAIT State Using TCP | |||
Timestamps", BCP 159, RFC 6191, DOI 10.17487/RFC6191, | Timestamps", BCP 159, RFC 6191, DOI 10.17487/RFC6191, | |||
April 2011, <https://www.rfc-editor.org/info/rfc6191>. | April 2011, <https://www.rfc-editor.org/info/rfc6191>. | |||
[42] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender | [41] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender | |||
Clarification for Persist Condition", RFC 6429, | Clarification for Persist Condition", RFC 6429, | |||
DOI 10.17487/RFC6429, December 2011, | DOI 10.17487/RFC6429, December 2011, | |||
<https://www.rfc-editor.org/info/rfc6429>. | <https://www.rfc-editor.org/info/rfc6429>. | |||
[43] Gont, F. and S. Bellovin, "Defending against Sequence | [42] Gont, F. and S. Bellovin, "Defending against Sequence | |||
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February | Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February | |||
2012, <https://www.rfc-editor.org/info/rfc6528>. | 2012, <https://www.rfc-editor.org/info/rfc6528>. | |||
[44] Borman, D., "TCP Options and Maximum Segment Size (MSS)", | [43] Borman, D., "TCP Options and Maximum Segment Size (MSS)", | |||
RFC 6691, DOI 10.17487/RFC6691, July 2012, | RFC 6691, DOI 10.17487/RFC6691, July 2012, | |||
<https://www.rfc-editor.org/info/rfc6691>. | <https://www.rfc-editor.org/info/rfc6691>. | |||
[45] Touch, J., "Updated Specification of the IPv4 ID Field", | [44] Touch, J., "Updated Specification of the IPv4 ID Field", | |||
RFC 6864, DOI 10.17487/RFC6864, February 2013, | RFC 6864, DOI 10.17487/RFC6864, February 2013, | |||
<https://www.rfc-editor.org/info/rfc6864>. | <https://www.rfc-editor.org/info/rfc6864>. | |||
[46] Touch, J., "Shared Use of Experimental TCP Options", | [45] Touch, J., "Shared Use of Experimental TCP Options", | |||
RFC 6994, DOI 10.17487/RFC6994, August 2013, | RFC 6994, DOI 10.17487/RFC6994, August 2013, | |||
<https://www.rfc-editor.org/info/rfc6994>. | <https://www.rfc-editor.org/info/rfc6994>. | |||
[47] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, | [46] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, | |||
"Architectural Considerations of IP Anycast", RFC 7094, | "Architectural Considerations of IP Anycast", RFC 7094, | |||
DOI 10.17487/RFC7094, January 2014, | DOI 10.17487/RFC7094, January 2014, | |||
<https://www.rfc-editor.org/info/rfc7094>. | <https://www.rfc-editor.org/info/rfc7094>. | |||
[48] Borman, D., Braden, B., Jacobson, V., and R. | [47] Borman, D., Braden, B., Jacobson, V., and R. | |||
Scheffenegger, Ed., "TCP Extensions for High Performance", | Scheffenegger, Ed., "TCP Extensions for High Performance", | |||
RFC 7323, DOI 10.17487/RFC7323, September 2014, | RFC 7323, DOI 10.17487/RFC7323, September 2014, | |||
<https://www.rfc-editor.org/info/rfc7323>. | <https://www.rfc-editor.org/info/rfc7323>. | |||
[49] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | [48] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | |||
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | |||
<https://www.rfc-editor.org/info/rfc7413>. | <https://www.rfc-editor.org/info/rfc7413>. | |||
[50] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. | [49] Duke, M., Braden, R., Eddy, W., Blanton, E., and A. | |||
Zimmermann, "A Roadmap for Transmission Control Protocol | Zimmermann, "A Roadmap for Transmission Control Protocol | |||
(TCP) Specification Documents", RFC 7414, | (TCP) Specification Documents", RFC 7414, | |||
DOI 10.17487/RFC7414, February 2015, | DOI 10.17487/RFC7414, February 2015, | |||
<https://www.rfc-editor.org/info/rfc7414>. | <https://www.rfc-editor.org/info/rfc7414>. | |||
[51] Black, D., Ed. and P. Jones, "Differentiated Services | [50] Black, D., Ed. and P. Jones, "Differentiated Services | |||
(Diffserv) and Real-Time Communication", RFC 7657, | (Diffserv) and Real-Time Communication", RFC 7657, | |||
DOI 10.17487/RFC7657, November 2015, | DOI 10.17487/RFC7657, November 2015, | |||
<https://www.rfc-editor.org/info/rfc7657>. | <https://www.rfc-editor.org/info/rfc7657>. | |||
[52] Fairhurst, G. and M. Welzl, "The Benefits of Using | [51] Fairhurst, G. and M. Welzl, "The Benefits of Using | |||
Explicit Congestion Notification (ECN)", RFC 8087, | Explicit Congestion Notification (ECN)", RFC 8087, | |||
DOI 10.17487/RFC8087, March 2017, | DOI 10.17487/RFC8087, March 2017, | |||
<https://www.rfc-editor.org/info/rfc8087>. | <https://www.rfc-editor.org/info/rfc8087>. | |||
[53] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind, | [52] Fairhurst, G., Ed., Trammell, B., Ed., and M. Kuehlewind, | |||
Ed., "Services Provided by IETF Transport Protocols and | Ed., "Services Provided by IETF Transport Protocols and | |||
Congestion Control Mechanisms", RFC 8095, | Congestion Control Mechanisms", RFC 8095, | |||
DOI 10.17487/RFC8095, March 2017, | DOI 10.17487/RFC8095, March 2017, | |||
<https://www.rfc-editor.org/info/rfc8095>. | <https://www.rfc-editor.org/info/rfc8095>. | |||
[54] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | [53] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of | |||
Transport Features Provided by IETF Transport Protocols", | Transport Features Provided by IETF Transport Protocols", | |||
RFC 8303, DOI 10.17487/RFC8303, February 2018, | RFC 8303, DOI 10.17487/RFC8303, February 2018, | |||
<https://www.rfc-editor.org/info/rfc8303>. | <https://www.rfc-editor.org/info/rfc8303>. | |||
[54] Black, D., "Relaxing Restrictions on Explicit Congestion | ||||
Notification (ECN) Experimentation", RFC 8311, | ||||
DOI 10.17487/RFC8311, January 2018, | ||||
<https://www.rfc-editor.org/info/rfc8311>. | ||||
[55] Chown, T., Loughney, J., and T. Winters, "IPv6 Node | [55] Chown, T., Loughney, J., and T. Winters, "IPv6 Node | |||
Requirements", BCP 220, RFC 8504, DOI 10.17487/RFC8504, | Requirements", BCP 220, RFC 8504, DOI 10.17487/RFC8504, | |||
January 2019, <https://www.rfc-editor.org/info/rfc8504>. | January 2019, <https://www.rfc-editor.org/info/rfc8504>. | |||
[56] Trammell, B. and M. Kuehlewind, "The Wire Image of a | [56] Trammell, B. and M. Kuehlewind, "The Wire Image of a | |||
Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April | Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April | |||
2019, <https://www.rfc-editor.org/info/rfc8546>. | 2019, <https://www.rfc-editor.org/info/rfc8546>. | |||
[57] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack, | [57] Bittau, A., Giffin, D., Handley, M., Mazieres, D., Slack, | |||
Q., and E. Smith, "Cryptographic Protection of TCP Streams | Q., and E. Smith, "Cryptographic Protection of TCP Streams | |||
skipping to change at page 106, line 30 ¶ | skipping to change at line 4714 ¶ | |||
Multiplexed and Secure Transport", RFC 9000, | Multiplexed and Secure Transport", RFC 9000, | |||
DOI 10.17487/RFC9000, May 2021, | DOI 10.17487/RFC9000, May 2021, | |||
<https://www.rfc-editor.org/info/rfc9000>. | <https://www.rfc-editor.org/info/rfc9000>. | |||
[61] Fairhurst, G. and C. Perkins, "Considerations around | [61] Fairhurst, G. and C. Perkins, "Considerations around | |||
Transport Header Confidentiality, Network Operations, and | Transport Header Confidentiality, Network Operations, and | |||
the Evolution of Internet Transport Protocols", RFC 9065, | the Evolution of Internet Transport Protocols", RFC 9065, | |||
DOI 10.17487/RFC9065, July 2021, | DOI 10.17487/RFC9065, July 2021, | |||
<https://www.rfc-editor.org/info/rfc9065>. | <https://www.rfc-editor.org/info/rfc9065>. | |||
[62] IANA, "Transmission Control Protocol (TCP) Parameters, | [62] IANA, "Transmission Control Protocol (TCP) Parameters", | |||
https://www.iana.org/assignments/tcp-parameters/tcp- | <https://www.iana.org/assignments/tcp-parameters/>. | |||
parameters.xhtml", 2019. | ||||
[63] IANA, "Transmission Control Protocol (TCP) Header Flags, | ||||
https://www.iana.org/assignments/tcp-header-flags/tcp- | ||||
header-flags.xhtml", 2019. | ||||
[64] Gont, F., "Processing of IP Security/Compartment and | [63] Gont, F., "Processing of IP Security/Compartment and | |||
Precedence Information by TCP", Work in Progress, | Precedence Information by TCP", Work in Progress, | |||
Internet-Draft, draft-gont-tcpm-tcp-seccomp-prec-00, 29 | Internet-Draft, draft-gont-tcpm-tcp-seccomp-prec-00, 29 | |||
March 2012, <http://www.ietf.org/internet-drafts/draft- | March 2012, <https://datatracker.ietf.org/doc/html/draft- | |||
gont-tcpm-tcp-seccomp-prec-00.txt>. | gont-tcpm-tcp-seccomp-prec-00>. | |||
[65] Gont, F. and D. Borman, "On the Validation of TCP Sequence | [64] Gont, F. and D. Borman, "On the Validation of TCP Sequence | |||
Numbers", Work in Progress, Internet-Draft, draft-gont- | Numbers", Work in Progress, Internet-Draft, draft-gont- | |||
tcpm-tcp-seq-validation-04, 11 March 2019, | tcpm-tcp-seq-validation-04, 11 March 2019, | |||
<http://www.ietf.org/internet-drafts/draft-gont-tcpm-tcp- | <https://datatracker.ietf.org/doc/html/draft-gont-tcpm- | |||
seq-validation-04.txt>. | tcp-seq-validation-04>. | |||
[66] Touch, J. and W. Eddy, "TCP Extended Data Offset Option", | [65] Touch, J. and W. M. Eddy, "TCP Extended Data Offset | |||
Work in Progress, Internet-Draft, draft-ietf-tcpm-tcp-edo- | Option", Work in Progress, Internet-Draft, draft-ietf- | |||
10, 19 July 2018, <http://www.ietf.org/internet-drafts/ | tcpm-tcp-edo-12, 15 April 2022, | |||
draft-ietf-tcpm-tcp-edo-10.txt>. | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
tcp-edo-12>. | ||||
[67] McQuistin, S., Band, V., Jacob, D., and C. Perkins, | [66] McQuistin, S., Band, V., Jacob, D., and C. Perkins, | |||
"Describing Protocol Data Units with Augmented Packet | "Describing Protocol Data Units with Augmented Packet | |||
Header Diagrams", Work in Progress, Internet-Draft, draft- | Header Diagrams", Work in Progress, Internet-Draft, draft- | |||
mcquistin-augmented-ascii-diagrams-08, 5 May 2021, | mcquistin-augmented-ascii-diagrams-10, 7 March 2022, | |||
<https://www.ietf.org/archive/id/draft-mcquistin- | <https://datatracker.ietf.org/doc/html/draft-mcquistin- | |||
augmented-ascii-diagrams-08.txt>. | augmented-ascii-diagrams-10>. | |||
[68] Thomson, M. and T. Pauly, "Long-term Viability of Protocol | [67] Thomson, M. and T. Pauly, "Long-Term Viability of Protocol | |||
Extension Mechanisms", Work in Progress, Internet-Draft, | Extension Mechanisms", RFC 9170, DOI 10.17487/RFC9170, | |||
draft-iab-use-it-or-lose-it-02, 23 August 2021, | December 2021, <https://www.rfc-editor.org/info/rfc9170>. | |||
<https://www.ietf.org/archive/id/draft-iab-use-it-or-lose- | ||||
it-02.txt>. | ||||
[69] Minshall, G., "A Proposed Modification to Nagle's | [68] Minshall, G., "A Suggested Modification to Nagle's | |||
Algorithm", Work in Progress, Internet-Draft, draft- | Algorithm", Work in Progress, Internet-Draft, draft- | |||
minshall-nagle-01, June 1999, | minshall-nagle-01, 18 June 1999, | |||
<https://datatracker.ietf.org/doc/html/draft-minshall- | <https://datatracker.ietf.org/doc/html/draft-minshall- | |||
nagle-01>. | nagle-01>. | |||
[70] Dalal, Y. and C. Sunshine, "Connection Management in | [69] Dalal, Y. and C. Sunshine, "Connection Management in | |||
Transport Protocols", Computer Networks Vol. 2, No. 6, pp. | Transport Protocols", Computer Networks, Vol. 2, No. 6, | |||
454-473, December 1978. | pp. 454-473, DOI 10.1016/0376-5075(78)90053-3, December | |||
1978, <https://doi.org/10.1016/0376-5075(78)90053-3>. | ||||
[71] Faber, T., Touch, J., and W. Yui, "The TIME-WAIT state in | [70] Faber, T., Touch, J., and W. Yui, "The TIME-WAIT state in | |||
TCP and Its Effect on Busy Servers", Proceedings of IEEE | TCP and Its Effect on Busy Servers", Proceedings of IEEE | |||
INFOCOM pp. 1573-1583, March 1999. | INFOCOM, pp. 1573-1583, DOI 10.1109/INFCOM.1999.752180, | |||
March 1999, <https://doi.org/10.1109/INFCOM.1999.752180>. | ||||
[72] Postel, J., "Comments on Action Items from the January | [71] Postel, J., "Comments on Action Items from the January | |||
Meeting", IEN 177, March 1981, | Meeting", IEN 177, March 1981, | |||
<https://www.rfc-editor.org/ien/ien177.txt>. | <https://www.rfc-editor.org/ien/ien177.txt>. | |||
[73] "Segmentation Offloads", Linux Networking Documentation , | [72] "Segmentation Offloads", The Linux Kernel Documentation, | |||
<https://www.kernel.org/doc/html/latest/networking/ | <https://www.kernel.org/doc/html/latest/networking/ | |||
segmentation-offloads.html>. | segmentation-offloads.html>. | |||
[73] RFC Errata, Erratum ID 573, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid573>. | ||||
[74] RFC Errata, Erratum ID 574, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid574>. | ||||
[75] RFC Errata, Erratum ID 700, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid700>. | ||||
[76] RFC Errata, Erratum ID 701, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid701>. | ||||
[77] RFC Errata, Erratum ID 1283, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1283>. | ||||
[78] RFC Errata, Erratum ID 1561, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1561>. | ||||
[79] RFC Errata, Erratum ID 1562, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1562>. | ||||
[80] RFC Errata, Erratum ID 1564, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1564>. | ||||
[81] RFC Errata, Erratum ID 1571, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1571>. | ||||
[82] RFC Errata, Erratum ID 1572, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1572>. | ||||
[83] RFC Errata, Erratum ID 2297, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2297>. | ||||
[84] RFC Errata, Erratum ID 2298, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2298>. | ||||
[85] RFC Errata, Erratum ID 2748, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2748>. | ||||
[86] RFC Errata, Erratum ID 2749, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2749>. | ||||
[87] RFC Errata, Erratum ID 2934, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2934>. | ||||
[88] RFC Errata, Erratum ID 3213, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid3213>. | ||||
[89] RFC Errata, Erratum ID 3300, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid3300>. | ||||
[90] RFC Errata, Erratum ID 3301, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid3301>. | ||||
[91] RFC Errata, Erratum ID 6222, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid6222>. | ||||
[92] RFC Errata, Erratum ID 572, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid572>. | ||||
[93] RFC Errata, Erratum ID 575, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid575>. | ||||
[94] RFC Errata, Erratum ID 1565, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1565>. | ||||
[95] RFC Errata, Erratum ID 1569, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid1569>. | ||||
[96] RFC Errata, Erratum ID 2296, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid2296>. | ||||
[97] RFC Errata, Erratum ID 3305, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid3305>. | ||||
[98] RFC Errata, Erratum ID 3602, RFC 793, | ||||
<https://www.rfc-editor.org/errata/eid3602>. | ||||
[99] RFC Errata, Erratum ID 4772, RFC 5961, | ||||
<https://www.rfc-editor.org/errata/eid4772>. | ||||
[100] Gont, F., "ICMP Attacks against TCP", RFC 5927, | ||||
DOI 10.17487/RFC5927, July 2010, | ||||
<https://www.rfc-editor.org/info/rfc5927>. | ||||
Appendix A. Other Implementation Notes | Appendix A. Other Implementation Notes | |||
This section includes additional notes and references on TCP | This section includes additional notes and references on TCP | |||
implementation decisions that are currently not a part of the RFC | implementation decisions that are currently not a part of the RFC | |||
series or included within the TCP standard. These items can be | series or included within the TCP standard. These items can be | |||
considered by implementers, but there was not yet a consensus to | considered by implementers, but there was not yet a consensus to | |||
include them in the standard. | include them in the standard. | |||
A.1. IP Security Compartment and Precedence | A.1. IP Security Compartment and Precedence | |||
The IPv4 specification [1] includes a precedence value in the (now | The IPv4 specification [1] includes a precedence value in the (now | |||
obsoleted) Type of Service field (TOS) field. It was modified in | obsoleted) Type of Service (TOS) field. It was modified in [20] and | |||
[21], and then obsoleted by the definition of Differentiated Services | then obsoleted by the definition of Differentiated Services | |||
(DiffServ) [4]. Setting and conveying TOS between the network layer, | (Diffserv) [4]. Setting and conveying TOS between the network layer, | |||
TCP implementation, and applications is obsolete, and replaced by | TCP implementation, and applications is obsolete and is replaced by | |||
DiffServ in the current TCP specification. | Diffserv in the current TCP specification. | |||
RFC 793 required checking the IP security compartment and precedence | RFC 793 required checking the IP security compartment and precedence | |||
on incoming TCP segments for consistency within a connection, and | on incoming TCP segments for consistency within a connection and with | |||
with application requests. Each of these aspects of IP have become | application requests. Each of these aspects of IP have become | |||
outdated, without specific updates to RFC 793. The issues with | outdated, without specific updates to RFC 793. The issues with | |||
precedence were fixed by [26], which is Standards Track, and so this | precedence were fixed by [25], which is Standards Track, and so this | |||
present TCP specification includes those changes. However, the state | present TCP specification includes those changes. However, the state | |||
of IP security options that may be used by MLS systems is not as | of IP security options that may be used by Multi-Level Secure (MLS) | |||
apparent in the IETF currently. | systems is not as apparent in the IETF currently. | |||
Resetting connections when incoming packets do not meet expected | Resetting connections when incoming packets do not meet expected | |||
security compartment or precedence expectations has been recognized | security compartment or precedence expectations has been recognized | |||
as a possible attack vector [64], and there has been discussion about | as a possible attack vector [63], and there has been discussion about | |||
amending the TCP specification to prevent connections from being | amending the TCP specification to prevent connections from being | |||
aborted due to non-matching IP security compartment and DiffServ | aborted due to nonmatching IP security compartment and Diffserv | |||
codepoint values. | codepoint values. | |||
A.1.1. Precedence | A.1.1. Precedence | |||
In DiffServ the former precedence values are treated as Class | In Diffserv, the former precedence values are treated as Class | |||
Selector codepoints, and methods for compatible treatment are | Selector codepoints, and methods for compatible treatment are | |||
described in the DiffServ architecture. The RFC 793/1122 TCP | described in the Diffserv architecture. The RFC TCP specification | |||
specification includes logic intending to have connections use the | defined by RFCs 793 and 1122 included logic intending to have | |||
highest precedence requested by either endpoint application, and to | connections use the highest precedence requested by either endpoint | |||
keep the precedence consistent throughout a connection. This logic | application, and to keep the precedence consistent throughout a | |||
from the obsolete TOS is not applicable for DiffServ, and should not | connection. This logic from the obsolete TOS is not applicable to | |||
be included in TCP implementations, though changes to DiffServ values | Diffserv and should not be included in TCP implementations, though | |||
within a connection are discouraged. For discussion of this, see RFC | changes to Diffserv values within a connection are discouraged. For | |||
7657 (sec 5.1, 5.3, and 6) [51]. | discussion of this, see RFC 7657 (Sections 5.1, 5.3, and 6) [50]. | |||
The obsoleted TOS processing rules in TCP assumed bidirectional (or | The obsoleted TOS processing rules in TCP assumed bidirectional (or | |||
symmetric) precedence values used on a connection, but the DiffServ | symmetric) precedence values used on a connection, but the Diffserv | |||
architecture is asymmetric. Problems with the old TCP logic in this | architecture is asymmetric. Problems with the old TCP logic in this | |||
regard were described in [26] and the solution described is to ignore | regard were described in [25], and the solution described is to | |||
IP precedence in TCP. Since RFC 2873 is a Standards Track document | ignore IP precedence in TCP. Since RFC 2873 is a Standards Track | |||
(although not marked as updating RFC 793), current implementations | document (although not marked as updating RFC 793), current | |||
are expected to be robust to these conditions. Note that the | implementations are expected to be robust in these conditions. Note | |||
DiffServ field value used in each direction is a part of the | that the Diffserv field value used in each direction is a part of the | |||
interface between TCP and the network layer, and values in use can be | interface between TCP and the network layer, and values in use can be | |||
indicated both ways between TCP and the application. | indicated both ways between TCP and the application. | |||
A.1.2. MLS Systems | A.1.2. MLS Systems | |||
The IP security option (IPSO) and compartment defined in [1] was | The IP Security Option (IPSO) and compartment defined in [1] was | |||
refined in RFC 1038 that was later obsoleted by RFC 1108. The | refined in RFC 1038, which was later obsoleted by RFC 1108. The | |||
Commercial IP Security Option (CIPSO) is defined in FIPS-188 | Commercial IP Security Option (CIPSO) is defined in FIPS-188 | |||
(withdrawn by NIST in 2015), and is supported by some vendors and | (withdrawn by NIST in 2015) and is supported by some vendors and | |||
operating systems. RFC 1108 is now Historic, though RFC 791 itself | operating systems. RFC 1108 is now Historic, though RFC 791 itself | |||
has not been updated to remove the IP security option. For IPv6, a | has not been updated to remove the IP Security Option. For IPv6, a | |||
similar option (CALIPSO) has been defined [37]. RFC 793 includes | similar option (Common Architecture Label IPv6 Security Option | |||
logic that includes the IP security/compartment information in | (CALIPSO)) has been defined [36]. RFC 793 includes logic that | |||
treatment of TCP segments. References to the IP "security/ | includes the IP security/compartment information in treatment of TCP | |||
compartment" in this document may be relevant for Multi-Level Secure | segments. References to the IP "security/compartment" in this | |||
(MLS) system implementers, but can be ignored for non-MLS | document may be relevant for Multi-Level Secure (MLS) system | |||
implementations, consistent with running code on the Internet. See | implementers but can be ignored for non-MLS implementations, | |||
Appendix A.1 for further discussion. Note that RFC 5570 describes | consistent with running code on the Internet. See Appendix A.1 for | |||
some MLS networking scenarios where IPSO, CIPSO, or CALIPSO may be | further discussion. Note that RFC 5570 describes some MLS networking | |||
used. In these special cases, TCP implementers should see section | scenarios where IPSO, CIPSO, or CALIPSO may be used. In these | |||
7.3.1 of RFC 5570, and follow the guidance in that document. | special cases, TCP implementers should see Section 7.3.1 of RFC 5570 | |||
and follow the guidance in that document. | ||||
A.2. Sequence Number Validation | A.2. Sequence Number Validation | |||
There are cases where the TCP sequence number validation rules can | There are cases where the TCP sequence number validation rules can | |||
prevent ACK fields from being processed. This can result in | prevent ACK fields from being processed. This can result in | |||
connection issues, as described in [65], which includes descriptions | connection issues, as described in [64], which includes descriptions | |||
of potential problems in conditions of simultaneous open, self- | of potential problems in conditions of simultaneous open, self- | |||
connects, simultaneous close, and simultaneous window probes. The | connects, simultaneous close, and simultaneous window probes. The | |||
document also describes potential changes to the TCP specification to | document also describes potential changes to the TCP specification to | |||
mitigate the issue by expanding the acceptable sequence numbers. | mitigate the issue by expanding the acceptable sequence numbers. | |||
In Internet usage of TCP, these conditions are rarely occurring. | In Internet usage of TCP, these conditions rarely occur. Common | |||
Common operating systems include different alternative mitigations, | operating systems include different alternative mitigations, and the | |||
and the standard has not been updated yet to codify one of them, but | standard has not been updated yet to codify one of them, but | |||
implementers should consider the problems described in [65]. | implementers should consider the problems described in [64]. | |||
A.3. Nagle Modification | A.3. Nagle Modification | |||
In common operating systems, both the Nagle algorithm and delayed | In common operating systems, both the Nagle algorithm and delayed | |||
acknowledgements are implemented and enabled by default. TCP is used | acknowledgments are implemented and enabled by default. TCP is used | |||
by many applications that have a request-response style of | by many applications that have a request-response style of | |||
communication, where the combination of the Nagle algorithm and | communication, where the combination of the Nagle algorithm and | |||
delayed acknowledgements can result in poor application performance. | delayed acknowledgments can result in poor application performance. | |||
A modification to the Nagle algorithm is described in [69] that | A modification to the Nagle algorithm is described in [68] that | |||
improves the situation for these applications. | improves the situation for these applications. | |||
This modification is implemented in some common operating systems, | This modification is implemented in some common operating systems and | |||
and does not impact TCP interoperability. Additionally, many | does not impact TCP interoperability. Additionally, many | |||
applications simply disable Nagle, since this is generally supported | applications simply disable Nagle since this is generally supported | |||
by a socket option. The TCP standard has not been updated to include | by a socket option. The TCP standard has not been updated to include | |||
this Nagle modification, but implementers may find it beneficial to | this Nagle modification, but implementers may find it beneficial to | |||
consider. | consider. | |||
A.4. Low Watermark Settings | A.4. Low Watermark Settings | |||
Some operating system kernel TCP implementations include socket | Some operating system kernel TCP implementations include socket | |||
options that allow specifying the number of bytes in the buffer until | options that allow specifying the number of bytes in the buffer until | |||
the socket layer will pass sent data to TCP (SO_SNDLOWAT) or to the | the socket layer will pass sent data to TCP (SO_SNDLOWAT) or to the | |||
application on receiving (SO_RCVLOWAT). | application on receiving (SO_RCVLOWAT). | |||
In addition, another socket option (TCP_NOTSENT_LOWAT) can be used to | In addition, another socket option (TCP_NOTSENT_LOWAT) can be used to | |||
control the amount of unsent bytes in the write queue. This can help | control the amount of unsent bytes in the write queue. This can help | |||
a sending TCP application to avoid creating large amounts of buffered | a sending TCP application to avoid creating large amounts of buffered | |||
data (and corresponding latency). As an example, this may be useful | data (and corresponding latency). As an example, this may be useful | |||
for applications that are multiplexing data from multiple upper level | for applications that are multiplexing data from multiple upper-level | |||
streams onto a connection, especially when streams may be a mix of | streams onto a connection, especially when streams may be a mix of | |||
interactive / real-time and bulk data transfer. | interactive/real-time and bulk data transfer. | |||
Appendix B. TCP Requirement Summary | Appendix B. TCP Requirement Summary | |||
This section is adapted from RFC 1122. | This section is adapted from RFC 1122. | |||
Note that there is no requirement related to PLPMTUD in this list, | Note that there is no requirement related to PLPMTUD in this list, | |||
but that PLPMTUD is recommended. | but that PLPMTUD is recommended. | |||
| | | | |S| | | +=================+=========+======+========+=====+========+======+ | |||
| | | | |H| |F | | Feature | ReqID | MUST | SHOULD | MAY | SHOULD | MUST | | |||
| | | | |O|M|o | | | | | | | NOT | NOT | | |||
| | |S| |U|U|o | +=================+=========+======+========+=====+========+======+ | |||
| | |H| |L|S|t | | PUSH flag | | |||
| |M|O| |D|T|n | +=================+=========+======+========+=====+========+======+ | |||
| |U|U|M| | |o | | Aggregate or | MAY-16 | | | X | | | | |||
| |S|L|A|N|N|t | | queue un-pushed | | | | | | | | |||
| |T|D|Y|O|O|t | | data | | | | | | | | |||
FEATURE | ReqID | | | |T|T|e | +-----------------+---------+------+--------+-----+--------+------+ | |||
-------------------------------------------------|--------|-|-|-|-|-|-- | | Sender collapse | SHLD-27 | | X | | | | | |||
| | | | | | | | | successive PSH | | | | | | | | |||
Push flag | | | | | | | | | bits | | | | | | | | |||
Aggregate or queue un-pushed data | MAY-16 | | |x| | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Sender collapse successive PSH flags | SHLD-27| |x| | | | | | SEND call can | MAY-15 | | | X | | | | |||
SEND call can specify PUSH | MAY-15 | | |x| | | | | specify PUSH | | | | | | | | |||
If cannot: sender buffer indefinitely | MUST-60| | | | |x| | +-----------------+---------+------+--------+-----+--------+------+ | |||
If cannot: PSH last segment | MUST-61|x| | | | | | | * If cannot: | MUST-60 | | | | | X | | |||
Notify receiving ALP of PSH | MAY-17 | | |x| | |1 | | sender | | | | | | | | |||
Send max size segment when possible | SHLD-28| |x| | | | | | buffer | | | | | | | | |||
| | | | | | | | | indefinitely | | | | | | | | |||
Window | | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Treat as unsigned number | MUST-1 |x| | | | | | | * If cannot: | MUST-61 | X | | | | | | |||
Handle as 32-bit number | REC-1 | |x| | | | | | PSH last | | | | | | | | |||
Shrink window from right | SHLD-14| | | |x| | | | segment | | | | | | | | |||
- Send new data when window shrinks | SHLD-15| | | |x| | | +-----------------+---------+------+--------+-----+--------+------+ | |||
- Retransmit old unacked data within window | SHLD-16| |x| | | | | | Notify | MAY-17 | | | X | | | | |||
- Time out conn for data past right edge | SHLD-17| | | |x| | | | receiving ALP^1 | | | | | | | | |||
Robust against shrinking window | MUST-34|x| | | | | | | of PSH | | | | | | | | |||
Receiver's window closed indefinitely | MAY-8 | | |x| | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Use standard probing logic | MUST-35|x| | | | | | | Send max size | SHLD-28 | | X | | | | | |||
Sender probe zero window | MUST-36|x| | | | | | | segment when | | | | | | | | |||
First probe after RTO | SHLD-29| |x| | | | | | possible | | | | | | | | |||
Exponential backoff | SHLD-30| |x| | | | | +=================+=========+======+========+=====+========+======+ | |||
Allow window stay zero indefinitely | MUST-37|x| | | | | | | Window | | |||
Retransmit old data beyond SND.UNA+SND.WND | MAY-7 | | |x| | | | +=================+=========+======+========+=====+========+======+ | |||
Process RST and URG even with zero window | MUST-66|x| | | | | | | Treat as | MUST-1 | X | | | | | | |||
| | | | | | | | | unsigned number | | | | | | | | |||
Urgent Data | | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Include support for urgent pointer | MUST-30|x| | | | | | | Handle as | REC-1 | | X | | | | | |||
Pointer indicates first non-urgent octet | MUST-62|x| | | | | | | 32-bit number | | | | | | | | |||
Arbitrary length urgent data sequence | MUST-31|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Inform ALP asynchronously of urgent data | MUST-32|x| | | | |1 | | Shrink window | SHLD-14 | | | | X | | | |||
ALP can learn if/how much urgent data Q'd | MUST-33|x| | | | |1 | | from right | | | | | | | | |||
ALP employ the urgent mechanism | SHLD-13| | | |x| | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | * Send new | SHLD-15 | | | | X | | | |||
TCP Options | | | | | | | | | data when | | | | | | | | |||
Support the mandatory option set | MUST-4 |x| | | | | | | window | | | | | | | | |||
Receive TCP option in any segment | MUST-5 |x| | | | | | | shrinks | | | | | | | | |||
Ignore unsupported options | MUST-6 |x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Include length for all options except EOL+NOP | MUST-68|x| | | | | | | * Retransmit | SHLD-16 | | X | | | | | |||
Cope with illegal option length | MUST-7 |x| | | | | | | old unacked | | | | | | | | |||
Process options regardless of word alignment | MUST-64|x| | | | | | | data within | | | | | | | | |||
Implement sending & receiving MSS option | MUST-14|x| | | | | | | window | | | | | | | | |||
IPv4 Send MSS option unless 536 | SHLD-5 | |x| | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
IPv6 Send MSS option unless 1220 | SHLD-5 | |x| | | | | | * Time out | SHLD-17 | | | | X | | | |||
Send MSS option always | MAY-3 | | |x| | | | | conn for | | | | | | | | |||
IPv4 Send-MSS default is 536 | MUST-15|x| | | | | | | data past | | | | | | | | |||
IPv6 Send-MSS default is 1220 | MUST-15|x| | | | | | | right edge | | | | | | | | |||
Calculate effective send seg size | MUST-16|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
MSS accounts for varying MTU | SHLD-6 | |x| | | | | | Robust against | MUST-34 | X | | | | | | |||
MSS not sent on non-SYN segments | MUST-65| | | | |x| | | shrinking | | | | | | | | |||
MSS value based on MMS_R | MUST-67|x| | | | | | | window | | | | | | | | |||
Pad with zero | MUST-69|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Receiver's | MAY-8 | | | X | | | | |||
TCP Checksums | | | | | | | | | window closed | | | | | | | | |||
Sender compute checksum | MUST-2 |x| | | | | | | indefinitely | | | | | | | | |||
Receiver check checksum | MUST-3 |x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Use standard | MUST-35 | X | | | | | | |||
ISN Selection | | | | | | | | | probing logic | | | | | | | | |||
Include a clock-driven ISN generator component | MUST-8 |x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Secure ISN generator with a PRF component | SHLD-1 | |x| | | | | | Sender probe | MUST-36 | X | | | | | | |||
PRF computable from outside the host | MUST-9 | | | | |x| | | zero window | | | | | | | | |||
| | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Opening Connections | | | | | | | | | * First probe | SHLD-29 | | X | | | | | |||
Support simultaneous open attempts | MUST-10|x| | | | | | | after RTO | | | | | | | | |||
SYN-RECEIVED remembers last state | MUST-11|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Passive Open call interfere with others | MUST-41| | | | |x| | | * Exponential | SHLD-30 | | X | | | | | |||
Function: simultan. LISTENs for same port | MUST-42|x| | | | | | | backoff | | | | | | | | |||
Ask IP for src address for SYN if necc. | MUST-44|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Otherwise, use local addr of conn. | MUST-45|x| | | | | | | Allow window | MUST-37 | X | | | | | | |||
OPEN to broadcast/multicast IP Address | MUST-46| | | | |x| | | stay zero | | | | | | | | |||
Silently discard seg to bcast/mcast addr | MUST-57|x| | | | | | | indefinitely | | | | | | | | |||
| | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Closing Connections | | | | | | | | | Retransmit old | MAY-7 | | | X | | | | |||
RST can contain data | SHLD-2 | |x| | | | | | data beyond | | | | | | | | |||
Inform application of aborted conn | MUST-12|x| | | | | | | SND.UNA+SND.WND | | | | | | | | |||
Half-duplex close connections | MAY-1 | | |x| | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Send RST to indicate data lost | SHLD-3 | |x| | | | | | Process RST and | MUST-66 | X | | | | | | |||
In TIME-WAIT state for 2MSL seconds | MUST-13|x| | | | | | | URG even with | | | | | | | | |||
Accept SYN from TIME-WAIT state | MAY-2 | | |x| | | | | zero window | | | | | | | | |||
Use Timestamps to reduce TIME-WAIT | SHLD-4 | |x| | | | | +=================+=========+======+========+=====+========+======+ | |||
| | | | | | | | | Urgent Data | | |||
Retransmissions | | | | | | | | +=================+=========+======+========+=====+========+======+ | |||
Implement exponential backoff, slow start, and | MUST-19|x| | | | | | | Include support | MUST-30 | X | | | | | | |||
congestion avoidance | | | | | | | | | for urgent | | | | | | | | |||
Retransmit with same IP ident | MAY-4 | | |x| | | | | pointer | | | | | | | | |||
Karn's algorithm | MUST-18|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Pointer | MUST-62 | X | | | | | | |||
Generating ACKs: | | | | | | | | | indicates first | | | | | | | | |||
Aggregate whenever possible | MUST-58|x| | | | | | | non-urgent | | | | | | | | |||
Queue out-of-order segments | SHLD-31| |x| | | | | | octet | | | | | | | | |||
Process all Q'd before send ACK | MUST-59|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Send ACK for out-of-order segment | MAY-13 | | |x| | | | | Arbitrary | MUST-31 | X | | | | | | |||
Delayed ACKs | SHLD-18| |x| | | | | | length urgent | | | | | | | | |||
Delay < 0.5 seconds | MUST-40|x| | | | | | | data sequence | | | | | | | | |||
Every 2nd full-sized segment or 2*RMSS ACK'd | SHLD-19| |x| | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Receiver SWS-Avoidance Algorithm | MUST-39|x| | | | | | | Inform ALP^1 | MUST-32 | X | | | | | | |||
| | | | | | | | | asynchronously | | | | | | | | |||
Sending data | | | | | | | | | of urgent data | | | | | | | | |||
Configurable TTL | MUST-49|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Sender SWS-Avoidance Algorithm | MUST-38|x| | | | | | | ALP^1 can learn | MUST-33 | X | | | | | | |||
Nagle algorithm | SHLD-7 | |x| | | | | | if/how much | | | | | | | | |||
Application can disable Nagle algorithm | MUST-17|x| | | | | | | urgent data Q'd | | | | | | | | |||
| | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| ALP employ the | SHLD-13 | | | | X | | | ||||
Connection Failures: | | | | | | | | | urgent | | | | | | | | |||
Negative advice to IP on R1 retxs | MUST-20|x| | | | | | | mechanism | | | | | | | | |||
Close connection on R2 retxs | MUST-20|x| | | | | | +=================+=========+======+========+=====+========+======+ | |||
ALP can set R2 | MUST-21|x| | | | |1 | | TCP Options | | |||
Inform ALP of R1<=retxs<R2 | SHLD-9 | |x| | | |1 | +=================+=========+======+========+=====+========+======+ | |||
Recommended value for R1 | SHLD-10| |x| | | | | | Support the | MUST-4 | X | | | | | | |||
Recommended value for R2 | SHLD-11| |x| | | | | | mandatory | | | | | | | | |||
Same mechanism for SYNs | MUST-22|x| | | | | | | option set | | | | | | | | |||
R2 at least 3 minutes for SYN | MUST-23|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Receive TCP | MUST-5 | X | | | | | | |||
Send Keep-alive Packets: | MAY-5 | | |x| | | | | Option in any | | | | | | | | |||
- Application can request | MUST-24|x| | | | | | | segment | | | | | | | | |||
- Default is "off" | MUST-25|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
- Only send if idle for interval | MUST-26|x| | | | | | | Ignore | MUST-6 | X | | | | | | |||
- Interval configurable | MUST-27|x| | | | | | | unsupported | | | | | | | | |||
- Default at least 2 hrs. | MUST-28|x| | | | | | | options | | | | | | | | |||
- Tolerant of lost ACKs | MUST-29|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
- Send with no data | SHLD-12| |x| | | | | | Include length | MUST-68 | X | | | | | | |||
- Configurable to send garbage octet | MAY-6 | | |x| | | | | for all options | | | | | | | | |||
| | | | | | | | | except EOL+NOP | | | | | | | | |||
IP Options | | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Ignore options TCP doesn't understand | MUST-50|x| | | | | | | Cope with | MUST-7 | X | | | | | | |||
Time Stamp support | MAY-10 | | |x| | | | | illegal option | | | | | | | | |||
Record Route support | MAY-11 | | |x| | | | | length | | | | | | | | |||
Source Route: | | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
ALP can specify | MUST-51|x| | | | |1 | | Process options | MUST-64 | X | | | | | | |||
Overrides src rt in datagram | MUST-52|x| | | | | | | regardless of | | | | | | | | |||
Build return route from src rt | MUST-53|x| | | | | | | word alignment | | | | | | | | |||
Later src route overrides | SHLD-24| |x| | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Implement | MUST-14 | X | | | | | | |||
Receiving ICMP Messages from IP | MUST-54|x| | | | | | | sending & | | | | | | | | |||
Dest. Unreach (0,1,5) => inform ALP | SHLD-25| |x| | | | | | receiving MSS | | | | | | | | |||
Abort on Dest. Unreach (0,1,5) =>nn | MUST-56| | | | |x| | | Option | | | | | | | | |||
Dest. Unreach (2-4) => abort conn | SHLD-26| |x| | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Source Quench => silent discard | MUST-55|x| | | | | | | IPv4 Send MSS | SHLD-5 | | X | | | | | |||
Abort on Time Exceeded => | MUST-56| | | | |x| | | Option unless | | | | | | | | |||
Abort on Param Problem => | MUST-56| | | | |x| | | 536 | | | | | | | | |||
| | | | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
Address Validation | | | | | | | | | IPv6 Send MSS | SHLD-5 | | X | | | | | |||
Reject OPEN call to invalid IP address | MUST-46|x| | | | | | | Option unless | | | | | | | | |||
Reject SYN from invalid IP address | MUST-63|x| | | | | | | 1220 | | | | | | | | |||
Silently discard SYN to bcast/mcast addr | MUST-57|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| | | | | | | | | Send MSS Option | MAY-3 | | | X | | | | |||
TCP/ALP Interface Services | | | | | | | | | always | | | | | | | | |||
Error Report mechanism | MUST-47|x| | | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
ALP can disable Error Report Routine | SHLD-20| |x| | | | | | IPv4 Send-MSS | MUST-15 | X | | | | | | |||
ALP can specify DiffServ field for sending | MUST-48|x| | | | | | | default is 536 | | | | | | | | |||
Passed unchanged to IP | SHLD-22| |x| | | | | +-----------------+---------+------+--------+-----+--------+------+ | |||
| IPv6 Send-MSS | MUST-15 | X | | | | | | ||||
| default is 1220 | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Calculate | MUST-16 | X | | | | | | ||||
| effective send | | | | | | | | ||||
| seg size | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| MSS accounts | SHLD-6 | | X | | | | | ||||
| for varying MTU | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| MSS not sent on | MUST-65 | | | | | X | | ||||
| non-SYN | | | | | | | | ||||
| segments | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| MSS value based | MUST-67 | X | | | | | | ||||
| on MMS_R | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Pad with zero | MUST-69 | X | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| TCP Checksums | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Sender compute | MUST-2 | X | | | | | | ||||
| checksum | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Receiver check | MUST-3 | X | | | | | | ||||
| checksum | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| ISN Selection | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Include a | MUST-8 | X | | | | | | ||||
| clock-driven | | | | | | | | ||||
| ISN generator | | | | | | | | ||||
| component | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Secure ISN | SHLD-1 | | X | | | | | ||||
| generator with | | | | | | | | ||||
| a PRF component | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| PRF computable | MUST-9 | | | | | X | | ||||
| from outside | | | | | | | | ||||
| the host | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Opening Connections | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Support | MUST-10 | X | | | | | | ||||
| simultaneous | | | | | | | | ||||
| open attempts | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| SYN-RECEIVED | MUST-11 | X | | | | | | ||||
| remembers last | | | | | | | | ||||
| state | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Passive OPEN | MUST-41 | | | | | X | | ||||
| call interfere | | | | | | | | ||||
| with others | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Function: | MUST-42 | X | | | | | | ||||
| simultaneously | | | | | | | | ||||
| LISTENs for | | | | | | | | ||||
| same port | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Ask IP for src | MUST-44 | X | | | | | | ||||
| address for SYN | | | | | | | | ||||
| if necessary | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Otherwise, | MUST-45 | X | | | | | | ||||
| use local | | | | | | | | ||||
| addr of | | | | | | | | ||||
| connection | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| OPEN to | MUST-46 | | | | | X | | ||||
| broadcast/ | | | | | | | | ||||
| multicast IP | | | | | | | | ||||
| address | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Silently | MUST-57 | X | | | | | | ||||
| discard seg to | | | | | | | | ||||
| bcast/mcast | | | | | | | | ||||
| addr | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Closing Connections | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| RST can contain | SHLD-2 | | X | | | | | ||||
| data | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Inform | MUST-12 | X | | | | | | ||||
| application of | | | | | | | | ||||
| aborted conn | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Half-duplex | MAY-1 | | | X | | | | ||||
| close | | | | | | | | ||||
| connections | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Send RST to | SHLD-3 | | X | | | | | ||||
| indicate | | | | | | | | ||||
| data lost | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| In TIME-WAIT | MUST-13 | X | | | | | | ||||
| state for 2MSL | | | | | | | | ||||
| seconds | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Accept SYN | MAY-2 | | | X | | | | ||||
| from TIME- | | | | | | | | ||||
| WAIT state | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Use | SHLD-4 | | X | | | | | ||||
| Timestamps | | | | | | | | ||||
| to reduce | | | | | | | | ||||
| TIME-WAIT | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Retransmissions | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Implement | MUST-19 | X | | | | | | ||||
| exponential | | | | | | | | ||||
| backoff, slow | | | | | | | | ||||
| start, and | | | | | | | | ||||
| congestion | | | | | | | | ||||
| avoidance | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Retransmit with | MAY-4 | | | X | | | | ||||
| same IP | | | | | | | | ||||
| identity | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Karn's | MUST-18 | X | | | | | | ||||
| algorithm | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Generating ACKs | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Aggregate | MUST-58 | X | | | | | | ||||
| whenever | | | | | | | | ||||
| possible | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Queue out-of- | SHLD-31 | | X | | | | | ||||
| order segments | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Process all Q'd | MUST-59 | X | | | | | | ||||
| before send ACK | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Send ACK for | MAY-13 | | | X | | | | ||||
| out-of-order | | | | | | | | ||||
| segment | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Delayed ACKs | SHLD-18 | | X | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Delay < 0.5 | MUST-40 | X | | | | | | ||||
| seconds | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Every 2nd | SHLD-19 | | X | | | | | ||||
| full-sized | | | | | | | | ||||
| segment or | | | | | | | | ||||
| 2*RMSS ACK'd | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Receiver SWS- | MUST-39 | X | | | | | | ||||
| Avoidance | | | | | | | | ||||
| Algorithm | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Sending Data | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Configurable | MUST-49 | X | | | | | | ||||
| TTL | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Sender SWS- | MUST-38 | X | | | | | | ||||
| Avoidance | | | | | | | | ||||
| Algorithm | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Nagle algorithm | SHLD-7 | | X | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Application | MUST-17 | X | | | | | | ||||
| can disable | | | | | | | | ||||
| Nagle | | | | | | | | ||||
| algorithm | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Connection Failures | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Negative advice | MUST-20 | X | | | | | | ||||
| to IP on R1 | | | | | | | | ||||
| retransmissions | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Close | MUST-20 | X | | | | | | ||||
| connection on | | | | | | | | ||||
| R2 | | | | | | | | ||||
| retransmissions | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| ALP^1 can set | MUST-21 | X | | | | | | ||||
| R2 | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Inform ALP of | SHLD-9 | | X | | | | | ||||
| R1<=retxs<R2 | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Recommended | SHLD-10 | | X | | | | | ||||
| value for R1 | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Recommended | SHLD-11 | | X | | | | | ||||
| value for R2 | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Same mechanism | MUST-22 | X | | | | | | ||||
| for SYNs | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * R2 at least | MUST-23 | X | | | | | | ||||
| 3 minutes | | | | | | | | ||||
| for SYN | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Send Keep-alive Packets | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Send Keep-alive | MAY-5 | | X | | | | | ||||
| Packets: | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Application | MUST-24 | X | | | | | | ||||
| can request | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Default is | MUST-25 | X | | | | | | ||||
| "off" | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Only send if | MUST-26 | X | | | | | | ||||
| idle for | | | | | | | | ||||
| interval | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Interval | MUST-27 | X | | | | | | ||||
| configurable | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Default at | MUST-28 | X | | | | | | ||||
| least 2 hrs. | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Tolerant of | MUST-29 | X | | | | | | ||||
| lost ACKs | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Send with no | SHLD-12 | | X | | | | | ||||
| data | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Configurable | MAY-6 | | | X | | | | ||||
| to send | | | | | | | | ||||
| garbage | | | | | | | | ||||
| octet | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| IP Options | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Ignore options | MUST-50 | X | | | | | | ||||
| TCP doesn't | | | | | | | | ||||
| understand | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Timestamp | MAY-10 | | X | | | | | ||||
| support | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Record Route | MAY-11 | | X | | | | | ||||
| support | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Source Route: | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * ALP^1 can | MUST-51 | X | | | | | | ||||
| specify | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Overrides | MUST-52 | X | | | | | | ||||
| src route | | | | | | | | ||||
| in | | | | | | | | ||||
| datagram | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Build return | MUST-53 | X | | | | | | ||||
| route from | | | | | | | | ||||
| src route | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Later src | SHLD-24 | | X | | | | | ||||
| route | | | | | | | | ||||
| overrides | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Receiving ICMP Messages from IP | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Receiving ICMP | MUST-54 | X | | | | | | ||||
| messages from | | | | | | | | ||||
| IP | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Dest Unreach | SHLD-25 | X | | | | | | ||||
| (0,1,5) => | | | | | | | | ||||
| inform ALP | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Abort on | MUST-56 | | | | | X | | ||||
| Dest Unreach | | | | | | | | ||||
| (0,1,5) | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Dest Unreach | SHLD-26 | | X | | | | | ||||
| (2-4) => | | | | | | | | ||||
| abort conn | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Source | MUST-55 | X | | | | | | ||||
| Quench => | | | | | | | | ||||
| silent | | | | | | | | ||||
| discard | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Abort on | MUST-56 | | | | | X | | ||||
| Time | | | | | | | | ||||
| Exceeded | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Abort on | MUST-56 | | | | | X | | ||||
| Param | | | | | | | | ||||
| Problem | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Address Validation | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Reject OPEN | MUST-46 | X | | | | | | ||||
| call to invalid | | | | | | | | ||||
| IP address | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Reject SYN from | MUST-63 | X | | | | | | ||||
| invalid IP | | | | | | | | ||||
| address | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Silently | MUST-57 | X | | | | | | ||||
| discard SYN to | | | | | | | | ||||
| bcast/mcast | | | | | | | | ||||
| addr | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| TCP/ALP Interface Services | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Error Report | MUST-47 | X | | | | | | ||||
| mechanism | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| ALP can disable | SHLD-20 | | X | | | | | ||||
| Error Report | | | | | | | | ||||
| Routine | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| ALP can specify | MUST-48 | X | | | | | | ||||
| Diffserv field | | | | | | | | ||||
| for sending | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| * Passed | SHLD-22 | | X | | | | | ||||
| unchanged to | | | | | | | | ||||
| IP | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| ALP can change | SHLD-21 | | X | | | | | ||||
| Diffserv field | | | | | | | | ||||
| during | | | | | | | | ||||
| connection | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| ALP generally | SHLD-23 | | | | X | | | ||||
| changing | | | | | | | | ||||
| Diffserv during | | | | | | | | ||||
| conn. | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Pass received | MAY-9 | | | X | | | | ||||
| Diffserv field | | | | | | | | ||||
| up to ALP | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| FLUSH call | MAY-14 | | | X | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
| Optional local | MUST-43 | X | | | | | | ||||
| IP addr param | | | | | | | | ||||
| in OPEN | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| RFC 5961 Support | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Implement data | MAY-12 | | | X | | | | ||||
| injection | | | | | | | | ||||
| protection | | | | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Explicit Congestion Notification | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Support ECN | SHLD-8 | | X | | | | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Alternative Congestion Control | | ||||
+=================+=========+======+========+=====+========+======+ | ||||
| Implement | MAY-18 | | | X | | | | ||||
| alternative | | | | | | | | ||||
| conformant | | | | | | | | ||||
| algorithm(s) | | | | | | | | ||||
+-----------------+---------+------+--------+-----+--------+------+ | ||||
ALP can change DiffServ field during connection| SHLD-21| |x| | | | | Table 8: TCP Requirements Summary | |||
ALP generally changing DiffServ during conn. | SHLD-23| | | |x| | | ||||
Pass received DiffServ field up to ALP | MAY-9 | | |x| | | | ||||
FLUSH call | MAY-14 | | |x| | | | ||||
Optional local IP addr parm. in OPEN | MUST-43|x| | | | | | ||||
| | | | | | | | ||||
RFC 5961 Support: | | | | | | | | ||||
Implement data injection protection | MAY-12 | | |x| | | | ||||
| | | | | | | | ||||
Explicit Congestion Notification: | | | | | | | | ||||
Support ECN | SHLD-8 | |x| | | | | ||||
| | | | | | | | ||||
Alternative Congestion Control: | | | | | | | | ||||
Implement alternative conformant algorithm(s) | MAY-18 | | |x| | | | ||||
-------------------------------------------------|--------|-|-|-|-|-|- | ||||
FOOTNOTES: (1) "ALP" means Application-Layer Program. | FOOTNOTES: (1) "ALP" means Application-Layer Program. | |||
Acknowledgments | ||||
This document is largely a revision of RFC 793, of which Jon Postel | ||||
was the editor. Due to his excellent work, it was able to last for | ||||
three decades before we felt the need to revise it. | ||||
Andre Oppermann was a contributor and helped to edit the first | ||||
revision of this document. | ||||
We are thankful for the assistance of the IETF TCPM working group | ||||
chairs over the course of work on this document: | ||||
Michael Scharf | ||||
Yoshifumi Nishida | ||||
Pasi Sarolahti | ||||
Michael Tüxen | ||||
During the discussions of this work on the TCPM mailing list, in | ||||
working group meetings, and via area reviews, helpful comments, | ||||
critiques, and reviews were received from (listed alphabetically by | ||||
last name): Praveen Balasubramanian, David Borman, Mohamed Boucadair, | ||||
Bob Briscoe, Neal Cardwell, Yuchung Cheng, Martin Duke, Francis | ||||
Dupont, Ted Faber, Gorry Fairhurst, Fernando Gont, Rodney Grimes, Yi | ||||
Huang, Rahul Jadhav, Markku Kojo, Mike Kosek, Juhamatti Kuusisaari, | ||||
Kevin Lahey, Kevin Mason, Matt Mathis, Stephen McQuistin, Jonathan | ||||
Morton, Matt Olson, Tommy Pauly, Tom Petch, Hagen Paul Pfeifer, Kyle | ||||
Rose, Anthony Sabatini, Michael Scharf, Greg Skinner, Joe Touch, | ||||
Michael Tüxen, Reji Varghese, Bernie Volz, Tim Wicinski, Lloyd Wood, | ||||
and Alex Zimmermann. | ||||
Joe Touch provided additional help in clarifying the description of | ||||
segment size parameters and PMTUD/PLPMTUD recommendations. Markku | ||||
Kojo helped put together the text in the section on TCP Congestion | ||||
Control. | ||||
This document includes content from errata that were reported by | ||||
(listed chronologically): Yin Shuming, Bob Braden, Morris M. Keesan, | ||||
Pei-chun Cheng, Constantin Hagemeier, Vishwas Manral, Mykyta | ||||
Yevstifeyev, EungJun Yi, Botong Huang, Charles Deng, Merlin Buge. | ||||
Author's Address | Author's Address | |||
Wesley M. Eddy (editor) | Wesley M. Eddy (editor) | |||
MTI Systems | MTI Systems | |||
United States of America | United States of America | |||
Email: wes@mti-systems.com | Email: wes@mti-systems.com | |||
End of changes. 800 change blocks. | ||||
2238 lines changed or deleted | 2520 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |