rfc9406.original | rfc9406.txt | |||
---|---|---|---|---|
Network Working Group P. Balasubramanian | Internet Engineering Task Force (IETF) P. Balasubramanian | |||
Internet-Draft Confluent | Request for Comments: 9406 Confluent | |||
Intended status: Standards Track Y. Huang | Category: Standards Track Y. Huang | |||
Expires: 31 August 2023 M. Olson | ISSN: 2070-1721 M. Olson | |||
Microsoft | Microsoft | |||
27 February 2023 | May 2023 | |||
HyStart++: Modified Slow Start for TCP | HyStart++: Modified Slow Start for TCP | |||
draft-ietf-tcpm-hystartplusplus-14 | ||||
Abstract | Abstract | |||
This document describes HyStart++, a simple modification to the slow | This document describes HyStart++, a simple modification to the slow | |||
start phase of congestion control algorithms. Slow start can | start phase of congestion control algorithms. Slow start can | |||
overshoot the ideal send rate in many cases, causing high packet loss | overshoot the ideal send rate in many cases, causing high packet loss | |||
and poor performance. HyStart++ uses increase in round-trip delay as | and poor performance. HyStart++ uses increase in round-trip delay as | |||
a heuristic to find an exit point before possible overshoot. It also | a heuristic to find an exit point before possible overshoot. It also | |||
adds a mitigation to prevent jitter from causing premature slow start | adds a mitigation to prevent jitter from causing premature slow start | |||
exit. | exit. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 31 August 2023. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9406. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2023 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology | |||
3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Definitions | |||
4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 | 4. HyStart++ Algorithm | |||
4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 4.1. Summary | |||
4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 | 4.2. Algorithm Details | |||
4.3. Tuning constants and other considerations . . . . . . . . 6 | 4.3. Tuning Constants and Other Considerations | |||
5. Deployments and Performance Evaluations . . . . . . . . . . . 7 | 5. Deployments and Performance Evaluations | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 6. Security Considerations | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 7. IANA Considerations | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | 8. References | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8.1. Normative References | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 8 | 8.2. Informative References | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 8 | Acknowledgments | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
[RFC5681] describes the slow start congestion control algorithm for | [RFC5681] describes the slow start congestion control algorithm for | |||
TCP. The slow start algorithm is used when the congestion window | TCP. The slow start algorithm is used when the congestion window | |||
(cwnd) is less than the slow start threshold (ssthresh). During slow | (cwnd) is less than the slow start threshold (ssthresh). During slow | |||
start, in absence of packet loss signals, TCP increases cwnd | start, in the absence of packet loss signals, TCP increases the cwnd | |||
exponentially to probe the network capacity. This fast growth can | exponentially to probe the network capacity. This fast growth can | |||
overshoot the ideal sending rate and cause significant packet loss | overshoot the ideal sending rate and cause significant packet loss | |||
which cannot always be recovered efficiently. | that cannot always be recovered efficiently. | |||
HyStart++ uses increase in round-trip delay as a signal to exit slow | HyStart++ builds upon Hybrid Start (HyStart), originally described in | |||
start before potential packet loss occurs as a result of overshoot. | [HyStart]. HyStart++ uses increase in round-trip delay as a signal | |||
This is one of two algorithms specified in [HyStart]. After the slow | to exit slow start before potential packet loss occurs as a result of | |||
start exit, a new Conservative Slow Start (CSS) phase is used to | overshoot. This is one of two algorithms specified in [HyStart] for | |||
determine whether the slow start exit was premature and to resume | finding a safe exit point for slow start. After the slow start exit, | |||
slow start. This mitigation improves performance in presence of | a new Conservative Slow Start (CSS) phase is used to determine | |||
jitter. HyStart++ reduces packet loss and retransmissions, and | whether the slow start exit was premature and to resume slow start. | |||
improves goodput in lab measurements and real world deployments. | This mitigation improves performance in the presence of jitter. | |||
HyStart++ reduces packet loss and retransmissions, and improves | ||||
goodput in lab measurements and real-world deployments. | ||||
While this document describes Hystart++ for TCP, it can also be used | While this document describes HyStart++ for TCP, it can also be used | |||
for other transport protocols which use slow start such as QUIC | for other transport protocols that use slow start, such as QUIC | |||
[RFC9002] or SCTP [RFC9260]. | [RFC9002] or the Stream Control Transmission Protocol (SCTP) | |||
[RFC9260]. | ||||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Definitions | 3. Definitions | |||
We repeat here some definition from [RFC5681] to aid the reader. | To aid the reader, we repeat some definitions from [RFC5681]: | |||
SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the | SENDER MAXIMUM SEGMENT SIZE (SMSS): The size of the largest segment | |||
largest segment that the sender can transmit. This value can be | that the sender can transmit. This value can be based on the | |||
based on the maximum transmission unit of the network, the path MTU | maximum transmission unit of the network, the Path MTU Discovery | |||
discovery [RFC1191], [RFC4821] algorithm, RMSS (see next item), or | algorithm [RFC1191] [RFC4821], RMSS (see next item), or other | |||
other factors. The size does not include the TCP/IP headers and | factors. The size does not include the TCP/IP headers and | |||
options. | options. | |||
RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the | RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The size of the largest | |||
largest segment the receiver is willing to accept. This is the value | segment that the receiver is willing to accept. This is the value | |||
specified in the MSS option sent by the receiver during connection | specified in the MSS option sent by the receiver during connection | |||
startup. Or, if the MSS option is not used, it is 536 bytes | startup. Or, if the MSS option is not used, it is 536 bytes | |||
[RFC1122]. The size does not include the TCP/IP headers and options. | [RFC1122]. The size does not include the TCP/IP headers and | |||
options. | ||||
RECEIVER WINDOW (rwnd): The most recently advertised receiver window. | RECEIVER WINDOW (rwnd): The most recently advertised receiver | |||
window. | ||||
CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount | CONGESTION WINDOW (cwnd): A TCP state variable that limits the | |||
of data a TCP can send. At any given time, a TCP MUST NOT send data | amount of data a TCP can send. At any given time, a TCP MUST NOT | |||
with a sequence number higher than the sum of the highest | send data with a sequence number higher than the sum of the | |||
acknowledged sequence number and the minimum of cwnd and rwnd. | highest acknowledged sequence number and the minimum of the cwnd | |||
and rwnd. | ||||
4. HyStart++ Algorithm | 4. HyStart++ Algorithm | |||
4.1. Summary | 4.1. Summary | |||
[HyStart] specifies two algorithms (a "Delay Increase" algorithm and | [HyStart] specifies two algorithms (a "Delay Increase" algorithm and | |||
an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | |||
that the sending rate has reached capacity. In practice, the Inter- | that the sending rate has reached capacity. In practice, the Inter- | |||
Packet Arrival algorithm does not perform well and is not able to | Packet Arrival algorithm does not perform well and is not able to | |||
detect congestion early, primarily due to ACK compression. The idea | detect congestion early, primarily due to ACK compression. The idea | |||
of the Delay Increase algorithm is to look for spikes in RTT (round- | of the Delay Increase algorithm is to look for spikes in RTT (round- | |||
trip time), which suggest that the bottleneck buffer is filling up. | trip time), which suggest that the bottleneck buffer is filling up. | |||
In HyStart++, a TCP sender uses traditional slow start and then uses | In HyStart++, a TCP sender uses standard slow start and then uses the | |||
the "Delay Increase" algorithm to trigger an exit from slow start. | Delay Increase algorithm to trigger an exit from slow start. But | |||
But instead of going straight from slow start to congestion | instead of going straight from slow start to congestion avoidance, | |||
avoidance, the sender spends a number of RTTs in a Conservative Slow | the sender spends a number of RTTs in a Conservative Slow Start (CSS) | |||
Start (CSS) phase to determine whether the exit from slow start was | phase to determine whether the exit from slow start was premature. | |||
premature. During CSS, the congestion window is grown exponentially | During CSS, the congestion window is grown exponentially in a fashion | |||
like in regular slow start, but with a smaller exponential base, | similar to regular slow start, but with a smaller exponential base, | |||
resulting in less aggressive growth. If the RTT reduces during CSS, | resulting in less aggressive growth. If the RTT reduces during CSS, | |||
it's concluded that the RTT spike was not related to congestion | it's concluded that the RTT spike was not related to congestion | |||
caused by the connection sending at a rate greater than the ideal | caused by the connection sending at a rate greater than the ideal | |||
send rate, and the connection resumes slow start. If the RTT | send rate, and the connection resumes slow start. If the RTT | |||
inflation persists throughout CSS, the connection enters congestion | inflation persists throughout CSS, the connection enters congestion | |||
avoidance. | avoidance. | |||
4.2. Algorithm Details | 4.2. Algorithm Details | |||
The following pseudocode uses a limit, L, to control the | The following pseudocode uses a limit, L, to control the | |||
aggressiveness of the cwnd increase during both standard slow start | aggressiveness of the cwnd increase during both standard slow start | |||
and CSS. While an arriving ACK may newly acknowledge an arbitrary | and CSS. While an arriving ACK may newly acknowledge an arbitrary | |||
number of bytes, the Hystart++ algorithm limits the number of those | number of bytes, the HyStart++ algorithm limits the number of those | |||
bytes applied to increase the cwnd to L*SMSS bytes. | bytes applied to increase the cwnd to L*SMSS bytes. | |||
lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at | lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at | |||
the initialization time. currRTT is the RTT sampled from the latest | the initialization time. currRTT is the RTT sampled from the latest | |||
incoming ACK and initialized to infinity. | incoming ACK and initialized to infinity. | |||
lastRoundMinRTT = infinity | lastRoundMinRTT = infinity | |||
currentRoundMinRTT = infinity | currentRoundMinRTT = infinity | |||
currRTT = infinity | currRTT = infinity | |||
Hystart++ measures rounds using sequence numbers, as follows: Define | HyStart++ measures rounds using sequence numbers, as follows: | |||
windowEnd as a sequence number initialized to SND.NXT. When | ||||
windowEnd is ACKed, the current round ends and windowEnd is set to | ||||
SND.NXT. | ||||
At the start of each round during standard slow start ([RFC5681]) and | * Define windowEnd as a sequence number initialized to SND.NXT. | |||
CSS, initialize the variables used to compute last round and current | ||||
round's minimum RTT: | * When windowEnd is ACKed, the current round ends and windowEnd is | |||
set to SND.NXT. | ||||
At the start of each round during standard slow start [RFC5681] and | ||||
CSS, initialize the variables used to compute the last round's and | ||||
current round's minimum RTT: | ||||
lastRoundMinRTT = currentRoundMinRTT | lastRoundMinRTT = currentRoundMinRTT | |||
currentRoundMinRTT = infinity | currentRoundMinRTT = infinity | |||
rttSampleCount = 0 | rttSampleCount = 0 | |||
For each arriving ACK in slow start, where N is the number of | For each arriving ACK in slow start, where N is the number of | |||
previously unacknowledged bytes acknowledged in the arriving ACK: | previously unacknowledged bytes acknowledged in the arriving ACK: | |||
Update the cwnd: | Update the cwnd: | |||
cwnd = cwnd + min(N, L * SMSS) | cwnd = cwnd + min(N, L * SMSS) | |||
Keep track of minimum observed RTT: | Keep track of the minimum observed RTT: | |||
currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | |||
rttSampleCount += 1 | rttSampleCount += 1 | |||
For rounds where at least N_RTT_SAMPLE RTT samples have been obtained | For rounds where at least N_RTT_SAMPLE RTT samples have been obtained | |||
and currentRoundMinRTT and lastRoundMinRTT are valid, check if delay | and currentRoundMinRTT and lastRoundMinRTT are valid, check to see if | |||
increase triggers slow start exit: | delay increase triggers slow start exit: | |||
if ((rttSampleCount >= N_RTT_SAMPLE) AND | if ((rttSampleCount >= N_RTT_SAMPLE) AND | |||
(currentRoundMinRTT != infinity) AND | (currentRoundMinRTT != infinity) AND | |||
(lastRoundMinRTT != infinity)) | (lastRoundMinRTT != infinity)) | |||
Compute a RTT Threshold clamped between MIN_RTT_THRESH and MAX_RTT_THRESH | RttThresh = max(MIN_RTT_THRESH, | |||
RttThresh = max(MIN_RTT_THRESH, min(lastRoundMinRTT / MIN_RTT_DIVISOR, MAX_RTT_THRESH)) | min(lastRoundMinRTT / MIN_RTT_DIVISOR, MAX_RTT_THRESH)) | |||
if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | |||
cssBaselineMinRtt = currentRoundMinRTT | cssBaselineMinRtt = currentRoundMinRTT | |||
exit slow start and enter CSS | exit slow start and enter CSS | |||
For each arriving ACK in CSS, where N is the number of previously | For each arriving ACK in CSS, where N is the number of previously | |||
unacknowledged bytes acknowledged in the arriving ACK: | unacknowledged bytes acknowledged in the arriving ACK: | |||
Update the cwnd: | Update the cwnd: | |||
cwnd = cwnd + (min(N, L * SMSS) / CSS_GROWTH_DIVISOR) | cwnd = cwnd + (min(N, L * SMSS) / CSS_GROWTH_DIVISOR) | |||
Keep track of minimum observed RTT: | Keep track of the minimum observed RTT: | |||
currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | |||
rttSampleCount += 1 | rttSampleCount += 1 | |||
For CSS rounds where at least N_RTT_SAMPLE RTT samples have been | For CSS rounds where at least N_RTT_SAMPLE RTT samples have been | |||
obtained, check if current round's minRTT drops below baseline | obtained, check to see if the current round's minRTT drops below | |||
indicating that HyStart exit was spurious: | baseline (cssBaselineMinRtt) indicating that slow start exit was | |||
spurious: | ||||
if (currentRoundMinRTT < cssBaselineMinRtt) | if (currentRoundMinRTT < cssBaselineMinRtt) | |||
cssBaselineMinRtt = infinity | cssBaselineMinRtt = infinity | |||
resume slow start including HyStart++ | resume slow start including HyStart++ | |||
CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS | CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS | |||
happens in the middle of a round, that partial round counts towards | happens in the middle of a round, that partial round counts towards | |||
the limit. | the limit. | |||
If CSS_ROUNDS rounds are complete, enter congestion avoidance by | If CSS_ROUNDS rounds are complete, enter congestion avoidance by | |||
setting ssthresh to current cwnd. | setting the ssthresh to the current cwnd. | |||
ssthresh = cwnd | ssthresh = cwnd | |||
If loss or ECN-marking is observed anytime during standard slow start | If loss or Explicit Congestion Notification (ECN) marking is observed | |||
or CSS, enter congestion avoidance by setting ssthresh to current | at any time during standard slow start or CSS, enter congestion | |||
cwnd. | avoidance by setting the ssthresh to the current cwnd. | |||
ssthresh = cwnd | ssthresh = cwnd | |||
4.3. Tuning constants and other considerations | 4.3. Tuning Constants and Other Considerations | |||
It is RECOMMENDED that a HyStart++ implementation use the following | It is RECOMMENDED that a HyStart++ implementation use the following | |||
constants: | constants: | |||
MIN_RTT_THRESH = 4 msec | MIN_RTT_THRESH = 4 msec | |||
MAX_RTT_THRESH = 16 msec | MAX_RTT_THRESH = 16 msec | |||
MIN_RTT_DIVISOR = 8 | MIN_RTT_DIVISOR = 8 | |||
N_RTT_SAMPLE = 8 | N_RTT_SAMPLE = 8 | |||
CSS_GROWTH_DIVISOR = 4 | CSS_GROWTH_DIVISOR = 4 | |||
CSS_ROUNDS = 5 | CSS_ROUNDS = 5 | |||
L = infinity if paced, L = 8 if non-paced | L = infinity if paced, L = 8 if non-paced | |||
These constants have been determined with lab measurements and real | These constants have been determined with lab measurements and real- | |||
world deployments. An implementation MAY tune them for different | world deployments. An implementation MAY tune them for different | |||
network characteristics. | network characteristics. | |||
The delay increase sensitivity is determined by MIN_RTT_THRESH and | The delay increase sensitivity is determined by MIN_RTT_THRESH and | |||
MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | |||
exits from slow start. Larger values of MAX_RTT_THRESH may result in | exits from slow start. Larger values of MAX_RTT_THRESH may result in | |||
slow start not exiting until loss is encountered for connections on | slow start not exiting until loss is encountered for connections on | |||
large RTT paths. | large RTT paths. | |||
MIN_RTT_DIVISOR is a fraction of RTT to compute delay threshold. A | MIN_RTT_DIVISOR is a fraction of RTT to compute the delay threshold. | |||
smaller value would mean a bigger threshold and thus less sensitive | A smaller value would mean a larger threshold and thus less | |||
to delay increase, and vice versa. | sensitivity to delay increase, and vice versa. | |||
While all TCP implementations are REQUIRED to take at least one RTT | While all TCP implementations are REQUIRED to take at least one RTT | |||
sample each round, implementations of HyStart++ are RECOMMENDED to | sample each round, implementations of HyStart++ are RECOMMENDED to | |||
take at least N_RTT_SAMPLE RTT samples. Using lower values of | take at least N_RTT_SAMPLE RTT samples. Using lower values of | |||
N_RTT_SAMPLE will lower the accuracy of the measured RTT for the | N_RTT_SAMPLE will lower the accuracy of the measured RTT for the | |||
round; higher values will improve accuracy at the cost of more | round; higher values will improve accuracy at the cost of more | |||
processing. | processing. | |||
The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value | The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value | |||
of 1 results in the same aggressive behavior as regular slow start. | of 1 results in the same aggressive behavior as regular slow start. | |||
Values larger than 4 will cause the algorithm to be less aggressive | Values larger than 4 will cause the algorithm to be less aggressive | |||
and maybe less performant. | and maybe less performant. | |||
Smaller values of CSS_ROUNDS may miss detecting jitter and larger | Smaller values of CSS_ROUNDS may miss detecting jitter, and larger | |||
values may limit performance. | values may limit performance. | |||
Packet pacing [ASA00] is a possible mechanism to avoid large bursts | Packet pacing [ASA00] is a possible mechanism to avoid large bursts | |||
and their associated harm. A paced TCP implementation SHOULD use L = | and their associated harm. A paced TCP implementation SHOULD use L = | |||
infinity. Burst concerns are mitigated by pacing and this setting | infinity. Burst concerns are mitigated by pacing, and this setting | |||
allows for optimal cwnd growth on modern networks. | allows for optimal cwnd growth on modern networks. | |||
For TCP implementations that pace to mitigate burst concerns, L | For TCP implementations that pace to mitigate burst concerns, L | |||
values smaller than INFINITY may suffer performance problems due to | values smaller than infinity may suffer performance problems due to | |||
slow cwnd growth in high speed networks. For non-paced TCP | slow cwnd growth in high-speed networks. For non-paced TCP | |||
implementations, L values smaller than 8 may suffer performance | implementations, L values smaller than 8 may suffer performance | |||
problems due to slow cwnd growth in high speed networks; L values | problems due to slow cwnd growth in high-speed networks; L values | |||
larger than 8 may cause an increase in burstiness and thereby loss | larger than 8 may cause an increase in burstiness and thereby loss | |||
rates, and result in poor performance. | rates, and result in poor performance. | |||
An implementation SHOULD use HyStart++ only for the initial slow | An implementation SHOULD use HyStart++ only for the initial slow | |||
start (when ssthresh is at its initial value of arbitrarily high per | start (when the ssthresh is at its initial value of arbitrarily high | |||
[RFC5681]) and fall back to using traditional slow start for the | per [RFC5681]) and fall back to using standard slow start for the | |||
remainder of the connection lifetime. This is acceptable because | remainder of the connection lifetime. This is acceptable because | |||
subsequent slow starts will use the discovered ssthresh value to exit | subsequent slow starts will use the discovered ssthresh value to exit | |||
slow start and avoid the overshoot problem. An implementation MAY | slow start and avoid the overshoot problem. An implementation MAY | |||
use HyStart++ to grow the restart window ([RFC5681]) after a long | use HyStart++ to grow the restart window [RFC5681] after a long idle | |||
idle period. | period. | |||
In application limited scenarios, the amount of data in flight could | In application-limited scenarios, the amount of data in flight could | |||
fall below the bandwidth-delay product (BDP) and result in smaller | fall below the bandwidth-delay product (BDP) and result in smaller | |||
RTT samples which can trigger an exit back to slow start. It is | RTT samples, which can trigger an exit back to slow start. It is | |||
expected that a connection might oscillate between CSS and slow start | expected that a connection might oscillate between CSS and slow start | |||
in such scenarios. But this behavior will neither result in a | in such scenarios. But this behavior will neither result in a | |||
connection prematurely entering congestion avoidance nor cause | connection prematurely entering congestion avoidance nor cause | |||
overshooting compared to slow start. | overshooting compared to slow start. | |||
5. Deployments and Performance Evaluations | 5. Deployments and Performance Evaluations | |||
As of February 2023, HyStart++ as described in this document has been | At the time of this writing, HyStart++ as described in this document | |||
default enabled for all TCP connections in the Windows operating | has been default enabled for all TCP connections in the Windows | |||
system for over two years with pacing disabled and an actual L = 8. | operating system for over two years with pacing disabled and an | |||
actual L = 8. | ||||
In lab measurements with Windows TCP, HyStart++ shows both goodput | In lab measurements with Windows TCP, HyStart++ shows goodput | |||
improvements as well as reductions in packet loss and retransmissions | improvements as well as reductions in packet loss and retransmissions | |||
compared to traditional slow start. For example, across a variety of | compared to standard slow start. For example, across a variety of | |||
tests on a 100 Mbps link with a bottleneck buffer size of bandwidth- | tests on a 100 Mbps link with a bottleneck buffer size of bandwidth- | |||
delay product, HyStart++ reduces bytes retransmitted by 50% and | delay product, HyStart++ reduces bytes retransmitted by 50% and | |||
retransmission timeouts (RTOs) by 36%. | retransmission timeouts (RTOs) by 36%. | |||
In an A/B test where we compare HyStart++ draft 01 to traditional | In an A/B test where we compared an implementation of HyStart++ | |||
slow start across a large Windows device population, out of 52 | (based on an earlier draft version of this document) to standard slow | |||
billion TCP connections, 0.7% of connections move from 1 RTO to 0 | start across a large Windows device population, out of 52 billion TCP | |||
RTOs and another 0.7% connections move from 2 RTOs to 1 RTO with | connections, 0.7% of connections move from 1 RTO to 0 RTOs and | |||
HyStart++. This test did not focus on send-heavy connections and the | another 0.7% of connections move from 2 RTOs to 1 RTO with HyStart++. | |||
impact on send-heavy connections is likely much higher. We plan to | This test did not focus on send-heavy connections, and the impact on | |||
conduct more such production experiments to gather more data in the | send-heavy connections is likely much higher. We plan to conduct | |||
future. | more such production experiments to gather more data in the future. | |||
6. Security Considerations | 6. Security Considerations | |||
HyStart++ enhances slow start and inherits the general security | HyStart++ enhances slow start and inherits the general security | |||
considerations discussed in [RFC5681]. | considerations discussed in [RFC5681]. | |||
An attacker can cause Hystart++ to exit slow start prematurely and | An attacker can cause HyStart++ to exit slow start prematurely and | |||
impair the performance of a TCP connection by, for example, dropping | impair the performance of a TCP connection by, for example, dropping | |||
data packets or their acknowledgements. | data packets or their acknowledgments. | |||
The ACK division attack outlined in [SCWA99] does not affect | The ACK division attack outlined in [SCWA99] does not affect | |||
Hystart++ because the congestion window increase in Hystart++ is | HyStart++ because the congestion window increase in HyStart++ is | |||
based on the number of bytes newly acknowledged in each arriving ACK | based on the number of bytes newly acknowledged in each arriving ACK | |||
rather than by a particular constant on each arriving ACK. | rather than by a particular constant on each arriving ACK. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document has no actions for IANA. | This document has no IANA actions. | |||
8. Acknowledgements | ||||
During the discussions of this work on the TCPM mailing list, in | ||||
working group meetings, helpful comments, critiques, and reviews were | ||||
received from (listed alphabetically by last name): Mark Allman, Bob | ||||
Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese | ||||
Enghardt, Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, | ||||
Randall Stewart, and Michael Tuexen. | ||||
9. References | 8. References | |||
9.1. Normative References | 8.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
<https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
9.2. Informative References | 8.2. Informative References | |||
[ASA00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding | [ASA00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding | |||
the Performance of TCP Pacing", Proceedings IEEE INFOCOM | the performance of TCP pacing", Proceedings IEEE INFOCOM | |||
2000, DOI 10.1109/INFCOM.2000.832483, 2000, | 2000, DOI 10.1109/INFCOM.2000.832483, March 2000, | |||
<https://doi.org/10.1109/INFCOM.2000.832483>. | <https://doi.org/10.1109/INFCOM.2000.832483>. | |||
[HyStart] Ha, S. and I. Ree, "Taming the elephants: New TCP slow | [HyStart] Ha, S. and I. Rhee, "Taming the elephants: New TCP slow | |||
start", Computer Networks vol. 55, no. 9, pp. 2092-2110, | start", Computer Networks vol. 55, no. 9, pp. 2092-2110, | |||
DOI 10.1016/j.comnet.2011.01.014, 2011, | DOI 10.1016/j.comnet.2011.01.014, June 2011, | |||
<https://doi.org/10.1016/j.comnet.2011.01.014>. | <https://doi.org/10.1016/j.comnet.2011.01.014>. | |||
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | |||
Communication Layers", STD 3, RFC 1122, | Communication Layers", STD 3, RFC 1122, | |||
DOI 10.17487/RFC1122, October 1989, | DOI 10.17487/RFC1122, October 1989, | |||
<https://www.rfc-editor.org/info/rfc1122>. | <https://www.rfc-editor.org/info/rfc1122>. | |||
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | |||
DOI 10.17487/RFC1191, November 1990, | DOI 10.17487/RFC1191, November 1990, | |||
<https://www.rfc-editor.org/info/rfc1191>. | <https://www.rfc-editor.org/info/rfc1191>. | |||
skipping to change at page 9, line 37 ¶ | skipping to change at line 411 ¶ | |||
[RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | |||
and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | |||
May 2021, <https://www.rfc-editor.org/info/rfc9002>. | May 2021, <https://www.rfc-editor.org/info/rfc9002>. | |||
[RFC9260] Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control | [RFC9260] Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control | |||
Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260, | Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260, | |||
June 2022, <https://www.rfc-editor.org/info/rfc9260>. | June 2022, <https://www.rfc-editor.org/info/rfc9260>. | |||
[SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | |||
"TCP congestion control with a misbehaving receiver", ACM | "TCP congestion control with a misbehaving receiver", ACM | |||
Computer Communication Review, 29(5), | SIGCOMM Computer Communication Review, vol. 29, issue 5, | |||
DOI 10.1145/505696.505704, 1999, | pp. 71-78, DOI 10.1145/505696.505704, October 1999, | |||
<https://doi.org/10.1145/505696.505704>. | <https://doi.org/10.1145/505696.505704>. | |||
Acknowledgments | ||||
During the discussions of this work on the TCPM mailing list and in | ||||
working group meetings, helpful comments, critiques, and reviews were | ||||
received from (listed alphabetically by last name) Mark Allman, Bob | ||||
Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese | ||||
Enghardt, Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, | ||||
Randall Stewart, and Michael Tüxen. | ||||
Authors' Addresses | Authors' Addresses | |||
Praveen Balasubramanian | Praveen Balasubramanian | |||
Confluent | Confluent | |||
899 West Evelyn Ave | 899 West Evelyn Ave | |||
Mountain View, CA 94041 | Mountain View, CA 94041 | |||
United States of America | United States of America | |||
Email: pravb.ietf@gmail.com | Email: pravb.ietf@gmail.com | |||
Yi Huang | Yi Huang | |||
Microsoft | Microsoft | |||
One Microsoft Way | One Microsoft Way | |||
Redmond, WA 94052 | Redmond, WA 98052 | |||
United States of America | United States of America | |||
Phone: +1 425 703 0447 | Phone: +1 425 703 0447 | |||
Email: huanyi@microsoft.com | Email: huanyi@microsoft.com | |||
Matt Olson | Matt Olson | |||
Microsoft | Microsoft | |||
One Microsoft Way | ||||
Redmond, WA 98052 | ||||
United States of America | ||||
Phone: +1 425 538 8598 | Phone: +1 425 538 8598 | |||
Email: maolson@microsoft.com | Email: maolson@microsoft.com | |||
End of changes. 59 change blocks. | ||||
154 lines changed or deleted | 165 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |