rfc9599.original | rfc9599.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
Internet-Draft Independent | Request for Comments: 9599 Independent | |||
Updates: 3819 (if approved) J. Kaippallimalil | BCP: 89 J. Kaippallimalil | |||
Intended status: Best Current Practice Futurewei | Updates: 3819 Futurewei | |||
Expires: 7 June 2024 5 December 2023 | Category: Best Current Practice August 2024 | |||
ISSN: 2070-1721 | ||||
Guidelines for Adding Congestion Notification to Protocols that | Guidelines for Adding Congestion Notification to Protocols That | |||
Encapsulate IP | Encapsulate IP | |||
draft-ietf-tsvwg-ecn-encap-guidelines-22 | ||||
Abstract | Abstract | |||
The purpose of this document is to guide the design of congestion | The purpose of this document is to guide the design of congestion | |||
notification in any lower layer or tunnelling protocol that | notification in any lower-layer or tunnelling protocol that | |||
encapsulates IP. The aim is for explicit congestion signals to | encapsulates IP. The aim is for explicit congestion signals to | |||
propagate consistently from lower layer protocols into IP. Then the | propagate consistently from lower-layer protocols into IP. Then, the | |||
IP internetwork layer can act as a portability layer to carry | IP internetwork layer can act as a portability layer to carry | |||
congestion notification from non-IP-aware congested nodes up to the | congestion notification from non-IP-aware congested nodes up to the | |||
transport layer (L4). Following these guidelines should assure | transport layer (L4). Specifications that follow these guidelines, | |||
interworking among IP layer and lower layer congestion notification | whether produced by the IETF or other standards bodies, should assure | |||
mechanisms, whether specified by the IETF or other standards bodies. | interworking among IP-layer and lower-layer congestion notification | |||
This document is included in BCP 89 and updates the single paragraph | mechanisms. This document is included in BCP 89 and updates the | |||
of advice to subnetwork designers about ECN in Section 13 of RFC | single paragraph of advice to subnetwork designers about Explicit | |||
3819, by replacing it with a reference to the whole of this document. | Congestion Notification (ECN) in Section 13 of RFC 3819 by replacing | |||
it with a reference to this document. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This memo documents an Internet Best Current Practice. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
BCPs is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 7 June 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9599. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
This document may contain material from IETF Documents or IETF | ||||
Contributions published or made publicly available before November | ||||
10, 2008. The person(s) controlling the copyright in some of this | ||||
material may not have granted the IETF Trust the right to allow | ||||
modifications of such material outside the IETF Standards Process. | ||||
Without obtaining an adequate license from the person(s) controlling | ||||
the copyright in such materials, this document may not be modified | ||||
outside the IETF Standards Process, and derivative works of it may | ||||
not be created outside the IETF Standards Process, except to format | ||||
it for publication as an RFC or to translate it into languages other | ||||
than English. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
1.1. Update to RFC 3819 . . . . . . . . . . . . . . . . . . . 5 | 1.1. Update to RFC 3819 | |||
1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Scope | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 2. Terminology | |||
3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 9 | 3. Modes of Operation | |||
3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 10 | 3.1. Feed-Forward-and-Up Mode | |||
3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 12 | 3.2. Feed-Up-and-Forward Mode | |||
3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 12 | 3.3. Feed-Backward Mode | |||
3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 14 | 3.4. Null Mode | |||
4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . 14 | Notification | |||
4.1. IP-in-IP Tunnels with Shim Headers . . . . . . . . . . . 15 | 4.1. IP-in-IP Tunnels with Shim Headers | |||
4.2. Wire Protocol Design: Indication of ECN Support . . . . . 16 | 4.2. Wire Protocol Design: Indication of ECN Support | |||
4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 19 | 4.3. Encapsulation Guidelines | |||
4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 21 | 4.4. Decapsulation Guidelines | |||
4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 22 | 4.5. Sequences of Similar Tunnels or Subnets | |||
4.6. Reframing and Congestion Markings . . . . . . . . . . . . 23 | 4.6. Reframing and Congestion Markings | |||
5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . 25 | Notification | |||
6. Feed-Backward Mode: Guidelines for Adding Congestion | 6. Feed-Backward Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . 26 | Notification | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 | 7. IANA Considerations | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 8. Security Considerations | |||
9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 9. Conclusions | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 10. References | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 29 | 10.1. Normative References | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References | |||
Comments Solicited . . . . . . . . . . . . . . . . . . . . . . . 34 | Acknowledgements | |||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 34 | Contributors | |||
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | Authors' Addresses | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
1. Introduction | 1. Introduction | |||
The benefits of Explicit Congestion Notification (ECN) described in | In certain networks, it might be possible for traffic to congest non- | |||
[RFC8087] and summarized below can only be fully realized if support | IP-aware nodes. In such networks, the benefits of Explicit | |||
for ECN is added to the relevant subnetwork technology, as well as to | Congestion Notification (ECN) described in [RFC8087] and summarized | |||
IP. When a lower layer buffer drops a packet obviously it does not | below can only be fully realized if support for congestion | |||
just drop at that layer; the packet disappears from all layers. In | notification is added to the relevant subnetwork technology, as well | |||
contrast, when active queue management (AQM) at a lower layer marks a | as to IP. When a lower-layer buffer implicitly notifies congestion | |||
packet with ECN, the marking needs to be explicitly propagated up the | by dropping a packet, it obviously does not just drop at that layer; | |||
layers. The same is true if AQM marks the outer header of a packet | the packet disappears from all layers. In contrast, when active | |||
that encapsulates inner tunnelled headers. Forwarding ECN is not as | queue management (AQM) at a lower layer buffer explicitly notifies | |||
straightforward as other headers because it has to be assumed ECN may | congestion by marking a frame header, the marking needs to be | |||
be only partially deployed. If a lower layer header that contains | explicitly propagated up the layers. The same is true if AQM marks | |||
ECN congestion indications is stripped off by a subnet egress that is | the outer header of a packet that encapsulates inner tunnelled | |||
not ECN-aware, or if the ultimate receiver or sender is not ECN- | headers. Forwarding ECN is not as straightforward as other headers | |||
aware, congestion needs to be indicated by dropping the packet, not | because it has to be assumed ECN may be only partially deployed. If | |||
marking it. | a lower-layer header that contains congestion indications is stripped | |||
off by a subnet egress that is not ECN-aware, or if the ultimate | ||||
receiver or sender is not ECN-aware, congestion needs to be indicated | ||||
by dropping the packet, not marking it. | ||||
The purpose of this document is to guide the addition of congestion | The purpose of this document is to guide the addition of congestion | |||
notification to any subnet technology or tunnelling protocol, so that | notification to any subnet technology or tunnelling protocol so that | |||
lower layer AQM algorithms can signal congestion explicitly and it | lower-layer AQM algorithms can signal congestion explicitly and that | |||
will propagate consistently into encapsulated (higher layer) headers, | signal will propagate consistently into encapsulated (higher-layer) | |||
otherwise the signals will not reach their ultimate destination. | headers. Otherwise, the signals will not reach their ultimate | |||
destination. | ||||
ECN is defined in the IP header (IPv4 and IPv6) [RFC3168] to allow a | ECN is defined in the IP header (IPv4 and IPv6) [RFC3168] to allow a | |||
resource to notify the onset of queue build-up without having to drop | resource to notify the onset of queue buildup without having to drop | |||
packets, by explicitly marking a proportion of packets with the | packets by explicitly marking a proportion of packets with the | |||
congestion experienced (CE) codepoint. | congestion experienced (CE) codepoint. | |||
Given a suitable marking scheme, ECN removes nearly all congestion | Given a suitable marking scheme, ECN removes nearly all congestion | |||
loss and it cuts delays for two main reasons: | loss and it cuts delays for two main reasons: | |||
* It avoids the delay when recovering from congestion losses, which | * It avoids the delay when recovering from congestion losses, which | |||
particularly benefits small flows or real-time flows, making their | particularly benefits small flows or real-time flows, making their | |||
delivery time predictably short [RFC2884]; | delivery time predictably short [RFC2884]. | |||
* As ECN is used more widely by end-systems, it will gradually | * As ECN is used more widely by end systems, it will gradually | |||
remove the need to configure a degree of delay into buffers before | remove the need to configure a degree of delay into buffers before | |||
they start to notify congestion (the cause of bufferbloat). This | they start to notify congestion (the cause of bufferbloat). This | |||
is because drop involves a trade-off between sending a timely | is because drop involves a trade-off between sending a timely | |||
signal and trying to avoid impairment, whereas ECN is solely a | signal and trying to avoid impairment, whereas ECN is solely a | |||
signal not an impairment, so there is no harm triggering it | signal and not an impairment, so there is no harm triggering it | |||
earlier. | earlier. | |||
Some lower layer technologies (e.g. MPLS, Ethernet) are used to form | Some lower-layer technologies (e.g., MPLS, Ethernet) are used to form | |||
subnetworks with IP-aware nodes only at the edges. These networks | subnetworks with IP-aware nodes only at the edges. These networks | |||
are often sized so that it is rare for interior queues to overflow. | are often sized so that it is rare for interior queues to overflow. | |||
However, until recently this was more due to the inability of TCP to | However, until recently, this was more due to the inability of TCP to | |||
saturate the links. For many years, fixes such as window scaling | saturate the links. For many years, fixes such as window scaling | |||
[RFC7323] proved hard to deploy. And the Reno variant of TCP has | [RFC7323] proved hard to deploy and the Reno variant of TCP remained | |||
remained in widespread use despite its inability to scale to high | in widespread use despite its inability to scale to high flow rates. | |||
flow rates. However, now that modern operating systems are finally | However, now that modern operating systems are finally capable of | |||
capable of saturating interior links, even the buffers of well- | saturating interior links, even the buffers of well-provisioned | |||
provisioned interior switches will need to signal episodes of | interior switches will need to signal episodes of queuing. | |||
queuing. | ||||
Propagation of ECN is defined for MPLS [RFC5129], and has been | Propagation of ECN is defined for MPLS [RFC5129] and TRILL [RFC7780] | |||
defined for TRILL [RFC7780], [I-D.ietf-trill-ecn-support], but it | [RFC9600], but it has yet to be defined for a number of other | |||
remains to be defined for a number of other subnetwork technologies. | subnetwork technologies. | |||
Similarly, ECN propagation is yet to be defined for many tunnelling | Similarly, ECN propagation is yet to be defined for many tunnelling | |||
protocols. [RFC6040] defines how ECN should be propagated for IP-in- | protocols. [RFC6040] defines how ECN should be propagated for IP-in- | |||
IPv4 [RFC2003], IP-in-IPv6 [RFC2473] and IPsec [RFC4301] tunnels, but | IPv4 [RFC2003], IP-in-IPv6 [RFC2473], and IPsec [RFC4301] tunnels, | |||
there are numerous other tunnelling protocols with a shim and/or a | but there are numerous other tunnelling protocols with a shim and/or | |||
layer 2 header between two IP headers (IPv4 or IPv6). Some address | a Layer 2 (L2) header between two IP headers (IPv4 or IPv6). Some | |||
ECN propagation between the IP headers, but many do not. This | address ECN propagation between the IP headers, but many do not. | |||
document gives guidance on how to address ECN propagation for future | This document gives guidance on how to address ECN propagation for | |||
tunnelling protocols, and a companion standards track specification | future tunnelling protocols, and a companion Standards Track | |||
[I-D.ietf-tsvwg-rfc6040update-shim] updates those existing IP-shim- | specification [RFC9601] updates existing tunnelling protocols with a | |||
(L2)-IP protocols that are under IETF change control and still widely | shim between IP headers that are under IETF change control and still | |||
used. | widely used. | |||
Incremental deployment is the most delicate aspect when adding | Incremental deployment is the most delicate aspect when adding | |||
support for ECN. The original ECN protocol in IP [RFC3168] was | support for ECN. The original ECN protocol in IP [RFC3168] was | |||
carefully designed so that a congested buffer would not mark a packet | carefully designed so that a congested buffer would not mark a packet | |||
(rather than drop it) unless both source and destination hosts were | (rather than drop it) unless both source and destination hosts were | |||
ECN-capable. Otherwise, its congestion markings would never be | ECN-capable. Otherwise, its congestion markings would never be | |||
detected and congestion would just build up further. However, to | detected and congestion would just build up further. However, to | |||
support congestion marking below the IP layer or within tunnels, it | support congestion marking below the IP layer or within tunnels, it | |||
is not sufficient to only check that the two layer 4 transport end- | is not sufficient to only check that the two layer 4 transport | |||
points support ECN; correct operation also depends on the | endpoints support ECN; correct operation also depends on the | |||
decapsulator at each subnet or tunnel egress faithfully propagating | decapsulator at each subnet or tunnel egress faithfully propagating | |||
congestion notifications to the higher layer. Otherwise, a legacy | congestion notification to the higher layer. Otherwise, a legacy | |||
decapsulator might silently fail to propagate any ECN signals from | decapsulator might silently fail to propagate any congestion signals | |||
the outer to the forwarded header. Then the lost signals would never | from the outer header to the forwarded header. Then, the lost | |||
be detected and again congestion would build up further. The | signals would never be detected and congestion would build up | |||
guidelines given later require protocol designers to carefully | further. The guidelines given later require protocol designers to | |||
consider incremental deployment, and suggest various safe approaches | carefully consider incremental deployment and suggest various safe | |||
for different circumstances. | approaches for different circumstances. | |||
Of course, the IETF does not have standards authority over every link | Of course, the IETF does not have standards authority over every | |||
layer protocol. So this document gives guidelines for designing | link-layer protocol; thus, this document gives guidelines for | |||
propagation of congestion notification across the interface between | designing propagation of congestion notification across the interface | |||
IP and protocols that may encapsulate IP (i.e. that can be layered | between IP and protocols that may encapsulate IP (i.e., that can be | |||
beneath IP). Each lower layer technology will exhibit different | layered beneath IP). Each lower-layer technology will exhibit | |||
issues and compromises, so the IETF or the relevant standards body | different issues and compromises, so the IETF or the relevant | |||
must be free to define the specifics of each lower layer congestion | standards body must be free to define the specifics of each lower- | |||
notification scheme. Nonetheless, if the guidelines are followed, | layer congestion notification scheme. Nonetheless, if the guidelines | |||
congestion notification should interwork between different | are followed, congestion notification should interwork between | |||
technologies, using IP in its role as a 'portability layer'. | different technologies using IP in its role as a 'portability layer'. | |||
Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often | Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often | |||
used in preference to 'MUST' or 'MUST NOT', because it is difficult | used in preference to 'MUST' or 'MUST NOT' because it is difficult to | |||
to know the compromises that will be necessary in each protocol | know the compromises that will be necessary in each protocol design. | |||
design. If a particular protocol design chooses not to follow a | If a particular protocol design chooses not to follow a 'SHOULD' or | |||
'SHOULD (NOT)' given in the advice below, it MUST include a sound | 'SHOULD NOT' given in the advice below, it MUST include a sound | |||
justification. | justification. | |||
It has not been possible to give common guidelines for all lower | It has not been possible to give common guidelines for all lower- | |||
layer technologies, because they do not all fit a common pattern. | layer technologies because they do not all fit a common pattern. | |||
Instead, they have been divided into a few distinct modes of | Instead, they have been divided into a few distinct modes of | |||
operation: feed-forward-and-upward; feed-upward-and-forward; feed- | operation: feed-forward-and-up, feed-up-and-forward, feed-backward, | |||
backward; and null mode. These modes are described in Section 3, | and null mode. These modes are described in Section 3, and separate | |||
then in the subsequent sections separate guidelines are given for | guidelines are given for each mode in subsequent sections. | |||
each mode. | ||||
1.1. Update to RFC 3819 | 1.1. Update to RFC 3819 | |||
This document updates the brief advice to subnetwork designers about | This document updates the brief advice to subnetwork designers about | |||
ECN in Section 13 of [RFC3819], by replacing the last two paragraphs | ECN in Section 13 of [RFC3819] by adding this document (RFC 9599) as | |||
with the following sentence: | an informative reference and replacing the last two paragraphs with | |||
the following sentence: | ||||
By following the guidelines in [this document], subnetwork | ||||
designers can enable a layer-2 protocol to participate in | ||||
congestion control without dropping packets via propagation of | ||||
explicit congestion notification (ECN [RFC3168]) to receivers. | ||||
and adding [this document] as an informative reference. {RFC Editor: | | By following the guidelines in [RFC9599], subnetwork designers can | |||
Please replace both instances of [this document] above with the | | enable a layer-2 protocol to participate in congestion control | |||
number of the present RFC when published.} | | without dropping packets via propagation of Explicit Congestion | |||
| Notification (ECN) [RFC3168] to receivers. | ||||
1.2. Scope | 1.2. Scope | |||
This document only concerns wire protocol processing of explicit | This document only concerns wire protocol processing of explicit | |||
notification of congestion. It makes no changes or recommendations | notification of congestion. It makes no changes or recommendations | |||
concerning algorithms for congestion marking or for congestion | concerning algorithms for congestion marking or congestion response | |||
response, because algorithm issues should be independent of the layer | because algorithm issues should be independent of the layer that the | |||
the algorithm operates in. | algorithm operates in. | |||
The default ECN semantics are described in [RFC3168] and updated by | The default ECN semantics are described in [RFC3168] and updated by | |||
[RFC8311]. Also, the guidelines for AQM designers [RFC7567] clarify | [RFC8311]. Also, the guidelines for AQM designers [RFC7567] clarify | |||
the semantics of both drop and ECN signals from AQM algorithms. | the semantics of both drop and ECN signals from AQM algorithms. | |||
[RFC4774] is the appropriate best current practice specification of | [RFC4774] is the appropriate best current practice specification of | |||
how algorithms with alternative semantics for the ECN field can be | how algorithms with alternative semantics for the ECN field can be | |||
partitioned from Internet traffic that uses the default ECN | partitioned from Internet traffic that uses the default ECN | |||
semantics. There are two main examples for how alternative ECN | semantics. There are two main examples for how alternative ECN | |||
semantics have been defined in practice: | semantics have been defined in practice: | |||
* RFC 4774 suggests using the ECN field in combination with a | * [RFC4774] suggests using the ECN field in combination with a | |||
Diffserv codepoint such as in PCN [RFC6660], Voice over 3G [UTRAN] | Diffserv codepoint, such as in Pre-Congestion Notification (PCN) | |||
or Voice over LTE (VoLTE) [LTE-RA]; | [RFC6660], Voice over 3G [UTRAN], or Voice over LTE (VoLTE) | |||
[LTE-RA]. | ||||
* RFC 8311 suggests using the ECT(1) codepoint of the ECN field to | * [RFC8311] suggests using the ECT(1) codepoint of the ECN field to | |||
indicate alternative semantics such as for the experimental Low | indicate alternative semantics, such as for the experimental Low | |||
Latency Low Loss Scalable throughput (L4S) service [RFC9331]). | Latency, Low Loss, and Scalable throughput (L4S) service | |||
[RFC9331]. | ||||
The aim is that the default rules for encapsulating and decapsulating | The aim is that the default rules for encapsulating and decapsulating | |||
the ECN field are sufficiently generic that tunnels and subnets will | the ECN field are sufficiently generic that tunnels and subnets will | |||
encapsulate and decapsulate packets without regard to how algorithms | encapsulate and decapsulate packets without regard to how algorithms | |||
elsewhere are setting or interpreting the semantics of the ECN field. | elsewhere are setting or interpreting the semantics of the ECN field. | |||
[RFC6040] updates RFC 4774 to allow alternative encapsulation and | [RFC6040] updates [RFC4774] to allow alternative encapsulation and | |||
decapsulation behaviours to be defined for alternative ECN semantics. | decapsulation behaviours to be defined for alternative ECN semantics. | |||
However, it reinforces the same point - that it is far preferable to | However, it reinforces the same point -- it is far preferable to try | |||
try to fit within the common ECN encapsulation and decapsulation | to fit within the common ECN encapsulation and decapsulation | |||
behaviours, because expecting all lower layer technologies and | behaviours because expecting all lower-layer technologies and tunnels | |||
tunnels to be updated is likely to be completely impractical. | to be updated is likely to be completely impractical. | |||
Alternative semantics for the ECN field can be defined to depend on | Alternative semantics for the ECN field can be defined to depend on | |||
the traffic class indicated by the DSCP. Therefore, correct | the traffic class indicated by the Differentiated Services Code Point | |||
propagation of congestion signals could depend on correct propagation | (DSCP). Therefore, correct propagation of congestion signals could | |||
of the DSCP between the layers and along the path. For instance, if | depend on correct propagation of the DSCP between the layers and | |||
the meaning of the ECN field depends on the DSCP (as in PCN or VoLTE) | along the path. For instance, if the meaning of the ECN field | |||
and if the outer DSCP is stripped on descapsulation, as in the pipe | depends on the DSCP (as in PCN or VoLTE) and the outer DSCP is | |||
model of [RFC2983], the special semantics of the ECN field would be | stripped on descapsulation, as in the pipe model of [RFC2983], the | |||
lost. Similarly, if the DSCP is changed at the boundary between | special semantics of the ECN field would be lost. Similarly, if the | |||
Diffserv domains, the special ECN semantics would also be lost. This | DSCP is changed at the boundary between Diffserv domains, the special | |||
is an important implication of the localized scope of most Diffserv | ECN semantics would also be lost. This is an important implication | |||
arrangements. In this document, correct propagation of traffic class | of the localized scope of most Diffserv arrangements. In this | |||
information is assumed, while what 'correct' means and how it is | document, correct propagation of traffic class information is assumed | |||
achieved is covered elsewhere (e.g. RFC 2983) and is outside the | while the meaning of 'correct' and how it is achieved is covered | |||
scope of the present document. | elsewhere (e.g., [RFC2983]) and is outside the scope of this | |||
document. | ||||
The guidelines in this document do ensure that common encapsulation | The guidelines in this document do ensure that common encapsulation | |||
and decapsulation rules are sufficiently generic to cover cases where | and decapsulation rules are sufficiently generic to cover cases where | |||
ECT(1) is used instead of ECT(0) to identify alternative ECN | ECT(1) is used instead of ECT(0) to identify alternative ECN | |||
semantics (as in L4S [RFC9331]) and where ECN marking algorithms use | semantics (as in L4S [RFC9331]) and where ECN-marking algorithms use | |||
ECT(1) to encode 3 severity levels into the ECN field (e.g. PCN | ECT(1) to encode three severity levels into the ECN field (e.g., PCN | |||
[RFC6660]) rather than the default of 2. All these different | [RFC6660]) rather than the default of two. All these different | |||
semantics for the ECN field work because it has been possible to | semantics for the ECN field work because it has been possible to | |||
define common default decapsulation rules that allow for all cases. | define common default decapsulation rules that allow for all cases | |||
[RFC6040]. | ||||
Note that the guidelines in this document do not necessarily require | Note that the guidelines in this document do not necessarily require | |||
the subnet wire protocol to be changed to add support for congestion | the subnet wire protocol to be changed to add support for congestion | |||
notification. For instance, the Feed-Up-and-Forward Mode | notification. For instance, the feed-up-and-forward mode | |||
(Section 3.2) and the Null Mode (Section 3.4) do not. Another way to | (Section 3.2) and the null mode (Section 3.4) do not. Another way to | |||
add congestion notification without consuming header space in the | add congestion notification without consuming header space in the | |||
subnet protocol might be to use a parallel control plane protocol. | subnet protocol might be to use a parallel control plane protocol. | |||
This document focuses on the congestion notification interface | This document focuses on the congestion notification interface | |||
between IP and lower layer or tunnel protocols that can encapsulate | between IP and lower-layer or tunnel protocols that can encapsulate | |||
IP, where the term 'IP' includes IPv4 or IPv6, unicast, multicast or | IP, where the term 'IP' includes IPv4 or IPv6, unicast, multicast, or | |||
anycast. However, it is likely that the guidelines will also be | anycast. However, it is likely that the guidelines will also be | |||
useful when a lower layer protocol or tunnel encapsulates itself, | useful when a lower-layer protocol or tunnel encapsulates itself, | |||
e.g. Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah) or when | e.g., Ethernet Media Access Control (MAC) in MAC ([IEEE802.1Q]; | |||
it encapsulates other protocols. In the feed-backward mode, | previously 802.1ah), or when it encapsulates other protocols. In the | |||
propagation of congestion signals for multicast and anycast packets | feed-backward mode, propagation of congestion signals for multicast | |||
is out-of-scope (because the complexity would make it unlikely to be | and anycast packets is out of scope (because the complexity would | |||
attempted). | make it unlikely to be attempted). | |||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
Further terminology used within this document: | Further terminology used within this document: | |||
Protocol data unit (PDU): Information that is delivered as a unit | Protocol data unit (PDU): Information that is delivered as a unit | |||
among peer entities of a layered network consisting of protocol | among peer entities of a layered network consisting of protocol | |||
control information (typically a header) and possibly user data | control information (typically a header) and possibly user data | |||
(payload) of that layer. The scope of this document includes | (payload) of that layer. The scope of this document includes | |||
layer 2 and layer 3 networks, where the PDU is respectively termed | Layer 2 and Layer 3 networks, where the PDU is respectively termed | |||
a frame or a packet (or a cell in ATM). PDU is a general term for | a frame or a packet (or a cell in ATM). PDU is a general term for | |||
any of these. This definition also includes a payload with a shim | any of these. This definition also includes a payload with a shim | |||
header lying somewhere between layer 2 and 3. | header lying somewhere between layer 2 and 3. | |||
Transport: The end-to-end transmission control function, | Transport: The end-to-end transmission control function, | |||
conventionally considered at layer-4 in the OSI reference model. | conventionally considered at layer 4 in the OSI reference model. | |||
Given the audience for this document will often use the word | Given the audience for this document will often use the word | |||
transport to mean low level bit carriage, whenever the term is | transport to mean low-level bit carriage, the term will be | |||
used it will be qualified, e.g. 'L4 transport'. | qualified whenever it is used, e.g., 'L4 transport'. | |||
Encapsulator: The link or tunnel endpoint function that adds an | Encapsulator: The link or tunnel endpoint function that adds an | |||
outer header to a PDU (also termed the 'link ingress', the 'subnet | outer header to a PDU (also termed the 'link ingress', the 'subnet | |||
ingress', the 'ingress tunnel endpoint' or just the 'ingress' | ingress', the 'ingress tunnel endpoint', or just the 'ingress' | |||
where the context is clear). | where the context is clear). | |||
Decapsulator: The link or tunnel endpoint function that removes an | Decapsulator: The link or tunnel endpoint function that removes an | |||
outer header from a PDU (also termed the 'link egress', the | outer header from a PDU (also termed the 'link egress', the | |||
'subnet egress', the 'egress tunnel endpoint' or just the 'egress' | 'subnet egress', the 'egress tunnel endpoint', or just the | |||
where the context is clear). | 'egress' where the context is clear). | |||
Incoming header: The header of an arriving PDU before encapsulation. | Incoming header: The header of an arriving PDU before encapsulation. | |||
Outer header: The header added to encapsulate a PDU. | Outer header: The header added to encapsulate a PDU. | |||
Inner header: The header encapsulated by the outer header. | Inner header: The header encapsulated by the outer header. | |||
Outgoing header: The header forwarded by the decapsulator. | Outgoing header: The header forwarded by the decapsulator. | |||
CE: Congestion Experienced [RFC3168] | CE: Congestion Experienced [RFC3168] | |||
skipping to change at page 9, line 4 ¶ | skipping to change at line 350 ¶ | |||
Inner header: The header encapsulated by the outer header. | Inner header: The header encapsulated by the outer header. | |||
Outgoing header: The header forwarded by the decapsulator. | Outgoing header: The header forwarded by the decapsulator. | |||
CE: Congestion Experienced [RFC3168] | CE: Congestion Experienced [RFC3168] | |||
ECT: ECN-Capable (L4) Transport [RFC3168] | ECT: ECN-Capable (L4) Transport [RFC3168] | |||
Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] | Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] | |||
Load Regulator: For each flow of PDUs, the transport function that | Load Regulator: For each flow of PDUs, the transport function that | |||
is capable of controlling the data rate. Typically located at the | is capable of controlling the data rate. Typically located at the | |||
data source, but in-path nodes can regulate load in some | data source, but in-path nodes can regulate load in some | |||
congestion control arrangements (e.g. admission control, policing | congestion control arrangements (e.g., admission control, policing | |||
nodes or transport circuit-breakers [RFC8084]). Note the term "a | nodes, or transport circuit-breakers [RFC8084]). Note that "a | |||
function capable of controlling the load" deliberately includes a | function capable of controlling the load" deliberately includes a | |||
transport that does not actually control the load responsively but | transport that does not actually control the load responsively, | |||
ideally it ought to (e.g. a sending application without congestion | but ideally it ought to (e.g., a sending application without | |||
control that uses UDP). | congestion control that uses UDP). | |||
ECN-PDU: A PDU at the IP layer or below with a capacity to signal | ECN-PDU: A PDU at the IP layer or below with a capacity to signal | |||
congestion that is part of a congestion control feedback loop | congestion that is part of a congestion control feedback loop | |||
within which all the nodes necessary to propagate the signal back | within which all the nodes necessary to propagate the signal back | |||
to the Load Regulator are capable of doing that propagation. An | to the Load Regulator are capable of doing that propagation. An | |||
IP packet with a non-zero ECN field implies that the endpoints are | IP packet with a non-zero ECN field implies that the endpoints are | |||
ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is | ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is | |||
intended to be a general term for a PDU at lower layers, as well | intended to be a general term for a PDU at lower layers, as well | |||
as at the IP layer. | as at the IP layer. | |||
Not-ECN-PDU: A PDU at the IP layer or below that is part of a | Not-ECN-PDU: A PDU at the IP layer or below that is part of a | |||
congestion control feedback loop that is not capable of | congestion control feedback loop that is not capable of | |||
propagating explicit congestion notification signals back to the | propagating ECN signals back to the Load Regulator because at | |||
Load Regulator, because at least one of the nodes necessary to | least one of the nodes necessary to propagate the signals is | |||
propagate the signals is incapable of doing that propagation. | incapable of doing that propagation. Note that this definition is | |||
Note that this definition is a property of the feedback-loop, not | a property of the feedback loop, not necessarily of the PDU | |||
necessarily of the PDU itself, because in some protocols the PDU | itself; certainly the PDU will self-describe the property in some | |||
will self-describe the property, but in others the property might | protocols, but in others, the property might be carried in a | |||
be carried in a separate control-plane context that is somehow | separate control plane context (which is somehow bound to the | |||
bound to the PDU. | PDU). | |||
3. Modes of Operation | 3. Modes of Operation | |||
This section sets down the different modes by which congestion | This section sets down the different modes by which congestion | |||
information is passed between the lower layer and the higher one. It | information is passed between the lower layer and the higher one. It | |||
acts as a reference framework for the following sections, which give | acts as a reference framework for the subsequent sections that give | |||
normative guidelines for designers of explicit congestion | normative guidelines for designers of congestion notification | |||
notification protocols, taking each mode in turn: | protocols, taking each mode in turn: | |||
Feed-Forward-and-Up: Nodes feed forward congestion notification | Feed-Forward-and-Up: Nodes feed forward congestion notification | |||
towards the egress within the lower layer then up and along the | towards the egress within the lower layer, then up and along the | |||
layers towards the end-to-end destination at the transport layer. | layers towards the end-to-end destination at the transport layer. | |||
The following local optimisation is possible: | The following local optimization is possible: | |||
Feed-Up-and-Forward: A lower layer switch feeds-up congestion | Feed-Up-and-Forward: A lower-layer switch feeds up congestion | |||
notification directly into the higher layer (e.g. into the ECN | notification directly into the higher layer (e.g., into the ECN | |||
field in the IP header), irrespective of whether the node is at | field in the IP header), irrespective of whether the node is at | |||
the egress of a subnet. | the egress of a subnet. | |||
Feed-Backward: Nodes feed back congestion signals towards the | Feed-Backward: Nodes feed back congestion signals towards the | |||
ingress of the lower layer and (optionally) attempt to control | ingress of the lower layer and (optionally) attempt to control | |||
congestion within their own layer. | congestion within their own layer. | |||
Null: Nodes cannot experience congestion at the lower layer except | Null: Nodes cannot experience congestion at the lower layer except | |||
at ingress nodes (which are IP-aware or equivalently higher-layer- | at the ingress nodes of the subnet (which are IP-aware or | |||
aware). | equivalently higher-layer-aware). | |||
3.1. Feed-Forward-and-Up Mode | 3.1. Feed-Forward-and-Up Mode | |||
Like IP and MPLS, many subnet technologies are based on self- | Like IP and MPLS, many subnet technologies are based on self- | |||
contained protocol data units (PDUs) or frames sent unreliably. They | contained PDUs or frames sent unreliably. They provide no feedback | |||
provide no feedback channel at the subnetwork layer, instead relying | channel at the subnetwork layer, instead relying on higher layers | |||
on higher layers (e.g. TCP) to feed back loss signals. | (e.g., TCP) to feed back loss signals. | |||
In these cases, ECN may best be supported by standardising explicit | In these cases, ECN may best be supported by standardising explicit | |||
notification of congestion into the lower layer protocol that carries | notification of congestion into the lower-layer protocol that carries | |||
the data forwards. Then a specification is needed for how the egress | the data forwards. Then, a specification is needed for how the | |||
of the lower layer subnet propagates this explicit signal into the | egress of the lower-layer subnet propagates this explicit signal into | |||
forwarded upper layer (IP) header. This signal continues forwards | the forwarded upper-layer (IP) header. This signal continues | |||
until it finally reaches the destination transport (at L4). Then | forwards until it finally reaches the destination transport (at L4). | |||
typically the destination will feed this congestion notification back | Typically, the destination will feed this congestion notification | |||
to the source transport using an end-to-end protocol (e.g. TCP). | back to the source transport using an end-to-end protocol (e.g., | |||
This is the arrangement that has already been used to add ECN to IP- | TCP). This is the arrangement that has already been used to add ECN | |||
in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. | to IP-in-IP tunnels [RFC6040], IP-in-MPLS, and MPLS-in-MPLS | |||
[RFC5129]. | ||||
This mode is illustrated in Figure 1. Along the middle of the | This mode is illustrated in Figure 1. Along the middle of the | |||
figure, layers 2, 3 and 4 of the protocol stack are shown, and one | figure, layers 2, 3, and 4 of the protocol stack are shown. One | |||
packet is shown along the bottom as it progresses across the network | packet is shown along the bottom as it progresses across the network | |||
from source to destination, crossing two subnets connected by a | from source to destination, crossing two subnets connected by a | |||
router, and crossing two switches on the path across each subnet. | router and crossing two switches on the path across each subnet. | |||
Congestion at the output of the first switch (shown as *) leads to a | Congestion at the output of the first switch (shown as *) leads to a | |||
congestion marking in the L2 header (shown as C in the illustration | congestion marking in the L2 header (shown as C in the illustration | |||
of the packet). The chevrons show the progress of the resulting | of the packet). The chevrons show the progress of the resulting | |||
congestion indication. It is propagated from link to link across the | congestion indication. It is propagated from link to link across the | |||
subnet in the L2 header, then when the router removes the marked L2 | subnet in the L2 header. Then, when the router removes the marked L2 | |||
header, it propagates the marking up into the L3 (IP) header. The | header, it propagates the marking up into the L3 (IP) header. The | |||
router forwards the marked L3 header into subnet B. The L2 protocol | router forwards the marked L3 header into subnet B. The L2 protocol | |||
used in subnet B does not support ECN, but the signal proceeds across | used in subnet B does not support congestion notification, but the | |||
it in the L3 header. | signal proceeds across it in the L3 header. | |||
Note that there is no implication that each 'C' marking is encoded | Note that there is no implication that each 'C' marking is encoded | |||
the same; a different encoding might be used for the 'C' marking in | the same; a different encoding might be used for the 'C' marking in | |||
each protocol. | each protocol. | |||
Finally, for completeness, we show the L3 marking arriving at the | Finally, for completeness, we show the L3 marking arriving at the | |||
destination, where the host transport protocol (e.g. TCP) feeds it | destination, where the host transport protocol (e.g., TCP) feeds it | |||
back to the source in the L4 acknowledgement (the 'C' at L4 in the | back to the source in the L4 acknowledgement (the 'C' at L4 in the | |||
packet at the top of the diagram). | packet at the top of the diagram). | |||
_ _ _ | _ _ _ | |||
/_______ | | |C| ACK Packet (V) | /_______ | | |C| ACK Packet (V) | |||
\ |_|_|_| | \ |_|_|_| | |||
+---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | |||
| | +---+ | ^ | | | | +---+ | ^ | | |||
| | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | |||
| | +---+ +---+ | ^ | +---+ +---+ | | | | | +---+ +---+ | ^ | +---+ +---+ | | | |||
| | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 | | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 | |||
|___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |||
source subnet A router subnet B dest | source subnet A router subnet B dest | |||
__ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | |||
| | | | | | | | |C| | | |C| | | |C| | Data________\ | | | | | | | | | |C| | | |C| | | |C| | Data________\ | |||
|__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |||
layer:4 3 2A 4 3 2A 4 3 4 3 2B | layer:4 3 2A 4 3 2A 4 3 4 3 2B | |||
header | header | |||
Figure 1: Feed-Forward-and-Up Mode | Figure 1: Feed-Forward-and-Up Mode | |||
Of course, modern networks are rarely as simple as this text-book | Of course, modern networks are rarely as simple as this textbook | |||
example, often involving multiple nested layers. For example, a 3GPP | example, often involving multiple nested layers. For example, a | |||
mobile network may have two IP-in-IP (GTP [GTPv1]) tunnels in series | Third Generation Partnership Project (3GPP) mobile network may have | |||
and an MPLS backhaul between the base station and the first router. | two IP-in-IP GTP [GTPv1] tunnels in series and an MPLS backhaul | |||
Nonetheless, the example illustrates the general idea of feeding | between the base station and the first router. Nonetheless, the | |||
congestion notification forward then upward whenever a header is | example illustrates the general idea of feeding congestion | |||
removed at the egress of a subnet. | notification forward then upward whenever a header is removed at the | |||
egress of a subnet. | ||||
Note that the FECN (forward ECN ) bit in Frame Relay [Buck00] and the | Note that the Forward Explicit Congestion Notification (FECN) bit in | |||
explicit forward congestion indication (EFCI [ITU-T.I.371]) bit in | Frame Relay [Buck00] and the Explicit Forward Congestion Indication | |||
ATM user data cells follow a feed-forward pattern. However, in ATM, | (EFCI) [ITU-T.I.371] bit in ATM user data cells follow a feed-forward | |||
this arrangement is only part of a feed-forward-and-backward pattern | pattern. However, in ATM, this arrangement is only part of a feed- | |||
at the lower layer, not feed-forward-and-up out of the lower layer — | forward-and-backward pattern at the lower layer, not feed-forward- | |||
the intention was never to interface to IP ECN at the subnet egress. | and-up out of the lower layer -- the intention was never to interface | |||
To our knowledge, Frame Relay FECN is solely used to detect where | with IP-ECN at the subnet egress. To our knowledge, Frame Relay FECN | |||
more capacity should be provisioned. | is solely used by network operators to detect where they should | |||
provision more capacity. | ||||
3.2. Feed-Up-and-Forward Mode | 3.2. Feed-Up-and-Forward Mode | |||
Ethernet is particularly difficult to extend incrementally to support | Ethernet is particularly difficult to extend incrementally to support | |||
explicit congestion notification. One way to support ECN in such | congestion notification. One way is to use so-called 'Layer 3 | |||
cases has been to use so called 'layer-3 switches'. These are | switches'. These are Ethernet switches that dig into the Ethernet | |||
Ethernet switches that dig into the Ethernet payload to find an IP | payload to find an IP header and manipulate or act on certain IP | |||
header and manipulate or act on certain IP fields (specifically | fields (specifically Diffserv and ECN). For instance, in Data Center | |||
Diffserv & ECN). For instance, in Data Center TCP [RFC8257], layer-3 | TCP [RFC8257], Layer 3 switches are configured to mark the ECN field | |||
switches are configured to mark the ECN field of the IP header within | of the IP header within the Ethernet payload when their output buffer | |||
the Ethernet payload when their output buffer becomes congested. | becomes congested. With respect to switching, a Layer 3 switch acts | |||
With respect to switching, a layer-3 switch acts solely on the | solely on the addresses in the Ethernet header; it does not use IP | |||
addresses in the Ethernet header; it does not use IP addresses, and | addresses and it does not decrement the TTL field in the IP header. | |||
it does not decrement the TTL field in the IP header. | ||||
_ _ _ | _ _ _ | |||
/_______ | | |C| ACK packet (V) | /_______ | | |C| ACK packet (V) | |||
\ |_|_|_| | \ |_|_|_| | |||
+---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | |||
| | +---+ | ^ | | | | +---+ | ^ | | |||
| | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | |||
| | +--^+ +---+ | v| +---+ +---+ | ^ | | | | +--^+ +---+ | v| +---+ +---+ | ^ | | |||
| | | *| | | | >|>>>>>|>>>|>>>>>|>>>|>>>>>|>^ |L2 | | | | *| | | | >|>>>>>|>>>|>>>>>|>>>|>>>>>|>^ |L2 | |||
|___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |||
source subnet E router subnet F dest | source subnet E router subnet F dest | |||
__ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | |||
| | | | | | | |C| | | | |C| | | |C|C| Data________\ | | | | | | | | |C| | | | |C| | | |C|C| Data________\ | |||
|__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |||
layer:4 3 2 4 3 2 4 3 4 3 2 | layer:4 3 2 4 3 2 4 3 4 3 2 | |||
header | header | |||
Figure 2: Feed-Up-and-Forward Mode | Figure 2: Feed-Up-and-Forward Mode | |||
By comparing Figure 2 with Figure 1, it can be seen that subnet E | By comparing Figure 2 with Figure 1, it can be seen that subnet E | |||
(perhaps a subnet of layer-3 Ethernet switches) works in feed-up-and- | (perhaps a subnet of Layer 3 Ethernet switches) works in feed-up-and- | |||
forward mode by notifying congestion directly into L3 at the point of | forward mode by notifying congestion directly into L3 at the point of | |||
congestion, even though the congested switch does not otherwise act | congestion, even though the congested switch does not otherwise act | |||
at L3. In this example, the technology in subnet F (e.g. MPLS) does | at L3. In this example, the technology in subnet F (e.g., MPLS) does | |||
support ECN natively, so when the router adds the layer-2 header it | support ECN. So, when the router adds the Layer 2 header, it copies | |||
copies the ECN marking from L3 to L2 as well, as shown by the 'C's in | the ECN marking from L3 to L2 as well, as shown by the 'C's in both | |||
both layers. | layers. | |||
3.3. Feed-Backward Mode | 3.3. Feed-Backward Mode | |||
In some layer 2 technologies, explicit congestion notification has | In some Layer 2 technologies, congestion notification has been | |||
been defined for use internally within the subnet with its own | defined for use internally within the subnet with its own feedback | |||
feedback and load regulation, but typically the interface with IP for | and load regulation but the interface with IP for ECN has not been | |||
ECN has not been defined. | defined. | |||
For instance, for the available bit-rate (ABR) service in ATM, the | For instance, the relative rate mechanism was one of the more popular | |||
relative rate mechanism was one of the more popular mechanisms for | ways to manage traffic for the Available Bit Rate (ABR) service in | |||
managing traffic, tending to supersede earlier designs. In this | ATM, and it tended to supersede earlier designs. In this approach, | |||
approach ATM switches send special resource management (RM) cells in | ATM switches send special resource management (RM) cells in both the | |||
both the forward and backward directions to control the ingress rate | forward and backward directions to control the ingress rate of user | |||
of user data into a virtual circuit. If a switch buffer is | data into a virtual circuit. If a switch buffer is approaching | |||
approaching congestion or is congested it sends an RM cell back | congestion or is congested, it sends an RM cell back towards the | |||
towards the ingress with respectively the No Increase (NI) or | ingress with respectively the No Increase (NI) or Congestion | |||
Congestion Indication (CI) bit set in its message type field | Indication (CI) bit set in its message type field [ATM-TM-ABR]. The | |||
[ATM-TM-ABR]. The ingress then holds or decreases its sending bit- | ingress then holds or decreases its sending bit rate accordingly. | |||
rate accordingly. | ||||
_ _ _ | _ _ _ | |||
/_______ | | |C| ACK packet (X) | /_______ | | |C| ACK packet (X) | |||
\ |_|_|_| | \ |_|_|_| | |||
+---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 | |||
| | +---+ | ^ | | | | +---+ | ^ | | |||
| | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 | | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 | |||
| | +---+ +---+ | | +---+ +---+ | | | | | +---+ +---+ | | +---+ +---+ | | | |||
| | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 | | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 | |||
skipping to change at page 13, line 46 ¶ | skipping to change at line 576 ¶ | |||
2 | 2 | |||
__ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier | __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier | |||
| | | | | | | | | | | | | | | | | | | data________\ | | | | | | | | | | | | | | | | | | | | data________\ | |||
|__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / | |||
layer: 4 3 2 4 3 2 4 3 4 3 2 | layer: 4 3 2 4 3 2 4 3 4 3 2 | |||
header | header | |||
Figure 3: Feed-Backward Mode | Figure 3: Feed-Backward Mode | |||
ATM's feed-backward approach does not fit well when layered beneath | ATM's feed-backward approach does not fit well when layered beneath | |||
IP's feed-forward approach — unless the initial data source is the | IP's feed-forward approach unless the initial data source is the same | |||
same node as the ATM ingress. Figure 3 shows the feed-backward | node as the ATM ingress. Figure 3 shows the feed-backward approach | |||
approach being used in subnet H. If the final switch on the path is | being used in subnet H. If the final switch on the path is congested | |||
congested (*), it does not feed-forward any congestion indications on | (*), it does not feed forward any congestion indications on the | |||
packet (U). Instead it sends a control cell (V) back to the router | packet (U). Instead, it sends a control cell (V) back to the router | |||
at the ATM ingress. | at the ATM ingress. | |||
However, the backward feedback does not reach the original data | However, the backward feedback does not reach the original data | |||
source directly because IP does not support backward feedback (and | source directly because IP does not support backward feedback (and | |||
subnet G is independent of subnet H). Instead, the router in the | subnet G is independent of subnet H). Instead, the router in the | |||
middle throttles down its sending rate but the original data sources | middle throttles down its sending rate, but the original data sources | |||
don't reduce their rates. The resulting rate mismatch causes the | don't reduce their rates. The resulting rate mismatch causes the | |||
middle router's buffer at layer 3 to back up until it becomes | middle router's buffer at layer 3 to back up until it becomes | |||
congested, which it signals forwards on later data packets at layer 3 | congested, which it signals forwards on later data packets at layer 3 | |||
(e.g. packet W). Note that the forward signal from the middle router | (e.g., packet W). Note that the forward signal from the middle | |||
is not triggered directly by the backward signal. Rather, it is | router is not triggered directly by the backward signal. Rather, it | |||
triggered by congestion resulting from the middle router's mismatched | is triggered by congestion resulting from the middle router's | |||
rate response to the backward signal. | mismatched rate response to the backward signal. | |||
In response to this later forward signalling, end-to-end feedback at | In response to this later forward signalling, end-to-end feedback at | |||
layer-4 finally completes the tortuous path of congestion indications | layer 4 finally completes the tortuous path of congestion indications | |||
back to the origin data source, as before. | back to the origin data source as before. | |||
Quantized congestion notification (QCN [IEEE802.1Q]) would suffer | Quantized Congestion Notification (QCN) [IEEE802.1Q] would suffer | |||
from similar problems if extended to multiple subnets. However, from | from similar problems if extended to multiple subnets. However, QCN | |||
the start QCN was clearly characterized as solely applicable to a | was clearly characterized as solely applicable to a single subnet | |||
single subnet (see Section 6). | from the start (see Section 6). | |||
3.4. Null Mode | 3.4. Null Mode | |||
Often link and physical layer resources are 'non-blocking' by design. | Link- and physical-layer resources are often 'non-blocking' by | |||
In these cases congestion notification may be implemented but it does | design. Congestion notification may be implemented in these cases, | |||
not need to be deployed at the lower layer; ECN in IP would be | but it does not need to be deployed at the lower layer; ECN in IP | |||
sufficient. | would be sufficient. | |||
A degenerate example is a point-to-point Ethernet link. Excess | A degenerate example is a point-to-point Ethernet link. Excess | |||
loading of the link merely causes the queue from the higher layer to | loading of the link merely causes the queue from the higher layer to | |||
back up, while the lower layer remains immune to congestion. Even a | back up, while the lower layer remains immune to congestion. Even a | |||
whole meshed subnetwork can be made immune to interior congestion by | whole meshed subnetwork can be made immune to interior congestion by | |||
limiting ingress capacity and sufficient sizing of interior links, | limiting ingress capacity and sufficient sizing of interior links, | |||
e.g. a non-blocking fat-tree network [Leiserson85]. An alternative | e.g., a non-blocking fat-tree network [Leiserson85]. An alternative | |||
to fat links near the root is numerous thin links with multi-path | to fat links near the root is numerous thin links with multi-path | |||
routing to ensure even worst-case patterns of load cannot congest any | routing to ensure even worst-case patterns of load cannot congest any | |||
link, e.g. a Clos network [Clos53]. | link, e.g., a Clos network [Clos53]. | |||
4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | |||
Notification | Notification | |||
Feed-forward-and-up is the mode already used for signalling ECN up | Feed-forward-and-up is the mode already used for signalling ECN up | |||
the layers through MPLS into IP [RFC5129] and through IP-in-IP | the layers through MPLS into IP [RFC5129] and through IP-in-IP | |||
tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 | tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 | |||
[RFC2473] or IPsec [RFC4301]. These RFCs take a consistent approach | [RFC2473], or IPsec [RFC4301]. These RFCs take a consistent approach | |||
and the following guidelines are designed to ensure this consistency | and the following guidelines are designed to ensure this consistency | |||
continues as ECN support is added to other protocols that encapsulate | continues as ECN support is added to other protocols that encapsulate | |||
IP. The guidelines are also designed to ensure compliance with the | IP. The guidelines are also designed to ensure compliance with the | |||
more general best current practice for the design of alternate ECN | more general best current practice for the design of alternate ECN | |||
schemes given in [RFC4774] and extended by [RFC8311]. | schemes given in [RFC4774] and extended by [RFC8311]. | |||
The rest of this section is structured as follows: | The rest of this section is structured as follows: | |||
* Section 4.1 addresses the most straightforward cases, where | * Section 4.1 addresses the most straightforward cases, where | |||
[RFC6040] can be applied directly to add ECN to tunnels that are | [RFC6040] can be applied directly to add ECN to tunnels that are | |||
effectively IP-in-IP tunnels, but with shim header(s) between the | effectively IP-in-IP tunnels, but with a shim header(s) between | |||
IP headers. | the IP headers. | |||
* The subsequent sections give guidelines for adding ECN to a subnet | * The subsequent sections give guidelines for adding congestion | |||
technology that uses feed-forward-and-up mode like IP, but it is | notification to a subnet technology that uses feed-forward-and-up | |||
not so similar to IP that [RFC6040] rules can be applied directly. | mode like IP, but it is not so similar to IP that [RFC6040] rules | |||
Specifically: | can be applied directly. Specifically: | |||
- Sections 4.2, 4.3 and 4.4 respectively address how to add ECN | - Sections 4.2, 4.3, and 4.4 address how to add ECN support to | |||
support to the wire protocol and to the encapsulators and | the wire protocol and to the encapsulators and decapsulators at | |||
decapsulators at the ingress and egress of the subnet. | the ingress and egress of the subnet, respectively. | |||
- Section 4.5 deals with the special, but common, case of | - Section 4.5 deals with the special but common case of sequences | |||
sequences of tunnels or subnets that all use the same | of tunnels or subnets that all use the same technology. | |||
technology | ||||
- Section 4.6 deals with the question of reframing when IP | - Section 4.6 deals with the question of reframing when IP | |||
packets do not map 1:1 into lower layer frames. | packets do not map 1:1 into lower-layer frames. | |||
4.1. IP-in-IP Tunnels with Shim Headers | 4.1. IP-in-IP Tunnels with Shim Headers | |||
A common pattern for many tunnelling protocols is to encapsulate an | A common pattern for many tunnelling protocols is to encapsulate an | |||
inner IP header with shim header(s) then an outer IP header. A shim | inner IP header with a shim header(s) then an outer IP header. A | |||
header is defined as one that is not sufficient alone to forward the | shim header is defined as one that is not sufficient alone to forward | |||
packet as an outer header. Another common pattern is for a shim to | the packet as an outer header. Another common pattern is for a shim | |||
encapsulate a layer 2 (L2) header, which in turn encapsulates (or | to encapsulate an L2 header, which in turn encapsulates (or might | |||
might encapsulate) an IP header. [I-D.ietf-tsvwg-rfc6040update-shim] | encapsulate) an IP header. [RFC9601] clarifies that [RFC6040] is | |||
clarifies that RFC 6040 is just as applicable when there are shim(s) | just as applicable when there are shims and even an L2 header between | |||
and possibly a L2 header between two IP headers. | two IP headers. | |||
However, it is not always feasible or necessary to propagate ECN | However, it is not always feasible or necessary to propagate ECN | |||
between IP headers when separated by a shim. For instance, it might | between IP headers when separated by a shim. For instance, it might | |||
be too costly to dig to arbitrary depths to find an inner IP header, | be too costly to dig to arbitrary depths to find an inner IP header, | |||
there may be little or no congestion within the tunnel by design (see | there may be little or no congestion within the tunnel by design (see | |||
null mode in Section 3.4 above), or a legacy implementation might not | null mode in Section 3.4 above), or a legacy implementation might not | |||
support ECN. In cases where a tunnel does not support ECN, it is | support ECN. In cases where a tunnel does not support ECN, it is | |||
important that the ingress does not copy the ECN field from an inner | important that the ingress does not copy the ECN field from an inner | |||
IP header to an outer. Therefore Section 4 of | IP header to an outer. Therefore Section 4 of [RFC9601] requires | |||
[I-D.ietf-tsvwg-rfc6040update-shim] requires network operators to | network operators to configure the ingress of a tunnel that does not | |||
configure the ingress of a tunnel that does not support ECN so that | support ECN so that it zeros the ECN field in the outer IP header. | |||
it zeros the ECN field in the outer IP header. | ||||
Nonetheless, in many cases it is feasible to propagate the ECN field | Nonetheless, in many cases it is feasible to propagate the ECN field | |||
between IP headers separated by shim header(s) and/or a L2 header. | between IP headers separated by shim headers and/or an L2 header. | |||
Particularly in the typical case when the outer IP header and the | Particularly in the typical case when the outer IP header and the | |||
shim(s) are added (or removed) as part of the same procedure. Even | shim(s) are added (or removed) as part of the same procedure. Even | |||
if the shim(s) encapsulate a L2 header, it is often possible to find | if a shim encapsulates an L2 header, it is often possible to find an | |||
an inner IP header within the L2 PDU and propagate ECN between that | inner IP header within the L2 PDU and propagate ECN between that and | |||
and the outer IP header. This can be thought of as a special case of | the outer IP header. This can be thought of as a special case of the | |||
the feed-up-and-forward mode (Section 3.2), so the guidelines for | feed-up-and-forward mode (Section 3.2), so the guidelines for this | |||
this mode apply (Section 5). | mode apply (Section 5). | |||
Numerous shim protocols have been defined for IP tunnelling. More | Numerous shim protocols have been defined for IP tunnelling. More | |||
recent ones e.g. Geneve [RFC8926] and Generic UDP Encapsulation | recent ones, e.g., Geneve [RFC8926] and Generic UDP Encapsulation | |||
(GUE) [I-D.ietf-intarea-gue] cite and follow RFC 6040. And some | (GUE) [INTAREA-GUE] cite and follow [RFC6040]. Some earlier ones, | |||
earlier ones, e.g. CAPWAP [RFC5415] and LISP [RFC9300], cite RFC | e.g., CAPWAP [RFC5415] and LISP [RFC9300], cite [RFC3168], which is | |||
3168, which is compatible with RFC 6040. | compatible with [RFC6040]. | |||
However, as Section 9.3 of [RFC3168] pointed out, ECN support needs | However, as Section 9.3 of [RFC3168] pointed out, ECN support needs | |||
to be defined for many earlier shim-based tunnelling protocols, e.g. | to be defined for many earlier shim-based tunnelling protocols, e.g., | |||
L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], | L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], | |||
GTP [GTPv1], [GTPv1-U], [GTPv2-C] and Teredo [RFC4380] as well as | GTP [GTPv1] [GTPv1-U] [GTPv2-C], and Teredo [RFC4380], as well as | |||
some recent ones, e.g. VXLAN [RFC7348], NVGRE [RFC7637] and NSH | some recent ones, e.g., VXLAN [RFC7348], NVGRE [RFC7637], and NSH | |||
[RFC8300]. | [RFC8300]. | |||
All these IP-based encapsulations can be updated in one shot by | All these IP-based encapsulations can be updated in one shot by | |||
simple reference to RFC 6040. However, it would not be appropriate | simple reference to [RFC6040]. However, it would not be appropriate | |||
to update all these protocols from within the present guidance | to update all these protocols from within the present guidance | |||
document. Instead a companion specification | document. Instead, a companion specification [RFC9601] has the | |||
[I-D.ietf-tsvwg-rfc6040update-shim] has been prepared that has the | appropriate Standards Track status to update Standards Track | |||
appropriate standards track status to update standards track | ||||
protocols. For those that are not under IETF change control | protocols. For those that are not under IETF change control | |||
[I-D.ietf-tsvwg-rfc6040update-shim] can only recommend that the | [RFC9601] can only recommend that the relevant body updates them. | |||
relevant body updates them. | ||||
4.2. Wire Protocol Design: Indication of ECN Support | 4.2. Wire Protocol Design: Indication of ECN Support | |||
This section is intended to guide the redesign of any lower layer | This section is intended to guide the redesign of any lower-layer | |||
protocol that encapsulate IP to add native ECN support at the lower | protocol that encapsulates IP to add built-in congestion notification | |||
layer. It reflects the approaches used in [RFC6040] and in | support at the lower layer using feed-forward-and-up mode. It | |||
[RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS | reflects the approaches used in [RFC6040] and in [RFC5129]. | |||
Therefore, IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS | ||||
encapsulations that already comply with [RFC6040] or [RFC5129] will | encapsulations that already comply with [RFC6040] or [RFC5129] will | |||
already satisfy this guidance. | already satisfy this guidance. | |||
A lower layer (or subnet) congestion notification system: | A lower-layer (or subnet) congestion notification system: | |||
1. SHOULD NOT apply explicit congestion notifications to PDUs that | 1. SHOULD NOT apply explicit congestion notifications to PDUs that | |||
are destined for legacy layer-4 transport implementations that | are destined for legacy layer-4 transport implementations that | |||
will not understand ECN, and | will not understand ECN; and | |||
2. SHOULD NOT apply explicit congestion notifications to PDUs if the | 2. SHOULD NOT apply explicit congestion notifications to PDUs if the | |||
egress of the subnet might not propagate congestion notifications | egress of the subnet might not propagate congestion notification | |||
onward into the higher layer. | onward into the higher layer. | |||
We use the term ECN-PDUs for a PDU on a feedback loop that will | We use the term ECN-PDU for a PDU on a feedback loop that will | |||
propagate congestion notification properly because it meets both | propagate congestion notification properly because it meets both | |||
the above criteria. And a Not-ECN-PDU is a PDU on a feedback | the above criteria. Additionally, a Not-ECN-PDU is a PDU on a | |||
loop that does not meet at least one of the criteria, and will | feedback loop that does not meet at least one of the criteria, | |||
therefore not propagate congestion notification properly. A | and therefore will not propagate congestion notification | |||
corollary of the above is that a lower layer congestion | properly. A corollary of the above is that a lower-layer | |||
notification protocol: | congestion notification protocol: | |||
3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. | 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. | |||
Note that there is no need for all interior nodes within a subnet to | Note that there is no need for all interior nodes within a subnet to | |||
be able to mark congestion explicitly. A mix of ECN and drop signals | be able to mark congestion explicitly. A mix of drop and explicit | |||
from different nodes is fine. However, if _any_ interior nodes might | congestion signals from different nodes is fine. However, if _any_ | |||
generate ECN markings, guideline 2 above says that all relevant | interior nodes might generate congestion markings, Guideline 2 above | |||
egress node(s) SHOULD be able to propagate those markings up to the | says that all relevant egress nodes SHOULD be able to propagate those | |||
higher layer. | markings up to the higher layer. | |||
In IP, if the ECN field in each PDU is cleared to the Not-ECT (not | In IP, if the ECN field in each PDU is cleared to the Not ECN-Capable | |||
ECN-capable transport) codepoint, it indicates that the L4 transport | Transport (Not-ECT) codepoint, it indicates that the L4 transport | |||
will not understand congestion markings. A congested buffer must not | will not understand congestion markings. A congested buffer must not | |||
mark these Not-ECT PDUs, and therefore has to signal congestion by | mark these Not-ECT PDUs; therefore, it has to signal congestion by | |||
increasingly applying drop instead. | increasingly applying drop instead. | |||
The mechanism a lower layer uses to distinguish the ECN-capability of | The mechanism a lower layer uses to distinguish the ECN capability of | |||
PDUs need not mimic that of IP. The above guidelines merely say that | PDUs need not mimic that of IP. The above guidelines merely say that | |||
the lower layer system, as a whole, should achieve the same outcome. | the lower-layer system as a whole should achieve the same outcome. | |||
For instance, ECN-capable feedback loops might use PDUs that are | For instance, ECN-capable feedback loops might use PDUs that are | |||
identified by a particular set of labels or tags. Alternatively, | identified by a particular set of labels or tags. Alternatively, | |||
logical link protocols that use flow state might determine whether a | logical-link protocols that use flow state might determine whether a | |||
PDU can be congestion marked by checking for ECN-support in the flow | PDU can be congestion marked by checking for ECN support in the flow | |||
state. Other protocols might depend on out-of-band control signals. | state. Other protocols might depend on out-of-band control signals. | |||
The per-domain checking of ECN support in MPLS [RFC5129] is a good | The per-domain checking of ECN support in MPLS [RFC5129] is a good | |||
example of a way to avoid sending congestion markings to L4 | example of a way to avoid sending congestion markings to L4 | |||
transports that will not understand them, without using any header | transports that will not understand them without using any header | |||
space in the subnet protocol. | space in the subnet protocol. | |||
In MPLS, header space is extremely limited, therefore RFC5129 does | In MPLS, header space is extremely limited; therefore, [RFC5129] does | |||
not provide a field in the MPLS header to indicate whether the PDU is | not provide a field in the MPLS header to indicate whether the PDU is | |||
an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are | an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are | |||
allowed to set explicit congestion indications without checking | allowed to set explicit congestion indications without checking | |||
whether the PDU is destined for a L4 transport that will understand | whether the PDU is destined for a L4 transport that will understand | |||
them. Nonetheless, this is made safe by requiring that the network | them. Nonetheless, this is made safe by requiring that the network | |||
operator upgrades all decapsulating edges of a whole domain at once, | operator upgrades all decapsulating edges of a whole domain at once | |||
as soon as even one switch within the domain is configured to mark | as soon as even one switch within the domain is configured to mark | |||
rather than drop some PDUs during congestion. Therefore, any edge | rather than drop some PDUs during congestion. Therefore, any edge | |||
node that might decapsulate a packet will be capable of checking | node that might decapsulate a packet will be capable of checking | |||
whether the higher layer transport is ECN-capable. When | whether the higher-layer transport is ECN-capable. When | |||
decapsulating a CE-marked packet, if the decapsulator discovers that | decapsulating a CE-marked packet, if the decapsulator discovers that | |||
the higher layer (inner header) indicates the transport is not ECN- | the higher layer (inner header) indicates the transport is not ECN- | |||
capable, it drops the packet — effectively on behalf of the earlier | capable, it drops the packet -- effectively on behalf of the earlier | |||
congested node (see Decapsulation Guideline 1 in Section 4.4). | congested node (see Decapsulation Guideline 1 in Section 4.4). | |||
It was only appropriate to define such an incremental deployment | It was only appropriate to define such an incremental deployment | |||
strategy because MPLS is targeted solely at professional operators, | strategy because MPLS is targeted solely at professional operators | |||
who can be expected to ensure that a whole subnetwork is consistently | who can be expected to ensure that a whole subnetwork is consistently | |||
configured. This strategy might not be appropriate for other link | configured. This strategy might not be appropriate for other link | |||
technologies targeted at zero-configuration deployment or deployment | technologies targeted at zero-configuration deployment or deployment | |||
by the general public (e.g. Ethernet). For such 'plug-and-play' | by the general public (e.g., Ethernet). For such 'plug-and-play' | |||
environments it will be necessary to invent a failsafe approach that | environments, it will be necessary to invent a fail-safe approach | |||
ensures congestion markings will never fall into black holes, no | that ensures congestion markings will never fall into black holes, no | |||
matter how inconsistently a system is put together. Alternatively, | matter how inconsistently a system is put together. Alternatively, | |||
congestion notification relying on correct system configuration could | congestion notification relying on correct system configuration could | |||
be confined to flavours of Ethernet intended only for professional | be confined to flavours of Ethernet intended only for professional | |||
network operators, such as Provider Backbone Bridges (PBB | network operators, such as Provider Backbone Bridges (PBB) | |||
[IEEE802.1Q]; previously 802.1ah). | ([IEEE802.1Q]; previously 802.1ah). | |||
ECN support in TRILL [I-D.ietf-trill-ecn-support] provides a good | ECN support in TRansparent Interconnection of Lots of Links (TRILL) | |||
example of how to add ECN to a lower layer protocol without relying | [RFC9600] provides a good example of how to add congestion | |||
on careful and consistent operator configuration. TRILL provides an | notification to a lower-layer protocol without relying on careful and | |||
extension header word with space for flags of different categories | consistent operator configuration. TRILL provides an extension | |||
depending on whether logic to understand the extension is critical. | header word with space for flags of different categories depending on | |||
The congestion experienced marking has been defined as a 'critical | whether logic to understand the extension is critical. The | |||
ingress-to-egress' flag. So if a transit RBridge sets this flag on a | congestion-experienced marking has been defined as a 'critical | |||
frame and an egress RBridge does not have any logic to process it, it | ingress-to-egress' flag. So, if a transit RBridge sets this flag on | |||
will drop it; which is the desired default action anyway. Therefore | a frame and an egress RBridge does not have any logic to process it, | |||
TRILL RBridges can be updated with support for ECN in no particular | the egress RBridge will drop the frame, which is the desired default | |||
order and, at the egress of the TRILL campus, congestion notification | action anyway. Therefore, TRILL RBridges can be updated with support | |||
will be propagated to IP as ECN whenever ECN logic has been | for congestion notification in no particular order and, at the egress | |||
implemented at the egress, or as drop otherwise. | of the TRILL campus, congestion notification will be propagated to IP | |||
as ECN whenever ECN logic has been implemented at the egress, or as | ||||
drop otherwise. | ||||
QCN [IEEE802.1Q] is not intended to extend beyond a single subnet, or | QCN [IEEE802.1Q] is not intended to extend beyond a single subnet or | |||
to interoperate with ECN. Nonetheless, the way QCN indicates to | interoperate with IP-ECN. Nonetheless, the way QCN indicates to | |||
lower layer devices that the end-points will not understand QCN | lower-layer devices that the endpoints will not understand QCN | |||
provides another example that a lower layer protocol designer might | provides another example that a lower-layer protocol designer might | |||
be able to mimic for their scenario. An operator can define certain | be able to mimic for their scenario. An operator can define certain | |||
Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to | Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to | |||
indicate non-QCN frames and an ingress bridge is required to map | indicate non-QCN frames. Then an ingress bridge has to map each | |||
arriving not-QCN-capable IP packets to one of these non-QCN PCPs. | arriving not-QCN-capable IP packet to one of these non-QCN PCPs. | |||
When drop for non-ECN traffic is deferred to the egress of a subnet, | When drop for non-ECN traffic is deferred to the egress of a subnet, | |||
it cannot necessarily be assumed that one ECN mark is equivalent to | it cannot necessarily be assumed that one congestion mark is | |||
one drop, as was originally required by [RFC3168]. [RFC8311] updated | equivalent to one drop, as was originally required by [RFC3168]. | |||
RFC 3168, to allow experimentation with congestion markings that are | [RFC8311] updated [RFC3168] to allow experimentation with congestion | |||
not equivalent to drop, in particular for L4S [RFC9331]. ECN support | markings that are not equivalent to drop, particularly for L4S | |||
in TRILL [I-D.ietf-trill-ecn-support] is a good example of a way to | [RFC9331]. ECN support in TRILL [RFC9600] is a good example of a way | |||
defer drop to the egress of a subnet both when marks are equivalent | to defer drop to the egress of a subnet both when marks are | |||
to drops (as in RFC 3168) and when they are not (as in L4S). The ECN | equivalent to drops (as in [RFC3168]) and when they are not (as in | |||
scheme for MPLS [RFC5129] was defined before L4S, so it only | L4S). The ECN scheme for MPLS [RFC5129] was defined before L4S, so | |||
currently supports deferred drop that is equivalent to ECN-marking. | it only currently supports deferred drop that is equivalent to ECN | |||
Nonetheless, in principle, MPLS (and potentially future L2 protocols) | marking. Nonetheless, in principle, MPLS (and potentially future L2 | |||
could support L4S marking and copy TRILL's approach for determining | protocols) could support L4S marking by copying TRILL's approach for | |||
the drop level of any non-ECN traffic at the subnet egress. | determining the drop level of any non-ECN traffic at the subnet | |||
egress. | ||||
4.3. Encapsulation Guidelines | 4.3. Encapsulation Guidelines | |||
This section is intended to guide the redesign of any node that | This section is intended to guide the redesign of any node that | |||
encapsulates IP with a lower layer header when adding native ECN | encapsulates IP with a lower-layer header when adding built-in | |||
support to the lower layer protocol. It reflects the approaches used | congestion notification support to the lower-layer protocol using | |||
in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- | feed-forward-and-up mode. It reflects the approaches used in | |||
MPLS or MPLS-in-MPLS encapsulations that already comply with | [RFC6040] and [RFC5129]. Therefore, IP-in-IP tunnels or IP-in-MPLS | |||
[RFC6040] or [RFC5129] will already satisfy this guidance. | or MPLS-in-MPLS encapsulations that already comply with [RFC6040] or | |||
[RFC5129] will already satisfy this guidance. | ||||
1. Egress Capability Check: A subnet ingress needs to be sure that | 1. Egress Capability Check: A subnet ingress needs to be sure that | |||
the corresponding egress of a subnet will propagate any | the corresponding egress of a subnet will propagate any | |||
congestion notification added to the outer header across the | congestion notification added to the outer header across the | |||
subnet. This is necessary in addition to checking that an | subnet. This is necessary in addition to checking that an | |||
incoming PDU indicates an ECN-capable (L4) transport. Examples | incoming PDU indicates an ECN-capable (L4) transport. Examples | |||
of how this guarantee might be provided include: | of how this guarantee might be provided include: | |||
* by configuration (e.g. if any label switches in a domain | * by configuration (e.g., if any label switch in a domain | |||
support ECN marking, [RFC5129] requires all egress nodes to | supports congestion marking, [RFC5129] requires all egress | |||
have been configured to propagate ECN) | nodes to have been configured to propagate ECN). | |||
* by the ingress explicitly checking that the egress propagates | * by the ingress explicitly checking that the egress propagates | |||
ECN (e.g. an early attempt to add ECN support to TRILL used | ECN (e.g., an early attempt to add ECN support to TRILL used | |||
IS-IS to check path capabilities before adding ECN extension | IS-IS to check path capabilities before adding ECN extension | |||
flags to each frame [RFC7780]). | flags to each frame [RFC7780]). | |||
* by inherent design of the protocol (e.g. by encoding ECN | * by inherent design of the protocol (e.g., by encoding | |||
marking on the outer header in such a way that a legacy egress | congestion marking on the outer header in such a way that a | |||
that does not understand ECN will consider the PDU corrupt or | legacy egress that does not understand ECN will consider the | |||
invalid and discard it, thus at least propagating a form of | PDU corrupt or invalid and discard it; thus, at least | |||
congestion signal). | propagating a form of congestion signal). | |||
2. Egress Fails Capability Check: If the ingress cannot guarantee | 2. Egress Fails Capability Check: If the ingress cannot guarantee | |||
that the egress will propagate congestion notification, the | that the egress will propagate congestion notification, the | |||
ingress SHOULD disable ECN at the lower layer when it forwards | ingress SHOULD disable congestion notification at the lower layer | |||
the PDU. An example of how the ingress might disable ECN at the | when it forwards the PDU. An example of how the ingress might | |||
lower layer would be by setting the outer header of the PDU to | disable congestion notification at the lower layer would be by | |||
identify it as a Not-ECN-PDU, assuming the subnet technology | setting the outer header of the PDU to identify it as a Not-ECN- | |||
supports such a concept. | PDU, assuming the subnet technology supports such a concept. | |||
3. Standard Congestion Monitoring Baseline: Once the ingress to a | 3. Standard Congestion Monitoring Baseline: Once the ingress to a | |||
subnet has established that the egress will correctly propagate | subnet has established that the egress will correctly propagate | |||
ECN, on encapsulation it SHOULD encode the same level of | ECN, on encapsulation, it SHOULD encode the same level of | |||
congestion in outer headers as is arriving in incoming headers. | congestion in outer headers as is arriving in incoming headers. | |||
For example, it might copy any incoming congestion notification | For example, it might copy any incoming congestion notifications | |||
into the outer header of the lower layer protocol. | into the outer header of the lower-layer protocol. | |||
This ensures that bulk congestion monitoring of outer headers | This ensures that bulk congestion monitoring of outer headers | |||
(e.g. by a network management node monitoring ECN in passing | (e.g., by a network management node monitoring congestion | |||
frames) will measure congestion accumulated along the whole | markings in passing frames) will measure congestion accumulated | |||
upstream path — starting from the Load Regulator not just | along the whole upstream path, starting from the Load Regulator | |||
starting from the ingress of the subnet. A node that is not the | and not just starting from the ingress of the subnet. A node | |||
Load Regulator SHOULD NOT re-initialize the level of CE markings | that is not the Load Regulator SHOULD NOT re-initialize the level | |||
in the outer to zero. | of CE markings in the outer header to zero. | |||
It would still also be possible to measure congestion introduced | It would still also be possible to measure congestion introduced | |||
across one subnet (or tunnel) by subtracting the level of CE | across one subnet (or tunnel) by subtracting the level of CE | |||
markings on inner headers from that on outer headers (see | markings on inner headers from that on outer headers (see | |||
Appendix C of [RFC6040]). For example: | Appendix C of [RFC6040]). For example: | |||
* If this guideline has been followed and if the level of CE | * If this guideline has been followed and if the level of CE | |||
markings is 0.4% on the outer and 0.1% on the inner, 0.4% | markings is 0.4% on the outer header and 0.1% on the inner | |||
congestion has been introduced across all the networks since | header, 0.4% congestion has been introduced across all the | |||
the load regulator, and 0.3% (= 0.4% - 0.1%) has been | networks since the Load Regulator, and 0.3% (= 0.4% - 0.1%) | |||
introduced since the ingress to the current subnet (or | has been introduced since the ingress to the current subnet | |||
tunnel); | (or tunnel). | |||
* Without this guideline, if the subnet ingress had re- | * Without this guideline, if the subnet ingress had re- | |||
initialized the outer congestion level to zero, the outer and | initialized the outer congestion level to zero, the outer and | |||
inner would measure 0.1% and 0.3%. It would still be possible | inner headers would measure 0.1% and 0.3%. It would still be | |||
to infer that the congestion introduced since the Load | possible to infer that the congestion introduced since the | |||
Regulator was 0.4% (= 0.1% + 0.3%). But only if the | Load Regulator was 0.4% (= 0.1% + 0.3%), but only if the | |||
monitoring system somehow knows whether the subnet ingress re- | monitoring system somehow knows whether the subnet ingress re- | |||
initialized the congestion level. | initialized the congestion level. | |||
As long as subnet and tunnel technologies use the standard | As long as subnet and tunnel technologies use the standard | |||
congestion monitoring baseline in this guideline, monitoring | congestion monitoring baseline in this guideline, monitoring | |||
systems will know to use the former approach, rather than having | systems will know to use the former approach rather than having | |||
to "somehow know" which approach to use. | to 'somehow know' which approach to use. | |||
4.4. Decapsulation Guidelines | 4.4. Decapsulation Guidelines | |||
This section is intended to guide the redesign of any node that | This section is intended to guide the redesign of any node that | |||
decapsulates IP from within a lower layer header when adding native | decapsulates IP from within a lower-layer header when adding built-in | |||
ECN support to the lower layer protocol. It reflects the approaches | congestion notification support to the lower-layer protocol using | |||
used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or | feed-forward-and-up mode. It reflects the approaches used in | |||
IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with | [RFC6040] and in [RFC5129]. Therefore, IP-in-IP tunnels or IP-in- | |||
MPLS or MPLS-in-MPLS encapsulations that already comply with | ||||
[RFC6040] or [RFC5129] will already satisfy this guidance. | [RFC6040] or [RFC5129] will already satisfy this guidance. | |||
A subnet egress SHOULD NOT simply copy congestion notification from | A subnet egress SHOULD NOT simply copy congestion notifications from | |||
outer headers to the forwarded header. It SHOULD calculate the | outer headers to the forwarded header. It SHOULD calculate the | |||
outgoing congestion notification field from the inner and outer | outgoing congestion notification field from the inner and outer | |||
headers using the following guidelines. If there is any conflict, | headers using the following guidelines. If there is any conflict, | |||
rules earlier in the list take precedence over rules later in the | rules earlier in the list take precedence over rules later in the | |||
list: | list. | |||
1. If the arriving inner header is a Not-ECN-PDU it implies the L4 | 1. If the arriving inner header is a Not-ECN-PDU, it implies the L4 | |||
transport will not understand explicit congestion markings. | transport will not understand explicit congestion markings. | |||
Then: | Then: | |||
* If the outer header carries an explicit congestion marking, it | * If the outer header carries an explicit congestion marking, it | |||
is likely that a protocol error has occurred, so drop is the | is likely that a protocol error has occurred, so drop is the | |||
only indication of congestion that the L4 transport will | only indication of congestion that the L4 transport will | |||
understand. If the congestion marking is the most severe | understand. If the outer congestion marking is the most | |||
possible, the packet MUST be dropped. However, if congestion | severe possible, the packet MUST be dropped. However, if | |||
can be marked with multiple levels of severity and the | congestion can be marked with multiple levels of severity and | |||
packet's marking is not the most severe, this requirement can | the packet's outer marking is not the most severe, this | |||
be relaxed to: the packet SHOULD be dropped. | requirement can be relaxed to: the packet SHOULD be dropped. | |||
* If the outer is an ECN-PDU that carries no indication of | * If the outer is an ECN-PDU that carries no indication of | |||
congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but | congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but | |||
still as a Not-ECN-PDU. | still as a Not-ECN-PDU. | |||
2. If the outer header does not support explicit congestion | 2. If the outer header does not support congestion notification (a | |||
notification (a Not-ECN-PDU), but the inner header does (an ECN- | Not-ECN-PDU), but the inner header does (an ECN-PDU), the inner | |||
PDU), the inner header SHOULD be forwarded unchanged. | header SHOULD be forwarded unchanged. | |||
3. In some lower layer protocols congestion may be signalled as a | 3. In some lower-layer protocols, congestion may be signalled as a | |||
numerical level, such as in the control frames of quantized | numerical level, such as in the control frames of QCN | |||
congestion notification (QCN [IEEE802.1Q]). If such a multi-bit | [IEEE802.1Q]. If such a multi-bit encoding encapsulates an ECN- | |||
encoding encapsulates an ECN-capable IP data packet, a function | capable IP data packet, a function will be needed to convert the | |||
will be needed to convert the quantized congestion level into the | quantized congestion level into the frequency of congestion | |||
frequency of congestion markings in outgoing IP packets. | markings in outgoing IP packets. | |||
4. Congestion indications might be encoded by a severity level. For | 4. Congestion indications might be encoded by a severity level. For | |||
instance increasing levels of congestion might be encoded by | instance, increasing levels of congestion might be encoded by | |||
numerically increasing indications, e.g. pre-congestion | numerically increasing indications, e.g., PCN can be encoded in | |||
notification (PCN) can be encoded in each PDU at three severity | each PDU at three severity levels in IP or MPLS [RFC6660] and the | |||
levels in IP or MPLS [RFC6660] and the default encapsulation and | default encapsulation and decapsulation rules [RFC6040] are | |||
decapsulation rules [RFC6040] are compatible with this | compatible with this interpretation of the ECN field. | |||
interpretation of the ECN field. | ||||
If the arriving inner header is an ECN-PDU, where the inner and | If the arriving inner header is an ECN-PDU, where the inner and | |||
outer headers carry indications of congestion of different | outer headers carry indications of congestion of different | |||
severity, the more severe indication SHOULD be forwarded in | severity, the more severe indication SHOULD be forwarded in | |||
preference to the less severe. | preference to the less severe. | |||
5. The inner and outer headers might carry a combination of | 5. The inner and outer headers might carry a combination of | |||
congestion notification fields that should not be possible given | congestion notification fields that should not be possible given | |||
any currently used protocol transitions. For instance, if | any currently used protocol transitions. For instance, if | |||
Encapsulation Guideline 3 in Section 4.3 had been followed, it | Encapsulation Guideline 3 in Section 4.3 had been followed, it | |||
should not be possible to have a less severe indication of | should not be possible to have a less severe indication of | |||
congestion in the outer than in the inner. It MAY be appropriate | congestion in the outer header than in the inner header. It MAY | |||
to log unexpected combinations of headers and possibly raise an | be appropriate to log unexpected combinations of headers and | |||
alarm. | possibly raise an alarm. | |||
If a safe outgoing codepoint can be defined for such a PDU, the | If a safe outgoing codepoint can be defined for such a PDU, the | |||
PDU SHOULD be forwarded rather than dropped. Some implementers | PDU SHOULD be forwarded rather than dropped. Some implementers | |||
discard PDUs with currently unused combinations of headers just | discard PDUs with currently unused combinations of headers just | |||
in case they represent an attack. However, an approach using | in case they represent an attack. However, an approach using | |||
alarms and policy-mediated drop is preferable to hard-coded drop, | alarms and policy-mediated drop is preferable to hard-coded drop | |||
so that operators can keep track of possible attacks but | so that operators can keep track of possible attacks, but | |||
currently unused combinations are not precluded from future use | currently unused combinations are not precluded from future use | |||
through new standards actions. | through new standards actions. | |||
4.5. Sequences of Similar Tunnels or Subnets | 4.5. Sequences of Similar Tunnels or Subnets | |||
In some deployments, particularly in 3GPP networks, an IP packet may | In some deployments, particularly in 3GPP networks, an IP packet may | |||
traverse two or more IP-in-IP tunnels in sequence that all use | traverse two or more IP-in-IP tunnels in sequence that all use | |||
identical technology (e.g. GTP). | identical technology (e.g., GTP). | |||
In such cases, it would be sufficient for every encapsulation and | In such cases, it would be sufficient for every encapsulation and | |||
decapsulation in the chain to comply with RFC 6040. Alternatively, | decapsulation in the chain to comply with [RFC6040]. Alternatively, | |||
as an optimisation, a node that decapsulates a packet and immediately | as an optimization, a node that decapsulates a packet and immediately | |||
re-encapsulates it for the next tunnel MAY copy the incoming outer | re-encapsulates it for the next tunnel MAY copy the incoming outer | |||
ECN field directly to the outgoing outer and the incoming inner ECN | ECN field directly to the outgoing outer header and the incoming | |||
field directly to the outgoing inner. Then the overall behavior | inner ECN field directly to the outgoing inner header. Then, the | |||
across the sequence of tunnel segments would still be consistent with | overall behaviour across the sequence of tunnel segments would still | |||
RFC 6040. | be consistent with [RFC6040]. | |||
Appendix C of RFC6040 describes how a tunnel egress can monitor how | Appendix C of [RFC6040] describes how a tunnel egress can monitor how | |||
much congestion has been introduced within a tunnel. A network | much congestion has been introduced within a tunnel. A network | |||
operator might want to monitor how much congestion had been | operator might want to monitor how much congestion had been | |||
introduced within a whole sequence of tunnels. Using the technique | introduced within a whole sequence of tunnels. Using the technique | |||
in Appendix C of RFC6040 at the final egress, the operator could | in Appendix C of [RFC6040] at the final egress, the operator could | |||
monitor the whole sequence of tunnels, but only if the above | monitor the whole sequence of tunnels, but only if the above | |||
optimisation were used consistently along the sequence of tunnels, in | optimization were used consistently along the sequence of tunnels, in | |||
order to make it appear as a single tunnel. Therefore, tunnel | order to make it appear as a single tunnel. Therefore, tunnel | |||
endpoint implementations SHOULD allow the operator to configure | endpoint implementations SHOULD allow the operator to configure | |||
whether this optimisation is enabled. | whether this optimization is enabled. | |||
When ECN support is added to a subnet technology, consideration | When congestion notification support is added to a subnet technology, | |||
SHOULD be given to a similar optimisation between subnets in sequence | consideration SHOULD be given to a similar optimization between | |||
if they all use the same technology. | subnets in sequence if they all use the same technology. | |||
4.6. Reframing and Congestion Markings | 4.6. Reframing and Congestion Markings | |||
The guidance in this section is worded in terms of framing | The guidance in this section is worded in terms of framing | |||
boundaries, but it applies equally whether the protocol data units | boundaries, but it applies equally whether the PDUs are frames, | |||
are frames, cells or packets. | cells, or packets. | |||
Where an AQM marks the ECN field of IP packets as they queue into a | Where an AQM marks the ECN field of IP packets as they queue into a | |||
layer-2 link, there will be no problem with framing boundaries, | Layer 2 link, there will be no problem with framing boundaries | |||
because the ECN markings would be applied directly to IP packets. | because the ECN markings would be applied directly to IP packets. | |||
The guidance in this section is only applicable where an ECN | The guidance in this section is only applicable where a congestion | |||
capability is being added to a layer-2 protocol so that layer-2 | notification capability is being added to a Layer 2 protocol so that | |||
frames can be ECN-marked by an AQM at layer-2. This would only be | Layer 2 frames can be marked by an AQM at layer 2. This would only | |||
necessary where AQM will be applied at pure layer-2 nodes (without | be necessary where AQM will be applied at pure Layer 2 nodes (without | |||
IP-awareness). | IP awareness). | |||
Where ECN marking has had to be applied at non-IP-aware nodes and | Where congestion marking has had to be applied at non-IP-aware nodes | |||
framing boundaries do not necessarily align with packet boundaries, | and framing boundaries do not necessarily align with packet | |||
the decapsulating IP forwarding node SHOULD propagate ECN markings | boundaries, the decapsulating IP forwarding node SHOULD propagate | |||
from layer-2 frame headers to IP packets that may have different | congestion markings from Layer 2 frame headers to IP packets that may | |||
boundaries as a consequence of reframing. | have different boundaries as a consequence of reframing. | |||
Two possible design goals for propagating congestion indications, | Two possible design goals for propagating congestion indications, | |||
described in Section 5.3 of [RFC3168] and Section 2.4 of [RFC7141], | described in Section 5.3 of [RFC3168] and Section 2.4 of [RFC7141], | |||
are: | are: | |||
1. approximate preservation of the presence (and therefore timing) | 1. approximate preservation of the presence (and therefore timing) | |||
of congestion marks on the L2 frames used to construct an IP | of congestion marks on the L2 frames used to construct an IP | |||
packet; | packet; | |||
a. at high frequency of congestion marking, approximate | 2. a. at high frequency of congestion marking, approximate | |||
preservation of the proportion of congestion marks arriving | preservation of the proportion of congestion marks arriving | |||
and departing; | and departing; | |||
b. at low frequency of congestion marking, approximate | b. at low frequency of congestion marking, approximate | |||
preservation of the timing of congestion marks arriving and | preservation of the timing of congestion marks arriving and | |||
departing. | departing. | |||
In either case, an implementation SHOULD ensure that any new incoming | In either case, an implementation SHOULD ensure that any new incoming | |||
congestion indication is propagated immediately, not held awaiting | congestion indication is propagated immediately; not held awaiting | |||
the possibility of further congestion indications to be sufficient to | the possibility of further congestion indications to be sufficient to | |||
indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to | indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to | |||
facilitate pipelined implementation, it would be acceptable for | facilitate pipelined implementation, it would be acceptable for | |||
congestion marks to propagate to a slightly later IP packet. | congestion marks to propagate to a slightly later IP packet. | |||
At decapsulation in either case: | At decapsulation in either case: | |||
* ECN marking propagation logically occurs before application of | * ECN-marking propagation logically occurs before application of | |||
Decapsulation Guideline 1 in Section 4.4. For instance, if ECN | Decapsulation Guideline 1 in Section 4.4. For instance, if ECN- | |||
marking propagation would cause an ECN congestion indication to be | marking propagation would cause an ECN congestion indication to be | |||
applied to an IP packet that is a Not-ECN-PDU, then that IP packet | applied to an IP packet that is a Not-ECN-PDU, then that IP packet | |||
is dropped in accordance with Guideline 1; | is dropped in accordance with Guideline 1. | |||
* where a mix of ECN-PDUs and non-ECN-PDUs arrives to construct the | * Where a mix of ECN-PDUs and non-ECN-PDUs arrives to construct the | |||
same IP packet, the decapsulation spec SHOULD require that packet | same IP packet, the decapsulation specification SHOULD require | |||
to be discarded. | that packet to be discarded. | |||
* where a mix of different types of ECN-PDUs arrives to construct | * Where a mix of different types of ECN-PDUs arrives to construct | |||
the same IP packet, e.g. a mix of frames that map to ECT(0) and | the same IP packet, e.g., a mix of frames that map to ECT(0) and | |||
ECT(1) IP packets, the decapsulation spec might consider this a | ECT(1) IP packets, the decapsulation specification might consider | |||
protocol error. But, if the lower layer protocol has defined such | this a protocol error. But, if the lower-layer protocol has | |||
a mix of types of ECN-PDU as valid, it SHOULD require the | defined such a mix of types of ECN-PDU as valid, it SHOULD require | |||
resulting IP packet to be set to either ECT(0) or ECT(1). In this | the resulting IP packet to be set to either ECT(0) or ECT(1). In | |||
case, it SHOULD take into account that the RFC series has so far | this case, it SHOULD take into account that the RFC Series has so | |||
allowed ECT(0) and ECT(1) to be considered equivalent [RFC3168], | far allowed ECT(0) and ECT(1) to be considered equivalent | |||
or ECT(1) can provide a less severe congestion marking than CE | [RFC3168]; or ECT(1) can provide a less severe congestion marking | |||
[RFC6040], or ECT(1) can indicate an unmarked but ECN-capable | than CE [RFC6040]; or ECT(1) can indicate an unmarked but ECN- | |||
packet that is subject to a different marking algorithm to ECT(0) | capable packet that is subject to a different marking algorithm to | |||
packets, for example L4S [RFC8311] [RFC9331]. | ECT(0) packets, e.g., L4S [RFC8311] [RFC9331]. | |||
The following are two ways that goal 1 might be achieved, but they | The following are two ways that goal 1 might be achieved, but they | |||
are not intended to be the only ways: | are not intended to be the only ways: | |||
* Every IP PDU that is constructed, in whole or in part, from an L2 | * Every IP PDU that is constructed, in whole or in part, from an L2 | |||
frame that is marked with a congestion signal, has that signal | frame that is marked with a congestion signal has that signal | |||
propagated to it; | propagated to it. | |||
* Every L2 frame that is marked with a congestion signal, propagates | * Every L2 frame that is marked with a congestion signal propagates | |||
that signal to one IP PDU which is constructed, in whole or in | that signal to one IP PDU that is constructed from it in whole or | |||
part, from it. If multiple IP PDUs meet this description, the | in part. If multiple IP PDUs meet this description, the choice | |||
choice can be made arbitrarily but ought to be consistent. | can be made arbitrarily but ought to be consistent. | |||
The following gives one way that goal 2 might be achieved, but it is | The following gives one way that goal 2 might be achieved, but it is | |||
not intended to be the only way: | not intended to be the only way: | |||
* For each of the streams of frames that encapsulate the IP packets | * For each of the streams of frames that encapsulate the IP packets | |||
of each IP-ECN codepoint and follow the same path through the | of each IP-ECN codepoint and follow the same path through the | |||
subnet, a counter ('in') tracks octets arriving within the payload | subnet, a counter ('in') tracks octets arriving within the payload | |||
of marked L2 frames and another ('out') tracks octets departing in | of marked L2 frames and another ('out') tracks octets departing in | |||
marked IP packets. While 'in' exceeds 'out', forwarded IP packets | marked IP packets. While 'in' exceeds 'out', forwarded IP packets | |||
are ECN-marked. If 'out' exceeds 'in' for longer than a timeout, | are ECN-marked. If 'out' exceeds 'in' for longer than a timeout, | |||
both counters are zeroed, to ensure that the start of the next | both counters are zeroed to ensure that the start of the next | |||
congestion episode propagates immediately. The 'out' counter | congestion episode propagates immediately. The 'out' counter | |||
includes octets in reconstructed IP packets that would have been | includes octets in reconstructed IP packets that would have been | |||
marked, but had to be dropped because they were Not-ECN-PDUs (by | marked, but had to be dropped because they were Not-ECN-PDUs (by | |||
Decapsulation Guideline 1 in Section 4.4). | Decapsulation Guideline 1 in Section 4.4). | |||
Generally, the number of L2 frames may be higher (e.g. ATM), similar | Generally, relative to the number of IP PDUs, the number of L2 frames | |||
to, or lower (e.g. 802.11 aggregation at a L2-only station) than the | may be higher (e.g., ATM), roughly the same, or lower (e.g., 802.11 | |||
number of IP PDUs, and this distinction may influence the choice of | aggregation at an L2-only station). This distinction may influence | |||
mechanism. | the choice of mechanism. | |||
5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
Notification | Notification | |||
The guidance in this section is applicable, for example, when IP | The guidance in this section is applicable, for example, when IP | |||
packets: | packets: | |||
* are encapsulated in Ethernet headers, which have no support for | * are encapsulated in Ethernet headers, which have no support for | |||
ECN; | congestion notification; | |||
* are forwarded by the eNode-B (base station) of a 3GPP radio access | * are forwarded by the eNode-B (base station) of a 3GPP radio access | |||
network, which is required to apply ECN marking during congestion, | network, which is required to apply ECN marking during congestion | |||
[LTE-RA], [UTRAN], but the Packet Data Convergence Protocol (PDCP) | [LTE-RA] [UTRAN], but the Packet Data Convergence Protocol (PDCP) | |||
that encapsulates the IP header over the radio access has no | that encapsulates the IP header over the radio access has no | |||
support for ECN. | support for ECN. | |||
This guidance also generalizes to encapsulation by other subnet | This guidance also generalizes to encapsulation by other subnet | |||
technologies with no native support for explicit congestion | technologies with no built-in support for congestion notification at | |||
notification at the lower layer, but with support for finding and | the lower layer, but with support for finding and processing an IP | |||
processing an IP header. It is unlikely to be applicable or | header. It is unlikely to be applicable or necessary for IP-in-IP | |||
necessary for IP-in-IP encapsulation, where feed-forward-and-up mode | encapsulation, where feed-forward-and-up mode based on [RFC6040] | |||
based on [RFC6040] would be more appropriate. | would be more appropriate. | |||
Marking the IP header while switching at layer-2 (by using a layer-3 | Marking the IP header while switching at layer 2 (by using a Layer 3 | |||
switch) or while forwarding in a radio access network seems to | switch) or while forwarding in a radio access network seems to | |||
represent a layering violation. However, it can be considered as a | represent a layering violation. However, it can be considered as a | |||
benign optimisation if the guidelines below are followed. Feed-up- | benign optimization if the guidelines below are followed. Feed-up- | |||
and-forward is certainly not a general alternative to implementing | and-forward is certainly not a general alternative to implementing | |||
feed-forward congestion notification in the lower layer, because: | feed-forward congestion notification in the lower layer, because: | |||
* IPv4 and IPv6 are not the only layer-3 protocols that might be | * IPv4 and IPv6 are not the only Layer 3 protocols that might be | |||
encapsulated by lower layer protocols | encapsulated by lower-layer protocols. | |||
* Link-layer encryption might be in use, making the layer-2 payload | * Link-layer encryption might be in use, making the Layer 2 payload | |||
inaccessible | inaccessible. | |||
* Many Ethernet switches do not have 'layer-3 switch' capabilities | * Many Ethernet switches do not have 'Layer 3 switch' capabilities, | |||
so they cannot read or modify an IP payload | so the ability to read or modify an IP payload cannot be assumed. | |||
* It might be costly to find an IP header (IPv4 or IPv6) when it may | * It might be costly to find an IP header (IPv4 or IPv6) when it may | |||
be encapsulated by more than one lower layer header, e.g. | be encapsulated by more than one lower-layer header, e.g., | |||
Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah). | Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah). | |||
Nonetheless, configuring lower layer equipment to look for an ECN | Nonetheless, configuring lower-layer equipment to look for an ECN | |||
field in an encapsulated IP header is a useful optimisation. If the | field in an encapsulated IP header is a useful optimization. If the | |||
implementation follows the guidelines below, this optimisation does | implementation follows the guidelines below, this optimization does | |||
not have to be confined to a controlled environment such as within a | not have to be confined to a controlled environment, e.g., within a | |||
data centre; it could usefully be applied on any network — even if | data centre; it could usefully be applied in any network -- even if | |||
the operator is not sure whether the above issues will never apply: | the operator is not sure whether the above issues will never apply: | |||
1. If a native lower-layer congestion notification mechanism exists | 1. If a built-in lower-layer congestion notification mechanism | |||
for a subnet technology, it is safe to mix feed-up-and-forward | exists for a subnet technology, it is safe to mix feed-up-and- | |||
with feed-forward-and-up on other switches in the same subnet. | forward with feed-forward-and-up on other switches in the same | |||
However, it will generally be more efficient to use the native | subnet. However, it will generally be more efficient to use the | |||
mechanism. | built-in mechanism. | |||
2. The depth of the search for an IP header SHOULD be limited. If | 2. The depth of the search for an IP header SHOULD be limited. If | |||
an IP header is not found soon enough, or an unrecognized or | an IP header is not found soon enough, or an unrecognized or | |||
unreadable header is encountered, the switch SHOULD resort to an | unreadable header is encountered, the switch SHOULD resort to an | |||
alternative means of signalling congestion (e.g. drop, or the | alternative means of signalling congestion (e.g., drop or the | |||
native lower layer mechanism if available). | built-in lower-layer mechanism if available). | |||
3. It is sufficient to use the first IP header found in the stack; | 3. It is sufficient to use the first IP header found in the stack; | |||
the egress of the relevant tunnel can propagate congestion | the egress of the relevant tunnel can propagate congestion | |||
notification upwards to any more deeply encapsulated IP headers | notification upwards to any more deeply encapsulated IP headers | |||
later. | later. | |||
6. Feed-Backward Mode: Guidelines for Adding Congestion Notification | 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification | |||
It can be seen from Section 3.3 that congestion notification in a | It can be seen from Section 3.3 that congestion notification in a | |||
subnet using feed-backward mode has generally not been designed to be | subnet using feed-backward mode has generally not been designed to be | |||
directly coupled with IP layer congestion notification. The subnet | directly coupled with IP-layer congestion notification. The subnet | |||
attempts to minimize congestion internally, and if the incoming load | attempts to minimize congestion internally, and if the incoming load | |||
at the ingress exceeds the capacity somewhere through the subnet, the | at the ingress exceeds the capacity somewhere through the subnet, the | |||
layer 3 buffer into the ingress backs up. Thus, a feed-backward mode | Layer 3 buffer into the ingress backs up. Thus, a feed-backward mode | |||
subnet is in some sense similar to a null mode subnet, in that there | subnet is in some sense similar to a null mode subnet, in that there | |||
is no need for any direct interaction between the subnet and higher | is no need for any direct interaction between the subnet and higher- | |||
layer congestion notification. Therefore no detailed protocol design | layer congestion notification. Therefore, no detailed protocol | |||
guidelines are appropriate. Nonetheless, a more general guideline is | design guidelines are appropriate. Nonetheless, a more general | |||
appropriate: | guideline is appropriate: | |||
A subnetwork technology intended to eventually interface to IP | | A subnetwork technology intended to eventually interface to IP | |||
SHOULD NOT be designed using only the feed-backward mode, which is | | SHOULD NOT be designed using only the feed-backward mode, which is | |||
certainly best for a stand-alone subnet, but would need to be | | certainly best for a stand-alone subnet, but would need to be | |||
modified to work efficiently as part of the wider Internet, | | modified to work efficiently as part of the wider Internet because | |||
because IP uses feed-forward-and-up mode. | | IP uses feed-forward-and-up mode. | |||
The feed-backward approach at least works beneath IP, where the term | The feed-backward approach at least works beneath IP, where the term | |||
'works' is used only in a narrow functional sense because feed- | 'works' is used only in a narrow functional sense because feed- | |||
backward can result in very inefficient and sluggish congestion | backward can result in very inefficient and sluggish congestion | |||
control — except if it is confined to the subnet directly connected | control -- except if it is confined to the subnet directly connected | |||
to the original data source, when it is faster than feed-forward. It | to the original data source when it is faster than feed-forward. It | |||
would be valid to design a protocol that could work in feed-backward | would be valid to design a protocol that could work in feed-backward | |||
mode for paths that only cross one subnet, and in feed-forward-and-up | mode for paths that only cross one subnet, and in feed-forward-and-up | |||
mode for paths that cross subnets. | mode for paths that cross subnets. | |||
In the early days of TCP/IP, a similar feed-backward approach was | In the early days of TCP/IP, a similar feed-backward approach was | |||
tried for explicit congestion signalling, using source-quench (SQ) | tried for explicit congestion signalling using source-quench (SQ) | |||
ICMP control packets. However, SQ fell out of favour and is now | ICMP control packets. However, SQ fell out of favour and is now | |||
formally deprecated [RFC6633]. The main problem was that it is hard | formally deprecated [RFC6633]. The main problem was that it is hard | |||
for a data source to tell the difference between a spoofed SQ message | for a data source to tell the difference between a spoofed SQ message | |||
and a quench request from a genuine buffer on the path. It is also | and a quench request from a genuine buffer on the path. It is also | |||
hard for a lower layer buffer to address an SQ message to the | hard for a lower-layer buffer to address an SQ message to the | |||
original source port number, which may be buried within many layers | original source port number, which may be buried within many layers | |||
of headers, and possibly encrypted. | of headers and possibly encrypted. | |||
QCN (also known as backward congestion notification, BCN; see | QCN (also known as Backward Congestion Notification (BCN); see | |||
Sections 30–33 of [IEEE802.1Q]; previously known as 802.1Qau) uses a | Sections 30-33 of [IEEE802.1Q], previously known as 802.1Qau) uses a | |||
feed-backward mode structurally similar to ATM's relative rate | feed-backward mode that is structurally similar to ATM's relative | |||
mechanism. However, QCN confines its applicability to scenarios such | rate mechanism. However, QCN confines its applicability to scenarios | |||
as some data centres where all endpoints are directly attached by the | such as some data centres where all endpoints are directly attached | |||
same Ethernet technology. If a QCN subnet were later connected into | by the same Ethernet technology. If a QCN subnet were later | |||
a wider IP-based internetwork (e.g. when attempting to interconnect | connected into a wider IP-based internetwork (e.g., when attempting | |||
multiple data centres) it would suffer the inefficiency shown in | to interconnect multiple data centres) it would suffer the | |||
Figure 3. | inefficiency shown in Figure 3. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This section is to be removed before publishing as an RFC. | This document has no IANA actions. | |||
This memo includes no request to IANA. | ||||
8. Security Considerations | 8. Security Considerations | |||
If a lower layer wire protocol is redesigned to include explicit | If a lower-layer wire protocol is redesigned to include explicit | |||
congestion signalling in-band in the protocol header, care SHOULD be | congestion signalling in-band in the protocol header, care SHOULD be | |||
taken to ensure that the field used is specified as mutable during | taken to ensure that the field used is specified as mutable during | |||
transit. Otherwise interior nodes signalling congestion would | transit. Otherwise, interior nodes signalling congestion would | |||
invalidate any authentication protocol applied to the lower layer | invalidate any authentication protocol applied to the lower-layer | |||
header — by altering a header field that had been assumed as | header -- by altering a header field that had been assumed as | |||
immutable. | immutable. | |||
The redesign of protocols that encapsulate IP in order to propagate | The redesign of protocols that encapsulate IP in order to propagate | |||
congestion signals between layers raises potential signal integrity | congestion signals between layers raises potential signal integrity | |||
concerns. Experimental or proposed approaches exist for assuring the | concerns. Experimental or proposed approaches exist for assuring the | |||
end-to-end integrity of in-band congestion signals, e.g.: | end-to-end integrity of in-band congestion signals, such as: | |||
* Congestion exposure (ConEx) for networks to audit that their | * Congestion Exposure (ConEx) for networks: | |||
congestion signals are not being suppressed by other networks or | ||||
by receivers, and for networks to police that senders are | - to audit that their congestion signals are not being suppressed | |||
responding sufficiently to the signals, irrespective of the L4 | by other networks or by receivers; and | |||
transport protocol used [RFC7713]. | ||||
- to police that senders are responding sufficiently to the | ||||
signals, irrespective of the L4 transport protocol used | ||||
[RFC7713]. | ||||
* A test for a sender to detect whether a network or the receiver is | * A test for a sender to detect whether a network or the receiver is | |||
suppressing congestion signals (for example see 2nd para of | suppressing congestion signals (for example, see the second | |||
Section 20.2 of [RFC3168]). | paragraph of Section 20.2 of [RFC3168]). | |||
Given these end-to-end approaches are already being specified, it | Given these end-to-end approaches are already being specified, it | |||
would make little sense to attempt to design hop-by-hop congestion | would make little sense to attempt to design hop-by-hop congestion | |||
signal integrity into a new lower layer protocol, because end-to-end | signal integrity into a new lower-layer protocol because end-to-end | |||
integrity inherently achieves hop-by-hop integrity. | integrity inherently achieves hop-by-hop integrity. | |||
Section 6 gives vulnerability to spoofing as one of the reasons for | Section 6 gives vulnerability to spoofing as one of the reasons for | |||
deprecating feed-backward mode. | deprecating feed-backward mode. | |||
9. Conclusions | 9. Conclusions | |||
Following the guidance in this document enables ECN support to be | Following the guidance in this document enables ECN support to be | |||
extended consistently to numerous protocols that encapsulate IP (IPv4 | extended consistently to numerous protocols that encapsulate IP (IPv4 | |||
and IPv6), so that IP continues to fulfil its role as an end-to-end | and IPv6) so that IP continues to fulfil its role as an end-to-end | |||
interoperability layer. This includes: | interoperability layer. This includes: | |||
* A wide range of tunnelling protocols including those with various | * A wide range of tunnelling protocols, including those with various | |||
forms of shim header between two IP headers, possibly also | forms of shim header between two IP headers, possibly also | |||
separated by a L2 header; | separated by an L2 header; | |||
* A wide range of subnet technologies, particularly those that work | * A wide range of subnet technologies, particularly those that work | |||
in the same 'feed-forward-and-up' mode that is used to support ECN | in the same 'feed-forward-and-up' mode that is used to support ECN | |||
in IP and MPLS. | in IP and MPLS. | |||
Guidelines have been defined for supporting propagation of ECN | Guidelines have been defined for supporting propagation of ECN | |||
between Ethernet and IP on so-called Layer-3 Ethernet switches, using | between Ethernet and IP on so-called Layer 3 Ethernet switches using | |||
a 'feed-up-and-forward' mode. This approach could enable other | a 'feed-up-and-forward' mode. This approach could enable other | |||
subnet technologies to pass ECN signals into the IP layer, even if | subnet technologies to pass ECN signals into the IP layer, even if | |||
they do not support ECN natively. | the lower-layer protocol does not support ECN. | |||
Finally, attempting to add ECN to a subnet technology in feed- | Finally, attempting to add congestion notification to a subnet | |||
backward mode is deprecated except in special cases, due to its | technology in feed-backward mode is deprecated except in special | |||
likely sluggish response to congestion. | cases due to its likely sluggish response to congestion. | |||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[I-D.ietf-trill-ecn-support] | ||||
Eastlake, D. E. and B. Briscoe, "TRILL (TRansparent | ||||
Interconnection of Lots of Links): ECN (Explicit | ||||
Congestion Notification) Support", Work in Progress, | ||||
Internet-Draft, draft-ietf-trill-ecn-support-07, 25 | ||||
February 2018, <https://datatracker.ietf.org/doc/html/ | ||||
draft-ietf-trill-ecn-support-07>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
<https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
skipping to change at page 30, line 13 ¶ | skipping to change at line 1339 ¶ | |||
2008, <https://www.rfc-editor.org/info/rfc5129>. | 2008, <https://www.rfc-editor.org/info/rfc5129>. | |||
[RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | |||
Notification", RFC 6040, DOI 10.17487/RFC6040, November | Notification", RFC 6040, DOI 10.17487/RFC6040, November | |||
2010, <https://www.rfc-editor.org/info/rfc6040>. | 2010, <https://www.rfc-editor.org/info/rfc6040>. | |||
[RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion | [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion | |||
Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, | Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, | |||
February 2014, <https://www.rfc-editor.org/info/rfc7141>. | February 2014, <https://www.rfc-editor.org/info/rfc7141>. | |||
[RFC9600] Eastlake 3rd, D. and B. Briscoe, "TRILL (TRansparent | ||||
Interconnection of Lots of Links): ECN (Explicit | ||||
Congestion Notification) Support", RFC 9600, | ||||
DOI 10.17487/RFC9600, August 2024, | ||||
<https://www.rfc-editor.org/info/rfc9600>. | ||||
10.2. Informative References | 10.2. Informative References | |||
[ATM-TM-ABR] | [ATM-TM-ABR] | |||
Cisco, "Understanding the Available Bit Rate (ABR) Service | Cisco, "Understanding the Available Bit Rate (ABR) Service | |||
Category for ATM VCs", Design Technote 10415, 5 June 2005, | Category for ATM VCs", Design Technote 10415, June 2005, | |||
<https://www.cisco.com/c/en/us/support/docs/asynchronous- | <https://www.cisco.com/c/en/us/support/docs/asynchronous- | |||
transfer-mode-atm/atm-traffic- | transfer-mode-atm/atm-traffic- | |||
management/10415-atmabr.html>. | management/10415-atmabr.html>. | |||
[Buck00] Buckwalter, J.T., "Frame Relay: Technology and Practice", | [Buck00] Buckwalter, J.T., "Frame Relay: Technology and Practice", | |||
Pub. Addison Wesley ISBN-13: 978-0201485240, 2000. | Addison-Wesley Professional, ISBN-13 978-0201485240, 2000. | |||
[Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", | [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", | |||
Bell Systems Technical Journal 32(2):406–424, March 1953. | The Bell System Technical Journal, Vol. 32, Issue 2, | |||
DOI 10.1002/j.1538-7305.1953.tb01433.x, March 1953, | ||||
<https://doi.org/10.1002/j.1538-7305.1953.tb01433.x>. | ||||
[GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across the Gn and Gp | [GTPv1] 3GPP, "General Packet Radio Service (GPRS); GPRS | |||
interface", Technical Specification TS 29.060. | Tunnelling Protocol (GTP) across the Gn and Gp interface", | |||
Technical Specification 29.060. | ||||
[GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling | [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling | |||
Protocol User Plane (GTPv1-U)", Technical Specification TS | Protocol User Plane (GTPv1-U)", Technical | |||
29.281. | Specification 29.281. | |||
[GTPv2-C] 3GPP, "Evolved General Packet Radio Service (GPRS) | [GTPv2-C] 3GPP, "3GPP Evolved Packet System (EPS); Evolved General | |||
Tunnelling Protocol for Control plane (GTPv2-C)", | Packet Radio Service (GPRS) Tunnelling Protocol for | |||
Technical Specification TS 29.274. | Control plane (GTPv2-C); Stage 3", Technical | |||
Specification 29.274. | ||||
[I-D.ietf-intarea-gue] | [IEEE802.1Q] | |||
IEEE, "IEEE Standard for Local and Metropolitan Area | ||||
Network--Bridges and Bridged Networks", IEEE Std 802.1Q- | ||||
2022, DOI 10.1109/IEEESTD.2022.10004498, December 2022, | ||||
<https://doi.org/10.1109/IEEESTD.2022.10004498>. | ||||
[INTAREA-GUE] | ||||
Herbert, T., Yong, L., and O. Zia, "Generic UDP | Herbert, T., Yong, L., and O. Zia, "Generic UDP | |||
Encapsulation", Work in Progress, Internet-Draft, draft- | Encapsulation", Work in Progress, Internet-Draft, draft- | |||
ietf-intarea-gue-09, 26 October 2019, | ietf-intarea-gue-09, 26 October 2019, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | |||
gue-09>. | gue-09>. | |||
[I-D.ietf-tsvwg-rfc6040update-shim] | ||||
Briscoe, B., "Propagating Explicit Congestion Notification | ||||
Across IP Tunnel Headers Separated by a Shim", Work in | ||||
Progress, Internet-Draft, draft-ietf-tsvwg-rfc6040update- | ||||
shim-22, 29 October 2023, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | ||||
rfc6040update-shim-22>. | ||||
[IEEE802.1Q] | ||||
IEEE, "IEEE Standard for Local and Metropolitan Area | ||||
Networks—Virtual Bridged Local Area Networks—Amendment 6: | ||||
Provider Backbone Bridges", IEEE Std 802.1Q-2018, July | ||||
2018, <https://ieeexplore.ieee.org/document/8403927>. | ||||
[ITU-T.I.371] | [ITU-T.I.371] | |||
ITU-T, "Traffic Control and Congestion Control in B-ISDN", | ITU-T, "Traffic control and congestion control in B-ISDN", | |||
ITU-T Rec. I.371 (03/04), March 2004, | ITU-T Recommendation I.371, March 2004, | |||
<https://www.itu.int/rec/T-REC-I.371>. | <https://www.itu.int/rec/T-REC-I.371-200403-I/en>. | |||
[Leiserson85] | [Leiserson85] | |||
Leiserson, C.E., "Fat-trees: universal networks for | Leiserson, C.E., "Fat-trees: Universal networks for | |||
hardware-efficient supercomputing", IEEE Transactions on | hardware-efficient supercomputing", IEEE Transactions on | |||
Computers 34(10):892–901, October 1985. | Computers, Vol. C-34, Issue 10, | |||
DOI 10.1109/TC.1985.6312192, October 1985, | ||||
<https://doi.org/10.1109/TC.1985.6312192>. | ||||
[LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) | [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) | |||
and Evolved Universal Terrestrial Radio Access Network | and Evolved Universal Terrestrial Radio Access Network | |||
(E-UTRAN); Overall description; Stage 2", Technical | (E-UTRAN); Overall description; Stage 2", Technical | |||
Specification TS 36.300. | Specification 36.300. | |||
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, | [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, | |||
DOI 10.17487/RFC2003, October 1996, | DOI 10.17487/RFC2003, October 1996, | |||
<https://www.rfc-editor.org/info/rfc2003>. | <https://www.rfc-editor.org/info/rfc2003>. | |||
[RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in | [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in | |||
IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, | IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, | |||
December 1998, <https://www.rfc-editor.org/info/rfc2473>. | December 1998, <https://www.rfc-editor.org/info/rfc2473>. | |||
[RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | |||
skipping to change at page 34, line 21 ¶ | skipping to change at line 1544 ¶ | |||
Cabellos, Ed., "The Locator/ID Separation Protocol | Cabellos, Ed., "The Locator/ID Separation Protocol | |||
(LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | (LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | |||
<https://www.rfc-editor.org/info/rfc9300>. | <https://www.rfc-editor.org/info/rfc9300>. | |||
[RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | |||
Congestion Notification (ECN) Protocol for Low Latency, | Congestion Notification (ECN) Protocol for Low Latency, | |||
Low Loss, and Scalable Throughput (L4S)", RFC 9331, | Low Loss, and Scalable Throughput (L4S)", RFC 9331, | |||
DOI 10.17487/RFC9331, January 2023, | DOI 10.17487/RFC9331, January 2023, | |||
<https://www.rfc-editor.org/info/rfc9331>. | <https://www.rfc-editor.org/info/rfc9331>. | |||
[UTRAN] 3GPP, "UTRAN Overall Description", Technical | [RFC9601] Briscoe, B., "Propagating Explicit Congestion Notification | |||
Specification TS 25.401. | Across IP Tunnel Headers Separated by a Shim", RFC 9601, | |||
DOI 10.17487/RFC9601, August 2024, | ||||
Comments Solicited | <https://www.rfc-editor.org/info/rfc9601>. | |||
This section is to be removed before publishing as an RFC. | ||||
Comments and questions are encouraged and very welcome. They can be | [UTRAN] 3GPP, "UTRAN overall description", Technical | |||
addressed to the IETF Transport Area working group mailing list | Specification 25.401. | |||
<tsvwg@ietf.org>, and/or to the authors. | ||||
Acknowledgements | Acknowledgements | |||
Thanks to Gorry Fairhurst and David Black for extensive reviews. | Thanks to Gorry Fairhurst and David Black for extensive reviews. | |||
Thanks also to the following reviewers: Joe Touch, Andrew McGregor, | Thanks also to the following reviewers: Joe Touch, Andrew McGregor, | |||
Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald | Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald | |||
Eastlake, Jonathan Morton, Markku Kojo, Sebastian Möller, Martin Duke | Eastlake 3rd, Jonathan Morton, Markku Kojo, Sebastian Möller, Martin | |||
and Michael Welzl, who pointed out that lower layer congestion | Duke, and Michael Welzl, who pointed out that lower-layer congestion | |||
notification signals may have different semantics to those in IP. | notification signals may have different semantics to those in IP. | |||
Thanks are also due to the tsvwg chairs, TSV ADs and IETF liaison | Thanks are also due to the Transport and Services Working Group | |||
people such as Eric Gray, Dan Romascanu and Gonzalo Camarillo for | (tsvwg) chairs, TSV ADs and IETF liaison people such as Eric Gray, | |||
helping with the liaisons with the IEEE and 3GPP. And thanks to | Dan Romascanu and Gonzalo Camarillo for helping with the liaisons | |||
Georg Mayer and particularly to Erik Guttman for the extensive search | with the IEEE and 3GPP. And thanks to Georg Mayer and particularly | |||
and categorisation of any 3GPP specifications that cite ECN | to Erik Guttman for the extensive search and categorization of any | |||
specifications. Thanks also to the Area Reviewers Dan Harkins, Paul | 3GPP specifications that cite ECN specifications. Thanks also to the | |||
Kyzivat, Sue Hares and Dale Worley. | Area Reviewers Dan Harkins, Paul Kyzivat, Sue Hares, and Dale Worley. | |||
Bob Briscoe was part-funded by the European Community under its | Bob Briscoe was part-funded by the European Community under its | |||
Seventh Framework Programme through the Trilogy project (ICT-216372) | Seventh Framework Programme through the Trilogy project (ICT-216372) | |||
for initial drafts then through the Reducing Internet Transport | for initial drafts then through the Reducing Internet Transport | |||
Latency (RITE) project (ICT-317700), and for final drafts (from -18) | Latency (RITE) project (ICT-317700), and for final drafts (from -18) | |||
he was funded by Apple Inc. The views expressed here are solely those | he was funded by Apple Inc. The views expressed here are solely those | |||
of the authors. | of the authors. | |||
Contributors | Contributors | |||
Pat Thaler | Pat Thaler | |||
Broadcom Corporation (retired) | Broadcom Corporation (retired) | |||
CA | CA | |||
USA | United States of America | |||
Pat was a co-author of this draft, but retired before its | Pat was a coauthor of this document, but retired before its | |||
publication. | publication. | |||
Authors' Addresses | Authors' Addresses | |||
Bob Briscoe | Bob Briscoe | |||
Independent | Independent | |||
United Kingdom | United Kingdom | |||
Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
URI: https://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
End of changes. 220 change blocks. | ||||
724 lines changed or deleted | 710 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |