TCP Maintenance and Minor Extensions (tcpm)Internet Engineering Task Force (IETF) M. Kuehlewind, Ed.Internet-DraftRequest for Comments: 7560 ETH ZurichIntended status:Category: Informational R. ScheffeneggerExpires: September 10, 2015ISSN: 2070-1721 NetApp, Inc. B. Briscoe BTMarch 9,August 2015 Problem Statement and Requirements fora More Accurate ECNIncreased Accuracy in Explicit Congestion Notification (ECN) Feedbackdraft-ietf-tcpm-accecn-reqs-08Abstract Explicit Congestion Notification (ECN) is a mechanism where network nodes can mark IPpacketspackets, instead of droppingthemthem, to indicate congestion to theend-points.endpoints. An ECN-capable receiver will feed this information back to the sender. ECN is specified for TCP in such a way that it can only feed back one congestion signal per Round-Trip Time (RTT). In contrast, ECN for other transport protocols, such as RTP/UDP and SCTP, is specified with more accurate ECN feedback. Recent new TCP mechanisms (likeConExCongestion Exposure (ConEx) orDCTCP)Data Center TCP (DCTCP)) need more accurate ECN feedback in the case where more than one marking is received in one RTT. This document specifies requirements for an update to the TCP protocol to provide more accurate ECN feedback. Status of This Memo ThisInternet-Draftdocument issubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are amaximumcandidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status ofsix monthsthis document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 10, 2015.http://www.rfc-editor.org/info/rfc7560. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . .23 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . .45 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . .56 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . .78 5. Design Approaches . . . . . . . . . . . . . . . . . . . . . .1011 5.1.Re-DefinitionRedefinition of ECN/NS Header Bits . . . . . . . . . . . 11 5.2. Using Other Header Bits . . . . . . . . . . . . . . . . .1213 5.3. Using a TCP Option . . . . . . . . . . . . . . . . . . .1213 6.Acknowledgements . . .Security Considerations . . . . . . . . . . . . . . . . . . .1314 7.IANA Considerations . . . . . . . . . . . . . . .References . . . . . .13 8. Security Considerations. . . . . . . . . . . . . . . . . . .13 9.14 7.1. Normative References . . . . . . . . . . . . . . . . . . 14 7.2. Informative References . . . . . . .14 9.1. Normative References .. . . . . . . . . . 14 Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP 16 Acknowledgements . . . . . . .14 9.2. Informative References. . . . . . . . . . . . . . . . .14 Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP 1517 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .1617 1. Introduction Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where network nodes can mark IP packets instead of dropping them to indicate congestion to theend-points.endpoints. An ECN-capable receiver will feed this information back to the sender. ECN is specified for TCP in such a way that only one feedback signal can be transmitted per Round-Trip Time (RTT). This is sufficient forpre-existingpreexisting TCP congestion control mechanisms that perform only one reduction in sending rate per RTT, independent of the number of ECN congestion marks. But recently proposed or deployed mechanisms like Congestion Exposure (ConEx) [RFC6789] or Data Center TCP (DCTCP)[I-D.bensley-tcpm-dctcp][DCTCP] need more accurate ECN feedback than'classic' ECN'classic ECN' [RFC3168] to work correctly in the case where more than one marking is received in any one RTT. For an in-depth discussion of the application benefits of using ECN (including with sufficiently granularfeedback)feedback), see[I-D.welzl-ecn-benefits].[ECN-BENEFITS]. ECN is also defined for transport protocols beside TCP. ECN feedback as defined for RTP/UDP [RFC6679] provides a very detailed level of information, delivering individual counters for all four ECN codepoints as well as lost and duplicate segments, but at the cost of highsignalingsignalling overhead. ECN feedback for SCTP has been proposed in[I-D.stewart-tsvwg-sctpecn].[SCTP-ECN]. This delivers a counter for the number ofCEECN-capable packets that were markedsegments between CWR chunks,due to congestion (since the last sender- side window reduction), butalsoit comes at the cost of increased overhead. Today, implementations of DCTCP already exist that alter TCP's ECN feedback protocol in proprietary ways (DCTCP was released in Microsoft Windows 8, and implementations exist for Linux and FreeBSD).TheHowever, the changes DCTCP makes to TCPare not currently the subject of any IETF standardization activity, and theyomit capability negotiation, relying instead on uniform configuration across all hosts and network devices with ECN capability. A primary motivation for this document is to intervene before each proprietary implementation invents its own non-interoperable handshake, which could lead to _de facto_ consumption of the few flags or codepoints that remain available for standardizing capability negotiation. This document lists requirements for a robust and interoperable TCP/ ECN feedback protocol that is more accurate than classic ECN [RFC3168] and that all implementations of new TCP extensions, like ConEx and/or DCTCP, can use. While a new feedback scheme should still deliver as much information as classic ECN, this document also clarifies what has to be taken into consideration in addition.ThusThus, the listed requirements should be addressed in the specification of a more accurate ECN feedback scheme. A few solutions have already been proposed. Section 5 demonstrates how to use the requirements to compare them, by briefly sketching theirhigh levelhigh-level design choices and discussing the benefits and drawbacks of each. The scope of these requirements is not limited to any specific environment and is intended for general deployment over public and private IP networks. Candidate solutions should try to adhere to all these requirements,butbut, where this is notpossiblepossible, they should justify the deviation. The ordering of the requirements listed in this document is not to be taken as an order of importance, because each requirement might have different weight in different deployment scenarios. These requirements are only concerned with the type and quality of the ECN feedback signal. The requirements do not stipulate how a TCP sender might react to the improved ECN signal. The requirements also do not imply that any modifications to TCP senders or receivers are obligatory. 1.1. Terminology We use the following terminology from [RFC3168] and [RFC3540]: The ECN field in the IP header: Not-ECT: the not ECN-Capable Transport codepoint, CE: the Congestion Experienced codepoint, ECT(0): the first ECN-Capable Transport codepoint, and ECT(1): the second ECN-Capable Transport codepoint. The ECN flags in the TCP header: CWR: the Congestion Window Reduced flag, ECE: the ECN-Echo flag, and NS: ECN Nonce Sum. In this document, the ECN feedback scheme as specified in [RFC3168] is called 'classic ECN' and any new proposal is called a 'more accurate ECN feedback' scheme. A 'congestion mark' is defined as an IP packet where the CE codepoint is set. A 'congestion episode' refers to one or more congestion marks that belong to the same overload situation in the network (usually during one RTT). A TCP segment with theacknowledgmentacknowledgement flag set is simply called an ACK. 2. Recap of Classic ECN and ECN Nonce in IP/TCP ECN requires two bits in the IP header. The ECN capability of a packet is indicated when either one of the two bits is set. A network node can set both bits simultaneously when it experiences congestion. This leads to the four codepoints(not-ECT,(Not-ECT, ECT(0), ECT(1), and CE) as listed above. In the TCPheaderheader, the first two bits in byte 14 are defined as ECN feedback for each half-connection. A TCP receiver signals the reception of a congestion mark using the ECN-Echo (ECE) flag in the TCP header. For reliability, the receiver continues to set the ECE flag on every ACK. To enable the TCP receiver to determine when to stop setting theECN-EchoECE flag, the sender sets the CWR flag upon reception of an ECE feedback signal. This always leads to a full RTT of ACKs with ECE set.ThusThus, the receiver cannot signal back any additional CE markings arriving within the same RTT. The ECN Nonce [RFC3540] is an experimental addition to ECN that the TCP sender can use to protect itself against accidental or malicious concealment of CE-marked or dropped packets. This addition defines the last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag. The receiver maintains a nonce sum that counts the occurrence of ECT(1)packets,packets and signals the least significant bit of this sum on the NS flag. There are no known deployments of a TCP stack that makes use of the ECN Nonce extension. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | N | C | E | U | A | P | R | S | F | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure 1: The(post-ECN(Post-ECN Nonce)definitionDefinition of the TCPheader flagsHeader Flags An alternative for a sender to assure feedback integrity has been proposed where the sender itself occasionally inserts a CE mark orreordering itself,reorders packets, and checks that the receiver feedsitthese back faithfully[I-D.moncaster-tcpm-rcv-cheat].[TEST-RCV]. This alternative consumes no header bits or codepoints,as well as releasingand it releases the ECT(1) codepoint in the IP header and the NS flag in the TCP header for other uses. 3. Use Cases The following two examples serve to show where existing mechanisms would already benefit from more accurate ECN feedback information. However, as it is hard to predict the future, once a more accurate ECN feedback mechanism that adheres to the requirements stated in this document is widely deployed, it's very likely that additional usesarewill be found. The examples listed below are in no particular order. ConEx is an experimental approach that allows a sender to relay congestion feedback provided by the receiver into the network along the forward data path. ConEx information can be used for traffic management to limit traffic proportionate to the actual congestion being caused, rather than limiting traffic based on rate or volume [RFC6789]. A ConEx sender uses selective acknowledgements (SACK) [RFC2018] for accurate feedback of loss signals, butcurrentlyuntil now TCPoffershas offered no equivalent accurate feedback for ECN. DCTCP offers very low and predictable queuing delay. DCTCP changes the reaction to congestion of a TCP sender and additionally requires switches/routers to have ECN enabled and configured with a low step threshold and no signal smoothing, so it is currently only used in private networks,e.g.e.g., internal to data centers. DCTCP was released in Microsoft Windows 8, and implementations exist for Linux and FreeBSD. To retrieve sufficient congestion information, the different DCTCP implementations use a proprietary ECN feedback protocol, but they omit capability negotiation. Moreover, the feedback protocol proposed in[I-D.bensley-tcpm-dctcp][DCTCP] only works if there are no losses at all, and otherwise it gets very confused (see Appendix A). Therefore, if agenericgeneric, more accurate ECN feedback scheme were available, it would solve two problems for DCTCP: i) the need for a consistent variant of DCTCP to be deployed network-wide and ii) the inability to cope with ACK loss. Classic ECN-TCP would not benefit from more accurate ECN feedback, but it would not suffer either. The same signal that is currently conveyed with ECN following the specification given in [RFC3168] would be available. The following scenarios should briefly show where accurate ECN feedback is needed or adds value: A sender withstandardisedstandardized TCP congestion control that supports ConEx: In thiscasecase, the ConEx mechanism uses the extra information per RTT to re-echo the precise congestion information, but the congestion control algorithm still ignores multiple marks per RTT [RFC5681]. A sender using DCTCP congestion control without ConEx: The congestion control algorithm uses the extra info per RTT to perform its decrease depending on the number of congestion marks. A sender using DCTCP congestion control and supporting ConEx: Both the congestion control algorithm and ConEx use the more accurate ECN feedback mechanism. As-yet-unspecified sender mechanisms: The above are two examples of more general interest in sender mechanisms that respond to the extent of congestion feedback, not just its existence. It will greatly simplify incremental deployment if the sender can unilaterally deploy newbehaviours,behaviours and rely on the presence of generic receivers that have already implemented more accurate feedback.An RFC5681A TCP sender using congestion control as specified in RFC 5681 without ConEx: No accurate feedback is necessary here. The congestion control algorithm still reacts to only one signal per RTT.ButBut, it is best to feed back all the information the receiver gets, whether or not the sender uses itor not-- at least as long as overhead is low or zero. Using CE for checking integrity: If a more accurate ECN feedback scheme feeds all occurrences of CE marks back, a sender could perform integrity checking by occasionally injecting CE marks itself. Specifically, a sender can send packetswhichthat it randomly marks with CE (at low frequency), then check if feedback is received for these packets. The congestion notification feedback for these self-injectedmarkings,markings would not require a congestion control reaction[I-D.moncaster-tcpm-rcv-cheat].[TEST-RCV]. 4. Requirements The requirements of the accurate ECN feedback protocol are to have fairly accurate (not necessarily perfect),timelytimely, and protectedsignaling.signalling. This leads to the following requirements, which should be discussed for any proposed more accurate ECN feedback scheme: Resilience The ECN feedback signal is carried within the ACK. Pure TCP ACKs can get lost without recovery (not just due tocongestion,congestion but also due to deliberate ACK thinning). Moreover, delayed ACKs are commonly used with TCP. Typically, an ACK is triggered after two data segments (ormoremore, e.g., due to receive segment coalescing, ACK compression, ACK congestion control[RFC5690][RFC5690], or otherphenomena,phenomena; see [RFC3449]). In ahigh congestionhigh-congestion situation where most of the packets are marked with CE, an accurate feedback mechanism should still be able to signal sufficient congestion information.ThusThus, the accurate ECN feedback extension has to take delayed ACKs and ACK loss into account. Also, a more accurate feedback protocol should still provide more accurate feedback than classic ECN when delayed ACKs cover more than two segments, or when a thin stream disables Nagle's algorithm [RFC0896]. Finally, the feedback mechanism should not be impacted by reordering of ACKs, even when theACK'edACKed sequence number does not increase. Timeliness A CE mark can be induced by the sending host, or more commonly a network node on the transmission path, and is then echoed by the receiver in the TCP ACK.ThusThus, when this information arrives at the sender, it is naturally already about one RTT old. With a sufficient ACKraterate, a further delay of a small number of packets can be tolerated. However, this information will become stale with large delays, given the dynamic nature of networks. TCP congestion control (which itself partly introduces these dynamics) operates on a time scale of one RTT. Thus, to be timely, congestion feedback information should be delivered within about one RTT. Integrity The integrity of the feedback in a more accurate ECN feedback scheme should be assured, at least as well as the ECN Nonce. Alternatively, it should at least be possible to give strong incentives for the receiver and network nodes to cooperate honestly. Given there are known problems with ECN Nonce deployment, this document only requires that the integrity of the more accurate ECN feedback can be assured; it does not require that the ECN Nonce mechanism is employed to achieve this. Indeed, if integrity could be providedelse-wise,in another manner, a more accurate ECN feedback protocol mightre-purposerepurpose the nonce sum (NS) flag in the TCP header. If the more accurate ECN feedback scheme provides sufficient information, the integrity check coulde.g.be performedbyby, e.g., deterministically setting the CE in the sender and monitoring the respective feedback (similar to ECT(1) and the ECN Nonce sum). Whether a sender should enforce when it detects wrong feedback information, and what kind of enforcement it should apply, are policy issues that need not be specified as part of the more accurate ECN feedback signal scheme itself, but rather when specifying an update to core TCP mechanisms like congestion control thatmakesmake use of the more accurate ECN signal. Accuracy Classic ECN feeds back one congestion notification perRTT, whichRTT; this is sufficient for classic TCP congestioncontrolcontrol, which reduces the sending rate at most once per RTT.ThusThus, the more accurate ECN feedback scheme should ensure that, if a congestion episode occurs, at least one congestion notification is echoed and received per RTT as classic ECN would do. Of course, the goal of a more accurate ECN extension is to reconstruct the number of CE markings more accurately. In the bestcasecase, the new scheme should even allow reconstruction of the exact number of payload bytes that aCE markedCE-marked packet was carrying. However, it is accepted that it may be too complex for a sender to get the exact number of congestion markings or marked bytes in all situations. Ideally, the feedback scheme should preserve the order in which any (of the four) ECN signals were received. And, ideally, it would even be possible for the sender to determine which of the packets covered by one delayed ACK were congestion marked,e.g.e.g., if the flow consists of packets of different sizes, or to allow for future protocols where the order of the markings may be important. In the best case, a sender that sees more accurate ECN feedback information would be able to reconstruct the occurrence of any of the fourcode points (non-ECT,codepoints (Not-ECT, CE, ECT(0), ECT(1)). However, assuming the sender marks all data packets as ECN-capable and uses a default setting of ECT(0) (as with[RFC3168],[RFC3168]), solely feeding back the occurrence of CE and ECT(1) might be sufficient. Because the sender can keep account of the transmitted segments with any of the three ECN codepoints, conveying any two of these back to the sender is sufficient for it to reconstruct the third as observed by the receiver.ThusThus, a more accurate ECN feedback scheme should at least provide information on two of these signals,e.g.e.g., CE and ECT(1). If a more accurate ECN scheme can reliably deliver feedback in most but not all circumstances, ideally the scheme should at least not introduce bias. In other words, undetected loss of some ACKs should be as likely to increase as decrease the sender's estimate of the probability of ECN marking. Complexity Implementation should be as simple aspossiblepossible, and only a minimum of additional state information should be needed. This will enable more accurate ECN feedback to be used as the default feedback mechanism, even if only one ECN feedback signal per RTT is needed. Overhead A more accurate ECN feedback signal should limit the additional network load, because ECN feedback is ultimately not critical information (in the worst case, loss will still be available as a congestion signal of last resort). As feedback information has to be provided frequently and in a timely fashion, potentially all or a large fraction of TCPacknowledgmentsacknowledgements might carry this information. Ideally, no additional segments should be exchanged compared toan RFC3168a TCPsession,session as specified in RFC 3168, and the overhead in each segment should be minimized. Backward and forward compatibility Given more accurate ECN feedback will involve a change to the TCP protocol, it should be negotiated between the two TCP endpoints. If either end does not support the more accurate feedback, they should both be able tofall-backfall back to classic ECN feedback. A more accurate ECN feedback extension should aim to traverse most middleboxes, including firewalls andnetwork address translators (NAT).Network Address Translators (NATs). Further, a feedback mechanism should provide a method to fall back to classic ECNsignalingsignalling if the new signal is suppressed by certain middleboxes. In order to avoid a fork in the TCP protocol specifications, if experiments with the new ECN feedback protocol are successful,itthe intention isintendedto eventually updateRFC3168RFC 3168 for any TCP/ECN sender, not just for ConEx or DCTCP senders.ThenThen, future senders will be able to unilaterally deploy new behaviours that exploit the existence of more accurate ECN feedback in receivers (forward compatibility). Conversely, even if another sender only needs one ECN feedback signal per RTT, it should be able to use more accurate ECNfeedback,feedback and simply ignore the excess information. Furthermore, the receiver should not make assumptions about the mechanism that was used to set the markings nor about any interpretation or reaction to the congestion signal. The receiver only needs to faithfully reflect congestion information back to the sender. 5. Design Approaches This section introduces some possible design approaches for TCP ECNfeedback design approaches.feedback. The purpose of this section is to give examples of how trade-offs might be needed between the requirements, as input to future IETF work to specify a protocol. The order is notsignificantsignificant, and there is no intention to endorse any particular approach. All approaches presented below (and proposed so far) are able to provide accurate ECN feedback information as long as no ACK loss occurs and the congestion rate is reasonable. In the case of a high ACK loss rate or very high congestion(CE marking)(CE-marking) rate, the proposed schemes have different resilience characteristics depending on the number of bits used for the encoding. While classic ECN provides reliable (but inaccurate) feedback of a maximum of one congestion signal per RTT, the proposed schemes do not implement an explicit acknowledgement mechanism for the feedback(as e.g.(as, e.g., theECE / CWRECE/CWR exchange of [RFC3168]). 5.1.Re-DefinitionRedefinition of ECN/NS Header Bits Schemes in this category can additionally use the NS bit for capability negotiation during the TCP handshake exchange. Thus a more accurate ECN could be negotiated without changing the classic ECN negotiation and thus being backwards compatible. Schemes in this category can simplyre-defineredefine the ECN header flags, ECE and CWR, to encode the occurrence of a CE marking at the receiver. This approach provides very limited resilience against loss of ACK, particularly pure ACKs (no payload and therefore delivered unreliably). A couple of schemes have been proposed so far: o A naiveone-bit1-bit scheme that sends one ECE for each CE received could use CWR to increase robustness against ACK loss by introducing redundant information on the next ACK, but this is still vulnerable to ACK loss. o The scheme defined for DCTCP[I-D.bensley-tcpm-dctcp],[DCTCP], which toggles the ECE feedback on an immediate ACK whenever the CE marking changes, and otherwise feeds back delayed ACKs with the ECE value unchanged. Appendix A demonstrates that this scheme is still ambiguous to the sender if the ACKs are pure ACKs, and if some may have been lost. Alternatively, the receiver uses the three ECN/NS header flags, ECE,CWRCWR, andNSNS, to represent a counter that signals the accumulated number of CE markings it has received. Resilience against loss is better than the flag-basedschemes,schemes but may not suffice in the presence of extended ACK loss that otherwise would not affect the TCP sender's performance. A number of coding schemes have been proposed so far in this category: o A 3-bit counter scheme continuously feeds back the three least significant bits of a CE counter; o A scheme that defines astandardisedstandardized lookup table to map the8eight codepoints onto either a CE counter or an ECT(1) counter. These proposed schemes provide accumulated information onECN-CECE marking feedback, similar to the number of acknowledged bytes in the TCP header. Due to the limited number ofbitsbits, the ECN feedback information will wrap much more often than the acknowledgement field.ThusThus, feedback information could be lost due to a relatively small sequence of pure-ACK losses. Resilience could be increased by introducing redundancy,e.g.e.g., send each counter increase two or more times. Ofcoursecourse, any of these additional mechanisms will increase the complexity. If the congestion rate is greater than the ACK rate (multiplied by the number of congestion marks that can be signaled per ACK), the congestion information cannot correctly be fed back. Covering the worst casewhere(where every packet is CEmarkedmarked) can potentially be realized by dynamically adapting the ACK rate and redundancy. This again increases complexity and perhaps thesignalingsignalling overhead as well. Schemes that do notre-purposerepurpose the ECN NSbit,bit could still support the ECN Nonce. 5.2. Using Other Header Bits As seen in Figure 1, there are currently three unused flags in the TCP header. The proposed 3-bit counter or codepoint schemes could be extended by one or more bits to add higher resilience against ACK loss. The relative gain would be exponentially higher resilience against ACK loss, while the respective drawbacks would remain identical. Alternatively, a new method couldstandardisestandardize the use of the bits in the Urgent Pointer field (see [RFC6093]) to signal more bits of its congestion signal counter, but only wheneverit does not setthe UrgentFlag.Flag is not set. As this is often the case, resilience could be increased without additional header overhead. Any proposal to use such bits would need to check the likelihood that some middleboxes might discard or 'normalize' the currently unused flag bits or a non-zero Urgent Pointer when the Urgent Flag is cleared. If during experimentation certain bits have been proven to be usable, the assignment of any of these bits would then require an IETF standards action. 5.3. Using a TCP Option Alternatively, a new TCP option could be introduced, to help maintain the accuracy and integrity of ECN feedback between receiver and sender. Such an option could provide higher resilience and even more information,perhapse.g., as much as is provided by a proposal for SCTP that counts the number of CE marked packet [SCTP-ECN] since the last CWR was observed, or by ECN for RTP/UDP[RFC6679], which[RFC6679]. The latter explicitly provides the total number of packets during a connection where the IP ECN field is set to ECT(0), ECT(1), CE,non-ECT marked and lost packets,or Not-ECT, asmuchwell asa proposal for SCTP that countsthe number ofECN marks [I-D.stewart-tsvwg-sctpecn] between CWR chunks.lost packets. However, deploying new TCP options has its own challenges. Moreover, to actually achieve high resilience, this option would need to be carried by most or all ACKs as the receiver cannot know if and when ACKs may be dropped.ThusThus, this approach would introduce considerablesignalingsignalling overhead even though ECN feedback is not extremely critical information (in the worst case, loss will still be available to provide a strong congestion feedback signal).Whatever,Nevertheless, such a TCP option could be used in addition to a more accurate ECN feedback scheme in the TCP header or in addition to classic ECN, only when needed and when space is available. 6.Acknowledgements Thanks to Gorry Fairhurst for his review and for ideas on CE-based integrity checking and to Mohammad Alizadeh for suggesting the need to avoid bias. Bob Briscoe was part-funded by the European Community under its Seventh Framework Programme through the Reducing Internet Transport Latency (RITE) project (ICT-317700) and through the Trilogy 2 project (ICT-317756). he views expressed here are solely those of the authors, in the context of the mentioned funding projects 7. IANA Considerations This memo includes no request to IANA. 8.Security Considerations ECN feedback information must only be used if the other information contained in a received TCP segment indicates that the congestion was genuinely part of the flow and notspoofed - i.e.spoofed. That is, the normal TCP acceptance techniques have to be used to verify that the segment is part of the flow before returning any contained ECN information,and similarlyand, similarly, ECN feedback is only accepted on valid ACKs. Given ECN feedback is used as input for congestion control, the respective algorithm would not react appropriately if ECN feedback were lost and the resilience mechanism to recover it was inadequate. This resilience requirement is articulated in Section 4. However, it should be noted that ECN feedback is not the last resort against congestion collapse, because if there is insufficient response to ECN, loss will ensue, and TCP will still react appropriately to loss. A receiver could suppress ECN feedback information leading to its connections consuming excess sender or network resources. This problem is similar to that seen with the classic ECN feedback scheme and should be addressed by integrity checking as required in Section 4.9.7. References9.1.7.1. Normative References [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September2001.2001, <http://www.rfc-editor.org/info/rfc3168>. [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit Congestion Notification (ECN) Signaling with Nonces", RFC 3540, DOI 10.17487/RFC3540, June2003. 9.2.2003, <http://www.rfc-editor.org/info/rfc3540>. 7.2. Informative References[I-D.bensley-tcpm-dctcp] sbens@microsoft.com, s.,[DCTCP] Bensley, S., Eggert, L., and D. Thaler, "Microsoft's Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters",draft-bensley-tcpm-dctcp-02 (work in progress), January 2015. [I-D.moncaster-tcpm-rcv-cheat] Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to Allow Senders to Identify Receiver Non-Compliance", draft- moncaster-tcpm-rcv-cheat-03 (workWork inprogress),Progress, draft-bensley-tcpm-dctcp-05, July2014. [I-D.stewart-tsvwg-sctpecn] Stewart, R., Tuexen, M.,2015. [ECN-BENEFITS] Fairhurst, G. andX. Dong, "ECN for Stream Control Transmission Protocol (SCTP)", draft-stewart- tsvwg-sctpecn-05 (work in progress), January 2014. [I-D.welzl-ecn-benefits] Welzl,M.and G. Fairhurst,Welzl, "The Benefitsto Applicationsof using Explicit Congestion Notification (ECN)",draft- welzl-ecn-benefits-01 (workWork inprogress),Progress draft-ietf-aqm-ecn-benefits-06, July2014.2015. [RFC0896] Nagle, J., "CongestioncontrolControl in IP/TCPinternetworks",Internetworks", RFC 896, DOI 10.17487/RFC0896, January1984.1984, <http://www.rfc-editor.org/info/rfc896>. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, DOI 10.17487/RFC2018, October1996.1996, <http://www.rfc-editor.org/info/rfc2018>. [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. Sooriyabandara, "TCP Performance Implications of Network Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, December2002.2002, <http://www.rfc-editor.org/info/rfc3449>. [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September2009.2009, <http://www.rfc-editor.org/info/rfc5681>. [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding Acknowledgement Congestion Control to TCP", RFC 5690, DOI 10.17487/RFC5690, February2010.2010, <http://www.rfc-editor.org/info/rfc5690>. [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the TCP Urgent Mechanism", RFC 6093, DOI 10.17487/RFC6093, January2011.2011, <http://www.rfc-editor.org/info/rfc6093>. [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., and K. Carlberg, "Explicit Congestion Notification (ECN) for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August2012.2012, <http://www.rfc-editor.org/info/rfc6679>. [RFC6789] Briscoe, B., Ed., Woundy, R., Ed., and A. Cooper, Ed., "Congestion Exposure (ConEx) Concepts and Use Cases", RFC 6789, DOI 10.17487/RFC6789, December2012.2012, <http://www.rfc-editor.org/info/rfc6789>. [SCTP-ECN] Stewart, R., Tuexen, M., and X. Dong, "ECN for Stream Control Transmission Protocol (SCTP)", Work in Progress, draft-stewart-tsvwg-sctpecn-05, January 2014. [TEST-RCV] Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to Allow Senders to Identify Receiver Non-Compliance", Work in Progress, draft-moncaster-tcpm-rcv-cheat-03, July 2014. Appendix A. Ambiguity of the More Accurate ECN Feedback in DCTCP As defined in[I-D.bensley-tcpm-dctcp],[DCTCP], a DCTCP receiver feeds back ECE=0 on delayed ACKs as long as CE remains 0, and also immediately sends an ACK with ECE=0 when CE transitions to 1. Similarly, it continually feeds back ECE=1 on delayed ACKs while CE remains 1 and immediately feeds back ECE=1 when CE transitions to 0. A sender can unambiguously decode this scheme if there is never any ACK loss, and the sender assumes there will never be any ACK loss. The following two examples show that the feedback sequence becomes highly ambiguous to thesender,sender if either of these conditions is broken. Below, '0'will representrepresents ECE=0, '1'will represent ECE=1represents ECE=1, and '.'will representrepresents a gap of one segment between delayed ACKs. Now imagine that the sender receives the following sequence of feedback on3three pure ACKs: 0.0.0 When the receiver sent thissequencesequence, it could have been any of the following four sequences: a. 0.0.0 (0 x CE) b. 010.0 (1 x CE) c. 0.010 (1 x CE) d. 01010 (2 x CE) where any of the 1s represent a possible pure ACK carrying ECE feedback that could have been lost. If the sender guesses (a), it might be correct, or it might miss 1 or 2 congestion marks over 5 packets. Therefore, when confronted with this simple sequence (that is not contrived), a sender can guess that congestion might have been 0%,20%20%, or 40%, but it doesn't know which. Sequences with a longer gap(e.g.(e.g., 0...0.0) become far more ambiguous. It helps a little if the sender knows the distance the receiver uses between delayed ACKs, and it helps a lot if the distance is 1,i.e.i.e., no delayedACKs, butACKs. However, eventhenwithout delayed ACKs there will still be ambiguity whenever there are pure ACK losses. Acknowledgements Thanks to Gorry Fairhurst for his review and for ideas on CE-based integrity checking and to Mohammad Alizadeh for suggesting the need to avoid bias. Bob Briscoe was partly funded by the European Community under its Seventh Framework Programme through the Reducing Internet Transport Latency (RITE) project (ICT-317700) and through the Trilogy 2 project (ICT-317756). The views expressed here are solely those of the authors, in the context of the mentioned funding projects. Authors' Addresses Mirja Kuehlewind (editor) ETH Zurich Gloriastrasse 35 Zurich 8092 Switzerland Email: mirja.kuehlewind@tik.ee.ethz.ch Richard Scheffenegger NetApp, Inc. Am Euro Platz 2 Vienna 1120 Austria Phone: +43 1 3676811 3146 Email: rs@netapp.com Bob Briscoe BT B54/77, Adastral Park Martlesham Heath Ipswich IP5 3REUKUnited Kingdom Phone: +44 1473 645196 Email:bob.briscoe@bt.comietf@bobbriscoe.net URI: http://bobbriscoe.net/