TRILL Working Group YizhouInternet Engineering Task Force (IETF) Y. LiINTERNET-DRAFT WeiguoRequest for Comments: 7379 W. HaoIntended Status:Category: Informational Huawei TechnologiesRadiaISSN: 2070-1721 R. Perlman EMCJonJ. Hudson BrocadeHongjunH. ZhaiZTE Expires: Feb 26, 2015 August 25,JIT October 2014 Problem Statement and Goals for Active-ActiveTRILLConnection at the Transparent Interconnection of Lots of Links (TRILL) Edgedraft-ietf-trill-active-active-connection-prob-07Abstract The IETF TRILL (Transparent Interconnection of Lots of Links) protocol provides support forflow level multi-pathingflow-level multipathing with rapid failover for both unicast and multi-destination traffic in networks with arbitrary topology. Active-active connection at the TRILL edge is the extension of these characteristics to end stations that are multiply connected to a TRILL campus. This informational document discusses thehigh levelhigh-level problems and goals when providingactive-activeactive- active connection at the TRILL edge. Status ofthisThis Memo ThisInternet-Draftdocument issubmitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force(IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum(IETF). It represents the consensus ofsix monthsthe IETF community. It has received public review andmay be updated, replaced, or obsoletedhas been approved for publication byotherthe Internet Engineering Steering Group (IESG). Not all documentsatapproved by the IESG are a candidate for anytime. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The listlevel of Internet Standard; see Section 2 of RFC 5741. Information about the currentInternet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The liststatus ofInternet-Draft Shadow Directories canthis document, any errata, and how to provide feedback on it may beaccessedobtained athttp://www.ietf.org/shadow.htmlhttp://www.rfc-editor.org/info/rfc7379. Copyrightand LicenseNotice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1....................................................3 1.1. Terminology. . . . . . . . . . . . . . . . . . . . . . . . 3................................................3 2. Target Scenario. . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 LAALP and Edge Group Characteristics . . . . . . . . . . . . 6.................................................4 3. Problems in Active-Active Connection at the TRILL Edge. . . . . . . . . . 7 3.1..........7 3.1. Frame Duplications. . . . . . . . . . . . . . . . . . . . . 7 3.2 Loop Back . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.........................................7 3.2. Loopback ...................................................8 3.3. Address Flip-Flop. . . . . . . . . . . . . . . . . . . . . 8 3.4..........................................8 3.4. UnsynchronizedInformation Among Member RBridges . . . . . . 8 4. High Level Requirements and Goals for Solutions . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 9 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 10 8.Information among Member RBridges ...........8 4. High-Level Requirements and Goals for Solutions .................9 5. Security Considerations ........................................10 6. References. . . . . . . . . . . . . . . . . . . . . . . . . . 10 8.1.....................................................11 6.1. Normative References. . . . . . . . . . . . . . . . . . . 10 8.2......................................11 6.2. Informative References. . . . . . . . . . . . . . . . . . 11....................................12 Acknowledgments ...................................................12 Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 11................................................12 1. Introduction The IETF TRILL (Transparent Interconnection of Lots of Links) [RFC6325] protocol providesloop freeloop-free andper hop basedper-hop-based multipath data forwarding with minimum configuration. TRILL uses IS-IS [IS-IS] [RFC6165] [RFC7176] as itscontrol planecontrol-plane routing protocol and defines aTRILL specificTRILL-specific header for user data. In a TRILL campus, communications between TRILL switchescan (1)can: 1) use multiple parallel links and/or paths,(2)2) spread load over different links and/or paths at afine grainedfine-grained flow level throughequal costequal-cost multipathing of unicast traffic and multiple distribution trees for multi-destination traffic, and(3)3) rapidlyre-configurereconfigure to accommodate link or node failures or additions."Active-active" is the extension, toTo theextentdegree practical, "active-active" is the extension of similar load spreading and robustness to the connections between end stations and the TRILL campus. Such end stations may have multiple ports and will be connected, directly or via bridges, to multiple edge TRILL switches. It must be possible, except in some failure conditions, to spreadend stationend-station traffic load at the granularity of flows across links to such multiple edge TRILL switches and rapidlyre-configurereconfigure to accommodate topology changes.1.11.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. The acronyms and terminology in the RBridges base protocol [RFC6325] are used herein with the following additions:CE -CE: Customer Equipment (end station or bridge). DataLabel -Label: VLAN or FGL(Fine Grained(Fine-Grained Label [RFC7172]).LAALP -LAALP: Local Active-Active Link Protocol. Any protocol similar to MC-LAG that runs in a distributedfashionsfashion on a CE, on the links from that CE to a set of edge group RBridges, and on those RBridges.MC-LAG -MC-LAG: Multi-Chassis Link Aggregation. Proprietary extensions to IEEE Std 802.1AX-2011 [802.1AX]standardso that the aggregated links can, at one end of the aggregation, attach to different switches. Edgegroup -group: a group of edge RBridges to which at least one CE is multiply attached using an LAALP. When multiple CEs attach to the exact same set of edge RBridges, those edge RBridges can be considered as a single edge group. An RBridge can be in more than one edge group.RBridge -RBridge: RoutingBridge - anBridge. An alternative name for a TRILL switch.TRILL -TRILL: Transparent Interconnection of Lots of Links. TRILLswitch --switch: a devicethethat implements the TRILL protocol; an alternative term for an RBridge. 2. Target Scenario This section presents a typical scenario of active-activeconnectionsconnection to a TRILL campus via multiple edge RBridges where the current TRILLappointed forwarderAppointed Forwarder mechanism does not work as expected. The TRILLappointed forwarderAppointed Forwarder mechanism [RFC6439] can handlefail overfailover (active-standby), provides loopavoidanceavoidance, and, with administrative configuration, provides load spreading based on VLAN. One and only one appointed RBridge can ingress/egress native frames into/from the TRILL campus for a given VLAN among all edge RBridges connecting a legacy network to the TRILL campus. This is true whether the legacy network is a simple point-to-point link or a complex bridged LAN or anything in between. By carefully selecting different RBridges asappointed forwarderAppointed Forwarder for different sets of VLANs, load spreading over different edgeRBidgesRBridges across different Data Labels can be achieved. Theappointed forwarderAppointed Forwarder mechanism [RFC6439] requires all of the edge group RBridges to exchange TRILL IS-IS Hello packets through their access ports. As Figure 1 shows, when multiple access links of multiple edge RBridges are connected to a CE by an LAALP, Hello messages sent by RB1 via access port to CE1 will not be forwarded to RB2 by CE1. RB2 (and other members of LAALP1) will not see that Hello from RB1 via the LAALP1. Every member RBridge of LAALP1 thinks of itself asappointed forwarderAppointed Forwarder on an LAALP1 link for all VLANs and will ingress/egress frames.HenceHence, theappointed forwarderAppointed Forwarder mechanism cannot provide active-active or even active-standby service across the edge group in such a scenario. ---------------------- | | | TRILL Campus | | | ---------------------- | | | ----- | -------- | | | +------+ +------+ +------+ | | | | | | |(RB1) | |(RB2) | | (RBk)| +------+ +------+ +------+ |..| |..| |..| | +----+ | | | | | +---|-----|--|----------+ | | +-|---|-----+ +-----------+ | | | | +------------------+ | | LAALP1--->(| | |) (| | |) <---LAALPn +-------+ . . . +-------+ | CE1 | | CEn | | | | | +-------+ +-------+ Figure11: Active-ActiveconnectionConnection to TRILLedgeEdge RBridgesActive-ActiveActive-active connection is useful when we want to achieve the following two goals: -FlowFlow-based rather thanVLAN basedVLAN-based load balancing is desired. - More rapid failure recovery is desired. The currentappointed forwarderAppointed Forwarder mechanism relies on the TRILL Hello timer expiration to detect the unreachability of another edge RBridge connecting to the same local link.Then re-appointingThen, reappointing the forwarder for specific VLANs may be required. Such procedures take time on the scale of seconds although this can be improved with TRILL use ofBFDBidirectional Forwarding Detection (BFD) [RFC7175].Active-ActiveActive-active connection usually has a faster built-in mechanism for member node and/or link failure detection. Faster detection of failures minimizes the frame loss and recovery time.TodayToday, LAALP is usually a proprietary facility whose implementation varies by vendor. So, to be sure the LAALPoperationsoperates successfully across a group of edge RBridges, those edge RBridges will almost always have to be from the same vendor. In the case where the LAALP is an MC-LAG, the CE normally implementsstandard [802.1AX]the logic described in IEEE Std 802.1AX-2011 [802.1AX], so proprietary elements would only be at the end of the edgegroup end.group. There is also a revision of IEEE Std 802.1AX-2011 [802.1AX] underway (802.1X-REV) to remove the restriction in IEEE Std 802.1AX-2011 [802.1AX] that there be one box at each end of the aggregation. So it is possiblethatthat, in thefuturefuture, an LAALP could be implemented through such a revised IEEE Std 802.1AX-2011 [802.1AX] withstandards conformantstandards-conformant logic at the ends of both the CE and edgegroup ends.group. In order to have a common understanding of active-active connection scenarios, the assumptions in Section 2.1 are made about the characteristics of the LAALP and edge group of RBridges.2.12.1. LAALP and Edge Group Characteristics For a CE connecting to multiple edge RBridges via an LAALP (active- active connection), the following characteristics apply: a) The LAALP will deliver a frame from anendnodeend node to TRILL at exactly one edge group RBridge. b) The LAALP will never forward frames it receives from oneup-linkuplink to another. c) The LAALP will attempt to send all frames for a given flow on the same uplink. To do this, it has some unknown rule for which frames get sent to which uplinks (typically based on a simple hash function of Layer 2 through 4 header fields). d) Frames are accepted from any of the uplinks and passed down toendnodesend nodes (if any exist). e) The LAALP cannot be assumed to send useful control information to theup-linkuplink such as "this is the set of other RBridges to which this CE isattached",attached" or "these are all the MAC addresses attached". For an edge group of RBridges to which a CE is multiply attached with an LAALP: a) Any two RBridges in the edge group are reachable from each other via the TRILL campus. b) Each RBridge in the edge group knows an ID for each LAALP instance multiply attached to that group. The ID will be consistent across the edge group and globally unique across the TRILL campus. For example, if CE1 attaches to RB1, RB2, ... RBn using an LAALP, then each of the RBs will know, for the port to CE1, that itishassomea label such as"LAALP1""LAALP1". c) Each RB in the edge group can be configured with the set of acceptable VLANs for the ports to any CE. The acceptable VLANs configured for those ports should include all the VLANs the CE has joined and be consistent for all the member RBridges of the edge group. d) Whenaan RBridge fails, all the other RBridgeshavingthat have formedanyan LAALP instance with itknowlearn of theinformationfailure in a timely fashion. e) When adown-linkdownlink of an edge group RBridge to an LAALP instance fails, that RBridge and all the other RBridges participating in the LAALPinstanceinstance, including thatdown-link knowdownlink, learn of the failure in a timely fashion. f) The RBridges in the edge group havesomea mechanism to exchange information with each other,includinginformation such as the set of CEs they are connecting to or the IDs of the LAALP instances theirdown-linksdownlinks are part of. Other than the applicable characteristics above, the internals of an LAALP are out of the scopeforof TRILL. 3. Problems in Active-Active Connection at the TRILL Edge This section presents the problems that need to be addressed in active-active connection scenarios. The topology in Figure 1 is used in the following sub-sections as the example scenario for illustration purposes.3.13.1. Frame Duplications When a remote RBridge ingresses a multi-destination TRILLDatadata packet in VLAN x, all edge group RBridges of LAALP1 will receive the frame if any local CE1 joins VLAN x. As each of them thinks it is theappointed forwarderAppointed Forwarder for VLAN x, without changes made for active- active connection support, they would all forward the frame to CE1. The bad consequence is that CE1 receives multiple copies of that multi-destination frame from the remoteend hostend-host source. Frame duplication may also occur when an ingress RBridge is non-remote, sayremote -- say, ingress and egress are two RBridges belonging to the same edge group. Assume LAALP m connects to an edge groupgg, and the edge group g consists of RB1,RB2RB2, and RB3. The multi-destination frames ingressed from a port not connected to LAALP m by RB1 can be locally replicated to other ports on RB1 and also TRILL encapsulated and forwarded to RB2 and RB3. CE1 will receive duplicate copies from RB1,RB2RB2, and RB3. Note that frame duplication is only a problem in multi-destination frame forwarding. Unicast forwarding does not have this issue as there is only ever one copy of the packet.3.2 Loop Back3.2. Loopback As shown in Figure 1, CE1 may send a native multi-destination frame to the TRILL campus via a member of the LAALP1 edge group (say RB1). This frame will be TRILL encapsulated and then forwarded through the campus to the multi-destination receivers. Other members (say RB2) of the same LAALP edge group will receive this multicast packet as well. In this case, without changes made for active-active connection support, RB2 will decapsulate the frame and egress it. The frame loops back to CE1.3.33.3. Address Flip-Flop Consider RB1 and RB2 using their own nickname as ingress nickname for data into a TRILL campus. As shownbyin Figure 1, CE1 may send a data frame with the same VLAN and sourceMACMedia Access Control (MAC) address to any member of the edge group LAALP1. Ifsomean egress RBridge receives TRILL data packets from different ingress RBridges but with the same source Data Label and MAC address, it learns differentDatacorrespondences between a {Data Label and MACtoaddress} and nicknameaddress correspondenceswhen decapsulating the data frames. Address correspondence may keep flip-flopping among nicknames of the member RBridges of the LAALP for the same Data Label and MAC address. Existing hardware does not supportdata planedata-plane learning of multiple nicknames for the same MAC address anddata labelData Label -- whendata planedata-plane learning indicates attachment of the MAC address to a new nickname, it overwrites the old attachment nickname. Implementers have stated that most current TRILL switch hardware, when doingdata planedata-plane learning, behaves badly under these circumstances and, for example,interpretinterprets address flip-flopping as a severe network problem. It may also cause the returning traffic to go through different paths to reach thedestinationdestination, resulting in persistentre-orderingreordering of the frames.3.43.4. Unsynchronized InformationAmongamong Member RBridges A local RBridge, say RB1 connected to LAALP1, may have learned aDatacorrespondence between a {Data Label and MACtoaddress} and nicknamecorrespondencefor a remote host h1 when h1 sends a packet to CE1. The returning traffic from CE1 may go to any other member RBridge of LAALP1, forexampleexample, RB2. RB2 may not have h1'sDatacorrespondence between a {Data Label and MACtoaddress} and nicknamecorrespondencestored.ThereforeTherefore, it has to do the flooding for unknown unicast addresses [RFC6325]. Such flooding is unnecessary since the returning traffic is almost always expected and RB1 had learned the address correspondence. It is desirable to avoid flooding; it imposes a greater burden on the network than known destination unicast traffic because the flooded traffic is sent over more links. Synchronization of theDatacorrespondence between a {Data Label and MACtoaddress} and nicknamecorrespondenceinformation among member RBridges will reduce such unnecessary flooding. 4.High LevelHigh-Level Requirements and Goals for Solutions The problems identified insectionSection 3 should be solved in any solution for active-active connection to edge RBridges. The following high- level requirements and goals should be met. Data plane: 1) Allup-linksuplinks of a CE MUST be active: the LAALP is free to choose anyup-linkuplink on which to sendpacketspackets, and the CE is able to receive packets from anyup-linkuplink of an edge group. 2)Looping backLoopback and frame duplication MUST be prevented. 3) Learning ofDatacorrespondence between a {Data Label and MACtoaddress} and nicknamecorrespondenceby a remote RBridge MUST NOT flip-flop between the local multiply attached edge RBridges. 4) Packets for a flow SHOULD stay in order. 5) The Reverse Path Forwarding Check MUST work properly as per the RBridges base protocol [RFC6325]. 6) Singleup-linkuplink failure on a CE to an edge group MUST NOT cause persistent packet delivery failure between a TRILL campus and CE. Control plane: 1) No requirement for new information to be passed between edge RBridges andCECEs or between edge RBridges andendnodes.end nodes exists. 2) If there is anyTRILL specificTRILL-specific information required to be exchanged between RBridges in an edge group, forexample data labelsexample, Data Labels and MAC addresses binding to nicknames, a solution MUST specify the mechanism to perform such exchange unless this is handled internal to the LAALP. 3) RBridges SHOULD be able to discover other members in the same edge group by exchanging their LAALP attachment information. Configuration, incremental deployment, and others: 1) Solution SHOULD require minimal configuration. 2) Solution SHOULD automatically detect misconfiguration of edge RBridge group. 3) Solution SHOULD support incremental deployment, that is, not requirecampus widecampus-wide upgrading for all RBridges, only changes to the edge group RBridges. 4) Solution SHOULD be able to support from2two up to at least4 active- active up-linksfour active-active uplinks on a multiply attached CE. 5) Solution SHOULD NOT assume there is a dedicated physical link between any twoof theedge RBridges in an edge group. 5. Security Considerations As an informational overview, thisdraftdocument does not introduce any extra security risks. Security risks introduced byanya particular LAALP or other elements of solutions to the problems presented here will be discussed in the separate document(s) describing such LAALP or solutions.End stationEnd-station links in TRILL are Ethernetlinkslinks, and consideration should be given to securing them with[802.1AE]link security as described in IEEE Std 802.1AE-2006 [802.1AE] for the protection ofend stationend-station data andlink levellink-level controlmessagesmessages, including any LAALP control messages. For general TRILL Security Considerations, see the RBridges base protocol [RFC6325]. 6.IANA Considerations No IANA action is required. RFC Editor: please delete this section before publication. 8.References8.16.1. Normative References [IS-IS] ISO/IEC, "Information technology -- Telecommunications and information exchange between systems -- Intermediate System to Intermediate System intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode network service (ISO 8473)", ISO/IEC 10589:2002, Second Edition, 2002. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March1997. [IS-IS] ISO/IEC 10589:2002, Second Edition, "Intermediate System to Intermediate System Intra-Domain Routing Exchange Protocol for use in Conjunction with the Protocol for Providing the Connectionless-mode Network Service (ISO 8473)", 2002.1997, <http://www.rfc-editor.org/info/rfc2119>. [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 Systems", RFC 6165, April2011.2011, <http://www.rfc- editor.org/info/rfc6165>. [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. Ghanwani, "Routing Bridges (RBridges): Base Protocol Specification", RFC 6325, July20112011, <http://www.rfc- editor.org/info/rfc6325>. [RFC6439] Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and F. Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 6439, November20112011, <http://www.rfc- editor.org/info/rfc6439>. [RFC7172]Eastlake,Eastlake 3rd, D.,M.Zhang,P.M., Agarwal,R.P., Perlman, R., and D. Dutt, "Transparent Interconnection of Lots of Links (TRILL): Fine-Grained Labeling",RFC7172,RFC 7172, May2014.2014, <http://www.rfc-editor.org/info/rfc7172>. [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., and A. Banerjee, "Transparent Interconnection of Lots of Links (TRILL) Use of IS-IS", RFC 7176, May2014. 8.22014, <http://www.rfc-editor.org/info/rfc7176>. 6.2. Informative References [802.1AE] IEEE, "IEEE Standard for Local and metropolitan area networks -- Media Access Control (MAC) Security", IEEE Std 802.1AE-2006, 18 August 2006. [802.1AX] IEEE, "IEEE Standard for Local and metropolitan area networks -- Link Aggregration", IEEE Std 802.1AX-2008, 3 November 2008. [RFC7175] Manral, V.,D. Eastlake, D.Eastlake 3rd, D., Ward, D., and A. Banerjee, "TransparentInterconnetionInterconnection of Lots of Links (TRILL): Bidirectional Forwarding Detection (BFD) Support",RFC7175,RFC 7175, May2014. [802.1AX] IEEE, "Link Aggregration", 802.1AX-2008, 3 November 2008. [802.1Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q-2011, 31 August 2011. [802.1AE] IEEE, "Media Access Control (MAC) Security", IEEE Std 802.1AE-2006, 18 August 2006. 7.2014, <http://www.rfc-editor.org/info/rfc7175>. Acknowledgments Special acknowledgments to Donald Eastlake, AdrianFarrelFarrel, and Mingui Zhang for their valuable comments. Authors' Addresses Yizhou Li Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56625409 EMail: liyizhou@huawei.com Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 EMail: haoweiguo@huawei.com Radia Perlman EMC 2010 256th Avenue NE, #200 Bellevue, WA 98007USA Email:United States EMail: Radia@alum.mit.edu Jon Hudson Brocade 130 Holger Way San Jose, CA 95134USAUnited States Phone: +1-408-333-4062 EMail: jon.hudson@gmail.com Hongjun ZhaiZTE 68 Zijinghua Road, Yuhuatai District Nanjing, Jiangsu 210012 China Phone: +86 25 52877345 Email: zhai.hongjun@zte.com.cnJIT EMail: honjun.zhai@tom.com