BESS WorkgroupInternet Engineering Task Force (IETF) J. Rabadan, Ed.Internet DraftRequest for Comments: 8584 Nokia Updates: 7432 S. Mohanty, Ed.Intended status:Category: Standards Track A. Sajassi ISSN: 2070-1721 Cisco J. Drake Juniper K. Nagaraj S. Sathappan NokiaExpires: July 28, 2019 January 24,April 2019 Framework forEVPNEthernet VPN Designated Forwarder Election Extensibilitydraft-ietf-bess-evpn-df-election-framework-09Abstract An alternative to theDefaultdefault Designated Forwarder (DF) selection algorithm in EthernetVPN (EVPN) networksVPNs (EVPNs) is defined. The DF is the Provider Edge (PE) router responsible for sendingbroadcast, unknown unicastBroadcast, Unknown Unicast, andmulticastMulticast (BUM) traffic tomulti-homeda multihomed CustomerEquipmentEdge (CE) device on a given VLAN on a particular Ethernet Segment(ES) within a VLAN.(ES). In addition, thecapabilityability to influence the DF election result for a VLAN based on the state of the associated Attachment Circuit (AC) is specified. This document clarifies the DFElectionelection Finite State Machine inEVPN, thereforeEVPN services. Therefore, it updates the EVPNspecification.specification (RFC 7432). Status ofthisThis Memo ThisInternet-Draftissubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsan Internet Standards Track document. This document is a product of the Internet Engineering Task Force(IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum(IETF). It represents the consensus ofsix monthsthe IETF community. It has received public review andmay be updated, replaced, or obsoletedhas been approved for publication byother documents at any time. Itthe Internet Engineering Steering Group (IESG). Further information on Internet Standards isinappropriate to use Internet-Drafts as reference material or to cite them other than as "workavailable inprogress." The listSection 2 of RFC 7841. Information about the currentInternet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The liststatus ofInternet-Draft Shadow Directories canthis document, any errata, and how to provide feedback on it may beaccessedobtained athttp://www.ietf.org/shadow.html This Internet-Draft will expire on July 28, 2019.https://www.rfc-editor.org/info/rfc8584. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(http://trustee.ietf.org/license-info)(https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3....................................................3 1.1. Conventions and Terminology ................................3 1.2. Default Designated Forwarder (DF) Election in EVPN. . . . 3 1.2.Services ...................................................5 1.3. Problem Statement. . . . . . . . . . . . . . . . . . . . . 6 1.2.1...........................................8 1.3.1. UnfairLoad-BalancingLoad Balancing and Service Disruption. . . . . 6 1.2.2.........8 1.3.2. Traffic Black-Holing on Individual AC Failures. . . . 7 1.3......10 1.4. The Need for Extending the Default DF Election in EVPN. . 10Services .............................................12 2.Conventions and Terminology . . . . . . . . . . . . . . . . . . 11 3.Designated Forwarder Election Protocol and BGP Extensions. . . 12 3.1.......13 2.1. The DF Election Finite State Machine (FSM). . . . . . . . 12 3.2.................13 2.2. The DF Election Extended Community. . . . . . . . . . . . 15 3.2.1.........................16 2.2.1. Backward Compatibility. . . . . . . . . . . . . . . . 18 3.3. Auto-Derivation of ES-Import Route Target . . . . . . . . . 18 4..............................19 3. The Highest Random Weight DF Election Algorithm. . . . . . . . 18 4.1.................19 3.1. HRW and Consistent Hashing. . . . . . . . . . . . . . . . 19 4.2.................................20 3.2. HRW Algorithm for EVPN DF Election. . . . . . . . . . . . 19 5.........................20 4. TheAttachment Circuit InfluencedAC-Influenced DF Election Capability. . . 21 5.1........................22 4.1. AC-Influenced DF Election CapabilityForfor VLAN-Aware Bundle Services. . . . . . . . . . . . . . . . . . . . . . 23 6.................................24 5. Solution Benefits. . . . . . . . . . . . . . . . . . . . . . . 24 7...............................................25 6. Security Considerations. . . . . . . . . . . . . . . . . . . . 25 8.........................................26 7. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 25 9.............................................27 8. References. . . . . . . . . . . . . . . . . . . . . . . . . . 26 9.1......................................................28 8.1. Normative References. . . . . . . . . . . . . . . . . . . 26 9.2.......................................28 8.2. Informative References. . . . . . . . . . . . . . . . . . 27 10.....................................29 Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . 27 11....................................................30 Contributors. . . . . . . . . . . . . . . . . . . . . . . . . 28......................................................30 Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 28................................................31 1. Introduction The Designated Forwarder (DF) inEVPN networksEthernet VPNs (EVPNs) is the Provider Edge (PE) router responsible for sendingbroadcast, unknown unicastBroadcast, Unknown Unicast, andmulticastMulticast (BUM) traffic to amulti-homedmultihomed CustomerEquipmentEdge (CE)device,device on a given VLAN on a particular Ethernet Segment (ES). The DF isselected outelected from the set of multihomed PEs attached to alistgiven ES, each ofcandidate PEs that advertisewhich advertises an ES route for thesameES as identified by its Ethernet Segment Identifier(ESI) to the EVPN network.(ESI). By default, the EVPN uses a DFElectionelection algorithm referred to as"Service Carving" and it"service carving". The DF election algorithm is based on a modulus function (V mod N) that takes the number of PEs in the ES (N) and the VLAN value (V) as input. ThisDefault DF Election algorithm has some inefficiencies that thisdocument addressesby defining ainefficiencies in the default DF election algorithm by defining a new DFElectionelection algorithm anda capabilityan ability to influence the DFElectionelection result for a VLAN, depending on the state of the associated Attachment Circuit (AC). In order to avoid any ambiguity with the identifier used in the DFElection Algorithm,election algorithm, this document uses the termEthernet Tag"Ethernet Tag" instead ofVLAN."VLAN". This document also creates a registry withIANA,IANA for future DFElection Algorithmselection algorithms andCapabilities.capabilities (see Section 7). It also presents a formal definition and clarification of the DFElectionelection Finite State Machine(FSM), therefore the(FSM). Therefore, this document updates[RFC7432][RFC7432], and EVPN implementations MUST conform to the prescribed FSM. The procedures described in this document apply to DF election in all EVPNsolutionssolutions, including those described in [RFC7432] and [RFC8214]. Apart from theFSMformaldescription,description of the FSM, this document does not intend to update other[RFC7432] procedures. Itprocedures described in [RFC7432]; it only aims to improve the behavior of the DFElectionelection on PEs that are upgraded to follow the procedures describedprocedures.in this document. 1.1.Default Designated Forwarder (DF) ElectionConventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" inEVPN [RFC7432] defines the Designated Forwarder (DF)this document are to be interpreted asthe EVPN PE responsible for:described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. oFloodingAC: Attachment Circuit. An AC has an Ethernet Tag associated with it. o ACS: Attachment Circuit Status. o BUM: Broadcast,Unknown unicastunknown unicast, andMulticast traffic (BUM), on a givenmulticast. o DF: Designated Forwarder. o NDF: Non-Designated Forwarder. o BDF: Backup Designated Forwarder. o EthernetTag on a particularA-D per ES route: Refers to Route Type 1 as defined in [RFC7432] or to Auto-discovery per Ethernet Segment(ES),route. o Ethernet A-D per EVI route: Refers tothe CE. This is valid for single-active and all-activeRoute Type 1 as defined in [RFC7432] or to Auto-discovery per EVPNmulti-homing.Instance route. oSending unicast trafficES: Ethernet Segment. o ESI: Ethernet Segment Identifier. o EVI: EVPN Instance. o MAC-VRF: A Virtual Routing and Forwarding table for Media Access Control (MAC) addresses on agiven Ethernet TagPE. o BD: Broadcast Domain. An EVI may be comprised of one BD (VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware Bundle services). o Bridge table: An instantiation of a BD on aparticular ES to the CE. This is valid for single-active multi-homing. Figure 1 illustrates an example that we will use to explain the Designated Forwarder function. +---------------+ | IP/MPLS | | CORE | +----+ ES1 +----+ +----+ | CE1|-----| | | |____ES2 +----+ | PE1| | PE2| \ | | +----+ \+----+ +----+ | | CE2| | +----+ /+----+ | | |____/ | | | PE3| ES2 / | +----+ / | | / +-------------+----+ / | PE4|____/ES2 | | +----+ Figure 1 Multi-homing Network of EVPN Figure 1 illustrates a case where there are two Ethernet Segments, ES1 and ES2. PE1 is attached to CE1 viaMAC-VRF. o HRW: Highest Random Weight. o VID: VLAN Identifier. o CE-VID: Customer Edge VLAN Identifier. o EthernetSegment ES1 whereas PE2, PE3 and PE4 are attachedTag: Used toCE2 via ES2 i.e. PE2, PE3 and PE4 formrepresent aredundancy group. Since CE2 is multi-homed to different PEs on the same Ethernet Segment, itBD that isnecessary for PE2, PE3 and PE4 to agreeconfigured on aDF to satisfy the above mentioned requirements. The effect of forwarding loops in a Layer-2 network is particularly severe because of the broadcast nature of Ethernet traffic andgiven ES for thelackpurpose ofa Time-To-Live (TTL). Therefore it is very importantDF election. Note thatin the case of a multi-homed CE only oneany of thePEsfollowing may be used tosend BUM traffic to it. Onerepresent a BD: VIDs (including Q-in-Q tags), configured IDs, VNIs (Virtual Extensible Local Area Network (VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service Instance Identifiers), etc., as long as the representation of thepre-requisites for this supportBDs isthat participatingconfigured consistently across the multihomed PEsmust agree amongst themselves as to who would act as the Designated Forwarder (DF). This needsattached to that ES. The Ethernet Tag value MUST beachieved through a distributed algorithmdifferent from zero. o Ethernet Tag ID: Refers to the identifier used inwhich each participating PE independently and unambiguously selects one oftheparticipating PEsEVPN routes defined in [RFC7432]. Its value may be the same as theDF, andEthernet Tag value (see theresult should be consistent and unanimous. The default algorithmdefinition forDF election defined by [RFC7432] atEthernet Tag) when advertising routes for VLAN-aware Bundle services. Note that in thegranularitycase of(ESI,EVI)VLAN-based or VLAN Bundle services, the Ethernet Tag ID isreferred to as "service carving". In this document, service carving and Defaultzero. o DFElection algorithm are used interchangeably. With service carving, it is possibleelection procedure: Also called "DF election". Refers toelect multiple DFs per Ethernet Segment (one per EVI)the process inorder to perform load-balancingits entirety, including the discovery oftraffic destined to a given Segment. The objective is thattheload-balancing procedures should carve upPEs in theBD space amongES, the creation and maintenance of theredundantPEnodes evenly, in suchcandidate list, and the selection of away that every PE isPE. o DF algorithm: A component of the DF election procedure. Strictly refers to the selection of a PE for adistinct setgiven <ES, Ethernet Tag>. o RR: Route Reflector. A network routing component for BGP [RFC4456]. It offers an alternative to the logical full-mesh requirement ofEVIs.the Internal Border Gateway Protocol (IBGP). TheDF Election algorithm as described in [RFC7432] (Section 8.5)purpose of the RR isbased onconcentration. Multiple BGP routers can peer with amodulus operation. The PEs to whichcentral point, theES (for which DF election is to be carried out per EVI) is multi-homed formRR -- acting as a route reflector server -- rather than peer with every other router in a full mesh. This results in anordered (ordinal) listO(N) peering as opposed to O(N^2). o TTL: Time To Live. This document also assumes that the reader is familiar with the terminology provided inascending order of[RFC7432]. 1.2. Default Designated Forwarder (DF) Election in EVPN Services [RFC7432] defines thePE IP address values. For example, there are N PEs: PE0, PE1,... PEN-1 rankedDF asper increasing IP addresses intheordinal list; then for each VLAN withEVPN PE responsible for: o Flooding BUM traffic on a given Ethernet TagV, configuredon a particular ES to theEthernet Segment ES1, PExCE. This isthe DFvalid forVLAN V on ES1 when x equals (V mod N). In the case of VLAN Bundle only the lowest VLAN is used. In the case when the planned density is high (meaning there are significant number of VLANsSingle-Active andtheAll-Active EVPN multihoming. o Sending unicast traffic on a given EthernetTags are uniformly distributed),Tag on a particular ES to thethinkingCE. This is valid for Single-Active multihoming. Figure 1 illustrates an example thatthe DF Electionwe willbe spread across the PEs hosting that Ethernet Segment and good load- balancing can be achieved. However, the described Default DF Election algorithm has some undesirable properties and in some cases can be somewhat disruptive and unfair. This document describes some of those issues and defines a mechanism for dealing with them. These mechanisms do involve changesuse to explain theDefaultDFElection algorithm, but they do not require any changes to the EVPN Route exchange and have minimal changes in thefunction. +---------------+ | IP/MPLS | | Core | +----+ ES1 +----+ +----+ | CE1|-----| | | |____ES2 +----+ | PE1| | PE2| \ | | +----+ \+----+ +----+ | | CE2| | +----+ /+----+ | | |____/ | | | PE3| ES2 / | +----+ / | | / +-------------+----+ / | PE4|____/ES2 | | +----+ Figure 1: EVPNroutes. In addition,Multihoming Figure 1 illustrates a case where there are two ESes: ES1 and ES2. PE1 is attached to CE1 via ES1, whereas PE2, PE3, and PE4 are attached to CE2 via ES2, i.e., PE2, PE3, and PE4 form aneedredundancy group. Since CE2 is multihomed toextenddifferent PEs on theDF Election procedures so that new algorithmssame ES, it is necessary for PE2, PE3, andcapabilities are possible. A single algorithm (the DefaultPE4 to agree on a DFElection algorithm) may not meetto satisfy therequirementsabove-mentioned requirements. The effect of forwarding loops inall the use-cases. Note that while [RFC7432] elects a DF per <ES, EVI>, this document electsaDF per <ES, BD>. This means that unlike [RFC7432], where forLayer 2 network is particularly severe because of the broadcast nature of Ethernet traffic and the lack of aVLAN-Aware Bundle service EVI thereTTL. Therefore, it is very important that, in the case of a multihomed CE, only oneDF forof theEVI, this document specifies that there willPEs bemultiple DFs, oneused to send BUM traffic to it. One of the prerequisites foreach BD configured inthis support is thatEVI. 1.2. Problem Statementparticipating PEs must agree amongst themselves as to who would act as the DF. Thissection describes some potential issues with the Default DF Election algorithm. 1.2.1. Unfair Load-Balancing and Service Disruption There are three fundamental problems with the current Default DF Election algorithm. 1- First, the algorithm will not perform well when the Ethernet Tag follows a non-uniform distribution, for instance when the Ethernet Tags are all even or all odd. In such a case let us assume that the ES is multi-homedneeds totwo PEs;be achieved through a distributed algorithm in which each participating PE independently and unambiguously selects one of the participating PEswill be electedasDF for all oftheVLANs. This is very sub-optimal. It defeatsDF, and thepurposeresult should be consistent and unanimous. The default algorithm for DF election defined by [RFC7432] at the granularity ofservice carving(ESI, EVI) is referred to asthe DFs are not really evenly spread across."service carving". Infact, inthisparticular case, one ofdocument, service carving and thePEs does not get elected asdefault DFat all, soelection algorithm are used interchangeably. With service carving, itdoes not participate in the DF responsibilities at all. Consider another example where, referringis possible toFigure 1, lets assume that PE2, PE3, PE4 areelect multiple DFs per ES (one per EVI) inascendingorder to perform load balancing ofthe IP address; and each VLAN configured on ES2 is associated with an Ethernet Tag of the form (3x+1), where x is an integer. This will result in PE3 always be selected as the DF. 2-traffic destined to a given ES. TheEthernet tag that identifies the BD can be as large as 2^24; however, itobjective isnot guaranteedthat thetenant BD on the ES will conform to a uniform distribution. In fact, it isload-balancing procedures should carve uptothecustomer what BDs they will configure onBD space among theES. Quoting [Knuth], "In general, we want to avoid values of M that divide r^k+a or r^k-a, where k andredundant PE nodes evenly, in such aare small numbers and rway that every PE is theradix of the alphabetic character set (usually r=64, 256 or 100), since a remainder modulo suchDF for avaluedistinct set ofM tends to be largelyEVIs. The DF election algorithm (as described in [RFC7432], Section 8.5) is based on asimple superposition of key digits. Such considerations suggest that we choose Mmodulus operation. The PEs to which the ES (for which DF election is to bea prime number such that r^k!=a(modulo)M or r^k!=?a(modulo)Mcarried out per EVI) is multihomed form an ordered (ordinal) list in ascending order by PE IP address value. For example, there are N PEs: PE0, PE1,... PE(N-1) ranked as per increasing IP addresses in the ordinal list; then, forsmall k & a."each VLAN with Ethernet Tag V, configured on ES1, PEx is the DF for VLAN V on ES1 when x equals (V mod N). Inour case, Nthe case of a VLAN Bundle, only the lowest VLAN is used. In the case when the planned density is high (meaning there are a significant number ofPEs in [RFC7432] which corresponds to M above. Since N, N-1 or N+1 need not satisfyVLANs and theprimality properties ofEthernet Tags are uniformly distributed), theM above; as perthinking is that the[RFC7432] modulo basedDFassignment, whenever a PE goes down or a new PE boots up (hosting the same Ethernet Segment), the modulo schemeelection willnot necessarily map BDs tobe spread across the PEsuniformly. 3- The third problem is onehosting that ES and good load balancing can be achieved. However, the described default DF election algorithm has some undesirable properties and, in some cases, can be somewhat disruptive and unfair. This document describes some ofdisruption. Considerthose issues and defines acase whenmechanism for dealing with them. These mechanisms do involve changes to thesame Ethernet Segment is multi-homeddefault DF election algorithm, but they do not require any changes toa set of PEs. WhentheES is downEVPN route exchange, and changes inone of the PEs, say PE1, or PE1 itself reboots, ortheBGP process goes down orEVPN routes will be minimal. In addition, there is a need to extend theconnectivity between PE1DF election procedures so that new algorithms andan RR goes down,capabilities are possible. A single algorithm (the default DF election algorithm) may not meet theeffective number of PEsrequirements inthe system now becomes N-1, and DFs are computed forall theVLANs that are configured onuse cases. Note thatEthernet Segment. In general, if thewhile [RFC7432] elects a DF per <ES, EVI>, this document elects a DF per <ES, BD>. This means that unlike [RFC7432], where for aVLAN v happens not to be PE1, but some other PE, say PE2, itVLAN-aware Bundle service EVI there islikelyonly one DF for the EVI, this document specifies thatsome other PE (different from PE1 and PE2)there willbecome the new DF.be multiple DFs, one for each BD configured in that EVI. 1.3. Problem Statement Thisissection describes some potential issues with the default DF election algorithm. 1.3.1. Unfair Load Balancing and Service Disruption There are three fundamental problems with the current default DF election algorithm. 1. The algorithm will notdesirable. Similarlyperform well when the Ethernet Tag follows anew PE hostsnon-uniform distribution -- for instance, when thesameEthernetSegment,Tags are all even or all odd. In such a case, let us assume that themapping again changes becauseES is multihomed to two PEs; one of themodulus operation.PEs will be elected as the DF for all of the VLANs. Thisresultsis very suboptimal. It defeats the purpose of service carving, as the DFs are not really evenly spread across the PEs hosting the ES. In fact, inneedless churn. Againthis particular case, one of the PEs does not get elected as the DF at all, so it does not participate in DF responsibilities at all. Consider another example where, referring to Figure 1,say v1, v2let's assume that (1) PE2, PE3, andv3PE4 areVLANslisted in ascending order by IP address and (2) each VLAN configured on ES2withis associated with an EthernetTagsTag ofvalue 999, 1000 and 1001 respectively. So PE1, PE2 and PE3 aretheDFs for v1, v2 and v3 respectively. Now when PE3 goes down, PE2form (3x+1), where x is an integer. This willbecomeresult in PE3 always being selected as theDF for v1 and PE1 will becomeDF. 2. The Ethernet Tag that identifies theDF for v2. One point to noteBD can be as large as 2^24; however, it is not guaranteed that theDefault DF election algorithm assumes that alltenant BD on thePEs who are multi-homedES will conform tothe same Ethernet Segment (and interested in the DF Election by exchanging EVPN routes) use an Originating Router's IP Address of the same family. This does not needa uniform distribution. In fact, it is up tobethecase ascustomer what BDs they will configure on theEVPN address-family can be carried over an IPv4ES. Quoting [Knuth]: In general, we want to avoid values of M that divide r^k+a orIPv6 peering,r^k-a, where k and a are small numbers and r is thePEs attached to the same ES may use an addressradix ofeither family. Mathematically,the alphabetic character set (usually r=64, 256 or 100), since aconventional hash function mapsremainder modulo such akey kvalue of M tends to be largely anumber i representing onesimple superposition ofm hash buckets throughkey digits. Such considerations suggest that we choose M to be afunction h(k) i.e. i=h(k).prime number such that r^k!=a(modulo)M or r^k!=?a(modulo)M for small k & a. Inthe EVPNour case,h is simply a modulo-m hash function viz. h(v) = v mod N, whereN is the number of PEsthat are multi-homed(Section 8.5 of [RFC7432]). N corresponds to M above. Since N, N-1, or N+1 need not satisfy theEthernet Segment in discussion. It is well-known that for good hash distribution using the modulus operation,primality properties of M, as per themodulus N should bemodulo-based DF assignment [RFC7432], whenever aprime-number not too close toPE goes down or apower of 2 [CLRS2009]. When the effective number of PEs changes from Nnew PE boots up (attached toN-1 (or vice versa); alltheobjects (VLAN V)same ES), the modulo scheme willbe remapped except those for which V mod N and V mod (N-1) refernot necessarily map BDs to PEs uniformly. 3. Disruption is another problem. Consider a case when the samePE in the previous and subsequent ordinal rankings respectively. From a forwarding perspective, thisES is multihomed to achurn, as it resultsset of PEs. When the ES is DOWN inre-programmingone of thePE ports as either blockingPEs, say PE1, or PE1 itself reboots, ornon-blocking atthePEs whereBGP process goes down or theDF state changes. This document addresses this problemconnectivity between PE1 andfurnishes a solution to this undesirable behavior. 1.2.2. Traffic Black-Holing on Individual AC Failures As discussed in section 2.1an RR goes down, theDefault DF Election algorithm defined by [RFC7432] takes into account only two variableseffective number of PEs in themodulus functionsystem now becomes N-1, and DFs are computed fora given ES: the existence ofall thePE's IP addressVLANs that are configured onthe candidate list and the locally provisioned Ethernet Tags. Ifthat ES. In general, if the DF foran <ESI, EVI> fails (duea VLAN V happens not tophysical link/node failures) an ES route withdrawal will make the Non-DF (NDF) PEs re- elect the DF forbe PE1, but some other PE, say PE2, it is likely that<ESI, EVI>some other PE (different from PE1 andthe servicePE2) willbe recovered. However,become theDefault DF election procedure doesnew DF. This is notprovidedesirable. Similarly, when aprotection against "logical" failures or human errors that may occur at service level onnew PE hosts theDF, whilesame ES, thelist of active PEs for a given ES does not change. These failures may have an impact not only on the local PE where the issue happens, but also on the rest of the PEsmapping again changes because of theES. Some examples of such logical failures are listed below: a) A given individual Attachment Circuit (AC) definedmodulus operation. This results inan ES is accidentally shutdown or even not provisioned yet (hence the Attachment Circuit Status - ACS - is DOWN), while the ES is operationally active (since the ES route is active). b) A given MAC-VRF -needless churn. Again referring to Figure 1, say V1, V2, and V3 are VLANs configured on ES2 witha defined ES - is shutdown or not provisioned yet, while the ES is operationally active (since the ES route is active). In this case, the ACSassociated Ethernet Tags ofallvalues 999, 1000, and 1001, respectively. So, PE1, PE2, and PE3 are theACs defined in that MAC-VRF is considered to be DOWN. Neither (a) nor (b)DFs for V1, V2, and V3, respectively. Now when PE3 goes down, PE2 willtriggerbecome the DFre-election on the remote multi-homed PEsfora given ES sinceV1 and PE1 will become theACSDF for V2. One point to note isnot taken into account inthat the default DF electionprocedures. Whilealgorithm assumes that all the PEs who are multihomed to the same ES (and interested in theACS is used as aDF electiontie-breaker and trigger in VPLS multi-homing procedures [VPLS-MH], there is no procedure defined inby exchanging EVPN routes) use an Originating Router's IP address [RFC7432] of the same family. This does not need totriggerbe theDF re-election based oncase, as theACS change onEVPN address family can be carried over an IPv4 or IPv6 peering, and theDF. Figure 2 illustratesPEs attached to thedescribed issue withsame ES may use anexample. +---+ |CE4| +---+ | PE4 | +-----+-----+ +---------------| +-----+ |---------------+ | | | BD-1| | | | +-----------+ | | | |address of either family. Mathematically, a conventional hash function maps a key k to a number i representing one of m hash buckets through a function h(k), i.e., i = h(k). In the EVPN| | | | PE1 PE2 PE3 | | (NDF) (DF) (NDF)| +-----------+ +-----------+ +-----------+ | | BD-1| | | | BD-1| | | | BD-1| | | +-----+ |-------| +-----+ |-------| +-----+ | +-----------+ +-----------+ +-----------+ AC1\ ES12 /AC2 AC3\ ES23 /AC4 \ / \ / \ / \ / +----+ +----+ |CE12| |CE23| +----+ +----+ Figure 2 Default DF Election and Traffic Black-Holing BD-1 is defined in PE1, PE2, PE3 and PE4. CE12case, h is simply amulti-homed CE connectedmodulo-m hash function viz. h(V) = V mod N, where N is the number of PEs that are multihomed toES12the ES inPE1 and PE2. Similarly CE23question. It ismulti-homed to PE2 and PE3well known that for good hash distribution usingES23. Both, CE12 and CE23, are connectedthe modulus operation, the modulus N should be a prime number not too close toBD-1 through VLAN-based service interfaces: CE12-VID 1 (VLAN ID 1 on CE12) is associateda power of 2 [CLRS2009]. When the effective number of PEs changes from N toAC1N-1 (or vice versa), all the objects (VLAN V) will be remapped except those for which V mod N andAC2 in BD-1, whereas CE23-VID 1 is associatedV mod (N-1) refer toAC3the same PE in the previous andAC4subsequent ordinal rankings, respectively. From a forwarding perspective, this is a churn, as it results inBD-1. Assume that, although not represented, there are other ACs defined on these ES mapped to different BDs. After executingreprogramming the PE ports as either blocking or non-blocking at the PEs where the[RFC7432] DefaultDFelection algorithm, PE2 turns outstate changes. This document addresses this problem and furnishes a solution tobethis undesirable behavior. 1.3.2. Traffic Black-Holing on Individual AC Failures The default DF election algorithm defined by [RFC7432] takes into account only two variables in theDFmodulus function forES12 and ES23a given ES: the existence of the PE's IP address inBD-1. The following issues may arise: a)the candidate list and the locally provisioned Ethernet Tags. IfAC2 is accidentally shutdown or even not configured, CE12 traffic will be impacted. In case of all-active multi-homing,theBUM trafficDF for an <ESI, EVI> fails (due toCE12physical link/node failures), an ES route withdrawal willbe "black-holed", whereasmake the NDF PEs re-elect the DF forsingle- active multi-homing, allthat <ESI, EVI> and thetraffic to/from CE12service will bediscarded. This is due torecovered. However, thefactdefault DF election procedure does not provide protection against "logical" failures or human errors that may occur at the service level on the DF, while the list of active PEs for alogical failure in PE2's AC2given ES does not change. These failures may have an impact nottriggeronly on the local PE where the issue happens but also on the rest of the PEs of the ES. Some examples of such logical failures are listed below: (a) A given individual AC defined in an ESroute withdrawn for ES12 (since there are still other ACs active on ES12) and therefore PE1 willis accidentally shut down or is notre- runprovisioned yet (hence, theDF election procedures. b) IfACS is DOWN), while theBridge Table for BD-1ES isadministratively shutdownoperationally active (since the ES route is active). (b) A given MAC-VRF with a defined ES is either shut down orevennotconfigured yet on PE2, CE12 and CE23 will both be impacted: BUM traffic to both CEs will be discarded in caseprovisioned yet, while the ES is operationally active (since the ES route is active). In this case, the ACS ofall-active multi-homing andalltraffic will be discarded to/fromtheCEsACs defined incase of single-active multi-homing. Thisthat MAC-VRF isdueconsidered tothe fact that PE1 and PE3be DOWN. Neither (a) nor (b) willnot re-runtrigger the DFelection procedures and will keep assuming PE2 is the DF. Quoting [RFC7432], "when an Ethernet Tag is decommissionedre-election onan Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI route(s) announced for the <ESI, Ethernet Tags> that are impacted by the decommissioning", however, while this A-D per EVI route withdrawal is used atthe remote multihomed PEsperforming aliasing or backup procedures, it is not used to influence the DF electionforthe affected EVIs. This document adds an optional modification of the DF Election procedure so thata given ES, since the ACSmay beis not taken into accountas a variable in the DF election, and therefore EVPN can provide protection against logical failures. 1.3. The Need for Extending the Default DF Election in EVPN Section 1.2 describes some of the issues that existin theDefaultDFElectionelection procedures.In order to address those issues, this document introduces a new DF Election framework. This framework allowsWhile thePEs to agree on a common DF election algorithm, as wellACS is used asthe capabilities to enable during the DF Election procedure. Generally, 'DF election algorithm' refers to the algorithm by whichanumber of input parameters are used to determine theDFPE, while 'DFelectioncapability' refers to an additional feature that can be used prior totiebreaker and trigger in Virtual Private LAN Service (VPLS) multihoming procedures [VPLS-MH], there is no procedure defined in theinvocation ofEVPN specification [RFC7432] to trigger the DFelection algorithm, such as modifyingre-election based on theinputs (or listACS change on the DF. Figure 2 shows an example ofcandidate PEs). Within this framework, this document defines a newlogical AC failure. +---+ |CE4| +---+ | PE4 | +-----+-----+ +---------------| +-----+ |---------------+ | | | BD-1| | | | +-----------+ | | | | EVPN | | | | PE1 PE2 PE3 | | (NDF) (DF) (NDF)| +-----------+ +-----------+ +-----------+ | | BD-1| | | | BD-1| | | | BD-1| | | +-----+ |-------| +-----+ |-------| +-----+ | +-----------+ +-----------+ +-----------+ AC1\ ES12 /AC2 AC3\ ES23 /AC4 \ / \ / \ / \ / +----+ +----+ |CE12| |CE23| +----+ +----+ Figure 2: Default DF Electionalgorithmanda new capability that can influence the DF Election result: o The new DF Election algorithmTraffic Black-Holing BD-1 isreferred to as "Highest Random Weight" (HRW). The HRW procedures are describeddefined insection 4. o The new DF Election capabilityPE1, PE2, PE3, and PE4. CE12 isreferreda multihomed CE connected toas "AC-Influenced DF Election" (AC-DF). The AC-DF procedures are describedES12 insection 5. o HRWPE1 andAC-DF mechanismsPE2. Similarly, CE23 is multihomed to PE2 and PE3 using ES23. Both CE12 and CE23 areindependent of each other. Therefore, a PE may support either HRW or AC-DF independently or may support both of them together. A PE may also support AC-DF capability alongconnected to BD-1 through VLAN-based service interfaces: CE12-VID 1 (VID 1 on CE12) is associated with AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated with AC3 and AC4 in BD-1. Assume that, although not represented, there are other ACs defined on these ESes mapped to different BDs. After executing theDefaultdefault DF election algorithmper [RFC7432]. In addition, this document defines a wayas described in [RFC7432], PE2 turns out toindicatebe thesupportDF for ES12 and ES23 in BD-1. The following issues may arise: (a) If AC2 is accidentally shut down or is not configured yet, CE12 traffic will be impacted. In the case ofHRW and/or AC-DF along withAll-Active multihoming, theEVPN ES routes advertised for a given ES. ReferBUM traffic tosection 3.2CE12 will be "black-holed", whereas formore details. 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL"Single-Active multihoming, all the traffic to/from CE12 will be discarded. This is because a logical failure inthis documentPE2's AC2 may not trigger an ES route withdrawal for ES12 (since there are still other ACs active on ES12); therefore, PE1 will not rerun the DF election procedures. (b) If the bridge table for BD-1 is administratively shut down or is not configured yet on PE2, CE12 and CE23 will both be impacted: BUM traffic to both CEs will beinterpreted as describeddiscarded inBCP 14 [RFC2119] [RFC8174] when,the case of All-Active multihoming, andonly when, they appear inallcapitals, as shown here. o ACtraffic will be discarded to/from the CEs in the case of Single-Active multihoming. This is because PE1 andACS - Attachment CircuitPE3 will not rerun the DF election procedures andAttachment Circuit Status. An AC haswill keep assuming that PE2 is the DF. Quoting [RFC7432], "When an Ethernet tag is decommissioned on an EthernetTag associated to it. o BUM - refers tosegment, then the PE MUST withdraw theBroadcast, Unknown unicast and Multicast traffic. o DF, NDF and BDF - Designated Forwarder, Non-Designated Forwarder and Backup Designated Forwarder oEthernet A-D perES route - refers to [RFC7432] route type 1 or Auto-Discovery per Ethernet Segment route. oEVI route(s) announced for the <ESI, Ethernet tags> that are impacted by the decommissioning." However, while this A-D per EVI route- refers to [RFC7432] route type 1withdrawal is used at the remote PEs performing aliasing orAuto-Discovery per EVPN Instance route. o ES and ESI - Ethernet Segment and Ethernet Segment Identifier. o EVI - EVPN Instance. o MAC-VRF - A Virtual Routing and Forwarding tablebackup procedures, it is not used to influence the DF election forMedia Access Control (MAC) addresses on a PE. o BD - Broadcast Domain. An EVIthe affected EVIs. This document adds an optional modification of the DF election procedure so that the ACS may becomprised of one (VLAN-Based or VLAN Bundle services) or multiple (VLAN-Aware Bundle services) Broadcast Domains. o Bridge Table - An instantiation of a broadcast domain on a MAC-VRF. o HRW - Highest Random Weight o VID and CE-VID - VLAN Identifier and Customer Equipment VLAN Identifier. o Ethernet Tag - used to representtaken into account as aBroadcast Domainvariable in the DF election; therefore, EVPN can provide protection against logical failures. 1.4. The Need for Extending the Default DF Election in EVPN Services Section 1.3 describes some of the issues thatis configured onexist in the default DF election procedures. In order to address those issues, this document introduces agiven ES for the purpose ofnew DFelection. Note that any ofelection framework. This framework allows thefollowing may be usedPEs torepresentagree on aBroadcast Domain: VIDs (including Q-in-Q tags), configured IDs, VNI (VXLAN Network Identifiers), normalized VID, I-SIDs (Service Instance Identifiers), etc.,common DF election algorithm, aslongwell as therepresentation of the broadcast domains is configured consistently across the multi-homed PEs attachedcapabilities tothat ES. The Ethernet Tag value MUST be different from zero. o Ethernet Tag ID -enable during the DF election procedure. Generally, "DF election algorithm" refers to theidentifieralgorithm by which a number of input parameters are usedinto determine theEVPN routes defined in [RFC7432]. Its value mayDF PE, while "DF election capability" refers to an additional feature that can be used prior to thesameinvocation of the DF election algorithm, such as modifying theEthernet Tag value (see Ethernet Tag definition) when advertising routes for VLAN-aware Bundle services. Note that in caseinputs (or list ofVLAN-based or VLAN Bundle services, the Ethernet Tag ID is zero. ocandidate PEs). Within this framework, this document defines a new DFElection Procedureelection algorithm and a new capability that can influence the DFAlgorithm -election result: o TheDesignated Forwarder Election Procedure or simplynew DFElection, referselection algorithm is referred tothe processas "Highest Random Weight" (HRW). The HRW procedures are described inits entirety, including the discovery of the PEsSection 3. o The new DF election capability is referred to as "AC-Influenced DF election" (AC-DF). The AC-DF procedures are described inthe ES, the creation and maintenance of the PE candidate listSection 4. o HRW andthe selectionAC-DF mechanisms are independent of each other. Therefore, aPE. The Designated Forwarder Algorithm is just a componentPE may support either HRW or AC-DF independently or may support both of them together. A PE may also support the AC-DF capability along with the default DFElection Procedure and strictly referselection algorithm per [RFC7432]. In addition, this document defines a way to indicate theselectionsupport ofa PEHRW and/or AC-DF along with the EVPN ES routes advertised for a given<ES,Ethernet Tag>. o TTL - Time To Live This document also assumes familiarity with the terminology of [RFC7432]. 3.ES. Refer to Section 2.2 for more details. 2. Designated Forwarder Election Protocol and BGP Extensions This section describes the BGP extensions required to support the new DFElectionelection procedures. In addition, since the EVPN specification [RFC7432]does leaveleaves several questions open as to the precisefinal state machineFSM behavior of the DF election,section 3.1 describesSection 2.1 precisely describes the intended behavior.3.1.2.1. The DF Election Finite State Machine (FSM) Per [RFC7432], the FSMdescribedshown in Figure 3 is executed per<ESI,VLAN><ES, VLAN> in the case of VLAN-based service or<ESI,[VLANs<ES, [VLANs in VLAN Bundle]> in the case of a VLAN Bundle on each participating PE.ObserveNote thatcurrently the VLANs are derived from local configuration and the FSM does not provide any protection against misconfiguration where the same (EVI,ESI) combination has different set of VLANs on different participating PEs or one ofthePEs elects to consider VLANs as VLAN Bundle and another as separate VLANs for election purposes (service type mismatch). TheFSM isconceptual and anyconceptual. Any design or implementation MUST comply withabehavior that is equivalent to theonebehavior outlined in this FSM. VLAN_CHANGE VLAN_CHANGE RCVD_ES RCVD_ES LOST_ES LOST_ES +----++----++-------+ |v| |++----++v | +-+----+ ES_UP| DF |++-------++ +->+ INIT+---------------> WAIT+-------------->+ DF_WAIT | ++-----++----+-++-------+-+ ^ | +-----------+ | |DF_TIMER |ANY STATEANY_STATE +-------+ VLAN_CHANGE | +-----------+ ES_DOWN +-----------------+ | | RCVD_ES v v+-----+++--------++ LOST_ES++---+-+ | DF |++------+-+ |DF | | DONEDF_DONE +<--------------+CALCDF_CALC +<-++------++---------+ CALCULATED+----+-++-------+-+ | | | +----+ VLAN_CHANGE RCVD_ES LOST_ES Figure33: DF Election Finite State Machine Observe that each EVI is locally configured on each of the multihomed PEs attached to a given ES and that the FSM does not provide any protection against inconsistent configuration between these PEs. That is, for a given EVI, one or more of the PEs are inadvertently configured with a different set of VLANs for a VLAN-aware Bundle service or with different VLANs for a VLAN-based service. The states and events shown in Figure 3 are defined as follows. States: 1. INIT: InitialStatestate. 2. DF_WAIT: State in which the participant waits for enough information to perform the DF election for the EVI/ESI/VLAN combination. 3. DF_CALC: State in which the new DF is recomputed. 4. DF_DONE: State in which theaccordingcorresponding DF for the EVI/ESI/VLAN combination has been elected. 5. ANY_STATE: Refers to any of the above states. Events: 1. ES_UP: TheESIES has been locally configured as'up'."UP". 2. ES_DOWN: TheESIES has been locally configured as'down'."DOWN". 3. VLAN_CHANGE: The VLANs configured in a bundle (that uses theESI)ES) changed. This event is necessary for VLAN Bundles only. 4. DF_TIMER: DFWaittimer [RFC7432] (referred to as "Wait timer" in this document) has expired. 5. RCVD_ES: A new or changedEthernet SegmentES route is received ina BGP REACH UPDATE.an Update message with an MP_REACH_NLRI. Receiving an unchangedUPDATEUpdate MUST NOT trigger this event. 6. LOST_ES:A BGP UNREACH UPDATEAn Update message with an MP_UNREACH_NLRI for a previously receivedEthernet SegmentES route has been received. Ifan UNREACHsuch a message is seen for a route that has not been advertised previously, the event MUST NOT be triggered. 7. CALCULATED: DF has been successfully calculated.AccordingCorresponding actions when transitions are performed or states are entered/exited: 1. ANY_STATE on ES_DOWN: (i)stopStop the DFwait timerWait timer. (ii)assumeAssume an NDF for the local PE. 2. INIT on ES_UP:transitionTransition to DF_WAIT. 3. INIT on VLAN_CHANGE,RCVD_ESRCVD_ES, or LOST_ES:doDo nothing. 4. DF_WAIT on entering the state: (i)startStart the DFwaitWait timer if not started already orexpiredexpired. (ii)assumeAssume an NDF for the local PE. 5. DF_WAIT on VLAN_CHANGE,RCVD_ESRCVD_ES, or LOST_ES:doDo nothing. 6. DF_WAIT on DF_TIMER:transitionTransition to DF_CALC. 7. DF_CALC on entering or re-entering the state: (i)rebuildRebuild the candidate list,hashperform a hash, and performelectionthe election. (ii)AfterwardsAfterwards, the FSM generates a CALCULATED event against itself. 8. DF_CALC on VLAN_CHANGE,RCVD_ESRCVD_ES, or LOST_ES:doDo as prescribed intransitionTransition 7. 9. DF_CALC on CALCULATED:markMark the election result for the VLAN or bundle, and transition to DF_DONE.11.10. DF_DONE on exiting the state:if there isIf a new DF election is triggered and the current DF is lost, then assume an NDF for the local PE for the VLAN or VLAN Bundle.12.11. DF_DONE on VLAN_CHANGE,RCVD_ESRCVD_ES, or LOST_ES:transitionTransition to DF_CALC. The above events and transitions are defined for theDefaultdefault DFElection Algorithm.election algorithm. As described in Section5,4, the use of the AC-DF capability introduces additional events and transitions.3.2.2.2. The DF Election Extended Community For the DF election procedures to be consistent and unanimous, it is necessary that all the participating PEs agree on the DFElectionelection algorithm and capabilities to be used. For instance, it is not possiblethatfor some PEs to continue to use theDefaultdefault DFElectionelection algorithmandwhile some PEs use HRW. Forbrown-fieldbrownfield deployments and for interoperability with legacy PEs, it is important that all PEsneed tohave thecapabilityability to fall back on theDefaultdefault DFElection.election. A PE can indicate its willingness to support HRW and/or AC-DF by signaling a DF Election Extended Community along with theEthernet SegmentES route(Type-4).(Route Type 4). The DF Election Extended Community is a new BGP transitiveextended communityExtended Community attribute [RFC4360] that is defined to identify the DF election procedure to be used for theEthernet Segment.ES. Figure 4 shows the encoding of the DF Election Extended Community. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Type=0x06Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Bitmap | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure44: DF Election Extended Community Where: oType is 0x06Type: 0x06, as registered with IANA (Section 7) for EVPN Extended Communities. oSub-Type is 0x06 -Sub-Type: 0x06. "DF Election ExtendedCommunity"Community", asrequested by this document toregistered with IANA. oRSV / Reserved -RSV/Reserved: Reserved bits forDF Alginformation that is specificinformation.to DF Alg. o DF Alg (5bits) -bits): Encodes the DFElectionelection algorithm values (between 0 and 31) that the advertising PE desires to use for the ES. This documentrequestscreates an IANAto set up aregistry called "DFAlg Registry" and solicitsAlg" (Section 7), which contains the following values: - Type 0: Default DFElectionelection algorithm, or modulus-based algorithm as defined in [RFC7432]. - Type 1: HRWalgorithm (explained in this document).Algorithm (Section 3). - Types 2-30: Unassigned. - Type 31: Reserved for Experimental Use. o Bitmap (2octets) -octets): Encodes "capabilities" to use with the DFElectionelection algorithm in thefield "DF Alg".DF Alg field. This documentrequestscreates an IANAto create aregistry (Section 7) for the Bitmap field, with values0-15,0-15. This registry is called "DF Election Capabilities" andsolicitsincludes thefollowing values:bit values listed below. 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |A| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure55: BitmapfieldField in the DF Election Extended Community - Bit 0 (corresponds to Bit 24 of the DF Election Extended Community): Unassigned. - Bit 1: AC-DF Capability (AC-Influenced DFElection, explained in this document).election; see Section 4). When set to 1, it indicates the desire to useAC- Influenced DF ElectionAC-DF with the rest of the PEs in the ES. - Bits 2-15: Unassigned. The DF Election Extended Community is used as follows: o A PE SHOULD attach the DF Election Extended Community to any advertised ESrouteroute, and the Extended Community MUST be sent if the ES is locally configured with a DF election algorithm other than theDefault Electiondefault DF election algorithm or if a capability is required to be used. In the Extended Community, the PE indicates the desired "DF Alg" algorithm and "Bitmap" capabilities to be used for the ES. - Only one DF Election Extended Community can be sent along with an ES route. Note that the intent is not for the advertising PE to indicate all the supported DF election algorithms andcapabilities,capabilities but to signal the preferred one. - DFAlgsAlg values 0 and 1 canbeboth be used withbit AC-DFBit 1 (AC-DF) set to 0 or 1. - In general, a specific DF Alg SHOULD determine the use of the reserved bits in the Extended Community, which may be used in a different way for a different DF Alg. In particular, for DFAlgsAlg values 0 and 1, the reserved bits are not set by the advertising PE and SHOULD be ignored by the receiving PE. o When a PE receives the ESRoutesroutes from all the other PEs for the ES in question, it checks to see if all the advertisements have theextended communityExtended Community with the same DF Alg and Bitmap: -In the case thatIf they do, this particular PE MUST follow the procedures for the advertised DF Alg and capabilities. For instance, if all ES routes for a given ES indicate DF Alg HRW and AC-DF set to 1, then thereceiving PE and by induction all the otherPEsinattached to the ES willproceed to doperform the DFElectionelection as per the HRWAlgorithmalgorithm and following the AC-DF procedures. -OtherwiseOtherwise, if even a single advertisementfor the type-4 routefor Route Type 4 is received without the locally configured DF Alg and capability, theDefaultdefault DFElection algorithm (modulus)election algorithm MUST be used as prescribed in [RFC7432]. This procedure handles the case where participating PEs in the ES disagree about the DF algorithm and capability toapply.be applied. - The absence of the DF Election Extended Community or the presence of multiple DF Election Extended Communities (in the same route) MUST be interpreted by a receiving PE as an indication of theDefaultdefault DFElectionelection algorithm on the sendingPE,PE -- that is, DF Alg 0 and no DFElectionelection capabilities. o When all the PEs in an ES advertise DF Type 31, they will rely on the local policy to decide how to proceed with the DFElection.election. o For any new capability defined in the future, theapplicability/compatibilityapplicability/ compatibility of this new capabilitytoto/with the existing DFAlgsAlg values must be assessed on acase by casecase-by-case basis. o Likewise, for any new DF Alg defined in the future, its applicability/compatibilitytoto/with the existing capabilities must be assessed on acase by casecase-by-case basis.3.2.1.2.2.1. Backward Compatibility Implementations that comply with [RFC7432]implementationsonly (i.e.,thoseimplementations that predate this specification) will not advertise the DF Election Extended Community. That means that all other participating PEs in the ES will not receive DF preferences and will revert to theDefaultdefault DFElectionelection algorithm withoutAC-Influenced DF Election.AC-DF. Similarly,a [RFC7432]an implementationreceivingthat complies with [RFC7432] only and that receives a DF Election Extended Community will ignore it and will continue to use theDefaultdefault DFElectionelection algorithm.3.3. Auto-Derivation of ES-Import Route Target Section 7.6 of [RFC7432] describes how the value of the ES-Import Route Target for ESI types 1, 2, and 3 can be auto-derived by using the high-order six bytes of the nine byte ESI value. The same auto- derivation procedure can be extended to ESI types 0, 4, and 5 as long as it is ensured that the auto-derived values for ES-Import RT among different ES types don't overlap. As in [RFC7432], the mechanism to guarantee that the auto-derived ESI or ES-import RT values for different ESIs do not match is out of scope of this document. 4.3. The Highest Random Weight DF Election Algorithm The procedure discussed in this section is applicable to the DFElectionelection in EVPNServicesservices [RFC7432] and the EVPN Virtual Private WireServicesService (VPWS) [RFC8214].Highest Random Weight (HRW)HRW as defined in [HRW1999] is originally proposed in the context of InternetCachingcaching and proxyServerserver load balancing. Given an object name and a set of servers, HRW maps a request to a server using the object-name (object-id) and server-name (server-id) rather than the server states. HRW forms a hash out of the server-id and the object-id and forms an ordered list of the servers for the particular object-id. The server for which the hash value ishighest,highest serves as the primary server responsible for that particular object, and the server with thenext highestnext-highest value in that hash serves as the backup server. HRW always maps a given object name to the same server within a given cluster;consequentlyconsequently, it can be used at client sites to achieve global consensus onobject-serverobject-to-server mappings. When that server goes down, the backup server becomes the responsible designate. Choosing an appropriate hash function that is statistically oblivious to the key distribution and imparts a good uniform distribution of the hash output is an important aspect of the algorithm.FortunatelyFortunately, many such hash functions exist. [HRW1999] providespseudo-randompseudorandom functions based on the Unix utilities rand and srand and easily constructed XOR functions that satisfy the desired hashing properties. HRW already finds use in multicast and ECMP[RFC2991],[RFC2992]. 4.1.[RFC2991] [RFC2992]. 3.1. HRW and Consistent Hashing HRW is not the only algorithm that addresses theobject to serverobject-to-server mapping problem with goals of fair load distribution,redundancyredundancy, and fast access. There is another family of algorithms that also addresses this problem; these fall under the umbrella of the Consistent Hashing Algorithms [CHASH]. These will not be considered here.4.2.3.2. HRW Algorithm for EVPN DF Election This section describes the application of HRW to DF election. LetDF(v)DF(V) denote theDesignated ForwarderDF andBDF(v)BDF(V) denote theBackup Designated forwarderBDF for the Ethernet Tagv, where v is the VLAN,V; Si is the IP address of PEi,i; Esdenotesis theEthernet Segment IdentifierESI; andweightWeight is a function ofv,V, Si, and Es. Note that while the DF election algorithm provided in [RFC7432] uses a PE address andvlanVLAN as inputs, this document uses an Ethernet Tag, PEaddressaddress, and ESI as inputs. This is because if the same set of PEs aremulti-homedmultihomed to the same set of ESes, then the DF election algorithm used in [RFC7432] would result in the same PE being elected DF for the same set ofbroadcast domainsBDs on eachES, which canES; this could have adverseside-effectsside effects on both load balancing and redundancy. Including an ESI in the DF election algorithm introduces additionalentropyentropy, which significantly reduces the probability of the same PE being elected DF for the same set ofbroadcast domainsBDs on each ES. Therefore, when using the HRWAlgorithmalgorithm for EVPN DFElection,election, the ESI value in the Weight function below SHOULD be set to that of the corresponding ES. In the case of a VLAN Bundle service,vV denotes the lowestVLANVLAN, similar to the'lowest"lowest VLAN inbundle'bundle" logic of [RFC7432]. 1.DF(v)DF(V) = Si|Weight(v,Weight(V, Es, Si) >=Weight(v,Weight(V, Es, Sj), for all j. In the case of a tie, choose the PE whose IP address is numerically the least. Note that 0 <= i,j <Numbernumber of PEs in the redundancy group. 2.BDF(v)BDF(V) = Sk|Weight(v,Weight(V, Es, Si) >=Weight(v,Weight(V, Es,Sk)Sk), andWeight(v,Weight(V, Es, Sk) >=Weight(v,Weight(V, Es, Sj). In the case oftiea tie, choose the PE whose IP address is numerically the least. Where:DF(v):o DF(V) is defined to be the address Si (index i) for whichweight(v,Weight(V, Es, Si) is thehighest,highest; 0 <= i <N-1 BDF(v)N-1. o BDF(V) is defined as that PE with address Sk for which the computedweightWeight is the next highest after theweightWeight of the DF. j is the running index from 0 toN-1, i,N-1; i and k are selected values. Since the Weight is apseudo-randompseudorandom function with the domain as the three-tuple(v,(V, Es, S), it is an efficient and deterministic algorithm that is independent of the Ethernet TagvV sample space distribution. Choosing a good hash function for thepseudo-randompseudorandom function is an important consideration for this algorithm to perform better than theDefaultdefault algorithm. As mentioned previously, such functions are described inthe HRW paper.[HRW1999]. We take as a candidate hash function the first one out of the two that are listed as preferred in [HRW1999]:Wrand(v,Wrand(V, Es, Si) = (1103515245((1103515245.Si+12345) XORD(v,Es))+12345)(modD(V, Es))+12345)(mod 2^31)Here D(v,Es)Here, D(V, Es) is the 31-bit digest (CRC-32 and discarding theMSBmost significant bit (MSB), as noted in [HRW1999]) of the14-byte stream, the14-octet stream (the 4-octet Ethernet Tagv (4 bytes)V followed by theEthernet Segment Identifier (10 bytes).10-octet ESI). It is mandated that the14-byte14-octet streamisbe formed by the concatenation of the EthernettagTag and theEthernet Segment identifierESI in network byte order. The CRC should proceed as if the stream is in network byte order (big-endian). Si is the address of the ith server. The server's IP address length does notmattermatter, as only the low-order 31 bits are modulo significant. A point to note is that the Weight function takes into consideration the combination of the Ethernet Tag,Ethernet Segmentthe ES, and the PEIP-IP address, and the actual length of the server IP address (whether IPv4 or IPv6) is not really relevant. TheDefaultdefault algorithm defined in [RFC7432] cannot employ both IPv4 and IPv6 PE addresses, since [RFC7432] does not specify how to decide on the ordering (the ordinal list) when both IPv4 and IPv6 PEs are present. HRW solves the disadvantages pointed out in Section1.2.11.3.1 of this document andensures:ensures that: owithWith very highprobability thatprobability, the task of DF election for the VLANs configured on an ES is more or less equally distributed among thePEsPEs, evenforin the2 PE case.case of two PEs (see the first fundamental problem listed in Section 1.3.1). o If a PE that is not the DF or the BDF for thatVLAN,VLAN goes down or its connection to the ES goes down, it does not result in a DF or BDF reassignment. This saves computation, especially in the case when the connection flaps. o Moreimportantlyimportantly, it avoids theneedless disruption case ofthird fundamental problem listed in Section1.2.1 (3),1.3.1 (needless disruption) that is inherent in the existingDefaultdefault DFElection.election. o In addition to the DF, the algorithm also furnishes the BDF, which would be the DF if the current DF fails.5.4. TheAttachment Circuit InfluencedAC-Influenced DF Election Capability The procedure discussed in this section is applicable to the DFElectionelection in EVPNServicesservices [RFC7432] and EVPNVirtual Private Wire ServicesVPWS [RFC8214]. The AC-DF capability is expected to beof general applicability withgenerally applicable to any future DFAlgorithm.algorithm. It modifies the DFElectionelection procedures by removing from consideration any candidate PE in the ES that cannot forward traffic on the AC that belongs to the BD. This section is applicable toVLAN-BasedVLAN-based and VLAN Bundle service interfaces. Section5.14.1 describes the procedures forVLAN-AwareVLAN-aware Bundle service interfaces. In particular, when used with theDefaultdefault DFAlg,algorithm, the AC-DF capability modifiestheStep 3 in the DFElectionelection procedure described in[RFC7432][RFC7432], Section 8.5, as follows: 3. When the timer expires, each PE builds an ordered"candidate"candidate list of the IP addresses of all the PE nodes attached to theEthernet SegmentES (including itself), in increasing numeric value. The candidate list is based on theOriginatorOriginating Router's IP addresses of the ESroutes,routes but excludes any PE from whom no Ethernet A-D per ES route has beenreceived,received or from whom the route has been withdrawn. Afterwards, the DFElectionelection algorithm is applied on a per <ES, EthernetTag>,Tag>; however, the IP address for a PE will not be considered to be a candidate for a given <ES, Ethernet Tag> until the corresponding Ethernet A-D per EVI route has been received from that PE. In other words, the ACS on the ES for a given PE must be UP so that the PE is consideredasto be a candidate for a given BD. If theDefaultdefault DFAlgalgorithm is used, every PE in the resulting candidate list is then given an ordinal indicating its position in the ordered list, starting with 0 as the ordinal for the PE with the numerically lowest IP address. The ordinals are used to determine which PE node will be the DF for a given Ethernet Tag on theEthernet Segment,ES, using the following rule: Assuming a redundancy group of N PE nodes, for VLAN-based service, the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when (V modN)=N) = i. In the case ofVLAN-(aware) bundlea VLAN (-aware) Bundle service, then the numerically lowest VLAN value in that bundle on that ES MUST be used in the modulo function as the Ethernet Tag. It should be noted that using the"OriginatingOriginating Router's IPaddress"Address field [RFC7432] in theEthernet SegmentES route to get the PE IP address needed for the ordered list allows for a CE to be multihomed across differentASesAutonomous Systems (ASes) if such a need ever arises. Theabove three paragraphs differmodified Step 3, above, differs from[RFC7432][RFC7432], Section 8.5, Step3,3 in twoaspects:ways: o Any DF Algalgorithmcan beused, andused -- not only the described modulus-based DF Alg (referred to as theDefaultdefault DFElection,election orDF"DF Alg00" in this document). o The candidate list is pruned based upon non-receipt of Ethernet A-D routes: a PE's IP address MUST be removed from the ES candidate list if its Ethernet A-D per ES route is withdrawn. A PE's IP address MUST NOT be consideredasto be a candidate DF foraan <ES, EthernetTag>,Tag> if its Ethernet A-D per EVI route for the <ES, Ethernet Tag> is withdrawn. The following example illustrates the AC-DF behavior applied to theDefaultdefault DF election algorithm, assuming the network in Figure 2:a)(a) When PE1 and PE2 discover ES12, they advertise an ES route for ES12 with the associatedES-import extended communityES-Import Extended Community and the DF Election Extended Community indicatingAC-DF=1;AC-DF = 1; they start a DF Wait timer (independently). Likewise, PE2 and PE3 advertise an ES route for ES23 withAC-DF=1AC-DF = 1 and start a DF Wait timer.b) PE1/PE2(b) PE1 and PE2 advertise an Ethernet A-D per ES route forES12,ES12. PE2 andPE2/PE3PE3 advertise an Ethernet A-D per ES route for ES23.c)(c) In addition,PE1/PE2/PE3PE1, PE2, and PE3 advertise an Ethernet A-D per EVI route for AC1, AC2,AC3AC3, and AC4 as soon as the ACs are enabled. Note that the AC can be associatedtowith a single customer VID(e.g. VLAN- based(e.g., VLAN-based service interfaces) or a bundle of customer VIDs(e.g.(e.g., VLAN Bundle service interfaces).d)(d) When the timer expires, each PE builds an ordered"candidate"candidate list of the IP addresses of all the PE nodesconnectedattached to theEthernet SegmentES (including itself) as explainedabovein[RFC7432]the modified Step3.3 above. Any PE from which an Ethernet A-D per ES route has not been received is pruned from the list.e)(e) When electing the DF for a given BD, a PE will not be considered to be a candidate until an Ethernet A-D per EVI route has been received from that PE. In other words, the ACS on the ES for a given PE must be UP so that the PE is consideredasto be a candidate for a given BD. For example, PE1 will not consider PE2 as a candidate for DF election for<ES12,VLAN-1><ES12, VLAN-1> until an Ethernet A-D per EVI route is received from PE2 for<ES12,VLAN-1>. f)<ES12, VLAN-1>. (f) Once the PEs with ACS = DOWN for a given BD have been removed from the candidate list, the DFElectionelection can be applied for the remaining N candidates. Note that this procedure only modifies the existing EVPN control plane by adding and processing the DF Election ExtendedCommunity,Community and by pruning the candidate list of PEs that take part in the DF election. In addition to the events defined in the FSM in Section3.1,2.1, the following events SHALL modify the candidate PE list and trigger the DF re-election in a PE for a given <ES, Ethernet Tag>. In the FSMofshown in Figure 3, the events below MUST trigger a transition from DF_DONE to DF_CALC:i.1. Local AC going DOWN/UP.ii.2. Reception of a new Ethernet A-D per EVIupdate/withdrawroute update/withdrawal for the <ES, Ethernet Tag>.iii.3. Reception of a new Ethernet A-D per ESupdate/withdrawroute update/withdrawal for the ES.5.1.4.1. AC-Influenced DF Election CapabilityForfor VLAN-Aware Bundle Services The procedure described insection 5Section 4 works for VLAN-based and VLAN Bundle service interfacessince,because, for those service types, a PE advertises only one Ethernet A-D per EVI route per<ES,VLAN><ES, VLAN> or<ES,VLAN<ES, VLAN Bundle>. In Section5,4, an Ethernet Tag represents a given VLAN or VLAN Bundle for the purpose of DFElection.election. The withdrawal of such a route means that the PE cannot forward traffic on that particular<ES,VLAN><ES, VLAN> or<ES,VLAN Bundle>, therefore<ES, VLAN Bundle>; therefore, the PE can be removed from consideration forDF.DF election. According to [RFC7432], in VLAN-aware Bundle services, the PE advertises multiple Ethernet A-D per EVI routes per<ES,VLAN<ES, VLAN Bundle> (one route per Ethernet Tag), while the DFElectionelection is still performed per<ES,VLAN<ES, VLAN Bundle>. The withdrawal of an individual route only indicates the unavailability of a specific ACbutand not necessarily all the ACs in the<ES,VLAN<ES, VLAN Bundle>. This document modifies the DFElectionelection forVLAN-AwareVLAN-aware Bundle services in the followingway:ways: o After confirming that all the PEs in the ES advertise the AC-DF capability, a PE will perform a DFElectionelection per<ES,VLAN>,<ES, VLAN>, as opposed to per<ES,VLAN<ES, VLAN Bundle> as described in [RFC7432]. Now, the withdrawal of an Ethernet A-D per EVI route for a VLAN will indicate that the advertising PE's ACS is DOWN and the rest of the PEs in the ES can remove the PE from consideration for DF election in the<ES,VLAN>.<ES, VLAN>. o The PEs will now follow the procedures insection 5.Section 4. For example, assuming threeBridge Tablesbridge tables in PE1 for the same MAC-VRF (each one associatedtowith a different Ethernet Tag,e.g.e.g., VLAN-1,VLAN-2VLAN-2, and VLAN-3), PE1 will advertise three Ethernet A-D per EVI routes for ES12. Each of the three routes will indicate the status of each of the three ACs in ES12. PE1 will be consideredasto be a valid candidate PE for DFElectionelection in<ES12,VLAN-1>, <ES12,VLAN-2>, <ES12,VLAN-3><ES12, VLAN-1>, <ES12, VLAN-2>, and <ES12, VLAN-3> as long as its three routes are active. For instance, if PE1 withdraws the Ethernet A-D per EVI routes for<ES12,VLAN-1>,<ES12, VLAN-1>, the PEs in ES12 will not consider PE1 as a suitable DF candidate for<ES12,VLAN-1>.<ES12, VLAN-1>. PE1 will still be considered for<ES12,VLAN-2><ES12, VLAN-2> and<ES12,VLAN-3><ES12, VLAN-3>, since its routes are active.6.5. Solution Benefits The solution described in this document provides the following benefits:a) Extends(a) It extends the DFElectionelection as defined in [RFC7432] to address the unfairload-load balancing and potential black-holing issuesofwith theDefaultdefault DFElectionelection algorithm. The solution is applicable to the DFElectionelection in EVPNServicesservices [RFC7432] and EVPNVirtual Private Wire ServicesVPWS [RFC8214].b)(b) It defines a way to signal the DFElectionelection algorithm and capabilities intended by the advertising PE. This is done by defining the DF Election Extended Community, whichallow signaling ofallows the advertising PE to indicate its support for the capabilitiessupported bydefined in this document as well as anyother futuresubsequently defined DFElectionelection algorithmsandor capabilities.c) The solution(c) It is backwards compatible with the procedures defined in [RFC7432]. If one or more PEs in the ES do not support the new procedures, they will all followthe [RFC7432]DFElection. 7.election as defined in [RFC7432]. 6. Security Considerations This document addresses some identified issues in the DFElectionelection procedures described in [RFC7432] by defining a new DFElectionelection framework. In general, this framework allows the PEs that are part of the sameEthernet SegmentES to exchange additional information and agree on the DFElection Typeelection type andCapabilitiescapabilities to be used.FollowingBy following the procedures in this document, the operator will minimizeundesired situationssuch undesirable situations as unfairload-balancing,load balancing, servicedisruptiondisruption, and traffic black-holing.Since thoseBecause such situationsmay have beencould be purposely created by a malicious user with access to the configuration of one PE, this documentenhancesalso enhances the security of the network. Note that the network will not benefitoffrom the new procedures if the DFElection Algelection algorithm is not consistently configured on all the PEs in the ES (if there is no unanimity among all the PEs, the DFElection Algelection algorithm falls back to theDefault [RFC7432]default DFElection).election as provided in [RFC7432]). This behavior could be exploited by an attacker that manages to modify the configuration of one PE in theEthernet SegmentES so that the DFElection Algelection algorithm and capabilities in all the PEs in theEthernet SegmentES fall back to theDefaultdefault DFElection.election. If that is the case, the PEs will be exposed to the unfairload-balancing,load balancing, servicedisruptiondisruption, and black-holingthat werementioned earlier. In addition, the new framework is extensible and allows forfuturenew security enhancements in the future. Note that such enhancements are out ofthescopeoffor this document. Finally, since this document extends the procedures in [RFC7432], the sameSecurity Considerationssecurity considerations as those described in [RFC7432] are valid for this document.8.7. IANA Considerations IANAis requested to:has: oAllocateAllocated Sub-Type value 0x06 in the "EVPN Extended CommunitySub- Types"Sub-Types" registry defined in [RFC7153] as follows:SUB-TYPE VALUE NAMESub-Type Value Name Reference --------------------------------------------------------------------- ------------- 0x06 DF Election Extended Community This document o Set up a registry called "DF Alg" for the DF Alg field in the Extended Community. New registrations will be made through the "RFC Required" procedure defined in [RFC8126]. Value 31 is forExperimentalexperimental use and does not require any other RFC than this document. The following initial values in that registryare requested:exist: Alg Name Reference ----------------------------------------------- ------------- 0 Default DF Election This document 1 HRWalgorithmAlgorithm This document 2-30 Unassigned 31 Reserved for ExperimentaluseUse This document o Set up a registry called "DF Election Capabilities" for thetwo- octet2-octet Bitmap field in the Extended Community. New registrations will be made through the "RFC Required" procedure defined in [RFC8126]. The following initial value in that registryis requested:exists: Bit Name Reference ---------------------------------- ------------- 0 Unassigned 1 AC-DFcapabilityCapability This document 2-15 Unassigned9.8. References9.1.8.1. Normative References [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, <https://www.rfc-editor.org/info/rfc7432>. [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017,<https://www.rfc- editor.org/info/rfc8214>.<https://www.rfc-editor.org/info/rfc8214>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February 2006,<http://www.rfc-editor.org/info/rfc4360>.<https://www.rfc-editor.org/info/rfc4360>. [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP Extended Communities", RFC 7153, DOI 10.17487/RFC7153, March 2014, <https://www.rfc-editor.org/info/rfc7153>. [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017,<https://www.rfc- editor.org/info/rfc8126>. 9.2.<https://www.rfc-editor.org/info/rfc8126>. 8.2. Informative References [VPLS-MH] Kothari,Henderickx et al.,B., Kompella, K., Henderickx, W., Balus, F., and J. Uttaro, "BGP based Multi-homing in Virtual Private LAN Service",draft-ietf-bess-vpls-multihoming- 02.txt, work in progress, September, 2018.Work in Progress, draft-ietf-bess-vpls- multihoming-03, March 2019. [CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., and D. Lewin, "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web", ACM Symposium on Theory ofComputingComputing, ACMPressPress, New York, DOI 10.1145/258533.258660, May 1997. [CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein, "Introduction to Algorithms (3rded.)",Edition)", MITPress and McGraw-HillPress, ISBN0-262-03384-4., February0-262-03384-8, 2009. [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/RFC2991, November 2000,<http://www.rfc-editor.org/info/rfc2991>.<https://www.rfc-editor.org/info/rfc2991>. [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,<http://www.rfc-editor.org/info/rfc2992>.<https://www.rfc-editor.org/info/rfc2992>. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, <https://www.rfc-editor.org/info/rfc4456>. [HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings to Increase Hit Rates", IEEE/ACM Transactionsin networkingon Networking, Volume6 Issue6, No. 1, February 1998,<https://www.microsoft.com/en-us/research/wp- content/uploads/2017/02/HRW98.pdf>.<https://www.microsoft.com/en-us/research/wp-content/ uploads/2017/02/HRW98.pdf>. [Knuth] Knuth, D., "The Art of ComputerProgramming -Programming: Volume 3: Sorting andSearching,Vol 3 Pg.Searching", 2nd Edition, Addison-Wesley, Page 516,Addison Wesley 10.1998. Acknowledgments The authors want to thankSriram Venkateswaran, Laxmi Padakanti,Ranganathan Boovaraghavan,Tamas Mondal,Sami Boutros,Jakob Heitz,Luc Andre Burdet, Anoop Ghanwani, Mrinmoy Ghosh, Jakob Heitz, Leo Mermelstein, Mankamana Mishra,Anoop Ghanwani andTamas Mondal, Laxmi Padakanti, SamirThoriaThoria, and Sriram Venkateswaran for their review and contributions. Special thanks to Stephane Litkowski for his thorough review and detailed contributions.11. Contributors In additionThey would also like to thank their working group chairs, Matthew Bocci and Stephane Litkowski, and their AD, Martin Vigoureux, for their guidance and support. Finally, they would like to thank theauthors listed onDirectorate reviewers and the ADs for their thorough reviews and probing questions, the answers to which have substantially improved thefront page,quality of the document. Contributors The followingcoauthorspeople havealsocontributed substantially to thisdocument:document and should be considered coauthors: Antoni Przygienda Juniper Networks, Inc. 1194 N. MathildaDriveAve. Sunnyvale, CA95134 USA94089 United States of America Email: prz@juniper.net Vinod Prabhu Nokia Email: vinod.prabhu@nokia.com Wim Henderickx Nokia Email: wim.henderickx@nokia.com Wen Lin Juniper Networks, Inc. Email: wlin@juniper.net Patrice Brissette Cisco Systems Email: pbrisset@cisco.com Keyur Patel Arrcus,IncInc. Email: keyur@arrcus.com Autumn Liu Ciena Email: hliu@ciena.com Authors' Addresses Jorge Rabadan (editor) Nokia 777 E. Middlefield Road Mountain View, CA 94043USAUnited States of America Email: jorge.rabadan@nokia.com Satya Mohanty (editor) Cisco Systems, Inc. 225 West Tasman Drive San Jose, CA 95134USAUnited States of America Email: satyamoh@cisco.com Ali Sajassi Cisco Systems, Inc. 225 West Tasman Drive San Jose, CA 95134USAUnited States of America Email: sajassi@cisco.com John Drake Juniper Networks, Inc. 1194 N. MathildaDriveAve. Sunnyvale, CA95134 USA94089 United States of America Email: jdrake@juniper.net Kiran Nagaraj Nokia 701 E. Middlefield Road Mountain View, CA 94043USAUnited States of America Email: kiran.nagaraj@nokia.com Senthil Sathappan Nokia 701 E. Middlefield Road Mountain View, CA 94043USAUnited States of America Email: senthil.sathappan@nokia.com