BESS WorkgroupInternet Engineering Task Force (IETF) A.Sajassi (Editor) INTERNET-DRAFTSajassi, Ed. Request for Comments: 8365 CiscoIntended Status:Category: Standards Track J.Drake (Editor)Drake, Ed. ISSN: 2070-1721 Juniper N. Bitar Nokia R. Shekhar Juniper J. Uttaro AT&T W. Henderickx NokiaExpires: August 9, 2018 February 9,March 2018 A Network Virtualization Overlay Solutionusing EVPN draft-ietf-bess-evpn-overlay-12Using Ethernet VPN (EVPN) Abstract This document specifies how Ethernet VPN (EVPN) can be used as a Network Virtualization Overlay (NVO) solution and explores the various tunnel encapsulation options over IP and their impact on the EVPN control-plane and procedures. In particular, the following encapsulation options are analyzed: Virtual Extensible LAN (VXLAN), Network Virtualization using Generic Routing Encapsulation (NVGRE), and MPLS overGeneric Routing Encapsulation (GRE).GRE. This specification is also applicable to Generic Network Virtualization Encapsulation(GENEVE) encapsulation;(GENEVE; however, some incremental work isrequiredrequired, which will be covered in a separate document. This document also specifies newmulti-homingmultihoming procedures for split-horizon filtering andmass-withdraw.mass withdrawal. It also specifies EVPN route constructions for VXLAN/NVGRE encapsulations and Autonomous SystemBoundaryBorder Router (ASBR) procedures formulti-homingmultihoming of Network Virtualization(NV)Edge (NVE) devices. Status ofthisThis Memo ThisInternet-Draftissubmitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsan Internet Standards Track document. This document is a product of the Internet Engineering Task Force(IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum(IETF). It represents the consensus ofsix monthsthe IETF community. It has received public review andmay be updated, replaced, or obsoletedhas been approved for publication byother documents at any time. Itthe Internet Engineering Steering Group (IESG). Further information on Internet Standards isinappropriate to use Internet-Drafts as reference material or to cite them other than as "workavailable inprogress." The listSection 2 of RFC 7841. Information about the currentInternet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The liststatus ofInternet-Draft Shadow Directories canthis document, any errata, and how to provide feedback on it may beaccessedobtained athttp://www.ietf.org/shadow.htmlhttps://www.rfc-editor.org/info/rfc8365. Copyrightand LicenseNotice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(http://trustee.ietf.org/license-info)(https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents11. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 4 2....................................................4 2. Requirements Notation and Conventions. . . . . . . . . . . . . 5 3...........................5 3. Terminology. . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.....................................................5 4. EVPN Features. . . . . . . . . . . . . . . . . . . . . . . . . 6 5...................................................7 5. Encapsulation Options for EVPN Overlays. . . . . . . . . . . . 8 5.1.........................8 5.1. VXLAN/NVGRE Encapsulation. . . . . . . . . . . . . . . . . 8 5.1.1..................................8 5.1.1. Virtual Identifiers Scope. . . . . . . . . . . . . . . 9 5.1.1.1 Data Center Interconnect with Gateway . . . . . . . 9 5.1.1.2 Data Center Interconnect without Gateway . . . . . . 9 5.1.2...........................9 5.1.2. Virtual Identifiers to EVI Mapping. . . . . . . . . . . 10 5.1.2.1 Auto Derivation of RT . . . . . . . . . . . . . . . 11 5.1.3.................11 5.1.3. Constructing EVPN BGP Routes. . . . . . . . . . . . . 13 5.2.......................13 5.2. MPLS over GRE. . . . . . . . . . . . . . . . . . . . . . . 14 6.............................................15 6. EVPN with MultipleData PlaneData-Plane Encapsulations. . . . . . . . . 15 7...................15 7. Single-Homing NVEs - NVE Residing in Hypervisor. . . . . . . . 15 7.1................16 7.1. Impact on EVPN BGP Routes & Attributes for VXLAN/NVGREEncapsulation . . . . . . . . . . . . . . . . . . . . . . . 16 7.2....16 7.2. Impact on EVPN Procedures for VXLAN/NVGREEncapsulation . . 16 8 Multi-HomingEncapsulations ..17 8. Multihoming NVEs - NVE Residing in ToR Switch. . . . . . . . 17 8.1..................18 8.1. EVPNMulti-HomingMultihoming Features. . . . . . . . . . . . . . . . 17 8.1.1 Multi-homed Ethernet Segment.................................18 8.1.1. Multihomed ES Auto-Discovery. . . . . . 18 8.1.2.......................18 8.1.2. Fast Convergence and MassWithdraw . . . . . . . . . . . 18 8.1.3Withdrawal ...............18 8.1.3. Split-Horizon. . . . . . . . . . . . . . . . . . . . . 18 8.1.4......................................19 8.1.4. Aliasing andBackup-Path . . . . . . . . . . . . . . . . 18 8.1.5Backup Path ...........................19 8.1.5. DF Election. . . . . . . . . . . . . . . . . . . . . . 19 8.2........................................20 8.2. Impact on EVPN BGP Routes&and Attributes. . . . . . . . . . . 20 8.3..................20 8.3. Impact on EVPN Procedures. . . . . . . . . . . . . . . . . 20 8.3.1.................................20 8.3.1. Split Horizon. . . . . . . . . . . . . . . . . . . . . 20 8.3.2......................................21 8.3.2. Aliasing andBackup-Path . . . . . . . . . . . . . . . . 21 8.3.3Backup Path ...........................22 8.3.3. Unknown Unicast Traffic Designation. . . . . . . . . . 21 9................22 9. Support for Multicast. . . . . . . . . . . . . . . . . . . . . 22 10 Data Center..........................................23 10. Data-Center Interconnections- DCI . . . . . . . . . . . . . . 23 10.1(DCIs) ...........................24 10.1. DCIusingUsing GWs. . . . . . . . . . . . . . . . . . . . . . . 23 10.2............................................24 10.2. DCIusingUsing ASBRs. . . . . . . . . . . . . . . . . . . . . . 24 10.2.1..........................................24 10.2.1. ASBR Functionality with Single-Homing NVEs. . . . . . 25 10.2.2........25 10.2.2. ASBR Functionality withMulti-HomingMultihoming NVEs. . . . . . . 25 11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 27 12..........26 11. Security Considerations. . . . . . . . . . . . . . . . . . . 27 13.......................................28 12. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 28 14...........................................29 13. References. . . . . . . . . . . . . . . . . . . . . . . . . . 28 14.1....................................................29 13.1. Normative References. . . . . . . . . . . . . . . . . . . 28 14.2.....................................29 13.2. Informative References. . . . . . . . . . . . . . . . . . 29...................................30 Acknowledgements ..................................................31 Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . 30......................................................32 Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 30 1................................................33 1. Introduction This document specifies how Ethernet VPN (EVPN) [RFC7432] can be used as a Network Virtualization Overlay (NVO) solution and explores the various tunnel encapsulation options over IP and their impact on the EVPN control-plane and procedures. In particular, the following encapsulation options are analyzed: Virtual Extensible LAN (VXLAN) [RFC7348], Network Virtualization using Generic Routing Encapsulation (NVGRE) [RFC7637], and MPLS over Generic Routing Encapsulation (GRE) [RFC4023]. This specification is also applicable to Generic Network Virtualization Encapsulation (GENEVE)encapsulation[GENEVE]; however, some incremental work isrequiredrequired, which will be covered in a separate document [EVPN-GENEVE]. This document also specifies newmulti-homingmultihoming procedures for split-horizon filtering andmass- withdraw.mass withdrawal. It also specifies EVPN route constructions for VXLAN/NVGRE encapsulations and Autonomous SystemBoundaryBorder Router (ASBR) procedures formulti-homingmultihoming of Network Virtualization(NV)Edge (NVE) devices. In the context of this document,a Network Virtualization Overlay (NVO)an NVO is a solution to address the requirements of a multi-tenant data center, especially one with virtualized hosts, e.g., Virtual Machines (VMs) or virtual workloads. The key requirements of such a solution, as described in [RFC7364],are:are the following: - Isolation of network traffic per tenant - Support for a large number of tenants (tens or hundreds of thousands) -Extending L2Extension of Layer 2 (L2) connectivity among different VMs belonging to a given tenant segment (subnet) across differentPointPoints ofDeliveries (PODs)Delivery (PoDs) within a data center or between different data centers - Allowing a given VM to move between different physical points of attachment within a given L2 segment The underlay network for NVO solutions is assumed to provide IP connectivity between NVOendpoints (NVEs).endpoints. This document describes howEthernet VPN (EVPN)EVPN can be used as an NVO solution and explores applicability of EVPN functions and procedures. In particular, it describes the various tunnel encapsulation options for EVPN overIP,IP and their impact on the EVPN control-planeandas well as procedures for two main scenarios:a)(a) single-homing NVEs - whenaan NVE resides in the hypervisor, andb) multi-homing(b) multihoming NVEs - whenaan NVE resides in aTop of RackTop-of-Rack (ToR)devicedevice. The possible encapsulation options for EVPN overlays that are analyzed in this document are: - VXLAN and NVGRE - MPLS over GRE Before getting into the description of the different encapsulation options for EVPN over IP, it is important to highlight the EVPN solution's main features, how those features are currently supported, and any impact that the encapsulation has on those features.22. Requirements Notation and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.33. Terminology Most of the terminology used in this documents comes from [RFC7432] and [RFC7365]. VXLAN: Virtual Extensible LAN GRE: Generic Routing Encapsulation NVGRE: Network Virtualization using Generic Routing Encapsulation GENEVE: Generic Network Virtualization EncapsulationPOD:PoD: Point of Delivery NV: Network Virtualization NVO: Network Virtualization Overlay NVE: Network VirtualizationEndpointEdge VNI:VirtualVXLAN Network Identifier(for VXLAN)VSID: Virtual Subnet Identifier (for NVGRE) I-SID: Service Instance Identifier EVPN: Ethernet VPN EVI: EVPN Instance. An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN MAC-VRF: A Virtual Routing and Forwarding table for Media Access Control (MAC) addresses on a PE IP-VRF: A Virtual Routing and Forwarding table for Internet Protocol (IP) addresses on a PE ES: EthernetSegment (ES):Segment. When a customer site (device or network) is connected to one or more PEs via a set of Ethernet links, then that set of links is referred to as an 'Ethernet segment'. Ethernet Segment Identifier (ESI): A unique non-zero identifier that identifies an Ethernet segment is called an 'Ethernet Segment Identifier'. Ethernet Tag: An Ethernet tag identifies a particular broadcast domain, e.g., a VLAN. An EVPN instance consists of one or more broadcast domains. PE: Provider Edgedevice.Single-Active Redundancy Mode: When only a single PE, among all the PEs attached to anEthernet segment,ES, is allowed to forward traffic to/from thatEthernet segmentES for a given VLAN, then the Ethernet segment is defined to be operating in Single-Active redundancy mode. All-Active Redundancy Mode: When all PEs attached to an Ethernet segment are allowed to forward known unicast traffic to/from thatEthernet segmentES for a given VLAN, then theEthernet segmentES is defined to be operating in All-Active redundancy mode. PIM-SM: Protocol Independent Multicast - Sparse-Mode PIM-SSM: Protocol Independent Multicast -Source SpecificSource-Specific MulticastBidir PIM:BIDIR-PIM: Bidirectional PIM44. EVPN Features EVPN [RFC7432] was originally designed to support the requirements detailed in [RFC7209] and therefore has the following attributes which directly addresscontrol planecontrol-plane scaling and ease of deployment issues.1) Control plane1. Control-plane information is distributed with BGP andBroadcastbroadcast andMulticastmulticast traffic is sent using a shared multicast tree or with ingress replication.2) Control plane2. Control-plane learning is used for MAC (and IP) addresses instead ofdata planedata-plane learning. The latter requires the flooding of unknown unicast and Address Resolution Protocol (ARP) frames; whereas, the former does not require any flooding.3)3. Route Reflector (RR) is used to reduce a full mesh of BGP sessions among PE devices to a single BGP session between a PE and the RR. Furthermore, RR hierarchy can be leveraged to scale the number of BGP routes on the RR.4)4. Auto-discovery via BGP is used to discover PE devices participating in a given VPN, PE devices participating in a given redundancy group, tunnel encapsulation types, multicast tunneltype,types, multicast members, etc.5)5. All-Active multihoming is used. This allows a givencustomer deviceCustomer Edge (CE) device to have multiple links to multiple PEs, and traffic to/from that CE fully utilizes all of these links.6)6. When a link between a CE and a PE fails, the PEs for that EVI are notified of the failure via the withdrawal of a single EVPN route. This allows those PEs to remove the withdrawing PE as a next hop for every MAC address associated with the failed link. This is termed'mass withdrawal'. 7)"mass withdrawal". 7. BGP route filtering and constrained route distribution are leveraged to ensure that thecontrol planecontrol-plane traffic for a given EVI is only distributed to the PEs in that EVI.8)8. Whenaan IEEE 802.1Q [IEEE.802.1Q] interface is used between a CE and a PE, each of the VLANID (VID)IDs (VIDs) on that interface can be mapped onto a bridge table (foruptoup to 4094 such bridge tables). All these bridge tables may be mapped onto a single MAC-VRF (in case of VLAN-aware bundle service).9)9. VM Mobility mechanisms ensure that all PEs in a given EVI know the ES with which a given VM, as identified by its MAC and IP addresses, is currently associated.10) Route Targets10. RTs are used to allow the operator (or customer) to define a spectrum of logical network topologies including mesh, hub&and spoke, and extranets (e.g., a VPN whose sites are owned by different enterprises), without the need for proprietary software or the aid of other virtual or physical devices. Because the design goal for NVO is millions of instances per common physical infrastructure, the scaling properties of the control plane for NVO are extremely important. EVPN and the extensions described herein, are designed with this level of scalability in mind.55. Encapsulation Options for EVPN Overlays5.15.1. VXLAN/NVGRE Encapsulation Both VXLAN and NVGRE are examples of technologies that provide a data plane encapsulation which is used to transport a packet over the common physical IP infrastructure between Network Virtualization Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN network. Both of these technologies include the identifier of the specific NVO instance,Virtual Network Identifier (VNI)VNI in VXLAN andVirtual Subnet Identifier (VSID)VSID in NVGRE, in each packet. In the remainder of this document we use VNI as the representation for NVO instance with the understanding that VSID can equally be used if the encapsulation is NVGRE unless it is stated otherwise. Note that aProvider Edge (PE)PE is equivalent toaan NVE/VTEP. VXLAN encapsulation is based on UDP, with an 8-byte header following the UDP header. VXLAN provides a 24-bit VNI, which typically provides a one-to-one mapping to the tenantVLAN ID,VID, as described in [RFC7348]. In this scenario, the ingress VTEP does not include an inner VLAN tag on the encapsulated frame, and the egress VTEP discards the frames with an inner VLAN tag. This mode of operation in [RFC7348] maps toVLAN BasedVLAN-Based Service in [RFC7432], where a tenantVLAN IDVID gets mapped to anEVPN instance (EVI).EVI. VXLAN also provides an option of including an inner VLAN tag in the encapsulated frame, if explicitly configured at the VTEP. This mode of operation can map to VLAN Bundle Service in [RFC7432] because all the tenant's tagged frames map to a single bridge table / MAC-VRF, and the inner VLAN tag is not used for lookup by the disposition PE when performing VXLAN decapsulation as described insectionSection 6 of [RFC7348]. [RFC7637] encapsulation is based on GREencapsulationencapsulation, and it mandates the inclusion of the optional GRE Keyfieldfield, which carries the VSID. There is a one-to-one mapping between the VSID and the tenantVLAN ID,VID, as described in[RFC7637] and the[RFC7637]. The inclusion of an inner VLAN tag is prohibited. This mode of operation in [RFC7637] maps to VLAN Based Service in [RFC7432]. As described in the nextsectionsection, there is no change to the encoding of EVPN routes to support VXLAN or NVGREencapsulationencapsulation, except for the use of the BGP Encapsulationextended communityExtended Community to indicate the encapsulation type (e.g., VXLAN or NVGRE). However, there is potential impact to the EVPN procedures depending on where the NVE is located (i.e., in hypervisor orTOR)ToR) and whethermulti-homingmultihoming capabilities are required.5.1.15.1.1. Virtual Identifiers Scope Although VNIs are defined as 24-bit globally unique values, there are scenarios in which it is desirable to use a locally significant value for the VNI, especially in the context ofdata center interconnect: 5.1.1.1 Data Centera data-center interconnect. 5.1.1.1. Data-Center Interconnect with Gateway In the case where NVEs in different data centers need to be interconnected, and the NVEs need to use VNIs asaglobally unique identifiers within a data center, then a Gateway (GW) needs to be employed at the edge of thedata center network.data-center network (DCN). This is because the Gateway will provide the functionality of translating the VNI when crossing network boundaries, which may align with operatorspan of controlspan-of-control boundaries. As an example, consider the network of Figure1 below.1. Assume there are three network operators: one for each of the DC1,DC2DC2, and WAN networks. The Gateways at the edge of the data centers are responsible for translating the VNIs between the values used in each of thedata center networksDCNs and the values used in the WAN. +--------------+ | | +---------+ | WAN | +---------+ +----+ | +---+ +----+ +----+ +---+ | +----+ |NVE1|--| | | |WAN | |WAN | | | |--|NVE3| +----+ |IP |GW |--|Edge| |Edge|--|GW | IP | +----+ +----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+ |NVE2|--| | | | | |--|NVE4| +----+ +---------+ +--------------+ +---------+ +----+ |<------ DC 1 ------> <------ DC2 ------>| Figure 1:Data CenterData-Center Interconnect with Gateway5.1.1.2 Data Center5.1.1.2. Data-Center Interconnect without Gateway In the case where NVEs in different data centers need to be interconnected, and the NVEs need to use locally assigned VNIs (e.g., similar to MPLS labels),thenthere may be no need to employ Gateways at the edge of thedata center network.DCN. More specifically, the VNI value that is used by the transmitting NVE is allocated by the NVE that is receiving the traffic (in other words, this is similar to"downstream assigned"a "downstream-assigned" MPLS label). This allows the VNI space to be decoupled between differentdata center networksDCNs without the need for a dedicated Gateway at the edge of the data centers. Thistopicstopic is covered insectionSection 10.2. +--------------+ | | +---------+ | WAN | +---------+ +----+ | | +----+ +----+ | | +----+ |NVE1|--| | |ASBR| |ASBR| | |--|NVE3| +----+ |IP Fabric|---| | | |--|IP Fabric| +----+ +----+ | | +----+ +----+ | | +----+ |NVE2|--| | | | | |--|NVE4| +----+ +---------+ +--------------+ +---------+ +----+ |<------ DC 1 -----> <---- DC2 ------>| Figure 2:Data CenterData-Center Interconnect with ASBR5.1.25.1.2. Virtual Identifiers to EVI MappingWhen the EVPN control plane is used in conjunction with VXLAN (or NVGRE encapsulation), justJust like[RFC7432]in [RFC7432], where two options existed for mapping broadcast domains (represented by VLAN IDs) to an EVI, when the EVPN control plane is used inhereconjunction with VXLAN (or NVGRE encapsulation), there are also two options for mapping broadcast domains represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI:1.Option 1: A Single Broadcast Domain per EVI In this option, a single Ethernet broadcast domain (e.g., subnet) represented by a VNI is mapped to a unique EVI. This corresponds to theVLAN Based serviceVLAN-Based Service in [RFC7432], where a tenant-facing interface, logical interface (e.g., represented by aVLAN ID)VID), orphysical,physical interface gets mapped to anEVPN instance (EVI).EVI. As such, a BGPRDRoute Distinguisher (RD) andRTRoute Target (RT) are needed per VNI on every NVE. The advantage of this model is that it allows the BGP RT constraint mechanisms to be used in order to limit the propagation and import of routes to only the NVEs that are interested in a given VNI. The disadvantage of this model may be the provisioning overhead if the RD and RT are not derived automatically from the VNI. In this option, the MAC-VRF table is identified by the RT in the control plane and by the VNI in the data-plane. In this option, the specific MAC-VRF table corresponds to only a single bridge table.2.Option 2: Multiple Broadcast Domains per EVI In this option, multiplesubnetssubnets, each represented by a uniqueVNIVNI, are mapped to a single EVI. For example, if a tenant has multiple segments/subnets each represented by a VNI, then all the VNIs for that tenant are mapped to a singleEVI - e.g.,EVI; for example, the EVI in this case represents the tenant and not asubnet .subnet. This corresponds to the VLAN-aware bundle service in [RFC7432]. The advantage of this model is that it doesn't require the provisioning of an RD/RT per VNI. However, this is a moot point when compared tooptionOption 1 where auto- derivation is used. The disadvantage of this model is that routes would be imported by NVEs that may not be interested in a given VNI. In thisoptionoption, the MAC-VRF table is identified by the RT in the controlplane andplane; a specific bridge table for that MAC-VRF is identified by the <RT, Ethernet Tag ID> in the control plane. In this option, the VNI in the data-plane is sufficient to identify a specific bridge table.5.1.2.1 Auto Derivation5.1.2.1. Auto-Derivation of RTWhenIn order to simplify configuration, when the option of a single VNI per EVI is used,in order to simplify configuration,the RT used for EVPN can be auto-derived. RD can beauto generatedauto-generated as described in[RFC7432][RFC7432], and RT can be auto-derived as described next. Since agatewayGateway PE as depicted infigure-1Figure 1 participates in both the DCN and WAN BGP sessions, it is importantthatthat, when RT values are auto-derived from VNIs, thereisbe no conflict in RT spaces betweenDCNDCNs andWAN networksWANs, assuming that both are operating within the sameAS.Autonomous System (AS). Also, there can be scenarios where both VXLAN and NVGRE encapsulations may be needed within the sameDCNDCN, and their corresponding VNIs are administeredindependentlyindependently, which means VNI spaces can overlap. In order to avoid conflict in RTspaces arises,spaces, the 6-byte RT values with 2-octet AS number for DCNs can beauto- derivedauto-derived as follow: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Administrator | Local Administrator | +-----------------------------------------------+---------------+ | Local Administrator (Cont.) | +-------------------------------+ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Administrator |A| TYPE| D-ID | Service ID | +-----------------------------------------------+---------------+ | Service ID (Cont.) | +-------------------------------+ The 6-octet RT field consists of twosub-field:sub-fields: - Global Administrator sub-field: 2 octets. This sub-field contains anAutonomous SystemAS number assigned byIANA.IANA <https://www.iana.org/assignments/ as-numbers/>. - Local Administrator sub-field: 4 octets * A: A single-bit field indicating if this RT is auto-derived 0: auto-derived 1:manually-derivedmanually derived * Type: A 3-bit field that identifies the space in which the other 3 bytes are defined. The following spaces are defined: 0 : VID (802.1Q VLAN ID) 1 : VXLAN 2 : NVGRE 3 : I-SID 4 : EVI 5 : dual-VID (QinQ VLAN ID) * D-ID: A 4-bit field that identifies domain-id. The default value of domain-id iszerozero, indicating that only a single numbering space exist for a given technology. However, ifthere aremore than one number spaceexistexists for a given technology (e.g., overlapping VXLAN spaces), then each of the number spaces need to beidentifyidentified bytheirits corresponding domain-id starting from 1. * Service ID: This 3-octet field is set to VNI, VSID, I-SID, or VID. It should be noted that RT auto-derivation is applicable for 2-octet AS numbers. For 4-octet AS numbers, the RT needs to be manually configuredsincebecause 3-octet VNI fields cannot be fit within the 2-octet local administrator field.5.1.35.1.3. Constructing EVPN BGP Routes In EVPN, an MPLSlabellabel, forinstanceinstance, identifying the forwarding table is distributed by the egress PE via the EVPN control plane and is placed in the MPLS header of a given packet by the ingress PE. This label is used upon receipt of that packet by the egress PE for disposition of that packet. This is very similar to the use of the VNI by the egress NVE, with the difference being that an MPLS label has local significance while a VNI typically has global significance. Accordingly, and specifically to support the option oflocally-locally assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement route, the MPLS label field in the EthernetADA-D per EVI route, and the MPLS label field in thePMSIP-Multicast Service Interface (PMSI) TunnelAttributeattribute of the Inclusive Multicast Ethernet Tag (IMET) route are used to carry the VNI. For the balance of this memo, the above MPLS label fields will be referred to as the VNI field. The VNI field is used for both local and globalVNIs, andVNIs; for eithercasecase, the entire 24-bit field is used to encode the VNI value. For theVLAN-based serviceVLAN-Based Service (a single VNI per MAC-VRF), the Ethernet Tag field in the MAC/IP Advertisement, EthernetADA-D per EVI, and IMET route MUST be set to zero just as in theVLAN Based serviceVLAN-Based Service in [RFC7432]. For theVLAN-aware bundle serviceVLAN-Aware Bundle Service (multiple VNIs per MAC-VRF with each VNI associated with its own bridge table), the Ethernet Tag field in the MAC Advertisement, EthernetADA-D per EVI, and IMET route MUST identify a bridge table within aMAC-VRF andMAC-VRF; the set of Ethernet Tags for that EVI needs to be configured consistently on all PEs within that EVI. Forlocally-assignedlocally assigned VNIs, the value advertised in the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware bundle service in [RFC7432]. Such setting must be done consistently on all PE devices participating in that EVI within a given domain. For global VNIs, the value advertised in the Ethernet Tag field SHOULD be set to a VNI as long as it matches the existing semantics of the Ethernet Tag, i.e., it identifies a bridge table within a MAC- VRF and the set of VNIs are configured consistently on each PE in that EVI. In order to indicate which type ofdata planedata-plane encapsulation (i.e., VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP Encapsulationextended communityExtended Community defined in [RFC5512] is included with all EVPN routes(i.e.(i.e., MAC Advertisement, EthernetADA-D per EVI, EthernetADA-D per ESI,Inclusive Multicast Ethernet Tag,IMET, and Ethernet Segment) advertised by an egress PE. Five new values have been assigned by IANA to extend the list of encapsulation types defined in[RFC5512] and[RFC5512]; they are listed insection 13.Section 11. The MPLS encapsulation tunnel type, listed insection 13,Section 11, is needed in order to distinguish between an advertising node that only supports non-MPLS encapsulations and one that supports MPLS and non- MPLS encapsulations. An advertising node that only supports MPLS encapsulation does not need to advertise any encapsulation tunnel types; i.e., if the BGP Encapsulationextended communityExtended Community is not present, then either MPLS encapsulation or a statically configured encapsulation is assumed. The Next Hop field of the MP_REACH_NLRI attribute of the route MUST be set to the IPv4 or IPv6 address of the NVE. The remaining fields in each route are set as per [RFC7432]. Note that the procedure defined here -- to use the MPLS Label field to carry the VNI in the presence of a Tunnel Encapsulation Extended Community specifying the use of aVNI,VNI -- is aligned with the procedures described insectionSection 8.2.2.2 of [TUNNEL-ENCAP] ("When a Valid VNI has not been Signaled").5.25.2. MPLS over GRE The EVPN data-plane is modeled as an EVPN MPLS client layer sitting over an MPLSPSN-tunnelPSN tunnel server layer. Some of the EVPN functions (split-horizon,aliasing,Aliasing, andbackup-path)Backup Path) are tied to the MPLS client layer. If MPLS over GRE encapsulation is used, then the EVPN MPLS client layer can be carried over an IP PSN tunnel transparently. Therefore, there is no impact to the EVPN procedures and associated data-plane operation.The existing standards[RFC4023] defines the standard for using MPLS over GREencapsulation as defined by [RFC4023]encapsulation, which can be used for thispurpose; however,purpose. However, whenitMPLS over GRE is used in conjunction with EVPN, it is recommended that the GRE key field be present and be used to provide a 32-bit entropy value only if the P nodes can perform Equal-Cost Multipath (ECMP) hashing based on the GRE key; otherwise, the GRE header SHOULD NOT include the GREkey.key field. The Checksum and Sequence Number fields MUST NOT beincludedincluded, and the corresponding C and S bits in the GREPacket Headerheader MUST be set to zero. A PE capable of supporting thisencapsulation,encapsulation SHOULD advertise its EVPN routes along with the Tunnel Encapsulationextended communityExtended Community indicating MPLS over GRE encapsulation as described in the previous section.66. EVPN with MultipleData PlaneData-Plane Encapsulations The use of the BGP Encapsulationextended communityExtended Community per [RFC5512] allows each NVE in a given EVI to know each of the encapsulations supported by each of the other NVEs in that EVI.i.e.,That is, each of the NVEs in a given EVI may support multipledata planedata-plane encapsulations. An ingress NVE can send a frame to an egress NVE only if the set of encapsulations advertised by the egress NVE forms a non-empty intersection with the set of encapsulations supported by the ingressNVE, andNVE; it is at the discretion of the ingress NVE which encapsulation to choose from this intersection. (As noted insectionSection 5.1.3, if the BGP Encapsulation extended community is not present, then the default MPLS encapsulation or a locally configured encapsulation is assumed.) When a PE advertises multiple supported encapsulations, it MUST advertise encapsulations that use the same EVPN procedures including procedures associated with split-horizon filtering described insectionSection 8.3.1. For example, VXLAN and NVGRE (or MPLS and MPLS over GRE) encapsulations use the same EVPNprocedures and thusprocedures; thus, a PE can advertise both of them and can support either of them or both of them simultaneously. However, a PE MUST NOT advertise VXLAN and MPLS encapsulations together because (a) the MPLS field of EVPN routes is set to either an MPLS label or aVNIVNI, but not both and (b) some EVPN procedures (such as split-horizon filtering) are different forVXLAN/NVGREVXLAN/ NVGRE and MPLS encapsulations. An ingress node that uses shared multicast trees for sending broadcast or multicast frames MAY maintain distinct trees for each different encapsulation type. It is the responsibility of the operator of a given EVI to ensure that all of the NVEs in that EVI support at least one common encapsulation. If this condition is violated, it could result in service disruption or failure. The use of the BGP Encapsulationextended communityExtended Community provides a method to detect when this condition isviolatedviolated, but the actions to be taken are at the discretion of the operator and are outside the scope of this document.77. Single-Homing NVEs - NVE Residing in Hypervisor Whenaan NVE and its hosts/VMs are co-located in the same physical device, e.g., when they reside in a server, the links between them are virtual and they typically sharefate; i.e.,fate. That is, the subject hosts/VMs are typically notmulti-homed ormultihomed or, if they aremulti-homed,multihomed, themulti-homingmultihoming is a purely local matter to the server hosting the VM and the NVEs, and it need not be "visible" to any other NVEs residing on otherservers, and thusservers. Thus, it does not require any specific protocol mechanisms. The most common case of this is when the NVE resides on the hypervisor. In thesub-sectionssubsections that follow, we will discuss the impact on EVPN procedures for the case when the NVE resides on the hypervisor and the VXLAN (or NVGRE) encapsulation is used.7.17.1. Impact on EVPN BGP Routes & Attributes for VXLAN/NVGREEncapsulationEncapsulations In scenarios where different groups of data centers are under different administrative domains, and these data centers are connected via one or more backbone core providers as described in [RFC7365], the RD must be a unique value per EVI or per NVE as described in [RFC7432]. In other words, whenever there is more than one administrative domain for global VNI,thena unique RD must beused, orused; or, whenever the VNI value has local significance,thena unique RD must be used. Therefore, it is recommended to use a unique RD as described in [RFC7432] at alltime.times. When the NVEs reside on the hypervisor, the EVPN BGP routes and attributes associated withmulti-homingmultihoming are no longer required. This reduces the required routes and attributes to the following subset of four out of the total of eight listed insectionSection 7 of [RFC7432]: - MAC/IP Advertisement Route - Inclusive Multicast Ethernet Tag Route - MAC Mobility Extended Community - Default Gateway Extended Community However, as noted insectionSection 8.6 of[RFC7432][RFC7432], in order to enable a single-homing ingress NVE to take advantage of fast convergence,aliasing,Aliasing, andbackup-pathBackup Path when interacting withmulti-homedmultihomed egress NVEs attached to a givenEthernet segment,ES, the single-homing ingress NVE should be able to receive and process routes that are EthernetADA-D per ES and EthernetADA-D perEVI routes. 7.2EVI. 7.2. Impact on EVPN Procedures for VXLAN/NVGREEncapsulationEncapsulations When the NVEs reside on the hypervisors, the EVPN procedures associated withmulti-homingmultihoming are no longer required. This limits the procedures on the NVE to the followingsubset of the EVPN procedures:subset. 1. Local learning of MAC addresses received from the VMs persectionSection 10.1 of [RFC7432]. 2. Advertising locally learned MAC addresses in BGP using the MAC/IP Advertisement routes. 3. Performing remote learning using BGP per Section10.29.2 of [RFC7432]. 4. Discovering other NVEs and constructing the multicast tunnels using theInclusive Multicast Ethernet TagIMET routes. 5. Handling MAC address mobility events per the procedures of Section1615 in [RFC7432]. However, as noted insectionSection 8.6 of[RFC7432][RFC7432], in order to enable a single-homing ingress NVE to take advantage of fast convergence,aliasing,Aliasing, andback-up pathBackup Path when interacting withmulti-homedmultihomed egress NVEs attached to a givenEthernet segment,ES, a single-homing ingress NVE should implement the ingress node processing of routes that are EthernetADA-D per ES and EthernetADA-D per EVIroutesas defined insectionsSections 8.2Fast Convergence("Fast Convergence") and 8.4Aliasing("Aliasing andBackup-PathBackup Path") of [RFC7432].8 Multi-Homing8. Multihoming NVEs - NVE Residing in ToR Switch In this section, we discuss the scenario where the NVEs reside in theTop of Rack (ToR)ToR switches AND the servers (where VMs are residing) aremulti-homedmultihomed to these ToR switches. Themulti-homingmultihoming NVEoperateoperates in All-Active or Single-Active redundancy mode. If the servers are single-homed to the ToR switches, then the scenario becomes similar to that where the NVE resides on the hypervisor, as discussed in Section 7, as far as the required EVPN functionalityareis concerned. [RFC7432] defines a set of BGP routes,attributesattributes, and procedures to supportmulti-homing.multihoming. We first describe these functions and procedures, then discuss which of these are impacted by the VXLAN (or NVGRE) encapsulation and what modifications are required. Asitwill be seen later in this section, the only EVPN procedure that is impacted by non-MPLS overlay encapsulation (e.g., VXLAN or NVGRE) where it provides space for one ID rather than a stack of labels, is that of split-horizon filtering formulti-homed Ethernet Segmentsmultihomed ESs described insectionSection 8.3.1.8.18.1. EVPNMulti-HomingMultihoming Features In this section, we will recap themulti-homingmultihoming features of EVPN to highlight the encapsulation dependencies. The section only describes the features and functions at ahigh-level.high level. For more details, the reader is to refer to [RFC7432].8.1.1 Multi-homed Ethernet Segment8.1.1. Multihomed ES Auto-Discovery EVPN NVEs (or PEs) connected to the sameEthernet Segment (e.g.ES (e.g., the same server viaLAG)Link Aggregation Group (LAG)) can automatically discover each other with minimal to no configuration through the exchange of BGP routes.8.1.28.1.2. Fast Convergence and MassWithdrawWithdrawal EVPN defines a mechanism to efficiently and quickly signal, to remote NVEs, the need to update their forwarding tables upon the occurrence of a failure in connectivity to anEthernet segmentES (e.g., a link or a port failure). This is done by having each NVE advertise an Ethernet A-DRouteroute perEthernet segmentES for each locally attached segment. Upon a failure in connectivity to the attached segment, the NVE withdraws the corresponding Ethernet A-D route. This triggers all NVEs that receive the withdrawal to update their next-hop adjacencies for all MAC addresses associated with theEthernet segmentES in question. If no other NVE had advertised an Ethernet A-D route for the same segment, then the NVE that received the withdrawal simply invalidates the MAC entries for that segment. Otherwise, the NVE updates the next-hop adjacency list accordingly.8.1.38.1.3. Split-Horizon If a server ismulti-homedmultihomed to two or more NVEs (represented by anEthernet segmentES ES1) and operating in anall-activeAll-Active redundancy mode, sends a BUMpacket (ie,(i.e., Broadcast, Unknown unicast, or Multicast) packet to one of these NVEs, then it is important to ensure the packet is not looped back to the server via another NVE connected to this server. The filtering mechanism on the NVE to prevent such loop and packet duplication is called"split horizon"split-horizon filtering'.8.1.48.1.4. Aliasing andBackup-PathBackup Path In the case where a station ismulti-homedmultihomed to multiple NVEs, it is possible that only a single NVE learns a set of the MAC addresses associated with traffic transmitted by the station. This leads to a situation where remote NVEs receive MACadvertisementAdvertisement routes, for these addresses, from a single NVE even though multiple NVEs are connected to themulti-homedmultihomed station. As a result, the remote NVEs are not able to effectively load-balance traffic among the NVEs connected to themulti-homed Ethernet segment. Thismultihomed ES. For example, this could be thecase, for e.g.case when the NVEs perform data-path learning on theaccess,access and theload-balancingload- balancing function on the station hashes traffic from a given source MAC address to a single NVE. Another scenario where this occurs is when the NVEs rely oncontrol planecontrol-plane learning on the access(e.g.(e.g., using ARP), since ARP traffic will be hashed to a single link in the LAG. To alleviate this issue, EVPN introduces the concept ofAliasing."Aliasing". This refers to the ability of an NVE to signal that it has reachability to a given locally attachedEthernet segment,ES, even when it haslearntlearned no MAC addresses from that segment. The Ethernet A-D route per EVI is used to that end. Remote NVEswhichthat receive MACadvertisementAdvertisement routes with non-zeroESIESIs should consider the MAC address as reachable via all NVEs that advertise reachability to the relevant Segment using Ethernet A-D routes with the same ESI and with the Single-Active flag reset.Backup-PathBackup Path is a closely related function, albeititone that applies to the case where the redundancy mode is Single-Active. In this case, the NVE signals that it has reachability to a given locally attachedEthernet SegmentES using the Ethernet A-D route as well. Remote NVEswhichthat receive the MACadvertisementAdvertisement routes, with non-zero ESI, should consider the MAC address as reachable via the advertising NVE. Furthermore, the remote NVEs should install aBackup-Path,Backup Path, for said MAC, to the NVEwhichthat had advertised reachability to the relevantSegmentsegment using an Ethernet A-D route with the same ESI and with the Single-Active flag set.8.1.58.1.5. DF Election If a host ismulti-homedmultihomed to two or more NVEs on anEthernet segmentES operating inall-activeAll-Active redundancy mode,thenthen, for a givenEVIEVI, only one of these NVEs, termed theDesignated Forwarder"Designated Forwarder" (DF) is responsible for sending it broadcast, multicast, and, if configured for that EVI, unknown unicast frames. This is required in order to prevent duplicate delivery of multi- destination frames to amulti-homedmultihomed host or VM, in case ofall-activeAll-Active redundancy. In NVEs where.1Q taggedframes tagged as IEEE 802.1Q [IEEE.802.1Q] are received from hosts, the DF election should be performed based on hostVLAN IDs (VIDs)VIDs persectionSection 8.5 of [RFC7432]. Furthermore,multi-homingmultihoming PEs of a givenEthernet SegmentES MAY perform DF election using configured IDs such as VNI, EVI, normalized VIDs, andetc.etc., as along the IDs are configured consistently across themulti-homingmultihoming PEs. In GWs whereVXLAN encapsulatedVXLAN-encapsulated frames are received, the DF election is performed on VNIs. Again, it is assumedthatthat, for a given EthernetSegment,segment, VNIs are unique and consistent (e.g., no duplicate VNIs exist).8.28.2. Impact on EVPN BGP Routes&and Attributes Sincemulti-homingmultihoming is supported in this scenario,thenthe entire set of BGP routes and attributes defined in [RFC7432]areis used. The setting of the Ethernet Tag field in the MAC Advertisement, EthernetADA-D per EVI, andInclusive MulticastIMET) routes follows that ofsectionSection 5.1.3. Furthermore, the setting of the VNI field in the MAC Advertisement and EthernetADA-D per EVI routes follows that ofsectionSection 5.1.3.8.38.3. Impact on EVPN Procedures Two cases need to be examined here, depending on whether the NVEs are operating in Single-Active or in All-Active redundancy mode. First,letslet's consider the case of Single-Active redundancy mode, where the hosts aremulti-homedmultihomed to a set ofNVEs,NVEs; however, only a single NVE is active at a given point of time for a given VNI. In this case, thealiasingAliasing is notrequiredrequired, and the split-horizon filtering may not be required, but other functions such asmulti-homed Ethernet segmentmultihomed ES auto-discovery, fast convergence and masswithdraw, backup path,withdrawal, Backup Path, and DF election are required. Second, let's consider the case of All-Active redundancy mode. In this case, out of all the EVPNmulti-homingmultihoming features listed insectionSection 8.1, the use of the VXLAN or NVGRE encapsulation impacts the split-horizon andaliasingAliasing features, since those two rely on the MPLS client layer. Given that this MPLS client layer is absent with these types of encapsulations, alternative procedures and mechanisms are needed to provide the required functions. Those are discussed in detail next.8.3.18.3.1. Split Horizon In EVPN, an MPLS label is used for split-horizon filtering to support All-Activemulti-homingmultihoming where an ingress NVE adds a label corresponding to the site of origin (aka an ESILabel)label) when encapsulating the packet. The egress NVE checks the ESI label when attempting to forward a multi-destination frame out an interface, and if the label corresponds to the same site identifier (ESI) associated with that interface, the packet gets dropped. This prevents the occurrence of forwarding loops. Since VXLAN and NVGRE encapsulations do not include the ESI label, other means of performing the split-horizon filtering function must be devised for these encapsulations. The following approach is recommended for split-horizon filtering when VXLAN (or NVGRE) encapsulation is used. Every NVEtracktracks the IP address(es) associated with the other NVE(s) with which it has sharedmulti-homed Ethernet Segments.multihomed ESs. When the NVE receives a multi-destination frame from the overlay network, it examines the source IP address in the tunnel header (which corresponds to the ingress NVE) and filters out the frame on all local interfaces connected toEthernet SegmentsESs that are shared with the ingress NVE. With this approach, it is required that the ingress NVEperformsperform replication locally to all directly attached EthernetSegmentssegments (regardless of the DFElectionelection state) for all flooded traffic ingress from the access interfaces(i.e.(i.e., from the hosts). This approach is referred to as "Local Bias", and has the advantage that only a single IP addressneeds toneed be used per NVE forsplit- horizonsplit-horizon filtering, as opposed to requiring an IP address per EthernetSegmentsegment per NVE. In order to allow proper operation of split-horizon filtering among the same group ofmulti-homingmultihoming PE devices, a mix of PE devices with MPLS over GRE encapsulations running[RFC7432]the procedures from [RFC7432] forsplit- horizonsplit-horizon filtering on the one hand and VXLAN/NVGREencapsulationsencapsulation running local-bias procedures on the other on a given EthernetSegmentsegment MUST NOT be configured.8.3.28.3.2. Aliasing andBackup-PathBackup Path The Aliasing and theBackup-PathBackup Path procedures for VXLAN/NVGRE encapsulation are very similar to the ones for MPLS. In the case of MPLS, Ethernet A-D route per EVI is used for Aliasing when the correspondingEthernet SegmentES operates in All-Activemulti-homing,multihoming, and the same route is used forBackup-PathBackup Path when the correspondingEthernet SegmentES operates in Single-Activemulti-homing.multihoming. In the case of VXLAN/NVGRE, the same route is used for the Aliasing and theBackup-Backup Path with the difference that the Ethernet Tag and VNI fields in Ethernet A-D per EVI route are set as described insectionSection 5.1.3.8.3.38.3.3. Unknown Unicast Traffic Designation In EVPN, when an ingress PE uses ingress replication to flood unknown unicast traffic to egress PEs, the ingress PE uses a different EVPN MPLS label (from the one used for known unicast traffic) to identify such BUM traffic. The egress PEs use this label to identify such BUM trafficand thusand, thus, apply DF filtering for All-Activemulti-homedmultihomed sites. In absence of an unknown unicast traffic designation and in the presence of enabling unknown unicast flooding, there can be transient duplicate traffic to All-Activemulti-homedmultihomed sites under the following condition: the host MAC address is learned by the egress PE(s) and advertised to the ingress PE; however, the MACadvertisementAdvertisement has not been received or processed by the ingress PE, resulting in the host MAC addressto bebeing unknown on the ingress PE butbeknown on the egress PE(s). Therefore, when a packet destined to that host MAC address arrives on the ingress PE, it floods it via ingress replication to all the egressPE(s)PE(s), and since they are known to the egress PE(s), multiple copiesisare sent to the All-Activemulti-homedmultihomed site. It should be noted that such transient packet duplication only happens when a) the destination host ismulti-homedmultihomed via All-Active redundancy mode, b) flooding of unknown unicast is enabled in the network, c) ingress replication is used, and d) traffic for the destination host is arrived on the ingress PE before it learns the host MAC address via BGP EVPN advertisement. If it is desired to avoid occurrence of such transient packet duplication (however low probability that may be), then VXLAN-GPE encapsulation needs to be used between these PEs and the ingress PE needs to set the BUM Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress- replicated BUM traffic.99. Support for Multicast TheE-VPN Inclusive Multicast Ethernet Tag (IMET)EVPN IMET route is used to discover the multicast tunnels among the endpoints associated with a given EVI (e.g., given VNI) forVLAN-based serviceVLAN- Based Service and a given<EVI,VLAN><EVI, VLAN> forVLAN-aware bundle service.VLAN-Aware Bundle Service. All fields of this routeisare set as described insectionSection 5.1.3. TheOriginatingoriginating router's IP address field is set to the NVE's IP address. This route is tagged with the PMSI Tunnel attribute, which is used to encode the type of multicast tunnel to be used as well as the multicast tunnel identifier. The tunnel encapsulation is encoded by adding the BGP Encapsulationextended communityExtended Community as persectionSection 5.1.1. For example, the PMSI Tunnel attribute may indicate the multicast tunnel is of type Protocol Independent Multicast - Sparse-Mode(PIM-SM);(PIM- SM); whereas, the BGP Encapsulationextended communityExtended Community may indicate the encapsulation for that tunnel is of type VXLAN. The following tunnel types as defined in [RFC6514] can be used in the PMSItunnelTunnel attribute for VXLAN/NVGRE: + 3 - PIM-SSM Tree + 4 - PIM-SM Tree + 5 -Bidir-PIMBIDIR-PIM Tree + 6 - Ingress Replication In case ofVxLANVXLAN and NVGREencapsulationencapsulations withlocally-assignedlocally assigned VNIs, just as in [RFC7432], each PE MUST advertise an IMET route to other PEs in an EVPN instance for the multicast tunnel type that it uses (i.e., ingress replication, PIM-SM, PIM-SSM, orBidir-PIMBIDIR-PIM tunnel). However, forglobally-assignedglobally assigned VNIs, each PE MUST advertise an IMET route to other PEs in an EVPN instance for ingress replication or a PIM-SSM tunnel, and they MAY advertise an IMET route for a PIM-SM orBidir-PIMBIDIR-PIM tunnel. In case of a PIM-SM orBidir-PIMBIDIR-PIM tunnel, no information in the IMET route is needed by the PE tosetupset up these tunnels. In the scenario where the multicast tunnel is a tree, both the Inclusive as well as the Aggregate Inclusive variants may be used. In the former case, a multicast tree is dedicated to a VNI. Whereas, in the latter, a multicast tree is shared among multiple VNIs. ForVNI- based service,VNI-Based Service, the Aggregate Inclusive mode is accomplished by having the NVEs advertise multiple IMET routes with differentRoute TargetsRTs (one per VNI) but with the same tunnel identifier encoded in the PMSItunnelTunnel attribute. ForVNI-aware bundle service,VNI-Aware Bundle Service, the Aggregate Inclusive mode is accomplished by having the NVEs advertise multiple IMET routes with differentVNIVNIs encoded in the Ethernet Tag field, but with the same tunnel identifier encoded in the PMSI Tunnel attribute.10 Data Center10. Data-Center Interconnections- DCI(DCIs) ForDCI,DCIs, the following two main scenarios are considered when connecting data centers running evpn-overlay (as described here) over an MPLS/IP core network: - Scenario 1: DCI using GWs - Scenario 2: DCI using ASBRs The following two subsections describe the operations for each of these scenarios.10.110.1. DCIusingUsing GWs This is the typical scenario for interconnecting data centers over WAN. In this scenario, EVPN routes are terminated and processed in each GW and MAC/IProutesroute are always re-advertised from DC to WAN but from WAN to DC, they are not re-advertised if unknown MACaddressaddresses (and default IP address) are utilized in the NVEs. In this scenario, each GW maintains a MAC-VRF (and/or IP-VRF) for each EVI. The main advantage of this approach is that NVEs do not need to maintain MAC and IP addresses from any remote data centers when default IProuteroutes and unknown MAC routes areused - i.e.,used; that is, they only need to maintain routes that are local to their own DC. When default IProuteroutes and unknown MACrouteroutes are used, any unknown IP and MAC packets from NVEs are forwarded to the GWs where all the VPN MAC and IP routes are maintained. This approach reduces the size of MAC-VRF and IP-VRF significantly at NVEs. Furthermore, it results in a faster convergence time upon a link or NVE failure in amulti-homedmultihomed network or device redundancy scenario, because thefailure relatedfailure-related BGP routes (such as masswithdrawwithdrawal message) do not need to get propagated all the way to the remote NVEs in the remote DCs. This approach is described indetailsdetail insectionSection 3.4 of [DCI-EVPN-OVERLAY].10.210.2. DCIusingUsing ASBRs This approach can be considered as the opposite of the firstapproach and itapproach. It favors simplification at DCI devices over NVEs such that larger MAC-VRF (and IP-VRF) tables need to be maintained on NVEs;whereas,whereas DCI devices don't need to maintain any MAC (and IP) forwarding tables. Furthermore, DCI devices do not need to terminate and process routes related tomulti-homingmultihoming but rather to relay these messages for the establishment of an end-to-end Label Switched Path(LSP) path.(LSP). In other words, DCI devices in this approach operate similar to ASBRs for inter-ASoptionOption B- section(see Section 10 of[RFC4364].[RFC4364]). This requires locally assigned VNIs to be used just likedownstreamdownstream- assigned MPLS VPNlabel wherelabels where, for all practicalpurposespurposes, the VNIs function like 24-bit VPN labels. This approach is equally applicable to data centers (or Carrier Ethernet networks) with MPLS encapsulation. In inter-ASoptionOption B, when ASBR receives an EVPN route from its DC over internal BGP (iBGP) and re-advertises it to other ASBRs, it re- advertises the EVPN route by re-writing the BGPnext-hopsnext hops to itself, thus losing the identity of the PE that originated the advertisement. Thisre-writerewrite of BGPnext-hopnext hop impacts the EVPNMass Withdrawmass withdrawal route (Ethernet A-D per ES) and its procedure adversely. However, it does not impact the EVPN Aliasing mechanism/procedure because when the Aliasing routes(Ether(Ethernet A-D per EVI) are advertised, the receiving PE first resolves a MAC address for a given EVI into its corresponding<ES,EVI> and<ES, EVI>, and, subsequently, it resolves the<ES,EVI>< ES, EVI> into multiple paths (and their associated next hops) via which the<ES,EVI><ES, EVI> is reachable. Since Aliasing and MAC routes are both advertisedper EVI basison a per-EVI-basis and they use the same RD and RT (per EVI), the receiving PE can associate them together on aper BGP pathper-BGP-path basis (e.g., per originatingPE) and thusPE). Thus, it can perform recursive routeresolution -resolution, e.g., a MAC is reachable via an<ES,EVI><ES, EVI> which in turn, is reachable via a set of BGPpaths, thuspaths; thus, the MAC is reachable via the set of BGP paths.Since on a per EVIDue to the per-EVI basis, the association of MAC routes and the corresponding Aliasing route is fixed and determined by the same RD andRT,RT; there is no ambiguity when the BGP next hop for these routes isre-writtenrewritten as these routes pass throughASBRs - i.e.,ASBRs. That is, the receiving PE may receive multiple Aliasing routes for the same EVI from a single next hop (a single ASBR), and it can still create multiple paths toward that <ES, EVI>. However, when the BGPnext hopnext-hop address corresponding to the originating PE isre-written,rewritten, the association between theMass Withdrawmass withdrawal route(Ether(Ethernet A-D per ES) and its corresponding MAC routes cannot be made based on their RDs and RTs because the RD forMass Withdrawthe mass Withdrawal route is different than the one for the MAC routes. Therefore, the functionality needed at the ASBRs and the receiving PEs depends on whether the MassWithdrawWithdrawal route is originated and whether there is a need to handle route resolution ambiguity for this route. The following two subsections describe the functionality needed by the ASBRs and the receiving PEs depending on whether the NVEs reside in aHypervisorshypervisors or inTORs. 10.2.1ToR switches. 10.2.1. ASBR Functionality with Single-Homing NVEs When NVEs reside in hypervisors as described insectionSection 7.1, there is nomulti-homing and thusmultihoming; thus, there is no need for the originating NVE to send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as noted insectionSection 7, in order to enable a single-homing ingress NVE to take advantage of fast convergence,aliasing,Aliasing, andbackup-pathBackup Path when interacting withmulti-homingmultihoming egress NVEs attached to a givenEthernet segment,ES, the single-homing NVE should be able to receive and process EthernetADA-D per ES and EthernetADA-D per EVI routes. The handling of these routesareis described in the next section.10.2.210.2.2. ASBR Functionality withMulti-HomingMultihoming NVEs When NVEs reside inTORsToR switches and operate inmulti-homingmultihoming redundancy mode,thenthere is a need, as described insectionSection 8,there is a needfor the originatingmulti-homingmultihoming NVE to send Ethernet A-D per ES route(s) (used for masswithdraw)withdrawal) and Ethernet A-D per EVI routes (used foraliasing).Aliasing). As described above, there-writerewrite of BGPnext-hopnext hop by ASBRs creates ambiguities when Ethernet A-D per ES routes are received by the remote NVE in a different ASBR because the receiving NVE cannotassociatedassociate that route with the MAC/IP routes of thatEthernet SegmentES advertised by the same originating NVE. This ambiguity inhibits the function ofmass-withdrawmass withdrawal per ES by the receiving NVE in a different AS. As anexampleexample, consider a scenario where a CE ismulti-homedmultihomed to PE1 andPE2PE2, where these PEs are connected via ASBR1 and then ASBR2 to the remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but not PE2. Therefore, PE1 advertisesEthEthernet A-D per ES1,EthEthernet A-D per EVI1, and M1; whereas, PE2 only advertisesEthEthernet A-D per ES1 andEthEthernet A-D per EVI1. ASBR1 receives all these five advertisements and passes them to ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them to the remotePE3PE3, with itself as the BGP next hop. PE3 receives these five routes where all of them have the same BGPnext-hopnext hop (i.e., ASBR2). Furthermore, the twoEtherEthernet A-D per ES routes received by PE3 have the sameinfo -information, i.e., same ESI and the same BGP next hop. Although both of these routes are maintained by the BGP process in PE3 (because they have different RDsand thusand, thus, are treated as different BGP routes), information from only one of them is used in the L2 routing table (L2 RIB). PE1 / \ CE ASBR1---ASBR2---PE3 \ / PE2 Figure1:3: Inter-AS Option B Now, when the AC between the PE2 and the CE fails and PE2 sendsNLRINetwork Layer Reachability Information (NLRI) withdrawal forEtherEthernet A-D per ESrouteroute, and this withdrawal gets propagated and received by the PE3, the BGP process in PE3 removes the corresponding BGP route; however, it doesn't remove the associatedinfoinformation (namely ESI and BGP next hop) from the L2 routing table (L2 RIB) because it still has the otherEtherEthernet A-D per ES route (originated from PE1) with the sameinfo.information. That is why themass- withdrawmass withdrawal mechanism does not work when doing DCI with inter-ASoptionOption B. However, as describedprevioulsy,previously, thealiasingAliasing function works and so does"mass-withdraw"mass withdrawal per EVI" (which is associated with withdrawing the EVPN route associated withAliasing -Aliasing, i.e.,EtherEthernet A-D per EVI route). In the above example, the PE3 receives two Aliasing routes with the same BGP next hop (ASBR2) but different RDs. One of theAliasAliasing route has the same RD as the advertised MAC route (M1). PE3 follows the route resolution procedure specified in [RFC7432] upon receiving the two Aliasingroute - ie,routes; that is, it resolves M1 to <ES,EVI1> and subsequentlyEVI1>, and, subsequently, it resolves<ES,EVI1><ES, EVI1> to a BGP path list with two paths along with the corresponding VNIs/MPLS labels (one associated with PE1 and the other associated with PE2). It should be noted that even though both paths are advertised by the same BGP next hop (ASRB2), the receiving PE3 can handle them properly. Therefore, M1 is reachable via two paths. This creates two end-to-end LSPs, from PE3 to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to forward traffic destined to M1, it canload balancedload-balance between the two LSPs. Although route resolution for Aliasing routes with the same BGP next hop is not explicitly mentioned in [RFC7432], this is the expectedoperation and thusoperation; thus, it is elaborated here. When the AC between the PE2 and the CE fails and PE2 sends NLRI withdrawal forEtherEthernet A-D per EVIroutesroutes, and these withdrawals get propagated and received by the PE3, the PE3 removes the Aliasing route and updates the pathlist - ie,list; that is, it removes the path corresponding to the PE2. Therefore, all the corresponding MAC routes for that<ES,EVI><ES, EVI> that point to that path list will now have the updated path list with a single path associated with PE1. This action can be consideredasto be themass-withdrawmass withdrawal at the per-EVI level. Themass-withdrawmass withdrawal at the per-EVI level has a longer convergence time than themass-withdrawmass withdrawal at the per-ES level; however, it is much faster than the convergence time when thewithdrawwithdrawal is done on a per-MAC basis. If a PE becomes detached from a given ES,thenthen, in addition to withdrawing its previously advertised EthernetAD PerA-D per ES routes, it MUST also withdraw its previously advertised EthernetAD PerA-D per EVI routes for that ES. For a remote PE that is separated from the withdrawing PE by one or more EVPN inter-ASoptionOption B ASBRs, the withdrawal of the EthernetAD PerA-D per ES routes is not actionable. However, a remote PE is able to correlate a previously advertised EthernetAD PerA-D per EVI route with any MAC/IP Advertisement routes also advertised by the withdrawing PE for that <ES, EVI, BD>. Hence, when it receives the withdrawal of an EthernetAD PerA-D per EVI route, it SHOULD remove the withdrawing PE as anext-hopnext hop for all MAC addresses associated with that <ES, EVI, BD>. In the previous example, when the AC between PE2 and the CE fails, PE2 will withdraw its EthernetAD PerA-D per ES andPerper EVI routes. When PE3 receives the withdrawal of an EthernetAD PerA-D per EVI route, it removes PE2 as a validnext-hopnext hop for all MAC addresses associated with the corresponding <ES, EVI, BD>. Therefore, all the MACnext-hopsnext hops for that<ES,EVI,<ES, EVI, BD> will now have a singlenext-hop, viznext hop, viz. the LSP to PE1. In summary, it can be seen thataliasingAliasing (andbackup path)Backup Path) functionality should work as is for inter-ASoptionOption B without requiring anyadditionadditional functionality in ASBRs or PEs. However, themass-withdrawmass withdrawal functionality falls back from per-ES mode to per-EVI mode for inter-ASoption B - i.e.,Option B. That is, PEs receivingmass-withdrawa mass withdrawal route from the same AS take action onEtherEthernet A-D per ES route; whereas, PEs receivingmass-withdraw routemass withdrawal routes from differentASASes take action onEtherthe Ethernet A-D per EVI route.11 Acknowledgement The authors would like to thank Aldrin Isaac, David Smith, John Mullooly, Thomas Nadeau, Samir Thoria, and Jorge Rabadan for their valuable comments and feedback. The authors would also like to thank Jakob Heitz for his contribution on section 10.2. 1211. Security Considerations This document uses IP-based tunnel technologies to support data plane transport. Consequently, the security considerations of those tunnel technologies apply. This document defines support for VXLAN [RFC7348] and NVGRE[RFC7637] encapsulations.encapsulations [RFC7637]. The security considerations from those RFCs apply to thedata planedata-plane aspects of this document. As with [RFC5512], any modification of the information that is used to form encapsulation headers, to choose a tunnel type, or to choose a particular tunnel for a particular payload type may lead to user data packets getting misrouted, misdelivered, and/or dropped. More broadly, the security considerations for the transport of IP reachability information using BGP are discussed in [RFC4271] and[RFC4272],[RFC4272] and are equally applicable for the extensions described in this document.1312. IANA Considerations This documentrequestsregisters the followingBGPin the "BGP Tunnel Encapsulation Attribute TunnelTypes from IANA and they have already been allocated. The IANA registry needs to point to this document.Types" registry. Value Name ----- ------------------------ 8 VXLAN Encapsulation 9 NVGRE Encapsulation 10 MPLS Encapsulation 11 MPLS in GRE Encapsulation 12 VXLAN GPE Encapsulation1413. References14.113.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March1997.1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017,<http://www.rfc-editor.org/info/rfc8174>.<https://www.rfc-editor.org/info/rfc8174>. [RFC7432]Sajassi et al.,Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGPMPLS BasedMPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February20142015, <https://www.rfc-editor.org/info/rfc7432>. [RFC7348] Mahalingam, M.,et al, "VXLAN:Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, August2014 [RFC7637] Garg, P., et al., "NVGRE: Network Virtualization using Generic Routing Encapsulation", RFC 7637, September, 20152014, <https://www.rfc-editor.org/info/rfc7348>. [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation Subsequent Address Family Identifier (SAFI) and the BGP Tunnel Encapsulation Attribute", RFC 5512, DOI 10.17487/RFC5512, April2009.2009, <https://www.rfc-editor.org/info/rfc5512>. [RFC4023]T. Worster et al.,Worster, T., Rekhter, Y., and E. Rosen, Ed., "Encapsulating MPLS in IP or Generic Routing Encapsulation (GRE)", RFC 4023, DOI 10.17487/RFC4023, March2005 14.22005, <https://www.rfc-editor.org/info/rfc4023>. [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network Virtualization Using Generic Routing Encapsulation", RFC 7637, DOI 10.17487/RFC7637, September 2015, <https://www.rfc-editor.org/info/rfc7637>. 13.2. Informative References [RFC7209]Sajassi et al.,Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May20142014, <https://www.rfc-editor.org/info/rfc7209>. [RFC4272]S.Murphy, S., "BGP Security VulnerabilitiesAnalysis.",Analysis", RFC 4272, DOI 10.17487/RFC4272, January2006.2006, <https://www.rfc-editor.org/info/rfc4272>. [RFC7364]Narten et al.,Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger, L., and M. Napierala, "Problem Statement: Overlays for Network Virtualization", RFC 7364, DOI 10.17487/RFC7364, October2014.2014, <https://www.rfc-editor.org/info/rfc7364>. [RFC7365]Lasserre et al.,Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, "Framework forDCData Center (DC) Network Virtualization", RFC 7365, DOI 10.17487/RFC7365, October2014. [DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution2014, <https://www.rfc-editor.org/info/rfc7365>. [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP Encodings and Procedures forEVPN Overlay networks", draft-ietf-bess-dci-evpn-overlay-08, workMulticast inprogress,MPLS/BGP IP VPNs", RFC 6514, DOI 10.17487/RFC6514, February8, 2018.2012, <https://www.rfc-editor.org/info/rfc6514>. [RFC4271]Y.Rekhter, Y., Ed.,T.Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January2006.2006, <https://www.rfc-editor.org/info/rfc4271>. [RFC4364] Rosen,E., et al,E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February2006.2006, <https://www.rfc-editor.org/info/rfc4364>. [TUNNEL-ENCAP]Rosen et al.,Rosen, E., Ed., Patel, K., and G. Velde, "The BGP Tunnel Encapsulation Attribute",draft-ietf-idr-tunnel-encaps-08, workWork inprogress, January 11,Progress draft-ietf-idr- tunnel-encaps-09, February 2018.[RFC6514] R. Aggarwal et al., "BGP Encodings[DCI-EVPN-OVERLAY] Rabadan, J., Ed., Sathappan, S., Henderickx, W., Sajassi, A., andProceduresJ. Drake, "Interconnect Solution forMulticastEVPN Overlay networks", Work inMPLS/BGP IP VPNs", RFC 6514, February 2012Progress, draft-ietf-bess-dci-evpn- overlay-10, March 2018. [EVPN-GENEVE] Boutros, S., Sajassi, A., Drake, J., and J. Rabadan, "EVPN control plane for Geneve", Work in Progress, draft- boutros-bess-evpn-geneve-02, March 2018. [VXLAN-GPE]Maino et al.,Maino, F., Kreeger, L., Ed., and U. Elzur, Ed., "Generic Protocol Extension for VXLAN",draft-ietf-nvo3-vxlan-gpe-05, workWork inprogressProgress, draft- ietf-nvo3-vxlan-gpe-05, October30,2017. [GENEVE]J. Gross et al.,Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., "Geneve: Generic Network Virtualization Encapsulation",draft-ietf-nvo3-geneve-05, September 2017 [EVPN-GENEVE] S. Boutros et al., "EVPN control planeWork in Progress draft-ietf-nvo3-geneve-06, March 2018. [IEEE.802.1Q] IEEE, "IEEE Standard forGeneve", draft-boutros-bess-evpn-geneve-00.txt, June 2017Local and metropolitan area networks - Bridges and Bridged Networks - Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q. Acknowledgements The authors would like to thank Aldrin Isaac, David Smith, John Mullooly, Thomas Nadeau, Samir Thoria, and Jorge Rabadan for their valuable comments and feedback. The authors would also like to thank Jakob Heitz for his contribution on Section 10.2. Contributors S. Salam K. Patel D. Rao S. Thoria D. Cai Cisco Y. Rekhter A. IssacWenW. LinNischalN. Sheth Juniper L. Yong Huawei Authors' Addresses Ali Sajassi (editor) CiscoUSAUnited States of America Email: sajassi@cisco.com John Drake (editor) Juniper NetworksUSAUnited States of America Email: jdrake@juniper.net Nabil Bitar NokiaUSA Email :United States of America Email: nabil.bitar@nokia.com R. Shekhar JuniperUSAUnited States of America Email: rshekhar@juniper.net James Uttaro AT&TUSAUnited States of America Email: uttaro@att.com Wim Henderickx NokiaUSA e-mail:Copernicuslaan 50 2018 Antwerp Belgium Email: wim.henderickx@nokia.com