rfc9136.original | rfc9136.txt | |||
---|---|---|---|---|
BESS Workgroup J. Rabadan, Ed. | Internet Engineering Task Force (IETF) J. Rabadan, Ed. | |||
Internet Draft W. Henderickx | Request for Comments: 9136 W. Henderickx | |||
Intended status: Standards Track Nokia | Category: Standards Track Nokia | |||
ISSN: 2070-1721 J. Drake | ||||
J. Drake | ||||
W. Lin | W. Lin | |||
Juniper | Juniper | |||
A. Sajassi | A. Sajassi | |||
Cisco | Cisco | |||
October 2021 | ||||
Expires: November 19, 2018 May 18, 2018 | IP Prefix Advertisement in Ethernet VPN (EVPN) | |||
IP Prefix Advertisement in EVPN | ||||
draft-ietf-bess-evpn-prefix-advertisement-11 | ||||
Abstract | Abstract | |||
The BGP MPLS-based Ethernet VPN (EVPN) [RFC7432] mechanism provides a | The BGP MPLS-based Ethernet VPN (EVPN) (RFC 7432) mechanism provides | |||
flexible control plane that allows intra-subnet connectivity in an | a flexible control plane that allows intra-subnet connectivity in an | |||
MPLS and/or NVO (Network Virtualization Overlay) [RFC7365] network. | MPLS and/or Network Virtualization Overlay (NVO) (RFC 7365) network. | |||
In some networks, there is also a need for a dynamic and efficient | In some networks, there is also a need for dynamic and efficient | |||
inter-subnet connectivity across Tenant Systems and End Devices that | inter-subnet connectivity across Tenant Systems and end devices that | |||
can be physical or virtual and do not necessarily participate in | can be physical or virtual and do not necessarily participate in | |||
dynamic routing protocols. This document defines a new EVPN route | dynamic routing protocols. This document defines a new EVPN route | |||
type for the advertisement of IP Prefixes and explains some use-case | type for the advertisement of IP prefixes and explains some use-case | |||
examples where this new route-type is used. | examples where this new route type is used. | |||
Status of this Memo | ||||
This Internet-Draft is submitted in full conformance with the | ||||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF), its areas, and its working groups. Note that | ||||
other groups may also distribute working documents as Internet- | ||||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | Status of This Memo | |||
and may be updated, replaced, or obsoleted by other documents at any | ||||
time. It is inappropriate to use Internet-Drafts as reference | ||||
material or to cite them other than as "work in progress." | ||||
The list of current Internet-Drafts can be accessed at | This is an Internet Standards Track document. | |||
http://www.ietf.org/ietf/1id-abstracts.txt | ||||
The list of Internet-Draft Shadow Directories can be accessed at | This document is a product of the Internet Engineering Task Force | |||
http://www.ietf.org/shadow.html | (IETF). It represents the consensus of the IETF community. It has | |||
received public review and has been approved for publication by the | ||||
Internet Engineering Steering Group (IESG). Further information on | ||||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on November 19, 2018. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9136. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Terminology | |||
2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 5 | 2. Problem Statement | |||
2.1 Inter-Subnet Connectivity Requirements in Data Centers . . . 5 | 2.1. Inter-Subnet Connectivity Requirements in Data Centers | |||
2.2 The Need for the EVPN IP Prefix Route . . . . . . . . . . . 8 | 2.2. The Need for the EVPN IP Prefix Route | |||
3. The BGP EVPN IP Prefix Route . . . . . . . . . . . . . . . . . 10 | 3. The BGP EVPN IP Prefix Route | |||
3.1 IP Prefix Route Encoding . . . . . . . . . . . . . . . . . . 11 | 3.1. IP Prefix Route Encoding | |||
3.2 Overlay Indexes and Recursive Lookup Resolution . . . . . . 13 | 3.2. Overlay Indexes and Recursive Lookup Resolution | |||
4. Overlay Index Use-Cases . . . . . . . . . . . . . . . . . . . . 15 | 4. Overlay Index Use Cases | |||
4.1 TS IP Address Overlay Index Use-Case . . . . . . . . . . . . 16 | 4.1. TS IP Address Overlay Index Use Case | |||
4.2 Floating IP Overlay Index Use-Case . . . . . . . . . . . . . 18 | 4.2. Floating IP Overlay Index Use Case | |||
4.3 Bump-in-the-Wire Use-Case . . . . . . . . . . . . . . . . . 20 | 4.3. Bump-in-the-Wire Use Case | |||
4.4 IP-VRF-to-IP-VRF Model . . . . . . . . . . . . . . . . . . . 23 | 4.4. IP-VRF-to-IP-VRF Model | |||
4.4.1 Interface-less IP-VRF-to-IP-VRF Model . . . . . . . . . 24 | 4.4.1. Interface-less IP-VRF-to-IP-VRF Model | |||
4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB . . . . . . 27 | 4.4.2. Interface-ful IP-VRF-to-IP-VRF with SBD IRB | |||
4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB . 30 | 4.4.3. Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . . 33 | 5. Security Considerations | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 33 | 6. IANA Considerations | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | 7. References | |||
7.1 Normative References . . . . . . . . . . . . . . . . . . . . 34 | 7.1. Normative References | |||
7.2 Informative References . . . . . . . . . . . . . . . . . . . 34 | 7.2. Informative References | |||
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 35 | Acknowledgments | |||
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 35 | Contributors | |||
10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 36 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
[RFC7365] provides a framework for Data Center (DC) Network | [RFC7365] provides a framework for Data Center (DC) Network | |||
Virtualization over Layer 3 and specifies that the Network | Virtualization over Layer 3 and specifies that the Network | |||
Virtualization Edge devices (NVEs) must provide layer 2 and layer 3 | Virtualization Edge (NVE) devices must provide Layer 2 and Layer 3 | |||
virtualized network services in multi-tenant DCs. [RFC8365] discusses | virtualized network services in multi-tenant DCs. [RFC8365] | |||
the use of EVPN as the technology of choice to provide layer 2 or | discusses the use of EVPN as the technology of choice to provide | |||
intra-subnet services in these DCs. This document, along with [EVPN- | Layer 2 or intra-subnet services in these DCs. This document, along | |||
INTERSUBNET], specifies the use of EVPN for layer 3 or inter-subnet | with [RFC9135], specifies the use of EVPN for Layer 3 or inter-subnet | |||
connectivity services. | connectivity services. | |||
[EVPN-INTERSUBNET] defines some fairly common inter-subnet forwarding | [RFC9135] defines some fairly common inter-subnet forwarding | |||
scenarios where TSes can exchange packets with TSes located in remote | scenarios where Tenant Systems (TSs) can exchange packets with TSs | |||
subnets. In order to achieve this, [EVPN-INTERSUBNET] describes how | located in remote subnets. In order to achieve this, [RFC9135] | |||
MAC/IPs encoded in TS RT-2 routes are not only used to populate MAC- | describes how Media Access Control (MAC) and IPs encoded in TS RT-2 | |||
VRF and overlay ARP tables, but also IP-VRF tables with the encoded | routes are not only used to populate MAC Virtual Routing and | |||
TS host routes (/32 or /128). In some cases, EVPN may advertise IP | Forwarding (MAC-VRF) and overlay Address Resolution Protocol (ARP) | |||
Prefixes and therefore provide aggregation in the IP-VRF tables, as | tables but also IP-VRF tables with the encoded TS host routes (/32 or | |||
opposed to propagate individual host routes. This document | /128). In some cases, EVPN may advertise IP prefixes and therefore | |||
complements the scenarios described in [EVPN-INTERSUBNET] and defines | provide aggregation in the IP-VRF tables, as opposed to propagating | |||
how EVPN may be used to advertise IP Prefixes. Interoperability | individual host routes. This document complements the scenarios | |||
between EVPN and L3VPN [RFC4364] IP Prefix routes is out of the scope | described in [RFC9135] and defines how EVPN may be used to advertise | |||
IP prefixes. Interoperability between EVPN and Layer 3 Virtual | ||||
Private Network (VPN) [RFC4364] IP Prefix routes is out of the scope | ||||
of this document. | of this document. | |||
Section 2.1 describes the inter-subnet connectivity requirements in | Section 2.1 describes the inter-subnet connectivity requirements in | |||
Data Centers. Section 2.2 explains why a new EVPN route type is | DCs. Section 2.2 explains why a new EVPN route type is required for | |||
required for IP Prefix advertisements. Sections 3, 4 and 5 will | IP prefix advertisements. Sections 3, 4, and 5 will describe this | |||
describe this route type and how it is used in some specific use | route type and how it is used in some specific use cases. | |||
cases. | ||||
1.1 Terminology | 1.1. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
AC: Attachment Circuit. | AC: Attachment Circuit | |||
ARP: Address Resolution Protocol. | ARP: Address Resolution Protocol | |||
BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single | BD: Broadcast Domain. As per [RFC7432], an EVI consists of a | |||
or multiple BDs. In case of VLAN-bundle and VLAN-based service | single BD or multiple BDs. In case of VLAN-bundle and | |||
models (see [RFC7432]), a BD is equivalent to an EVI. In case of | VLAN-based service models (see [RFC7432]), a BD is | |||
VLAN-aware bundle service model, an EVI contains multiple BDs. | equivalent to an EVI. In case of a VLAN-aware bundle | |||
Also, in this document, BD and subnet are equivalent terms. | service model, an EVI contains multiple BDs. Also, in this | |||
document, "BD" and "subnet" are equivalent terms. | ||||
BD Route Target: refers to the Broadcast Domain assigned Route Target | BD Route Target: Refers to the broadcast-domain-assigned Route | |||
[RFC4364]. In case of VLAN-aware bundle service model, all the BD | Target [RFC4364]. In case of a VLAN-aware bundle service | |||
instances in the MAC-VRF share the same Route Target. | model, all the BD instances in the MAC-VRF share the same | |||
Route Target. | ||||
BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as per | BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as | |||
[RFC7432]. | per [RFC7432]. | |||
DGW: Data Center Gateway. | CE: Customer Edge | |||
Ethernet A-D route: Ethernet Auto-Discovery (A-D) route, as per | DA: Destination Address | |||
[RFC7432]. | ||||
Ethernet NVO tunnel: refers to Network Virtualization Overlay tunnels | DGW: Data Center Gateway | |||
with Ethernet payload. Examples of this type of tunnels are VXLAN | ||||
or GENEVE. | ||||
EVI: EVPN Instance spanning the NVE/PE devices that are participating | Ethernet A-D Route: Ethernet Auto-Discovery (A-D) route, as per | |||
on that EVPN, as per [RFC7432]. | [RFC7432]. | |||
EVPN: Ethernet Virtual Private Networks, as per [RFC7432]. | Ethernet NVO Tunnel: Refers to Network Virtualization Overlay | |||
tunnels with Ethernet payload. Examples of this type of | ||||
tunnel are VXLAN or GENEVE. | ||||
GRE: Generic Routing Encapsulation. | EVI: EVPN Instance spanning the NVE/PE devices that are | |||
participating on that EVPN, as per [RFC7432]. | ||||
GW IP: Gateway IP Address. | EVPN: Ethernet VPN, as per [RFC7432]. | |||
IPL: IP Prefix Length. | GENEVE: Generic Network Virtualization Encapsulation, as per | |||
[RFC8926]. | ||||
IP NVO tunnel: it refers to Network Virtualization Overlay tunnels | GRE: Generic Routing Encapsulation | |||
with IP payload (no MAC header in the payload). | ||||
IP-VRF: A VPN Routing and Forwarding table for IP routes on an | GW IP: Gateway IP address | |||
NVE/PE. The IP routes could be populated by EVPN and IP-VPN | ||||
address families. An IP-VRF is also an instantiation of a layer 3 | ||||
VPN in an NVE/PE. | ||||
IRB: Integrated Routing and Bridging interface. It connects an IP-VRF | IPL: IP Prefix Length | |||
to a BD (or subnet). | ||||
MAC-VRF: A Virtual Routing and Forwarding table for Media Access | IP NVO Tunnel: Refers to Network Virtualization Overlay tunnels with | |||
Control (MAC) addresses on an NVE/PE, as per [RFC7432]. A MAC-VRF | IP payload (no MAC header in the payload). | |||
is also an instantiation of an EVI in an NVE/PE. | ||||
ML: MAC address length. | IP-VRF: A Virtual Routing and Forwarding table for IP routes on an | |||
NVE/PE. The IP routes could be populated by EVPN and IP- | ||||
VPN address families. An IP-VRF is also an instantiation | ||||
of a Layer 3 VPN in an NVE/PE. | ||||
ND: Neighbor Discovery Protocol. | IRB: Integrated Routing and Bridging interface. It connects an | |||
IP-VRF to a BD (or subnet). | ||||
NVE: Network Virtualization Edge. | MAC: Media Access Control | |||
GENEVE: Generic Network Virtualization Encapsulation, [GENEVE]. | MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on | |||
an NVE/PE, as per [RFC7432]. A MAC-VRF is also an | ||||
instantiation of an EVI in an NVE/PE. | ||||
NVO: Network Virtualization Overlays. | ML: MAC Address Length | |||
RT-2: EVPN route type 2, i.e., MAC/IP advertisement route, as defined | ND: Neighbor Discovery | |||
in [RFC7432]. | ||||
RT-5: EVPN route type 5, i.e., IP Prefix route. As defined in Section | NVE: Network Virtualization Edge | |||
3. | ||||
SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, | NVO: Network Virtualization Overlay | |||
only IRB interfaces, and it is used to provide connectivity among | ||||
all the IP-VRFs of the tenant. The SBD is only required in IP-VRF- | ||||
to-IP-VRF use-cases (see Section 4.4.). | ||||
SN: Subnet. | PE: Provider Edge | |||
TS: Tenant System. | RT-2: EVPN Route Type 2, i.e., MAC/IP Advertisement route, as | |||
defined in [RFC7432]. | ||||
VA: Virtual Appliance. | RT-5: EVPN Route Type 5, i.e., IP Prefix route, as defined in | |||
Section 3. | ||||
VNI: Virtual Network Identifier. As in [RFC8365], the term is used as | SBD: Supplementary Broadcast Domain. A BD that does not have | |||
a representation of a 24-bit NVO instance identifier, with the | any ACs, only IRB interfaces, and is used to provide | |||
understanding that VNI will refer to a VXLAN Network Identifier in | connectivity among all the IP-VRFs of the tenant. The SBD | |||
VXLAN, or Virtual Network Identifier in GENEVE, etc. unless it is | is only required in IP-VRF-to-IP-VRF use cases (see | |||
stated otherwise. | Section 4.4). | |||
VTEP: VXLAN Termination End Point, as in [RFC7348]. | SN: Subnet | |||
VXLAN: Virtual Extensible LAN, as in [RFC7348]. | TS: Tenant System | |||
VA: Virtual Appliance | ||||
VM: Virtual Machine | ||||
VNI: Virtual Network Identifier. As in [RFC8365], the term is | ||||
used as a representation of a 24-bit NVO instance | ||||
identifier, with the understanding that "VNI" will refer to | ||||
a VXLAN Network Identifier in VXLAN, or a Virtual Network | ||||
Identifier in GENEVE, etc., unless it is stated otherwise. | ||||
VSID: Virtual Subnet Identifier | ||||
VTEP: VXLAN Termination End Point, as per [RFC7348]. | ||||
VXLAN: Virtual eXtensible Local Area Network, as per [RFC7348]. | ||||
This document also assumes familiarity with the terminology of | This document also assumes familiarity with the terminology of | |||
[RFC7432], [RFC8365] and [RFC7365]. | [RFC7365], [RFC7432], and [RFC8365]. | |||
2. Problem Statement | 2. Problem Statement | |||
This Section describes the inter-subnet connectivity requirements in | This section describes the inter-subnet connectivity requirements in | |||
Data Centers and why a specific route type to advertise IP Prefixes | DCs and why a specific route type to advertise IP prefixes is needed. | |||
is needed. | ||||
2.1 Inter-Subnet Connectivity Requirements in Data Centers | 2.1. Inter-Subnet Connectivity Requirements in Data Centers | |||
[RFC7432] is used as the control plane for a Network Virtualization | [RFC7432] is used as the control plane for an NVO solution in DCs, | |||
Overlay (NVO) solution in Data Centers (DC), where Network | where NVE devices can be located in hypervisors or Top-of-Rack (ToR) | |||
Virtualization Edge (NVE) devices can be located in Hypervisors or | switches, as described in [RFC8365]. | |||
Top of Rack switches (ToRs), as described in [RFC8365]. | ||||
The following considerations apply to Tenant Systems (TS) that are | The following considerations apply to TSs that are physical or | |||
physical or virtual systems identified by MAC and maybe IP addresses | virtual systems identified by MAC (and possibly IP addresses) and are | |||
and connected to BDs by Attachment Circuits: | connected to BDs by Attachment Circuits: | |||
o The Tenant Systems may be Virtual Machines (VMs) that generate | * The Tenant Systems may be VMs that generate traffic from their own | |||
traffic from their own MAC and IP. | MAC and IP. | |||
o The Tenant Systems may be Virtual Appliance entities (VAs) that | * The Tenant Systems may be VA entities that forward traffic to/from | |||
forward traffic to/from IP addresses of different End Devices | IP addresses of different end devices sitting behind them. | |||
sitting behind them. | ||||
o These VAs can be firewalls, load balancers, NAT devices, other | - These VAs can be firewalls, load balancers, NAT devices, other | |||
appliances or virtual gateways with virtual routing instances. | appliances, or virtual gateways with virtual routing instances. | |||
o These VAs do not necessarily participate in dynamic routing | - These VAs do not necessarily participate in dynamic routing | |||
protocols and hence rely on the EVPN NVEs to advertise the | protocols and hence rely on the EVPN NVEs to advertise the | |||
routes on their behalf. | routes on their behalf. | |||
o In all these cases, the VA will forward traffic to other TSes | - In all these cases, the VA will forward traffic to other TSs | |||
using its own source MAC but the source IP will be the one | using its own source MAC, but the source IP will be the one | |||
associated to the End Device sitting behind or a translated IP | associated with the end device sitting behind the VA or a | |||
address (part of a public NAT pool) if the VA is performing | translated IP address (part of a public NAT pool) if the VA is | |||
NAT. | performing NAT. | |||
o Note that the same IP address and endpoint could exist behind | - Note that the same IP address and endpoint could exist behind | |||
two of these TSes. One example of this would be certain | two of these TSs. One example of this would be certain | |||
appliance resiliency mechanisms, where a virtual IP or | appliance resiliency mechanisms, where a virtual IP or floating | |||
floating IP can be owned by one of the two VAs running the | IP can be owned by one of the two VAs running the resiliency | |||
resiliency protocol (the master VA). Virtual Router Redundancy | protocol (the Master VA). The Virtual Router Redundancy | |||
Protocol (VRRP), RFC5798, is one particular example of this. | Protocol (VRRP) [RFC5798] is one particular example of this. | |||
Another example is multi-homed subnets, i.e., the same subnet | Another example is multihomed subnets, i.e., the same subnet is | |||
is connected to two VAs. | connected to two VAs. | |||
o Although these VAs provide IP connectivity to VMs and subnets | - Although these VAs provide IP connectivity to VMs and the | |||
behind them, they do not always have their own IP interface | subnets behind them, they do not always have their own IP | |||
connected to the EVPN NVE, e.g., layer 2 firewalls are | interface connected to the EVPN NVE; Layer 2 firewalls are | |||
examples of VAs not supporting IP interfaces. | examples of VAs not supporting IP interfaces. | |||
Figure 1 illustrates some of the examples described above. | Figure 1 illustrates some of the examples described above. | |||
NVE1 | NVE1 | |||
+-----------+ | +-----------+ | |||
TS1(VM)--| (BD-10) |-----+ | TS1(VM)--| (BD-10) |-----+ | |||
IP1/M1 +-----------+ | DGW1 | M1/IP1 +-----------+ | DGW1 | |||
+---------+ +-------------+ | +---------+ +-------------+ | |||
| |----| (BD-10) | | | |----| (BD-10) | | |||
SN1---+ NVE2 | | | IRB1\ | | SN1---+ NVE2 | | | IRB1\ | | |||
| +-----------+ | | | (IP-VRF)|---+ | | +-----------+ | | | (IP-VRF)|---+ | |||
SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ | SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ | |||
| IP2/M2 +-----------+ | VXLAN/ | ( ) | | M2/IP2 +-----------+ | VXLAN/ | ( ) | |||
IP4---+ <-+ | GENEVE | DGW2 ( WAN ) | IP4---+ <-+ | GENEVE | DGW2 ( WAN ) | |||
| | | +-------------+ (___) | | | | +-------------+ (___) | |||
vIP23 (floating) | |----| (BD-10) | | | vIP23 (floating) | |----| (BD-10) | | | |||
| +---------+ | IRB2\ | | | | +---------+ | IRB2\ | | | |||
SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ | SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ | |||
| IP3/M3 +-----------+ | | | +-------------+ | | M3/IP3 +-----------+ | | | +-------------+ | |||
SN3---TS3(VA)--| (BD-10) |---+ | | | SN3---TS3(VA)--| (BD-10) |---+ | | | |||
| +-----------+ | | | | +-----------+ | | | |||
IP5---+ | | | IP5---+ | | | |||
| | | | | | |||
NVE4 | | NVE5 +--SN5 | NVE4 | | NVE5 +--SN5 | |||
+---------------------+ | | +-----------+ | | +---------------------+ | | +-----------+ | | |||
IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 | IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 | |||
| \ | | +-----------+ | | | \ | | +-----------+ | | |||
| (IP-VRF) |--+ ESI4 +--SN7 | | (IP-VRF) |--+ ESI4 +--SN7 | |||
| / \IRB3 | | | / \IRB3 | | |||
|---| (BD-2) (BD-10) | | |---| (BD-2) (BD-10) | | |||
SN4| +---------------------+ | SN4| +---------------------+ | |||
Figure 1 DC inter-subnet use-cases | Note: | |||
ESI4 = Ethernet Segment Identifier 4 | ||||
Figure 1: DC Inter-subnet Use Cases | ||||
Where: | Where: | |||
NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same BD for a | NVE1, NVE2, NVE3, NVE4, NVE5, DGW1, and DGW2 share the same BD for a | |||
particular tenant. BD-10 is comprised of the collection of BD | particular tenant. BD-10 is comprised of the collection of BD | |||
instances defined in all the NVEs. All the hosts connected to BD-10 | instances defined in all the NVEs. All the hosts connected to BD-10 | |||
belong to the same IP subnet. The hosts connected to BD-10 are listed | belong to the same IP subnet. The hosts connected to BD-10 are | |||
below: | listed below: | |||
o TS1 is a VM that generates/receives traffic from/to IP1, where IP1 | * TS1 is a VM that generates/receives traffic to/from IP1, where IP1 | |||
belongs to the BD-10 subnet. | belongs to the BD-10 subnet. | |||
o TS2 and TS3 are Virtual Appliances (VA) that send/receive traffic | * TS2 and TS3 are VAs that send/receive traffic to/from the subnets | |||
from/to the subnets and hosts sitting behind them (SN1, SN2, SN3, | and hosts sitting behind them (SN1, SN2, SN3, IP4, and IP5). | |||
IP4 and IP5). Their IP addresses (IP2 and IP3) belong to the BD-10 | Their IP addresses (IP2 and IP3) belong to the BD-10 subnet, and | |||
subnet and they can also generate/receive traffic. When these VAs | they can also generate/receive traffic. When these VAs receive | |||
receive packets destined to their own MAC addresses (M2 and M3) | packets destined to their own MAC addresses (M2 and M3), they will | |||
they will route the packets to the proper subnet or host. These VAs | route the packets to the proper subnet or host. These VAs do not | |||
do not support routing protocols to advertise the subnets connected | support routing protocols to advertise the subnets connected to | |||
to them and can move to a different server and NVE when the Cloud | them and can move to a different server and NVE when the cloud | |||
Management System decides to do so. These VAs may also support | management system decides to do so. These VAs may also support | |||
redundancy mechanisms for some subnets, similar to VRRP, where a | redundancy mechanisms for some subnets, similar to VRRP, where a | |||
floating IP is owned by the master VA and only the master VA | floating IP is owned by the Master VA and only the Master VA | |||
forwards traffic to a given subnet. E.g.,: vIP23 in Figure 1 is a | forwards traffic to a given subnet. For example, vIP23 in | |||
floating IP that can be owned by TS2 or TS3 depending on which | Figure 1 is a floating IP that can be owned by TS2 or TS3 | |||
system is the master. Only the master will forward traffic to SN1. | depending on which system is the Master. Only the Master will | |||
forward traffic to SN1. | ||||
o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have | * Integrated Routing and Bridging interfaces IRB1, IRB2, and IRB3 | |||
their own IP addresses that belong to the BD-10 subnet too. These | have their own IP addresses that belong to the BD-10 subnet too. | |||
IRB interfaces connect the BD-10 subnet to Virtual Routing and | These IRB interfaces connect the BD-10 subnet to Virtual Routing | |||
Forwarding (IP-VRF) instances that can route the traffic to other | and Forwarding (IP-VRF) instances that can route the traffic to | |||
subnets for the same tenant (within the DC or at the other end of | other subnets for the same tenant (within the DC or at the other | |||
the WAN). | end of the WAN). | |||
o TS4 is a layer 2 VA that provides connectivity to subnets SN5, SN6 | * TS4 is a Layer 2 VA that provides connectivity to subnets SN5, | |||
and SN7, but does not have an IP address itself in the BD-10. TS4 | SN6, and SN7 but does not have an IP address itself in the BD-10. | |||
is connected to a port on NVE5 assigned to Ethernet Segment | TS4 is connected to a port on NVE5 that is assigned to Ethernet | |||
Identifier 4. | Segment Identifier 4 (ESI4). | |||
For a BD that an ingress NVE is attached to, "Overlay Index" is | For a BD to which an ingress NVE is attached, "Overlay Index" is | |||
defined as an identifier that the ingress EVPN NVE requires in order | defined as an identifier that the ingress EVPN NVE requires in order | |||
to forward packets to a subnet or host in a remote subnet. As an | to forward packets to a subnet or host in a remote subnet. As an | |||
example, vIP23 (Figure 1) is an Overlay Index that any NVE attached | example, vIP23 (Figure 1) is an Overlay Index that any NVE attached | |||
to BD-10 needs to know in order to forward packets to SN1. IRB3 IP | to BD-10 needs to know in order to forward packets to SN1. The IRB3 | |||
address is an Overlay Index required to get to SN4, and ESI4 | IP address is an Overlay Index required to get to SN4, and ESI4 is an | |||
(Ethernet Segment Identifier 4) is an Overlay Index needed to forward | Overlay Index needed to forward traffic to SN5. In other words, the | |||
traffic to SN5. In other words, the Overlay Index is a next-hop in | Overlay Index is a next hop in the overlay address space that can be | |||
the overlay address space that can be an IP address, a MAC address or | an IP address, a MAC address, or an ESI. When advertised along with | |||
an ESI. When advertised along with an IP Prefix, the Overlay Index | an IP prefix, the Overlay Index requires a recursive resolution to | |||
requires a recursive resolution to find out to what egress NVE the | find out the egress NVE to which the EVPN packets need to be sent. | |||
EVPN packets need to be sent. | ||||
All the DC use cases in Figure 1 require inter-subnet forwarding and | All the DC use cases in Figure 1 require inter-subnet forwarding; | |||
therefore, the individual host routes and subnets: | therefore, the individual host routes and subnets: | |||
a) must be advertised from the NVEs (since VAs and VMs do not | a) must be advertised from the NVEs (since VAs and VMs do not | |||
participate in dynamic routing protocols) and | participate in dynamic routing protocols) and | |||
b) may be associated to an Overlay Index that can be a VA IP address, | ||||
a floating IP address, a MAC address or an ESI. The Overlay Index | ||||
is further discussed in Section 3.2. | ||||
2.2 The Need for the EVPN IP Prefix Route | b) may be associated with an Overlay Index that can be a VA IP | |||
address, a floating IP address, a MAC address, or an ESI. The | ||||
Overlay Index is further discussed in Section 3.2. | ||||
[RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC | 2.2. The Need for the EVPN IP Prefix Route | |||
address can be advertised together with an IP address length and IP | ||||
address (IP). While a variable IP address length might have been used | [RFC7432] defines a MAC/IP Advertisement route (also referred to as | |||
to indicate the presence of an IP prefix in a route type 2, there are | "RT-2") where a MAC address can be advertised together with an IP | |||
several specific use cases in which using this route type to deliver | address length and IP address (IP). While a variable IP address | |||
IP Prefixes is not suitable. | length might have been used to indicate the presence of an IP prefix | |||
in a route type 2, there are several specific use cases in which | ||||
using this route type to deliver IP prefixes is not suitable. | ||||
One example of such use cases is the "floating IP" example described | One example of such use cases is the "floating IP" example described | |||
in Section 2.1. In this example it is needed to decouple the | in Section 2.1. In this example, it is necessary to decouple the | |||
advertisement of the prefixes from the advertisement of MAC address | advertisement of the prefixes from the advertisement of a MAC address | |||
of either M2 or M3, otherwise the solution gets highly inefficient | of either M2 or M3; otherwise, the solution gets highly inefficient | |||
and does not scale. | and does not scale. | |||
For example, if 1,000 prefixes are advertised from M2 (using RT-2) | For example, if 1,000 prefixes are advertised from M2 (using RT-2) | |||
and the floating IP owner changes from M2 to M3, 1,000 routes would | and the floating IP owner changes from M2 to M3, 1,000 routes would | |||
be withdrawn from M2 and readvertise 1k routes from M3. However if a | be withdrawn by M2 and readvertised by M3. However, if a separate | |||
separate route type is used, 1,000 routes can be advertised as | route type is used, 1,000 routes can be advertised as associated with | |||
associated to the floating IP address (vIP23) and only one RT-2 for | the floating IP address (vIP23), and only one RT-2 can be used for | |||
advertising the ownership of the floating IP, i.e., vIP23 and M2 in | advertising the ownership of the floating IP, i.e., vIP23 and M2 in | |||
the route type 2. When the floating IP owner changes from M2 to M3, a | the route type 2. When the floating IP owner changes from M2 to M3, | |||
single RT-2 withdraw/update is required to indicate the change. The | a single RT-2 withdrawal/update is required to indicate the change. | |||
remote DGW will not change any of the 1,000 prefixes associated to | The remote DGW will not change any of the 1,000 prefixes associated | |||
vIP23, but will only update the ARP resolution entry for vIP23 (now | with vIP23 but will only update the ARP resolution entry for vIP23 | |||
pointing at M3). | (now pointing at M3). | |||
An EVPN route (type 5) for the advertisement of IP Prefixes is | An EVPN route (type 5) for the advertisement of IP prefixes is | |||
described in this document. This new route type has a differentiated | described in this document. This new route type has a differentiated | |||
role from the RT-2 route and addresses the Data Center (or NVO-based | role from the RT-2 route and addresses the inter-subnet connectivity | |||
networks in general) inter-subnet connectivity scenarios described in | scenarios for DCs (or NVO-based networks in general) described in | |||
this document. Using this new RT-5, an IP Prefix may be advertised | this document. Using this new RT-5, an IP prefix may be advertised | |||
along with an Overlay Index that can be a GW IP address, a MAC or an | along with an Overlay Index, which can be a GW IP address, a MAC, or | |||
ESI, or without an Overlay Index, in which case the BGP next-hop will | an ESI. The IP prefix may also be advertised without an Overlay | |||
point at the egress NVE/ASBR/ABR and the MAC in the Router's MAC | Index, in which case the BGP next hop will point at the egress NVE, | |||
Extended Community will provide the inner MAC destination address to | Area Border Router (ABR), or ASBR, and the MAC in the EVPN Router's | |||
be used. As discussed throughout the document, the EVPN RT-2 does not | MAC Extended Community will provide the inner MAC destination address | |||
meet the requirements for all the DC use cases, therefore this EVPN | to be used. As discussed throughout the document, the EVPN RT-2 does | |||
route type 5 is required. | not meet the requirements for all the DC use cases; therefore, this | |||
EVPN route type 5 is required. | ||||
The EVPN route type 5 decouples the IP Prefix advertisements from the | The EVPN route type 5 decouples the IP prefix advertisements from the | |||
MAC/IP route advertisements in EVPN, hence: | MAC/IP Advertisement routes in EVPN. Hence: | |||
a) Allows the clean and clear advertisements of IPv4 or IPv6 prefixes | a) The clean and clear advertisements of IPv4 or IPv6 prefixes in a | |||
in an NLRI (Network Layer Reachability Information message) with | Network Layer Reachability Information (NLRI) message without MAC | |||
no MAC addresses. | addresses are allowed. | |||
b) Since the route type is different from the MAC/IP Advertisement | b) Since the route type is different from the MAC/IP Advertisement | |||
route, the current [RFC7432] procedures do not need to be | route, the current procedures described in [RFC7432] do not need | |||
modified. | to be modified. | |||
c) Allows a flexible implementation where the prefix can be linked to | c) A flexible implementation is allowed where the prefix can be | |||
different types of Overlay/Underlay Indexes: overlay IP address, | linked to different types of Overlay/Underlay Indexes: overlay IP | |||
overlay MAC addresses, overlay ESI, underlay BGP next-hops, etc. | addresses, overlay MAC addresses, overlay ESIs, underlay BGP next | |||
hops, etc. | ||||
d) An EVPN implementation not requiring IP Prefixes can simply | d) An EVPN implementation not requiring IP prefixes can simply | |||
discard them by looking at the route type value. | discard them by looking at the route type value. | |||
The following Sections describe how EVPN is extended with a route | The following sections describe how EVPN is extended with a route | |||
type for the advertisement of IP prefixes and how this route is used | type for the advertisement of IP prefixes and how this route is used | |||
to address the inter-subnet connectivity requirements existing in the | to address the inter-subnet connectivity requirements existing in the | |||
Data Center. | DC. | |||
3. The BGP EVPN IP Prefix Route | 3. The BGP EVPN IP Prefix Route | |||
The BGP EVPN NLRI as defined in [RFC7432] is shown below: | The BGP EVPN NLRI as defined in [RFC7432] is shown below: | |||
+-----------------------------------+ | +-----------------------------------+ | |||
| Route Type (1 octet) | | | Route Type (1 octet) | | |||
+-----------------------------------+ | +-----------------------------------+ | |||
| Length (1 octet) | | | Length (1 octet) | | |||
+-----------------------------------+ | +-----------------------------------+ | |||
| Route Type specific (variable) | | | Route Type specific (variable) | | |||
+-----------------------------------+ | +-----------------------------------+ | |||
Figure 2 BGP EVPN NLRI | Figure 2: BGP EVPN NLRI | |||
This document defines an additional route type (RT-5) in the IANA | This document defines an additional route type (RT-5) in the IANA | |||
EVPN Route Types registry [EVPNRouteTypes], to be used for the | "EVPN Route Types" registry [EVPNRouteTypes] to be used for the | |||
advertisement of EVPN routes using IP Prefixes: | advertisement of EVPN routes using IP prefixes: | |||
Value: 5 | Value: 5 | |||
Description: IP Prefix Route | Description: IP Prefix | |||
According to Section 5.4 in [RFC7606], a node that doesn't recognize | According to Section 5.4 of [RFC7606], a node that doesn't recognize | |||
the Route Type 5 (RT-5) will ignore it. Therefore an NVE following | the route type 5 (RT-5) will ignore it. Therefore, an NVE following | |||
this document can still be attached to a BD where an NVE ignoring RT- | this document can still be attached to a BD where an NVE ignoring RT- | |||
5s is attached to. Regular [RFC7432] procedures would apply in that | 5s is attached. Regular procedures described in [RFC7432] would | |||
case for both NVEs. In case two or more NVEs are attached to | apply in that case for both NVEs. In case two or more NVEs are | |||
different BDs of the same tenant, they MUST support RT-5 for the | attached to different BDs of the same tenant, they MUST support the | |||
proper Inter-Subnet Forwarding operation of the tenant. | RT-5 for the proper inter-subnet forwarding operation of the tenant. | |||
The detailed encoding of this route and associated procedures are | The detailed encoding of this route and associated procedures are | |||
described in the following Sections. | described in the following sections. | |||
3.1 IP Prefix Route Encoding | 3.1. IP Prefix Route Encoding | |||
An IP Prefix Route Type for IPv4 has the Length field set to 34 and | An IP Prefix route type for IPv4 has the Length field set to 34 and | |||
consists of the following fields: | consists of the following fields: | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| RD (8 octets) | | | RD (8 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
|Ethernet Segment Identifier (10 octets)| | |Ethernet Segment Identifier (10 octets)| | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| Ethernet Tag ID (4 octets) | | | Ethernet Tag ID (4 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| IP Prefix Length (1 octet, 0 to 32) | | | IP Prefix Length (1 octet, 0 to 32) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| IP Prefix (4 octets) | | | IP Prefix (4 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| GW IP Address (4 octets) | | | GW IP Address (4 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| MPLS Label (3 octets) | | | MPLS Label (3 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
Figure 3 EVPN IP Prefix route NLRI for IPv4 | Figure 3: EVPN IP Prefix Route NLRI for IPv4 | |||
An IP Prefix Route Type for IPv6 has the Length field set to 58 and | An IP Prefix route type for IPv6 has the Length field set to 58 and | |||
consists of the following fields: | consists of the following fields: | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| RD (8 octets) | | | RD (8 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
|Ethernet Segment Identifier (10 octets)| | |Ethernet Segment Identifier (10 octets)| | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| Ethernet Tag ID (4 octets) | | | Ethernet Tag ID (4 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| IP Prefix Length (1 octet, 0 to 128) | | | IP Prefix Length (1 octet, 0 to 128) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| IP Prefix (16 octets) | | | IP Prefix (16 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| GW IP Address (16 octets) | | | GW IP Address (16 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
| MPLS Label (3 octets) | | | MPLS Label (3 octets) | | |||
+---------------------------------------+ | +---------------------------------------+ | |||
Figure 4 EVPN IP Prefix route NLRI for IPv6 | Figure 4: EVPN IP Prefix Route NLRI for IPv6 | |||
Where: | Where: | |||
o The Length field of the BGP EVPN NLRI for an EVPN IP Prefix route | * The Length field of the BGP EVPN NLRI for an EVPN IP Prefix route | |||
MUST be either 34 (if IPv4 addresses are carried) or 58 (if IPv6 | MUST be either 34 (if IPv4 addresses are carried) or 58 (if IPv6 | |||
addresses are carried). The IP Prefix and Gateway IP Address MUST | addresses are carried). The IP prefix and gateway IP address MUST | |||
be from the same IP address family. | be from the same IP address family. | |||
o Route Distinguisher (RD) and Ethernet Tag ID MUST be used as | * The Route Distinguisher (RD) and Ethernet Tag ID MUST be used as | |||
defined in [RFC7432] and [RFC8365]. In particular, the RD is unique | defined in [RFC7432] and [RFC8365]. In particular, the RD is | |||
per MAC-VRF (or IP-VRF). The MPLS Label field is set to either an | unique per MAC-VRF (or IP-VRF). The MPLS Label field is set to | |||
MPLS label or a VNI, as described in [RFC8365] for other EVPN route | either an MPLS label or a VNI, as described in [RFC8365] for other | |||
types. | EVPN route types. | |||
o The Ethernet Segment Identifier MUST be a non-zero 10-octet | * The Ethernet Segment Identifier MUST be a non-zero 10-octet | |||
identifier if the ESI is used as an Overlay Index (see the | identifier if the ESI is used as an Overlay Index (see the | |||
definition of Overlay Index in Section 3.2). It MUST be all bytes | definition of "Overlay Index" in Section 3.2). It MUST be all | |||
zero otherwise. The ESI format is described in [RFC7432]. | bytes zero otherwise. The ESI format is described in [RFC7432]. | |||
o The IP Prefix Length can be set to a value between 0 and 32 (bits) | * The IP prefix length can be set to a value between 0 and 32 (bits) | |||
for IPv4 and between 0 and 128 for IPv6, and specifies the number | for IPv4 and between 0 and 128 for IPv6, and it specifies the | |||
of bits in the Prefix. The value MUST NOT be greater than 128. | number of bits in the prefix. The value MUST NOT be greater than | |||
128. | ||||
o The IP Prefix is a 4 or 16-octet field (IPv4 or IPv6). | * The IP prefix is a 4- or 16-octet field (IPv4 or IPv6). | |||
o The GW (Gateway) IP Address field is a 4 or 16-octet field (IPv4 or | * The GW IP Address field is a 4- or 16-octet field (IPv4 or IPv6) | |||
IPv6), and will encode a valid IP address as an Overlay Index for | and will encode a valid IP address as an Overlay Index for the IP | |||
the IP Prefixes. The GW IP field MUST be all bytes zero if it is | prefixes. The GW IP field MUST be all bytes zero if it is not | |||
not used as an Overlay Index. Refer to Section 3.2 for the | used as an Overlay Index. Refer to Section 3.2 for the definition | |||
definition and use of the Overlay Index. | and use of the Overlay Index. | |||
o The MPLS Label field is encoded as 3 octets, where the high-order | * The MPLS Label field is encoded as 3 octets, where the high-order | |||
20 bits contain the label value, as per [RFC7432]. When sending, | 20 bits contain the label value, as per [RFC7432]. When sending, | |||
the label value SHOULD be zero if recursive resolution based on | the label value SHOULD be zero if a recursive resolution based on | |||
overlay index is used. If the received MPLS Label value is zero, | an Overlay Index is used. If the received MPLS label value is | |||
the route MUST contain an Overlay Index and the ingress NVE/PE MUST | zero, the route MUST contain an Overlay Index, and the ingress | |||
do recursive resolution to find the egress NVE/PE. If the received | NVE/PE MUST perform a recursive resolution to find the egress NVE/ | |||
Label is zero and the route does not contain an Overlay Index, it | PE. If the received label is zero and the route does not contain | |||
MUST be treat-as-withdraw [RFC7606]. | an Overlay Index, it MUST be "treat as withdraw" [RFC7606]. | |||
The RD, Ethernet Tag ID, IP Prefix Length and IP Prefix are part of | The RD, Ethernet Tag ID, IP prefix length, and IP prefix are part of | |||
the route key used by BGP to compare routes. The rest of the fields | the route key used by BGP to compare routes. The rest of the fields | |||
are not part of the route key. | are not part of the route key. | |||
An IP Prefix Route MAY be sent along with a Router's MAC Extended | An IP Prefix route MAY be sent along with an EVPN Router's MAC | |||
Community (defined in [EVPN-INTERSUBNET]) to carry the MAC address | Extended Community (defined in [RFC9135]) to carry the MAC address | |||
that is used as the overlay index. Note that the MAC address may be | that is used as the Overlay Index. Note that the MAC address may be | |||
that of an TS. | that of a TS. | |||
As described in Section 3.2, certain data combinations in a received | As described in Section 3.2, certain data combinations in a received | |||
routes would imply a "treat-as-withdraw" handling of the route | route would imply a treat-as-withdraw handling of the route | |||
[RFC7606]. | [RFC7606]. | |||
3.2 Overlay Indexes and Recursive Lookup Resolution | 3.2. Overlay Indexes and Recursive Lookup Resolution | |||
RT-5 routes support recursive lookup resolution through the use of | RT-5 routes support recursive lookup resolution through the use of | |||
Overlay Indexes as follows: | Overlay Indexes as follows: | |||
o An Overlay Index can be an ESI, IP address in the address space of | * An Overlay Index can be an ESI or IP address in the address space | |||
the tenant or MAC address and it is used by an NVE as the next-hop | of the tenant or MAC address, and it is used by an NVE as the next | |||
for a given IP Prefix. An Overlay Index always needs a recursive | hop for a given IP prefix. An Overlay Index always needs a | |||
route resolution on the NVE/PE that installs the RT-5 into one of | recursive route resolution on the NVE/PE that installs the RT-5 | |||
its IP-VRFs, so that the NVE knows to which egress NVE/PE it needs | into one of its IP-VRFs so that the NVE knows to which egress NVE/ | |||
to forward the packets. It is important to note that recursive | PE it needs to forward the packets. It is important to note that | |||
resolution of the Overlay Index applies upon installation into an | recursive resolution of the Overlay Index applies upon | |||
IP-VRF, and not upon BGP propagation (for instance, on an ASBR). | installation into an IP-VRF and not upon BGP propagation (for | |||
Also, as a result of the recursive resolution, the egress NVE/PE is | instance, on an ASBR). Also, as a result of the recursive | |||
not necessarily the same NVE that originated the RT-5. | resolution, the egress NVE/PE is not necessarily the same NVE that | |||
originated the RT-5. | ||||
o The Overlay Index is indicated along with the RT-5 in the ESI | * The Overlay Index is indicated along with the RT-5 in the ESI | |||
field, GW IP field or Router's MAC Extended Community, depending on | field, GW IP field, or EVPN Router's MAC Extended Community, | |||
whether the IP Prefix next-hop is an ESI, IP address or MAC address | depending on whether the IP prefix next hop is an ESI, an IP | |||
in the tenant space. The Overlay Index for a given IP Prefix is set | address, or a MAC address in the tenant space. The Overlay Index | |||
by local policy at the NVE that originates an RT-5 for that IP | for a given IP prefix is set by local policy at the NVE that | |||
Prefix (typically managed by the Cloud Management System). | originates an RT-5 for that IP prefix (typically managed by the | |||
cloud management system). | ||||
o In order to enable the recursive lookup resolution at the ingress | * In order to enable the recursive lookup resolution at the ingress | |||
NVE, an NVE that is a possible egress NVE for a given Overlay Index | NVE, an NVE that is a possible egress NVE for a given Overlay | |||
must originate a route advertising itself as the BGP next hop on | Index must originate a route advertising itself as the BGP next | |||
the path to the system denoted by the Overlay Index. For instance: | hop on the path to the system denoted by the Overlay Index. For | |||
instance: | ||||
. If an NVE receives an RT-5 that specifies an Overlay Index, the | - If an NVE receives an RT-5 that specifies an Overlay Index, the | |||
NVE cannot use the RT-5 in its IP-VRF unless (or until) it can | NVE cannot use the RT-5 in its IP-VRF unless (or until) it can | |||
recursively resolve the Overlay Index. | recursively resolve the Overlay Index. | |||
. If the RT-5 specifies an ESI as the Overlay Index, recursive | ||||
resolution can only be done if the NVE has received and installed | ||||
an RT-1 (Auto-Discovery per-EVI) route specifying that ESI. | ||||
. If the RT-5 specifies a GW IP address as the Overlay Index, | ||||
recursive resolution can only be done if the NVE has received and | ||||
installed an RT-2 (MAC/IP route) specifying that IP address in | ||||
the IP address field of its NLRI. | ||||
. If the RT-5 specifies a MAC address as the Overlay Index, | ||||
recursive resolution can only be done if the NVE has received and | ||||
installed an RT-2 (MAC/IP route) specifying that MAC address in | ||||
the MAC address field of its NLRI. | ||||
Note that the RT-1 or RT-2 routes needed for the recursive | - If the RT-5 specifies an ESI as the Overlay Index, a recursive | |||
resolution may arrive before or after the given RT-5 route. | resolution can only be done if the NVE has received and | |||
installed an RT-1 (auto-discovery per EVI) route specifying | ||||
that ESI. | ||||
o Irrespective of the recursive resolution, if there is no IGP or BGP | - If the RT-5 specifies a GW IP address as the Overlay Index, a | |||
route to the BGP next-hop of an RT-5, BGP MUST NOT install the RT-5 | recursive resolution can only be done if the NVE has received | |||
even if the Overlay Index can be resolved. | and installed an RT-2 (MAC/IP Advertisement route) specifying | |||
that IP address in the IP Address field of its NLRI. | ||||
o The ESI and GW IP fields may both be zero at the same time. | - If the RT-5 specifies a MAC address as the Overlay Index, a | |||
However, they MUST NOT both be non-zero at the same time. A route | recursive resolution can only be done if the NVE has received | |||
containing a non-zero GW IP and a non-zero ESI (at the same time) | and installed an RT-2 (MAC/IP Advertisement route) specifying | |||
SHOULD be treat-as-withdraw [RFC7606]. | that MAC address in the MAC Address field of its NLRI. | |||
o If either the ESI or GW IP are non-zero, then the non-zero one is | Note that the RT-1 or RT-2 routes needed for the recursive | |||
the Overlay Index, regardless of whether the Router's MAC Extended | resolution may arrive before or after the given RT-5 route. | |||
Community is present or the value of the Label. In case the GW IP | ||||
is the Overlay Index (hence ESI is zero), the Router's MAC Extended | ||||
Community is ignored if present. | ||||
o A route where ESI, GW IP, MAC and Label are all zero at the same | * Irrespective of the recursive resolution, if there is no IGP or | |||
time SHOULD be treat-as-withdraw. | BGP route to the BGP next hop of an RT-5, BGP MUST NOT install the | |||
RT-5 even if the Overlay Index can be resolved. | ||||
* The ESI and GW IP fields may both be zero at the same time. | ||||
However, they MUST NOT both be non-zero at the same time. A route | ||||
containing a non-zero GW IP and a non-zero ESI (at the same time) | ||||
SHOULD be treat as withdraw [RFC7606]. | ||||
* If either the ESI or the GW IP are non-zero, then the non-zero one | ||||
is the Overlay Index, regardless of whether the EVPN Router's MAC | ||||
Extended Community is present or the value of the label. In case | ||||
the GW IP is the Overlay Index (hence, ESI is zero), the EVPN | ||||
Router's MAC Extended Community is ignored if present. | ||||
* A route where ESI, GW IP, MAC, and Label are all zero at the same | ||||
time SHOULD be treat as withdraw. | ||||
The indirection provided by the Overlay Index and its recursive | The indirection provided by the Overlay Index and its recursive | |||
lookup resolution is required to achieve fast convergence in case of | lookup resolution is required to achieve fast convergence in case of | |||
a failure of the object represented by the Overlay Index (see the | a failure of the object represented by the Overlay Index (see the | |||
example described in Section 2.2). | example described in Section 2.2). | |||
Table 1 shows the different RT-5 field combinations allowed by this | Table 1 shows the different RT-5 field combinations allowed by this | |||
specification and what Overlay Index must be used by the receiving | specification and what Overlay Index must be used by the receiving | |||
NVE/PE in each case. Those cases where there is no Overlay Index, are | NVE/PE in each case. Cases where there is no Overlay Index are | |||
indicated as "None" in Table 1. If there is no Overlay Index the | indicated as "None" in Table 1. If there is no Overlay Index, the | |||
receiving NVE/PE will not perform any recursive resolution, and the | receiving NVE/PE will not perform any recursive resolution, and the | |||
actual next-hop is given by the RT-5's BGP next-hop. | actual next hop is given by the RT-5's BGP next hop. | |||
+----------+----------+----------+------------+----------------+ | +==========+==========+==========+============+===============+ | |||
| ESI | GW IP | MAC* | Label | Overlay Index | | | ESI | GW IP | MAC* | Label | Overlay Index | | |||
|--------------------------------------------------------------| | +==========+==========+==========+============+===============+ | |||
| Non-Zero | Zero | Zero | Don't Care | ESI | | | Non-Zero | Zero | Zero | Don't Care | ESI | | |||
| Non-Zero | Zero | Non-Zero | Don't Care | ESI | | +----------+----------+----------+------------+---------------+ | |||
| Zero | Non-Zero | Zero | Don't Care | GW IP | | | Non-Zero | Zero | Non-Zero | Don't Care | ESI | | |||
| Zero | Zero | Non-Zero | Zero | MAC | | +----------+----------+----------+------------+---------------+ | |||
| Zero | Zero | Non-Zero | Non-Zero | MAC or None** | | | Zero | Non-Zero | Zero | Don't Care | GW IP | | |||
| Zero | Zero | Zero | Non-Zero | None*** | | +----------+----------+----------+------------+---------------+ | |||
+----------+----------+----------+------------+----------------+ | | Zero | Zero | Non-Zero | Zero | MAC | | |||
+----------+----------+----------+------------+---------------+ | ||||
| Zero | Zero | Non-Zero | Non-Zero | MAC or None** | | ||||
+----------+----------+----------+------------+---------------+ | ||||
| Zero | Zero | Zero | Non-Zero | None*** | | ||||
+----------+----------+----------+------------+---------------+ | ||||
Table 1 - RT-5 fields and Indicated Overlay Index | Table 1: RT-5 Fields and Indicated Overlay Index | |||
Table NOTES: | Table Notes: | |||
* MAC with Zero value means no Router's MAC extended community is | * MAC with "Zero" value means no EVPN Router's MAC Extended | |||
present along with the RT-5. Non-Zero indicates that the extended | Community is present along with the RT-5. "Non-Zero" indicates | |||
community is present and carries a valid MAC address. The | that the extended community is present and carries a valid MAC | |||
encoding of a MAC address MUST be the 6-octet MAC address | address. The encoding of a MAC address MUST be the 6-octet MAC | |||
specified by [802.1Q] and [802.1D-REV]. Examples of invalid MAC | address specified by [IEEE-802.1Q]. Examples of invalid MAC | |||
addresses are broadcast or multicast MAC addresses. The route | addresses are broadcast or multicast MAC addresses. The route | |||
MUST be treat-as-withdraw in case of an invalid MAC address. The | MUST be treat as withdraw in case of an invalid MAC address. | |||
presence of the Router's MAC extended community alone is not | The presence of the EVPN Router's MAC Extended Community alone | |||
enough to indicate the use of the MAC address as the Overlay | is not enough to indicate the use of the MAC address as the | |||
Index, since the extended community can be used for other | Overlay Index since the extended community can be used for | |||
purposes. | other purposes. | |||
** In this case, the Overlay Index may be the RT-5's MAC address or | ** In this case, the Overlay Index may be the RT-5's MAC address | |||
None, depending on the local policy of the receiving NVE/PE. Note | or "None", depending on the local policy of the receiving NVE/ | |||
that the advertising NVE/PE that sets the Overlay Index SHOULD | PE. Note that the advertising NVE/PE that sets the Overlay | |||
advertise an RT-2 for the MAC Overlay Index if there are | Index SHOULD advertise an RT-2 for the MAC Overlay Index if | |||
receiving NVE/PEs configured to use the MAC as the Overlay Index. | there are receiving NVE/PEs configured to use the MAC as the | |||
This case in Table 1 is used in the IP-VRF-to-IP-VRF | Overlay Index. This case in Table 1 is used in the IP-VRF-to- | |||
implementations described in 4.4.1 and 4.4.3. The support of a | IP-VRF implementations described in Sections 4.4.1 and 4.4.3. | |||
MAC Overlay Index in this model is OPTIONAL. | The support of a MAC Overlay Index in this model is OPTIONAL. | |||
*** The Overlay Index is None. This is a special case used for IP- | *** The Overlay Index is "None". This is a special case used for | |||
VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO tunnels | IP-VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO | |||
as opposed to Ethernet NVO tunnels. | tunnels as opposed to Ethernet NVO tunnels. | |||
If the combination of ESI, GW IP, MAC and Label in the receiving RT-5 | If the combination of ESI, GW IP, MAC, and Label in the receiving | |||
is different than the combinations shown in Table 1, the router will | RT-5 is different than the combinations shown in Table 1, the router | |||
process the route as per the rules described at the beginning of this | will process the route as per the rules described at the beginning of | |||
Section (3.2). | this section (Section 3.2). | |||
Table 2 shows the different inter-subnet use-cases described in this | Table 2 shows the different inter-subnet use cases described in this | |||
document and the corresponding coding of the Overlay Index in the | document and the corresponding coding of the Overlay Index in the | |||
route type 5 (RT-5). | route type 5 (RT-5). | |||
+---------+---------------------+----------------------------+ | +=========+=====================+===========================+ | |||
| Section | Use-case | Overlay Index in the RT-5 | | | Section | Use Case | Overlay Index in the RT-5 | | |||
+-------------------------------+----------------------------+ | +=========+=====================+===========================+ | |||
| 4.1 | TS IP address | GW IP | | | 4.1 | TS IP address | GW IP | | |||
| 4.2 | Floating IP address | GW IP | | +---------+---------------------+---------------------------+ | |||
| 4.3 | "Bump in the wire" | ESI or MAC | | | 4.2 | Floating IP address | GW IP | | |||
| 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC or None | | +---------+---------------------+---------------------------+ | |||
+---------+---------------------+----------------------------+ | | 4.3 | "Bump-in-the-wire" | ESI or MAC | | |||
+---------+---------------------+---------------------------+ | ||||
| 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC, or None | | ||||
+---------+---------------------+---------------------------+ | ||||
Table 2 - Use-cases and Overlay Indexes for Recursive Resolution | Table 2: Use Cases and Overlay Indexes for Recursive | |||
Resolution | ||||
The above use-cases are representative of the different Overlay | The above use cases are representative of the different Overlay | |||
Indexes supported by RT-5 (GW IP, ESI, MAC or None). | Indexes supported by the RT-5 (GW IP, ESI, MAC, or None). | |||
4. Overlay Index Use-Cases | 4. Overlay Index Use Cases | |||
This Section describes some use-cases for the Overlay Index types | ||||
used with the IP Prefix route. Although the examples use IPv4 | ||||
Prefixes and subnets, the descriptions of the RT-5 are valid for the | ||||
same cases with IPv6, only replacing the IP Prefixes, IPL and GW IP | ||||
by the corresponding IPv6 values. | ||||
4.1 TS IP Address Overlay Index Use-Case | This section describes some use cases for the Overlay Index types | |||
used with the IP Prefix route. Although the examples use IPv4 | ||||
prefixes and subnets, the descriptions of the RT-5 are valid for the | ||||
same cases with IPv6, except that IP Prefixes, IPL, and GW IP are | ||||
replaced by the corresponding IPv6 values. | ||||
4.1. TS IP Address Overlay Index Use Case | ||||
Figure 5 illustrates an example of inter-subnet forwarding for | Figure 5 illustrates an example of inter-subnet forwarding for | |||
subnets sitting behind Virtual Appliances (on TS2 and TS3). | subnets sitting behind VAs (on TS2 and TS3). | |||
IP4---+ NVE2 DGW1 | IP4---+ NVE2 DGW1 | |||
| +-----------+ +---------+ +-------------+ | | +-----------+ +---------+ +-------------+ | |||
SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | |||
| IP2/M2 +-----------+ | | | IRB1\ | | | M2/IP2 +-----------+ | | | IRB1\ | | |||
-+---+ | | | (IP-VRF)|---+ | -+---+ | | | (IP-VRF)|---+ | |||
| | | +-------------+ _|_ | | | | +-------------+ _|_ | |||
SN1 | VXLAN/ | ( ) | SN1 | VXLAN/ | ( ) | |||
| | GENEVE | DGW2 ( WAN ) | | | GENEVE | DGW2 ( WAN ) | |||
-+---+ NVE3 | | +-------------+ (___) | -+---+ NVE3 | | +-------------+ (___) | |||
| IP3/M3 +-----------+ | |----| (BD-10) | | | | M3/IP3 +-----------+ | |----| (BD-10) | | | |||
SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | |||
| +-----------+ +---------+ | (IP-VRF)|---+ | | +-----------+ +---------+ | (IP-VRF)|---+ | |||
IP5---+ +-------------+ | IP5---+ +-------------+ | |||
Figure 5 TS IP address use-case | Figure 5: TS IP Address Use Case | |||
An example of inter-subnet forwarding between subnet SN1, which uses | An example of inter-subnet forwarding between subnet SN1, which uses | |||
a 24 bit IP prefix (written as SN1/24 in future), and a subnet | a 24-bit IP prefix (written as SN1/24 in the future), and a subnet | |||
sitting in the WAN is described below. NVE2, NVE3, DGW1 and DGW2 are | sitting in the WAN is described below. NVE2, NVE3, DGW1, and DGW2 | |||
running BGP EVPN. TS2 and TS3 do not participate in dynamic routing | are running BGP EVPN. TS2 and TS3 do not participate in dynamic | |||
protocols, and they only have a static route to forward the traffic | routing protocols, and they only have a static route to forward the | |||
to the WAN. SN1/24 is dual-homed to NVE2 and NVE3. | traffic to the WAN. SN1/24 is dual-homed to NVE2 and NVE3. | |||
In this case, a GW IP is used as an Overlay Index. Although a | In this case, a GW IP is used as an Overlay Index. Although a | |||
different Overlay Index type could have been used, this use-case | different Overlay Index type could have been used, this use case | |||
assumes that the operator knows the VA's IP addresses beforehand, | assumes that the operator knows the VA's IP addresses beforehand, | |||
whereas the VA's MAC address is unknown and the VA's ESI is zero. | whereas the VA's MAC address is unknown and the VA's ESI is zero. | |||
Because of this, the GW IP is the suitable Overlay Index to be used | Because of this, the GW IP is the suitable Overlay Index to be used | |||
with the RT-5s. The NVEs know the GW IP to be used for a given Prefix | with the RT-5s. The NVEs know the GW IP to be used for a given | |||
by policy. | prefix by policy. | |||
(1) NVE2 advertises the following BGP routes on behalf of TS2: | (1) NVE2 advertises the following BGP routes on behalf of TS2: | |||
o Route type 2 (MAC/IP route) containing: ML=48 (MAC Address | * Route type 2 (MAC/IP Advertisement route) containing: ML = 48 | |||
Length), M=M2 (MAC Address), IPL=32 (IP Prefix Length), IP=IP2 | (MAC address length), M = M2 (MAC address), IPL = 32 (IP | |||
and [RFC5512] BGP Encapsulation Extended Community with the | prefix length), IP = IP2, and BGP Encapsulation Extended | |||
corresponding Tunnel type. The MAC and IP addresses may be | Community [RFC9012] with the corresponding tunnel type. The | |||
learned via ARP snooping. | MAC and IP addresses may be learned via ARP snooping. | |||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=0, GW IP address=IP2. The prefix and GW IP are learned by | SN1, ESI = 0, and GW IP address = IP2. The prefix and GW IP | |||
policy. | are learned by policy. | |||
(2) Similarly, NVE3 advertises the following BGP routes on behalf of | (2) Similarly, NVE3 advertises the following BGP routes on behalf of | |||
TS3: | TS3: | |||
o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32, | * Route type 2 (MAC/IP Advertisement route) containing: ML = | |||
IP=IP3 (and BGP Encapsulation Extended Community). | 48, M = M3, IPL = 32, IP = IP3 (and BGP Encapsulation | |||
Extended Community). | ||||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=0, GW IP address=IP3. | SN1, ESI = 0, and GW IP address = IP3. | |||
(3) DGW1 and DGW2 import both received routes based on the Route | (3) DGW1 and DGW2 import both received routes based on the Route | |||
Targets: | Targets: | |||
o Based on the BD-10 Route Target in DGW1 and DGW2, the MAC/IP | * Based on the BD-10 Route Target in DGW1 and DGW2, the MAC/IP | |||
route is imported and M2 is added to the BD-10 along with its | Advertisement route is imported, and M2 is added to the BD-10 | |||
corresponding tunnel information. For instance, if VXLAN is | along with its corresponding tunnel information. For | |||
used, the VTEP will be derived from the MAC/IP route BGP next- | instance, if VXLAN is used, the VTEP will be derived from the | |||
hop and VNI from the MPLS Label1 field. IP2 - M2 is added to | MAC/IP Advertisement route BGP next hop and VNI from the MPLS | |||
the ARP table. Similarly, M3 is added to BD-10 and IP3 - M3 to | Label1 field. M2/IP2 is added to the ARP table. Similarly, | |||
the ARP table. | M3 is added to BD-10, and M3/IP3 is added to the ARP table. | |||
o Based on the BD-10 Route Target in DGW1 and DGW2, the IP | * Based on the BD-10 Route Target in DGW1 and DGW2, the IP | |||
Prefix route is also imported and SN1/24 is added to the IP- | Prefix route is also imported, and SN1/24 is added to the IP- | |||
VRF with Overlay Index IP2 pointing at the local BD-10. In | VRF with Overlay Index IP2 pointing at the local BD-10. In | |||
this example, it is assumed that the RT-5 from NVE2 is | this example, it is assumed that the RT-5 from NVE2 is | |||
preferred over the RT-5 from NVE3. If both routes were equally | preferred over the RT-5 from NVE3. If both routes were | |||
preferable and ECMP enabled, SN1/24 would also be added to the | equally preferable and ECMP enabled, SN1/24 would also be | |||
routing table with Overlay Index IP3. | added to the routing table with Overlay Index IP3. | |||
(4) When DGW1 receives a packet from the WAN with destination IPx, | (4) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table and Overlay Index=IP2 is found. Since IP2 is an | table, and Overlay Index = IP2 is found. Since IP2 is an | |||
Overlay Index a recursive route resolution is required for | Overlay Index, a recursive route resolution is required for | |||
IP2. | IP2. | |||
o IP2 is resolved to M2 in the ARP table, and M2 is resolved to | * IP2 is resolved to M2 in the ARP table, and M2 is resolved to | |||
the tunnel information given by the BD FIB (e.g., remote VTEP | the tunnel information given by the BD FIB (e.g., remote VTEP | |||
and VNI for the VXLAN case). | and VNI for the VXLAN case). | |||
o The IP packet destined to IPx is encapsulated with: | * The IP packet destined to IPx is encapsulated with: | |||
. Source inner MAC = IRB1 MAC. | - Inner source MAC = IRB1 MAC. | |||
. Destination inner MAC = M2. | - Inner destination MAC = M2. | |||
. Tunnel information provided by the BD (VNI, VTEP IPs and | - Tunnel information provided by the BD (VNI, VTEP IPs, and | |||
MACs for the VXLAN case). | MACs for the VXLAN case). | |||
(5) When the packet arrives at NVE2: | (5) When the packet arrives at NVE2: | |||
o Based on the tunnel information (VNI for the VXLAN case), the | * Based on the tunnel information (VNI for the VXLAN case), the | |||
BD-10 context is identified for a MAC lookup. | BD-10 context is identified for a MAC lookup. | |||
o Encapsulation is stripped off and based on a MAC lookup | * Encapsulation is stripped off and, based on a MAC lookup | |||
(assuming MAC forwarding on the egress NVE), the packet is | (assuming MAC forwarding on the egress NVE), the packet is | |||
forwarded to TS2, where it will be properly routed. | forwarded to TS2, where it will be properly routed. | |||
(6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will | (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will | |||
be applied to the MAC route IP2/M2, as defined in [RFC7432]. | be applied to the MAC route M2/IP2, as defined in [RFC7432]. | |||
Route type 5 prefixes are not subject to MAC mobility procedures, | Route type 5 prefixes are not subject to MAC Mobility | |||
hence no changes in the DGW IP-VRF routing table will occur for | procedures; hence, no changes in the DGW IP-VRF table will occur | |||
TS2 mobility, i.e., all the prefixes will still be pointing at | for TS2 mobility -- i.e., all the prefixes will still be | |||
IP2 as Overlay Index. There is an indirection for e.g., SN1/24, | pointing at IP2 as the Overlay Index. There is an indirection | |||
which still points at Overlay Index IP2 in the routing table, but | for, e.g., SN1/24, which still points at Overlay Index IP2 in | |||
IP2 will be simply resolved to a different tunnel, based on the | the routing table, but IP2 will be simply resolved to a | |||
outcome of the MAC mobility procedures for the MAC/IP route | different tunnel based on the outcome of the MAC Mobility | |||
IP2/M2. | procedures for the MAC/IP Advertisement route M2/IP2. | |||
Note that in the opposite direction, TS2 will send traffic based on | Note that in the opposite direction, TS2 will send traffic based on | |||
its static-route next-hop information (IRB1 and/or IRB2), and regular | its static-route next-hop information (IRB1 and/or IRB2), and regular | |||
EVPN procedures will be applied. | EVPN procedures will be applied. | |||
4.2 Floating IP Overlay Index Use-Case | 4.2. Floating IP Overlay Index Use Case | |||
Sometimes Tenant Systems (TS) work in active/standby mode where an | Sometimes TSs work in active/standby mode where an upstream floating | |||
upstream floating IP - owned by the active TS - is used as the | IP owned by the active TS is used as the Overlay Index to get to some | |||
Overlay Index to get to some subnets behind. This redundancy mode, | subnets behind the TS. This redundancy mode, already introduced in | |||
already introduced in Section 2.1 and 2.2, is illustrated in Figure | Sections 2.1 and 2.2, is illustrated in Figure 6. | |||
6. | ||||
NVE2 DGW1 | NVE2 DGW1 | |||
+-----------+ +---------+ +-------------+ | +-----------+ +---------+ +-------------+ | |||
+---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | |||
| IP2/M2 +-----------+ | | | IRB1\ | | | M2/IP2 +-----------+ | | | IRB1\ | | |||
| <-+ | | | (IP-VRF)|---+ | | <-+ | | | (IP-VRF)|---+ | |||
| | | | +-------------+ _|_ | | | | | +-------------+ _|_ | |||
SN1 vIP23 (floating) | VXLAN/ | ( ) | SN1 vIP23 (floating) | VXLAN/ | ( ) | |||
| | | GENEVE | DGW2 ( WAN ) | | | | GENEVE | DGW2 ( WAN ) | |||
| <-+ NVE3 | | +-------------+ (___) | | <-+ NVE3 | | +-------------+ (___) | |||
| IP3/M3 +-----------+ | |----| (BD-10) | | | | M3/IP3 +-----------+ | |----| (BD-10) | | | |||
+---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | |||
+-----------+ +---------+ | (IP-VRF)|---+ | +-----------+ +---------+ | (IP-VRF)|---+ | |||
+-------------+ | +-------------+ | |||
Figure 6 Floating IP Overlay Index for redundant TS | Figure 6: Floating IP Overlay Index for Redundant TS | |||
In this use-case, a GW IP is used as an Overlay Index for the same | In this use case, a GW IP is used as an Overlay Index for the same | |||
reasons as in 4.1. However, this GW IP is a floating IP that belongs | reasons as in Section 4.1. However, this GW IP is a floating IP that | |||
to the active TS. Assuming TS2 is the active TS and owns vIP23: | belongs to the active TS. Assuming TS2 is the active TS and owns | |||
vIP23: | ||||
(1) NVE2 advertises the following BGP routes for TS2: | (1) NVE2 advertises the following BGP routes for TS2: | |||
o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32, | * Route type 2 (MAC/IP Advertisement route) containing: ML = | |||
IP=vIP23 (and BGP Encapsulation Extended Community). The MAC | 48, M = M2, IPL = 32, and IP = vIP23 (as well as BGP | |||
and IP addresses may be learned via ARP snooping. | Encapsulation Extended Community). The MAC and IP addresses | |||
may be learned via ARP snooping. | ||||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=0, GW IP address=vIP23. The prefix and GW IP are learned | SN1, ESI = 0, and GW IP address = vIP23. The prefix and GW | |||
by policy. | IP are learned by policy. | |||
(2) NVE3 advertises the following BGP route for TS3 (it does not | (2) NVE3 advertises the following BGP route for TS3 (it does not | |||
advertise an RT-2 for vIP23/M3): | advertise an RT-2 for M3/vIP23): | |||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=0, GW IP address=vIP23. The prefix and GW IP are learned | SN1, ESI = 0, and GW IP address = vIP23. The prefix and GW | |||
by policy. | IP are learned by policy. | |||
(3) DGW1 and DGW2 import both received routes based on the Route | (3) DGW1 and DGW2 import both received routes based on the Route | |||
Target: | Target: | |||
o M2 is added to the BD-10 FIB along with its corresponding | * M2 is added to the BD-10 FIB along with its corresponding | |||
tunnel information. For the VXLAN use case, the VTEP will be | tunnel information. For the VXLAN use case, the VTEP will be | |||
derived from the MAC/IP route BGP next-hop and VNI from the | derived from the MAC/IP Advertisement route BGP next hop and | |||
VNI field. vIP23 - M2 is added to the ARP table. | VNI from the VNI field. M2/vIP23 is added to the ARP table. | |||
o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay | * SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay | |||
index vIP23 pointing at M2 in the local BD-10. | Index vIP23 pointing at M2 in the local BD-10. | |||
(4) When DGW1 receives a packet from the WAN with destination IPx, | (4) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table and Overlay Index=vIP23 is found. Since vIP23 is | table, and Overlay Index = vIP23 is found. Since vIP23 is an | |||
an Overlay Index, a recursive route resolution for vIP23 is | Overlay Index, a recursive route resolution for vIP23 is | |||
required. | required. | |||
o vIP23 is resolved to M2 in the ARP table, and M2 is resolved | * vIP23 is resolved to M2 in the ARP table, and M2 is resolved | |||
to the tunnel information given by the BD (remote VTEP and VNI | to the tunnel information given by the BD (remote VTEP and | |||
for the VXLAN case). | VNI for the VXLAN case). | |||
o The IP packet destined to IPx is encapsulated with: | * The IP packet destined to IPx is encapsulated with: | |||
. Source inner MAC = IRB1 MAC. | - Inner source MAC = IRB1 MAC. | |||
. Destination inner MAC = M2. | - Inner destination MAC = M2. | |||
. Tunnel information provided by the BD FIB (VNI, VTEP IPs | - Tunnel information provided by the BD FIB (VNI, VTEP IPs, | |||
and MACs for the VXLAN case). | and MACs for the VXLAN case). | |||
(5) When the packet arrives at NVE2: | (5) When the packet arrives at NVE2: | |||
o Based on the tunnel information (VNI for the VXLAN case), the | * Based on the tunnel information (VNI for the VXLAN case), the | |||
BD-10 context is identified for a MAC lookup. | BD-10 context is identified for a MAC lookup. | |||
o Encapsulation is stripped off and based on a MAC lookup | * Encapsulation is stripped off and, based on a MAC lookup | |||
(assuming MAC forwarding on the egress NVE), the packet is | (assuming MAC forwarding on the egress NVE), the packet is | |||
forwarded to TS2, where it will be properly routed. | forwarded to TS2, where it will be properly routed. | |||
(6) When the redundancy protocol running between TS2 and TS3 appoints | (6) When the redundancy protocol running between TS2 and TS3 | |||
TS3 as the new active TS for SN1, TS3 will now own the floating | appoints TS3 as the new active TS for SN1, TS3 will now own the | |||
vIP23 and will signal this new ownership, using a gratuitous ARP | floating vIP23 and will signal this new ownership using a | |||
REPLY message (explained in [RFC5227]) or similar. Upon receiving | gratuitous ARP REPLY message (explained in [RFC5227]) or | |||
the new owner's notification, NVE3 will issue a route type 2 for | similar. Upon receiving the new owner's notification, NVE3 will | |||
M3-vIP23 and NVE2 will withdraw the RT-2 for M2-vIP23. DGW1 and | issue a route type 2 for M3/vIP23, and NVE2 will withdraw the | |||
DGW2 will update their ARP tables with the new MAC resolving the | RT-2 for M2/vIP23. DGW1 and DGW2 will update their ARP tables | |||
floating IP. No changes are made in the IP-VRF routing table. | with the new MAC resolving the floating IP. No changes are made | |||
in the IP-VRF table. | ||||
4.3 Bump-in-the-Wire Use-Case | 4.3. Bump-in-the-Wire Use Case | |||
Figure 7 illustrates an example of inter-subnet forwarding for an IP | Figure 7 illustrates an example of inter-subnet forwarding for an IP | |||
Prefix route that carries a subnet SN1. In this use-case, TS2 and TS3 | Prefix route that carries subnet SN1. In this use case, TS2 and TS3 | |||
are layer 2 VA devices without any IP address that can be included as | are Layer 2 VA devices without any IP addresses that can be included | |||
an Overlay Index in the GW IP field of the IP Prefix route. Their MAC | as an Overlay Index in the GW IP field of the IP Prefix route. Their | |||
addresses are M2 and M3 respectively and are connected to BD-10. Note | MAC addresses are M2 and M3, respectively, and are connected to BD- | |||
that IRB1 and IRB2 (in DGW1 and DGW2 respectively) have IP addresses | 10. Note that IRB1 and IRB2 (in DGW1 and DGW2, respectively) have IP | |||
in a subnet different than SN1. | addresses in a subnet different than SN1. | |||
NVE2 DGW1 | NVE2 DGW1 | |||
M2 +-----------+ +---------+ +-------------+ | M2 +-----------+ +---------+ +-------------+ | |||
+---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | | |||
| ESI23 +-----------+ | | | IRB1\ | | | ESI23 +-----------+ | | | IRB1\ | | |||
| + | | | (IP-VRF)|---+ | | + | | | (IP-VRF)|---+ | |||
| | | | +-------------+ _|_ | | | | | +-------------+ _|_ | |||
SN1 | | VXLAN/ | ( ) | SN1 | | VXLAN/ | ( ) | |||
| | | GENEVE | DGW2 ( WAN ) | | | | GENEVE | DGW2 ( WAN ) | |||
| + NVE3 | | +-------------+ (___) | | + NVE3 | | +-------------+ (___) | |||
| ESI23 +-----------+ | |----| (BD-10) | | | | ESI23 +-----------+ | |----| (BD-10) | | | |||
+---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | | |||
M3 +-----------+ +---------+ | (IP-VRF)|---+ | M3 +-----------+ +---------+ | (IP-VRF)|---+ | |||
+-------------+ | +-------------+ | |||
Figure 7 Bump-in-the-wire use-case | Figure 7: Bump-in-the-Wire Use Case | |||
Since neither TS2 nor TS3 can participate in any dynamic routing | Since TS2 and TS3 cannot participate in any dynamic routing protocol | |||
protocol and have no IP address assigned, there are two potential | and neither has an IP address assigned, there are two potential | |||
Overlay Index types that can be used when advertising SN1: | Overlay Index types that can be used when advertising SN1: | |||
a) an ESI, i.e., ESI23, that can be provisioned on the attachment | a) an ESI, i.e., ESI23, that can be provisioned on the attachment | |||
ports of NVE2 and NVE3, as shown in Figure 7. | ports of NVE2 and NVE3, as shown in Figure 7 or | |||
b) or the VA's MAC address, that can be added to NVE2 and NVE3 by | ||||
policy. | ||||
The advantage of using an ESI as Overlay Index as opposed to the VA's | b) the VA's MAC address, which can be added to NVE2 and NVE3 by | |||
MAC address, is that the forwarding to the egress NVE can be done | policy. | |||
purely based on the state of the AC in the ES (notified by the | ||||
Ethernet A-D per-EVI route) and all the EVPN multi-homing redundancy | The advantage of using an ESI as the Overlay Index as opposed to the | |||
mechanisms can be reused. For instance, the [RFC7432] mass-withdrawal | VA's MAC address is that the forwarding to the egress NVE can be done | |||
mechanism for fast failure detection and propagation can be used. | purely based on the state of the AC in the Ethernet segment (notified | |||
This Section assumes that an ESI Overlay Index is used in this use- | by the Ethernet A-D per EVI route), and all the EVPN multihoming | |||
case but it does not prevent the use of the VA's MAC address as an | redundancy mechanisms can be reused. For instance, the mass | |||
Overlay Index. If a MAC is used as Overlay Index, the control plane | withdrawal mechanism described in [RFC7432] for fast failure | |||
must follow the procedures described in Section 4.4.3. | detection and propagation can be used. It is assumed per this | |||
section that an ESI Overlay Index is used in this use case, but this | ||||
use case does not preclude the use of the VA's MAC address as an | ||||
Overlay Index. If a MAC is used as the Overlay Index, the control | ||||
plane must follow the procedures described in Section 4.4.3. | ||||
The model supports VA redundancy in a similar way to the one | The model supports VA redundancy in a similar way to the one | |||
described in Section 4.2 for the floating IP Overlay Index use-case, | described in Section 4.2 for the floating IP Overlay Index use case, | |||
except that it uses the EVPN Ethernet A-D per-EVI route instead of | except that it uses the EVPN Ethernet A-D per EVI route instead of | |||
the MAC advertisement route to advertise the location of the Overlay | the MAC advertisement route to advertise the location of the Overlay | |||
Index. The procedure is explained below: | Index. The procedure is explained below: | |||
(1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the | (1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the | |||
following BGP routes: | following BGP routes: | |||
o Route type 1 (Ethernet A-D route for BD-10) containing: | * Route type 1 (Ethernet A-D route for BD-10) containing: ESI = | |||
ESI=ESI23 and the corresponding tunnel information (VNI | ESI23 and the corresponding tunnel information (VNI field), | |||
field), as well as the BGP Encapsulation Extended Community as | as well as the BGP Encapsulation Extended Community as per | |||
per [RFC8365]. | [RFC8365]. | |||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=ESI23, GW IP address=0. The Router's MAC Extended | SN1, ESI = ESI23, and GW IP address = 0. The EVPN Router's | |||
Community defined in [EVPN-INTERSUBNET] is added and carries | MAC Extended Community defined in [RFC9135] is added and | |||
the MAC address (M2) associated to the TS behind which SN1 | carries the MAC address (M2) associated with the TS behind | |||
sits. M2 may be learned by policy, however the MAC in the | which SN1 sits. M2 may be learned by policy; however, the | |||
Extended Community is preferred if sent with the route. | MAC in the Extended Community is preferred if sent with the | |||
route. | ||||
(2) NVE3 advertises the following BGP route for TS3 (no AD per-EVI | (2) NVE3 advertises the following BGP route for TS3 (no AD per EVI | |||
route is advertised): | route is advertised): | |||
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, | * Route type 5 (IP Prefix route) containing: IPL = 24, IP = | |||
ESI=23, GW IP address=0. The Router's MAC Extended Community | SN1, ESI = 23, and GW IP address = 0. The EVPN Router's MAC | |||
is added and carries the MAC address (M3) associated to the TS | Extended Community is added and carries the MAC address (M3) | |||
behind which SN1 sits. M3 may be learned by policy, however | associated with the TS behind which SN1 sits. M3 may be | |||
the MAC in the Extended Community is preferred if sent with | learned by policy; however, the MAC in the Extended Community | |||
the route. | is preferred if sent with the route. | |||
(3) DGW1 and DGW2 import the received routes based on the Route | (3) DGW1 and DGW2 import the received routes based on the Route | |||
Target: | Target: | |||
o The tunnel information to get to ESI23 is installed in DGW1 | * The tunnel information to get to ESI23 is installed in DGW1 | |||
and DGW2. For the VXLAN use case, the VTEP will be derived | and DGW2. For the VXLAN use case, the VTEP will be derived | |||
from the Ethernet A-D route BGP next-hop and VNI from the | from the Ethernet A-D route BGP next hop and VNI from the | |||
VNI/VSID field (see [RFC8365]). | VNI/VSID field (see [RFC8365]). | |||
o The RT-5 coming from the NVE that advertised the RT-1 is | * The RT-5 coming from the NVE that advertised the RT-1 is | |||
selected and SN1/24 is added to the IP-VRF in DGW1 and DGW2 | selected, and SN1/24 is added to the IP-VRF in DGW1 and DGW2 | |||
with Overlay Index ESI23 and MAC = M2. | with Overlay Index ESI23 and MAC = M2. | |||
(4) When DGW1 receives a packet from the WAN with destination IPx, | (4) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table and Overlay Index=ESI23 is found. Since ESI23 is | table, and Overlay Index = ESI23 is found. Since ESI23 is an | |||
an Overlay Index, a recursive route resolution is required to | Overlay Index, a recursive route resolution is required to | |||
find the egress NVE where ESI23 resides. | find the egress NVE where ESI23 resides. | |||
o The IP packet destined to IPx is encapsulated with: | * The IP packet destined to IPx is encapsulated with: | |||
. Source inner MAC = IRB1 MAC. | - Inner source MAC = IRB1 MAC. | |||
. Destination inner MAC = M2 (this MAC will be obtained | - Inner destination MAC = M2 (this MAC will be obtained from | |||
from the Router's MAC Extended Community received along | the EVPN Router's MAC Extended Community received along | |||
with the RT-5 for SN1). Note that the Router's MAC | with the RT-5 for SN1). Note that the EVPN Router's MAC | |||
Extended Community is used in this case to carry the TS' | Extended Community is used in this case to carry the TS's | |||
MAC address, as opposed to the NVE/PE's MAC address. | MAC address, as opposed to the MAC address of the NVE/PE. | |||
. Tunnel information for the NVO tunnel is provided by the | - Tunnel information for the NVO tunnel is provided by the | |||
Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for | Ethernet A-D route per EVI for ESI23 (VNI and VTEP IP for | |||
the VXLAN case). | the VXLAN case). | |||
(5) When the packet arrives at NVE2: | (5) When the packet arrives at NVE2: | |||
o Based on the tunnel demultiplexer information (VNI for the | * Based on the tunnel demultiplexer information (VNI for the | |||
VXLAN case), the BD-10 context is identified for a MAC lookup | VXLAN case), the BD-10 context is identified for a MAC lookup | |||
(assuming MAC-based disposition model [RFC7432]) or the VNI | (assuming a MAC-based disposition model [RFC7432]), or the | |||
may directly identify the egress interface (for a MPLS-based | VNI may directly identify the egress interface (for an MPLS- | |||
disposition model, which in this context is a VNI-based | based disposition model, which in this context is a VNI-based | |||
disposition model). | disposition model). | |||
o Encapsulation is stripped off and based on a MAC lookup | * Encapsulation is stripped off and, based on a MAC lookup | |||
(assuming MAC forwarding on the egress NVE) or a VNI lookup | (assuming MAC forwarding on the egress NVE) or a VNI lookup | |||
(in case of VNI forwarding), the packet is forwarded to TS2, | (in case of VNI forwarding), the packet is forwarded to TS2, | |||
where it will be forwarded to SN1. | where it will be forwarded to SN1. | |||
(6) If the redundancy protocol running between TS2 and TS3 follows an | (6) If the redundancy protocol running between TS2 and TS3 follows | |||
active/standby model and there is a failure, appointing TS3 as | an active/standby model and there is a failure, TS3 is appointed | |||
the new active TS for SN1, TS3 will now own the connectivity to | as the new active TS for SN1. TS3 will now own the connectivity | |||
SN1 and will signal this new ownership. Upon receiving the new | to SN1 and will signal this new ownership. Upon receiving the | |||
owner's notification, NVE3's AC will become active and issue a | new owner's notification, NVE3's AC will become active and issue | |||
route type 1 for ESI23, whereas NVE2 will withdraw its Ethernet | a route type 1 for ESI23, whereas NVE2 will withdraw its | |||
A-D route for ESI23. DGW1 and DGW2 will update their tunnel | Ethernet A-D route for ESI23. DGW1 and DGW2 will update their | |||
information to resolve ESI23. The destination inner MAC will be | tunnel information to resolve ESI23. The inner destination MAC | |||
changed to M3. | will be changed to M3. | |||
4.4 IP-VRF-to-IP-VRF Model | 4.4. IP-VRF-to-IP-VRF Model | |||
This use-case is similar to the scenario described in "IRB forwarding | This use case is similar to the scenario described in Section 9.1 of | |||
on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new | [RFC9135]; however, the new requirement here is the advertisement of | |||
requirement here is the advertisement of IP Prefixes as opposed to | IP prefixes as opposed to only host routes. | |||
only host routes. | ||||
In the examples described in Sections 4.1, 4.2 and 4.3, the BD | In the examples described in Sections 4.1, 4.2, and 4.3, the BD | |||
instance can connect IRB interfaces and any other Tenant Systems | instance can connect IRB interfaces and any other Tenant Systems | |||
connected to it. EVPN provides connectivity for: | connected to it. EVPN provides connectivity for: | |||
1. Traffic destined to the IRB or TS IP interfaces as well as | 1. Traffic destined to the IRB or TS IP interfaces, as well as | |||
2. Traffic destined to IP subnets sitting behind the TS, e.g., SN1 or | 2. Traffic destined to IP subnets sitting behind the TS, e.g., SN1 | |||
SN2. | or SN2. | |||
In order to provide connectivity for (1), MAC/IP routes (RT-2) are | In order to provide connectivity for (1), MAC/IP Advertisement routes | |||
needed so that IRB or TS MACs and IPs can be distributed. | (RT-2) are needed so that IRB or TS MACs and IPs can be distributed. | |||
Connectivity type (2) is accomplished by the exchange of IP Prefix | Connectivity type (2) is accomplished by the exchange of IP Prefix | |||
routes (RT-5) for IPs and subnets sitting behind certain Overlay | routes (RT-5) for IPs and subnets sitting behind certain Overlay | |||
Indexes, e.g., GW IP or ESI or TS MAC. | Indexes, e.g., GW IP, ESI, or TS MAC. | |||
In some cases, IP Prefix routes may be advertised for subnets and IPs | In some cases, IP Prefix routes may be advertised for subnets and IPs | |||
sitting behind an IRB. This use-case is referred to as the "IP-VRF- | sitting behind an IRB. This use case is referred to as the "IP-VRF- | |||
to-IP-VRF" model. | to-IP-VRF" model. | |||
[EVPN-INTERSUBNET] defines an asymmetric IRB model and a symmetric | [RFC9135] defines an asymmetric IRB model and a symmetric IRB model | |||
IRB model, based on the required lookups at the ingress and egress | based on the required lookups at the ingress and egress NVE. The | |||
NVE: the asymmetric model requires an IP lookup and a MAC lookup at | asymmetric model requires an IP lookup and a MAC lookup at the | |||
the ingress NVE, whereas only a MAC lookup is needed at the egress | ingress NVE, whereas only a MAC lookup is needed at the egress NVE; | |||
NVE; the symmetric model requires IP and MAC lookups at both, ingress | the symmetric model requires IP and MAC lookups at both the ingress | |||
and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use-case | and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use case | |||
described in this Section is a symmetric IRB model. | described in this section is a symmetric IRB model. | |||
Note that, in an IP-VRF-to-IP-VRF scenario, out of the many subnets | Note that in an IP-VRF-to-IP-VRF scenario, out of the many subnets | |||
that a tenant may have, it may be the case that only a few are | that a tenant may have, it may be the case that only a few are | |||
attached to a given NVE/PE's IP-VRF. In order to provide inter-subnet | attached to a given IP-VRF of the NVE/PE. In order to provide inter- | |||
connectivity among the set of NVE/PEs where the tenant is connected, | subnet connectivity among the set of NVE/PEs where the tenant is | |||
a new SBD is created on all of them if recursive resolution is | connected, a new SBD is created on all of them if a recursive | |||
needed. This SBD is instantiated as a regular BD (with no ACs) in | resolution is needed. This SBD is instantiated as a regular BD (with | |||
each NVE/PE and has an IRB interface that connects the SBD to the IP- | no ACs) in each NVE/PE and has an IRB interface that connects the SBD | |||
VRF. The IRB interface's IP or MAC address is used as the overlay | to the IP-VRF. The IRB interface's IP or MAC address is used as the | |||
index for recursive resolution. | Overlay Index for a recursive resolution. | |||
Depending on the existence and characteristics of the SBD and IRB | Depending on the existence and characteristics of the SBD and IRB | |||
interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- | interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- | |||
VRF scenarios identified and described in this document: | VRF scenarios identified and described in this document: | |||
1) Interface-less model: no SBD and no overlay indexes required. | 1. Interface-less model: no SBD and no Overlay Indexes required. | |||
2) Interface-ful with SBD IRB model: it requires SBD, as well as GW | ||||
IP addresses as overlay indexes. | 2. Interface-ful with an SBD IRB model: requires SBD as well as GW | |||
3) Interface-ful with unnumbered SBD IRB model: it requires SBD, as | IP addresses as Overlay Indexes. | |||
well as MAC addresses as overlay indexes. | ||||
3. Interface-ful with an unnumbered SBD IRB model: requires SBD as | ||||
well as MAC addresses as Overlay Indexes. | ||||
Inter-subnet IP multicast is outside the scope of this document. | Inter-subnet IP multicast is outside the scope of this document. | |||
4.4.1 Interface-less IP-VRF-to-IP-VRF Model | 4.4.1. Interface-less IP-VRF-to-IP-VRF Model | |||
Figure 8 will be used for the description of this model. | Figure 8 depicts the Interface-less IP-VRF-to-IP-VRF model. | |||
NVE1(M1) | NVE1(M1) | |||
+------------+ | +------------+ | |||
IP1+----| (BD-1) | DGW1(M3) | IP1+----| (BD-1) | DGW1(M3) | |||
| \ | +---------+ +--------+ | | \ | +---------+ +--------+ | |||
| (IP-VRF)|----| |-|(IP-VRF)|----+ | | (IP-VRF)|----| |-|(IP-VRF)|----+ | |||
| / | | | +--------+ | | | / | | | +--------+ | | |||
+---| (BD-2) | | | _+_ | +---| (BD-2) | | | _+_ | |||
| +------------+ | | ( ) | | +------------+ | | ( ) | |||
SN1| | VXLAN/ | ( WAN )--H1 | SN1| | VXLAN/ | ( WAN )--H1 | |||
| NVE2(M2) | GENEVE/| (___) | | NVE2(M2) | GENEVE/| (___) | |||
| +------------+ | MPLS | + | | +------------+ | MPLS | + | |||
+---| (BD-2) | | | DGW2(M4) | | +---| (BD-2) | | | DGW2(M4) | | |||
| \ | | | +--------+ | | | \ | | | +--------+ | | |||
| (IP-VRF)|----| |-|(IP-VRF)|----+ | | (IP-VRF)|----| |-|(IP-VRF)|----+ | |||
| / | +---------+ +--------+ | | / | +---------+ +--------+ | |||
SN2+----| (BD-3) | | SN2+----| (BD-3) | | |||
+------------+ | +------------+ | |||
Figure 8 Interface-less IP-VRF-to-IP-VRF model | Figure 8: Interface-less IP-VRF-to-IP-VRF Model | |||
In this case: | In this case: | |||
a) The NVEs and DGWs must provide connectivity between hosts in SN1, | a) The NVEs and DGWs must provide connectivity between hosts in SN1, | |||
SN2, IP1 and hosts sitting at the other end of the WAN, for | SN2, and IP1 and hosts sitting at the other end of the WAN -- for | |||
example, H1. It is assumed that the DGWs import/export IP and/or | example, H1. It is assumed that the DGWs import/export IP and/or | |||
VPN-IP routes from/to the WAN. | VPN-IP routes to/from the WAN. | |||
b) The IP-VRF instances in the NVE/DGWs are directly connected | b) The IP-VRF instances in the NVE/DGWs are directly connected | |||
through NVO tunnels, and no IRBs and/or BD instances are | through NVO tunnels, and no IRBs and/or BD instances are | |||
instantiated to connect the IP-VRFs. | instantiated to connect the IP-VRFs. | |||
c) The solution must provide layer 3 connectivity among the IP-VRFs | c) The solution must provide Layer 3 connectivity among the IP-VRFs | |||
for Ethernet NVO tunnels, for instance, VXLAN or GENEVE. | for Ethernet NVO tunnels -- for instance, VXLAN or GENEVE. | |||
d) The solution may provide layer 3 connectivity among the IP-VRFs | d) The solution may provide Layer 3 connectivity among the IP-VRFs | |||
for IP NVO tunnels, for example, GENEVE (with IP payload). | for IP NVO tunnels -- for example, GENEVE (with IP payload). | |||
In order to meet the above requirements, the EVPN route type 5 will | In order to meet the above requirements, the EVPN route type 5 will | |||
be used to advertise the IP Prefixes, along with the Router's MAC | be used to advertise the IP prefixes, along with the EVPN Router's | |||
Extended Community as defined in [EVPN-INTERSUBNET] if the | MAC Extended Community as defined in [RFC9135] if the advertising | |||
advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will | NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will advertise an | |||
advertise an RT-5 for each of its prefixes with the following fields: | RT-5 for each of its prefixes with the following fields: | |||
o RD as per [RFC7432]. | * RD as per [RFC7432]. | |||
o Ethernet Tag ID=0. | * Ethernet Tag ID = 0. | |||
o IP Prefix Length and IP address, as explained in the previous | * IP prefix length and IP address, as explained in the previous | |||
Sections. | sections. | |||
o GW IP address=0. | * GW IP address = 0. | |||
o ESI=0 | * ESI = 0. | |||
o MPLS label or VNI corresponding to the IP-VRF. | * MPLS label or VNI corresponding to the IP-VRF. | |||
Each RT-5 will be sent with a Route Target identifying the tenant | Each RT-5 will be sent with a Route Target identifying the tenant | |||
(IP-VRF) and may be sent with two BGP extended communities: | (IP-VRF) and may be sent with two BGP extended communities: | |||
o The first one is the BGP Encapsulation Extended Community, as | * The first one is the BGP Encapsulation Extended Community, as per | |||
per [RFC5512], identifying the tunnel type. | [RFC9012], identifying the tunnel type. | |||
o The second one is the Router's MAC Extended Community as per | * The second one is the EVPN Router's MAC Extended Community, as per | |||
[EVPN-INTERSUBNET] containing the MAC address associated to | [RFC9135], containing the MAC address associated with the NVE | |||
the NVE advertising the route. This MAC address identifies the | advertising the route. This MAC address identifies the NVE/DGW | |||
NVE/DGW and MAY be reused for all the IP-VRFs in the NVE. The | and MAY be reused for all the IP-VRFs in the NVE. The EVPN | |||
Router's MAC Extended Community must be sent if the route is | Router's MAC Extended Community must be sent if the route is | |||
associated to an Ethernet NVO tunnel, for instance, VXLAN. If | associated with an Ethernet NVO tunnel -- for instance, VXLAN. If | |||
the route is associated to an IP NVO tunnel, for instance | the route is associated with an IP NVO tunnel -- for instance, | |||
GENEVE with IP payload, the Router's MAC Extended Community | GENEVE with an IP payload -- the EVPN Router's MAC Extended | |||
should not be sent. | Community should not be sent. | |||
The following example illustrates the procedure to advertise and | The following example illustrates the procedure to advertise and | |||
forward packets to SN1/24 (IPv4 prefix advertised from NVE1): | forward packets to SN1/24 (IPv4 prefix advertised from NVE1): | |||
(1) NVE1 advertises the following BGP route: | (1) NVE1 advertises the following BGP route: | |||
o Route type 5 (IP Prefix route) containing: | * Route type 5 (IP Prefix route) containing: | |||
. IPL=24, IP=SN1, Label=10. | - IPL = 24, IP = SN1, Label = 10. | |||
. GW IP= set to 0. | - GW IP = set to 0. | |||
. [RFC5512] BGP Encapsulation Extended Community. | - BGP Encapsulation Extended Community [RFC9012]. | |||
. Router's MAC Extended Community that contains M1. | - EVPN Router's MAC Extended Community that contains M1. | |||
. Route Target identifying the tenant (IP-VRF). | - Route Target identifying the tenant (IP-VRF). | |||
(2) DGW1 imports the received routes from NVE1: | (2) DGW1 imports the received routes from NVE1: | |||
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | |||
Route Target. | Route Target. | |||
o Since GW IP=ESI=0, the Label is a non-zero value and the local | * Since GW IP = ESI = 0, the label is a non-zero value, and the | |||
policy indicates this interface-less model, DGW1 will use the | local policy indicates this interface-less model, DGW1, will | |||
Label and next-hop of the RT-5, as well as the MAC address | use the label and next hop of the RT-5, as well as the MAC | |||
conveyed in the Router's MAC Extended Community (as inner | address conveyed in the EVPN Router's MAC Extended Community | |||
destination MAC address) to set up the forwarding state and | (as the inner destination MAC address) to set up the | |||
later encapsulate the routed IP packets. | forwarding state and later encapsulate the routed IP packets. | |||
(3) When DGW1 receives a packet from the WAN with destination IPx, | (3) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table. The lookup yields SN1/24. | table. The lookup yields SN1/24. | |||
o Since the RT-5 for SN1/24 had a GW IP=ESI=0, a non-zero Label | * Since the RT-5 for SN1/24 had a GW IP = ESI = 0, a non-zero | |||
and next-hop and the model is interface-less, DGW1 will not | label, and a next hop, and since the model is interface-less, | |||
need a recursive lookup to resolve the route. | DGW1 will not need a recursive lookup to resolve the route. | |||
o The IP packet destined to IPx is encapsulated with: Source | * The IP packet destined to IPx is encapsulated with: inner | |||
inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer | source MAC = DGW1 MAC, inner destination MAC = M1, outer | |||
IP (tunnel source IP) = DGW1 IP, Destination outer IP (tunnel | source IP (tunnel source IP) = DGW1 IP, and outer destination | |||
destination IP) = NVE1 IP. The Source and Destination inner | IP (tunnel destination IP) = NVE1 IP. The source and inner | |||
MAC addresses are not needed if IP NVO tunnels are used. | destination MAC addresses are not needed if IP NVO tunnels | |||
are used. | ||||
(4) When the packet arrives at NVE1: | (4) When the packet arrives at NVE1: | |||
o NVE1 will identify the IP-VRF for an IP lookup based on the | * NVE1 will identify the IP-VRF for an IP lookup based on the | |||
Label (the Destination inner MAC is not needed to identify the | label (the inner destination MAC is not needed to identify | |||
IP-VRF). | the IP-VRF). | |||
o An IP lookup is performed in the routing context, where SN1 | * An IP lookup is performed in the routing context, where SN1 | |||
turns out to be a local subnet associated to BD-2. A | turns out to be a local subnet associated with BD-2. A | |||
subsequent lookup in the ARP table and the BD FIB will provide | subsequent lookup in the ARP table and the BD FIB will | |||
the forwarding information for the packet in BD-2. | provide the forwarding information for the packet in BD-2. | |||
The model described above is called Interface-less model since the | The model described above is called an "interface-less" model since | |||
IP-VRFs are connected directly through tunnels and they don't require | the IP-VRFs are connected directly through tunnels, and they don't | |||
those tunnels to be terminated in SBDs instead, as in Sections 4.4.2 | require those tunnels to be terminated in SBDs instead, as in | |||
or 4.4.3. | Sections 4.4.2 or 4.4.3. | |||
4.4.2 Interface-ful IP-VRF-to-IP-VRF with SBD IRB | 4.4.2. Interface-ful IP-VRF-to-IP-VRF with SBD IRB | |||
Figure 9 will be used for the description of this model. | Figure 9 depicts the Interface-ful IP-VRF-to-IP-VRF with SBD IRB | |||
model. | ||||
NVE1 | NVE1 | |||
+------------+ DGW1 | +------------+ DGW1 | |||
IP10+---+(BD-1) | +---------------+ +------------+ | IP10+---+(BD-1) | +---------------+ +------------+ | |||
| \ | | | | | | | \ | | | | | | |||
|(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ | |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ | |||
| / IRB(IP1/M1) IRB(IP3/M3) | | | | / IRB(M1/IP1) IRB(M3/IP3) | | | |||
+---+(BD-2) | | | +------------+ _+_ | +---+(BD-2) | | | +------------+ _+_ | |||
| +------------+ | | ( ) | | +------------+ | | ( ) | |||
SN1| | VXLAN/ | ( WAN )--H1 | SN1| | VXLAN/ | ( WAN )--H1 | |||
| NVE2 | GENEVE/ | (___) | | NVE2 | GENEVE/ | (___) | |||
| +------------+ | MPLS | DGW2 + | | +------------+ | MPLS | DGW2 + | |||
+---+(BD-2) | | | +------------+ | | +---+(BD-2) | | | +------------+ | | |||
| \ | | | | | | | | \ | | | | | | | |||
|(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ | |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ | |||
| / IRB(IP2/M2) IRB(IP4/M4) | | | / IRB(M2/IP2) IRB(M4/IP4) | | |||
SN2+----+(BD-3) | +---------------+ +------------+ | SN2+----+(BD-3) | +---------------+ +------------+ | |||
+------------+ | +------------+ | |||
Figure 9 Interface-ful with SBD IRB model | Figure 9: Interface-ful with SBD IRB Model | |||
In this model: | In this model: | |||
a) As in Section 4.4.1, the NVEs and DGWs must provide connectivity | a) As in Section 4.4.1, the NVEs and DGWs must provide connectivity | |||
between hosts in SN1, SN2, IP10 and hosts sitting at the other end | between hosts in SN1, SN2, and IP10 and in hosts sitting at the | |||
of the WAN. | other end of the WAN. | |||
b) However, the NVE/DGWs are now connected through Ethernet NVO | b) However, the NVE/DGWs are now connected through Ethernet NVO | |||
tunnels terminated in the SBD instance. The IP-VRFs use IRB | tunnels terminated in the SBD instance. The IP-VRFs use IRB | |||
interfaces for their connectivity to the SBD. | interfaces for their connectivity to the SBD. | |||
c) Each SBD IRB has an IP and a MAC address, where the IP address | c) Each SBD IRB has an IP and a MAC address, where the IP address | |||
must be reachable from other NVEs or DGWs. | must be reachable from other NVEs or DGWs. | |||
d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. | d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. | |||
e) The solution must provide layer 3 connectivity for Ethernet NVO | e) The solution must provide Layer 3 connectivity for Ethernet NVO | |||
tunnels, for instance, VXLAN or GENEVE (with Ethernet payload). | tunnels -- for instance, VXLAN or GENEVE (with Ethernet payload). | |||
EVPN type 5 routes will be used to advertise the IP Prefixes, whereas | EVPN type 5 routes will be used to advertise the IP prefixes, whereas | |||
EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB | EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB | |||
interface. Each NVE/DGW will advertise an RT-5 for each of its | interface. Each NVE/DGW will advertise an RT-5 for each of its | |||
prefixes with the following fields: | prefixes with the following fields: | |||
o RD as per [RFC7432]. | * RD as per [RFC7432]. | |||
o Ethernet Tag ID=0. | * Ethernet Tag ID = 0. | |||
o IP Prefix Length and IP address, as explained in the previous | * IP prefix length and IP address, as explained in the previous | |||
Sections. | sections. | |||
o GW IP address=IRB-IP of the SBD (this is the Overlay Index | * GW IP address = IRB-IP of the SBD (this is the Overlay Index that | |||
that will be used for the recursive route resolution). | will be used for the recursive route resolution). | |||
o ESI=0 | * ESI = 0. | |||
o Label value should be zero since the RT-5 route requires a | * Label value should be zero since the RT-5 route requires a | |||
recursive lookup resolution to an RT-2 route. It is ignored on | recursive lookup resolution to an RT-2 route. It is ignored on | |||
reception, and, when forwarding packets, the MPLS label or VNI | reception, and the MPLS label or VNI from the RT-2's MPLS Label1 | |||
from the RT-2's MPLS Label1 field is used. | field is used when forwarding packets. | |||
Each RT-5 will be sent with a Route Target identifying the tenant | Each RT-5 will be sent with a Route Target identifying the tenant | |||
(IP-VRF). The Router's MAC Extended Community should not be sent in | (IP-VRF). The EVPN Router's MAC Extended Community should not be | |||
this case. | sent in this case. | |||
The following example illustrates the procedure to advertise and | The following example illustrates the procedure to advertise and | |||
forward packets to SN1/24 (IPv4 prefix advertised from NVE1): | forward packets to SN1/24 (IPv4 prefix advertised from NVE1): | |||
(1) NVE1 advertises the following BGP routes: | (1) NVE1 advertises the following BGP routes: | |||
o Route type 5 (IP Prefix route) containing: | * Route type 5 (IP Prefix route) containing: | |||
. IPL=24, IP=SN1, Label= SHOULD be set to 0. | - IPL = 24, IP = SN1, Label = SHOULD be set to 0. | |||
. GW IP=IP1 (SBD IRB's IP) | - GW IP = IP1 (SBD IRB's IP). | |||
. Route Target identifying the tenant (IP-VRF). | - Route Target identifying the tenant (IP-VRF). | |||
o Route type 2 (MAC/IP route for the SBD IRB) containing: | * Route type 2 (MAC/IP Advertisement route for the SBD IRB) | |||
containing: | ||||
. ML=48, M=M1, IPL=32, IP=IP1, Label=10. | - ML = 48, M = M1, IPL = 32, IP = IP1, Label = 10. | |||
. A [RFC5512] BGP Encapsulation Extended Community. | - A BGP Encapsulation Extended Community [RFC9012]. | |||
. Route Target identifying the SBD. This Route Target may be | - Route Target identifying the SBD. This Route Target may | |||
the same as the one used with the RT-5. | be the same as the one used with the RT-5. | |||
(2) DGW1 imports the received routes from NVE1: | (2) DGW1 imports the received routes from NVE1: | |||
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | |||
Route Target. | Route Target. | |||
. Since GW IP is different from zero, the GW IP (IP1) will be | - Since GW IP is different from zero, the GW IP (IP1) will | |||
used as the Overlay Index for the recursive route resolution | be used as the Overlay Index for the recursive route | |||
to the RT-2 carrying IP1. | resolution to the RT-2 carrying IP1. | |||
(3) When DGW1 receives a packet from the WAN with destination IPx, | (3) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table. The lookup yields SN1/24, which is associated | table. The lookup yields SN1/24, which is associated with | |||
to the Overlay Index IP1. The forwarding information is | the Overlay Index IP1. The forwarding information is derived | |||
derived from the RT-2 received for IP1. | from the RT-2 received for IP1. | |||
o The IP packet destined to IPx is encapsulated with: Source | * The IP packet destined to IPx is encapsulated with: inner | |||
inner MAC = M3, Destination inner MAC = M1, Source outer IP | source MAC = M3, inner destination MAC = M1, outer source IP | |||
(source VTEP) = DGW1 IP, Destination outer IP (destination | (source VTEP) = DGW1 IP, and outer destination IP | |||
VTEP) = IP1. | (destination VTEP) = NVE1 IP. | |||
(4) When the packet arrives at NVE1: | (4) When the packet arrives at NVE1: | |||
o NVE1 will identify the IP-VRF for an IP lookup based on the | * NVE1 will identify the IP-VRF for an IP lookup based on the | |||
Label and the inner MAC DA. | label and the inner MAC DA. | |||
o An IP lookup is performed in the routing context, where SN1 | * An IP lookup is performed in the routing context, where SN1 | |||
turns out to be a local subnet associated to BD-2. A | turns out to be a local subnet associated with BD-2. A | |||
subsequent lookup in the ARP table and the BD FIB will provide | subsequent lookup in the ARP table and the BD FIB will | |||
the forwarding information for the packet in BD-2. | provide the forwarding information for the packet in BD-2. | |||
The model described above is called 'Interface-ful with SBD IRB | The model described above is called an "interface-ful with SBD IRB" | |||
model' because the tunnels connecting the DGWs and NVEs need to be | model because the tunnels connecting the DGWs and NVEs need to be | |||
terminated into the SBD. The SBD is connected to the IP-VRFs via SBD | terminated into the SBD. The SBD is connected to the IP-VRFs via SBD | |||
IRB interfaces, and that allows the recursive resolution of RT-5s to | IRB interfaces, and that allows the recursive resolution of RT-5s to | |||
GW IP addresses. | GW IP addresses. | |||
4.4.3 Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB | 4.4.3. Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB | |||
Figure 10 will be used for the description of this model. Note that | Figure 10 depicts the Interface-ful IP-VRF-to-IP-VRF with unnumbered | |||
this model is similar to the one described in Section 4.4.2, only | SBD IRB model. Note that this model is similar to the one described | |||
without IP addresses on the SBD IRB interfaces. | in Section 4.4.2, only without IP addresses on the SBD IRB | |||
interfaces. | ||||
NVE1 | NVE1 | |||
+------------+ DGW1 | +------------+ DGW1 | |||
IP1+----+(BD-1) | +---------------+ +------------+ | IP1+----+(BD-1) | +---------------+ +------------+ | |||
| \ | | | | | | | \ | | | | | | |||
|(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ | |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ | |||
| / IRB(M1)| | IRB(M3) | | | | / IRB(M1)| | IRB(M3) | | | |||
+---+(BD-2) | | | +------------+ _+_ | +---+(BD-2) | | | +------------+ _+_ | |||
| +------------+ | | ( ) | | +------------+ | | ( ) | |||
SN1| | VXLAN/ | ( WAN )--H1 | SN1| | VXLAN/ | ( WAN )--H1 | |||
| NVE2 | GENEVE/ | (___) | | NVE2 | GENEVE/ | (___) | |||
| +------------+ | MPLS | DGW2 + | | +------------+ | MPLS | DGW2 + | |||
+---+(BD-2) | | | +------------+ | | +---+(BD-2) | | | +------------+ | | |||
| \ | | | | | | | | \ | | | | | | | |||
|(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ | |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ | |||
| / IRB(M2)| | IRB(M4) | | | / IRB(M2)| | IRB(M4) | | |||
SN2+----+(BD-3) | +---------------+ +------------+ | SN2+----+(BD-3) | +---------------+ +------------+ | |||
+------------+ | +------------+ | |||
Figure 10 Interface-ful with unnumbered SBD IRB model | Figure 10: Interface-ful with Unnumbered SBD IRB Model | |||
In this model: | In this model: | |||
a) As in Section 4.4.1 and 4.4.2, the NVEs and DGWs must provide | a) As in Sections 4.4.1 and 4.4.2, the NVEs and DGWs must provide | |||
connectivity between hosts in SN1, SN2, IP1 and hosts sitting at | connectivity between hosts in SN1, SN2, and IP1 and in hosts | |||
the other end of the WAN. | sitting at the other end of the WAN. | |||
b) As in Section 4.4.2, the NVE/DGWs are connected through Ethernet | b) As in Section 4.4.2, the NVE/DGWs are connected through Ethernet | |||
NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB | NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB | |||
interfaces for their connectivity to the SBD. | interfaces for their connectivity to the SBD. | |||
c) However, each SBD IRB has a MAC address only, and no IP address | c) However, each SBD IRB has a MAC address only and no IP address | |||
(that is why the model refers to an 'unnumbered' SBD IRB). In this | (which is why the model refers to an "unnumbered" SBD IRB). In | |||
model, there is no need to have IP reachability to the SBD IRB | this model, there is no need to have IP reachability to the SBD | |||
interfaces themselves and there is a requirement to limit the | IRB interfaces themselves, and there is a requirement to limit | |||
number of IP addresses used. | the number of IP addresses used. | |||
d) As in Section 4.4.2, the SBD is composed of all the NVE/DGW BDs of | d) As in Section 4.4.2, the SBD is composed of all the NVE/DGW BDs | |||
the tenant that need inter-subnet-forwarding. | of the tenant that need inter-subnet forwarding. | |||
e) As in Section 4.4.2, the solution must provide layer 3 | e) As in Section 4.4.2, the solution must provide Layer 3 | |||
connectivity for Ethernet NVO tunnels, for instance, VXLAN or | connectivity for Ethernet NVO tunnels -- for instance, VXLAN or | |||
GENEVE (with Ethernet payload). | GENEVE (with Ethernet payload). | |||
This model will also make use of the RT-5 recursive resolution. EVPN | This model will also make use of the RT-5 recursive resolution. EVPN | |||
type 5 routes will advertise the IP Prefixes along with the Router's | type 5 routes will advertise the IP prefixes along with the EVPN | |||
MAC Extended Community used for the recursive lookup, whereas EVPN | Router's MAC Extended Community used for the recursive lookup, | |||
RT-2 routes will advertise the MAC addresses of each SBD IRB | whereas EVPN RT-2 routes will advertise the MAC addresses of each SBD | |||
interface (this time without an IP). | IRB interface (this time without an IP). | |||
Each NVE/DGW will advertise an RT-5 for each of its prefixes with the | Each NVE/DGW will advertise an RT-5 for each of its prefixes with the | |||
same fields as described in 4.4.2 except for: | same fields as described in Section 4.4.2, except: | |||
o GW IP address= set to 0. | * GW IP address = set to 0. | |||
Each RT-5 will be sent with a Route Target identifying the tenant | Each RT-5 will be sent with a Route Target identifying the tenant | |||
(IP-VRF) and the Router's MAC Extended Community containing the MAC | (IP-VRF) and the EVPN Router's MAC Extended Community containing the | |||
address associated to SBD IRB interface. This MAC address may be | MAC address associated with the SBD IRB interface. This MAC address | |||
reused for all the IP-VRFs in the NVE. | may be reused for all the IP-VRFs in the NVE. | |||
The example is similar to the one in Section 4.4.2: | The example is similar to the one in Section 4.4.2: | |||
(1) NVE1 advertises the following BGP routes: | (1) NVE1 advertises the following BGP routes: | |||
o Route type 5 (IP Prefix route) containing the same values as | * Route type 5 (IP Prefix route) containing the same values as | |||
in the example in Section 4.4.2, except for: | in the example in Section 4.4.2, except: | |||
. GW IP= SHOULD be set to 0. | - GW IP = SHOULD be set to 0. | |||
. Router's MAC Extended Community containing M1 (this will be | - EVPN Router's MAC Extended Community containing M1 (this | |||
used for the recursive lookup to a RT-2). | will be used for the recursive lookup to an RT-2). | |||
o Route type 2 (MAC route for the SBD IRB) with the same values | * Route type 2 (MAC route for the SBD IRB) with the same values | |||
as in Section 4.4.2 except for: | as in Section 4.4.2, except: | |||
. ML=48, M=M1, IPL=0, Label=10. | - ML = 48, M = M1, IPL = 0, Label = 10. | |||
(2) DGW1 imports the received routes from NVE1: | (2) DGW1 imports the received routes from NVE1: | |||
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 | |||
Route Target. | Route Target. | |||
. The MAC contained in the Router's MAC Extended Community | - The MAC contained in the EVPN Router's MAC Extended | |||
sent along with the RT-5 (M1) will be used as the Overlay | Community sent along with the RT-5 (M1) will be used as | |||
Index for the recursive route resolution to the RT-2 | the Overlay Index for the recursive route resolution to | |||
carrying M1. | the RT-2 carrying M1. | |||
(3) When DGW1 receives a packet from the WAN with destination IPx, | (3) When DGW1 receives a packet from the WAN with destination IPx, | |||
where IPx belongs to SN1/24: | where IPx belongs to SN1/24: | |||
o A destination IP lookup is performed on the DGW1 IP-VRF | * A destination IP lookup is performed on the DGW1 IP-VRF | |||
routing table. The lookup yields SN1/24, which is associated | table. The lookup yields SN1/24, which is associated with | |||
to the Overlay Index M1. The forwarding information is derived | the Overlay Index M1. The forwarding information is derived | |||
from the RT-2 received for M1. | from the RT-2 received for M1. | |||
o The IP packet destined to IPx is encapsulated with: Source | * The IP packet destined to IPx is encapsulated with: inner | |||
inner MAC = M3, Destination inner MAC = M1, Source outer IP | source MAC = M3, inner destination MAC = M1, outer source IP | |||
(source VTEP) = DGW1 IP, Destination outer IP (destination | (source VTEP) = DGW1 IP, and outer destination IP | |||
VTEP) = NVE1 IP. | (destination VTEP) = NVE1 IP. | |||
(4) When the packet arrives at NVE1: | (4) When the packet arrives at NVE1: | |||
o NVE1 will identify the IP-VRF for an IP lookup based on the | * NVE1 will identify the IP-VRF for an IP lookup based on the | |||
Label and the inner MAC DA. | label and the inner MAC DA. | |||
o An IP lookup is performed in the routing context, where SN1 | * An IP lookup is performed in the routing context, where SN1 | |||
turns out to be a local subnet associated to BD-2. A | turns out to be a local subnet associated with BD-2. A | |||
subsequent lookup in the ARP table and the BD FIB will provide | subsequent lookup in the ARP table and the BD FIB will | |||
the forwarding information for the packet in BD-2. | provide the forwarding information for the packet in BD-2. | |||
The model described above is called Interface-ful with unnumbered SBD | The model described above is called an "interface-ful with unnumbered | |||
IRB model (as in Section 4.4.2), only this time the SBD IRB does not | SBD IRB" model (as in Section 4.4.2) but without the SBD IRB having | |||
have an IP address. | an IP address. | |||
5. Security Considerations | 5. Security Considerations | |||
This document provides a set of procedures to achieve Inter-Subnet | This document provides a set of procedures to achieve inter-subnet | |||
Forwarding across NVEs or PEs attached to a group of BDs that belong | forwarding across NVEs or PEs attached to a group of BDs that belong | |||
to the same tenant (or VPN). The security considerations discussed in | to the same tenant (or VPN). The security considerations discussed | |||
[RFC7432] apply to the Intra-Subnet Forwarding or communication | in [RFC7432] apply to the intra-subnet forwarding or communication | |||
within each of those BDs. In addition, the security considerations in | within each of those BDs. In addition, the security considerations | |||
[RFC4364] should also be understood, since this document and | in [RFC4364] should also be understood, since this document and | |||
[RFC4364] may be used in similar applications. | [RFC4364] may be used in similar applications. | |||
Contrary to [RFC4364], this document does not describe PE/CE route | Contrary to [RFC4364], this document does not describe PE/CE route | |||
distribution techniques, but rather considers the CEs as TSes or VAs | distribution techniques but rather considers the CEs as TSs or VAs | |||
that do not run dynamic routing protocols. This can be considered a | that do not run dynamic routing protocols. This can be considered a | |||
security advantage, since dynamic routing protocols can be blocked on | security advantage, since dynamic routing protocols can be blocked on | |||
the NVE/PE ACs, not allowing the tenant to interact with the | the NVE/PE ACs, not allowing the tenant to interact with the | |||
infrastructure's dynamic routing protocols. | infrastructure's dynamic routing protocols. | |||
In this document, the RT-5 may use a regular BGP Next Hop for its | In this document, the RT-5 may use a regular BGP next hop for its | |||
resolution or an Overlay Index that requires a recursive resolution | resolution or an Overlay Index that requires a recursive resolution | |||
to a different EVPN route (an RT-2 or an RT-1). In the latter case, | to a different EVPN route (an RT-2 or an RT-1). In the latter case, | |||
it is worth noting that any action that ends up filtering or | it is worth noting that any action that ends up filtering or | |||
modifying the RT-2/RT-1 routes used to convey the Overlay Indexes, | modifying the RT-2 or RT-1 routes used to convey the Overlay Indexes | |||
will modify the resolution of the RT-5 and therefore the forwarding | will modify the resolution of the RT-5 and therefore the forwarding | |||
of packets to the remote subnet. | of packets to the remote subnet. | |||
6. IANA Considerations | 6. IANA Considerations | |||
This document requests value 5 in the [EVPNRouteTypes] registry | IANA has registered value 5 in the "EVPN Route Types" registry | |||
defined by [RFC7432]: | [EVPNRouteTypes] defined by [RFC7432] as follows: | |||
Value Description Reference | +=======+=============+===========+ | |||
5 IP Prefix route [this document] | | Value | Description | Reference | | |||
+=======+=============+===========+ | ||||
| 5 | IP Prefix | RFC 9136 | | ||||
+-------+-------------+-----------+ | ||||
7. References | Table 3 | |||
7.1 Normative References | 7. References | |||
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | 7.1. Normative References | |||
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet | ||||
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, <http://www.rfc- | ||||
editor.org/info/rfc7432>. | ||||
[RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation | [EVPNRouteTypes] | |||
Subsequent Address Family Identifier (SAFI) and the BGP Tunnel | IANA, "EVPN Route Types", | |||
Encapsulation Attribute", RFC 5512, DOI 10.17487/RFC5512, April 2009, | <https://www.iana.org/assignments/evpn>. | |||
<http://www.rfc-editor.org/info/rfc5512>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March | Requirement Levels", BCP 14, RFC 2119, | |||
1997, <http://www.rfc-editor.org/info/rfc2119>. | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | ||||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC2119 | [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | |||
Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, | Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based | |||
<http://www.rfc-editor.org/info/rfc8174>. | Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February | |||
2015, <https://www.rfc-editor.org/info/rfc7432>. | ||||
[RFC8365] Sajassi-Drake et al., "A Network Virtualization Overlay | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
Solution using EVPN", RFC 8365, DOI 10.17487/RFC8365, March, 2018. | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
[EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in | [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | |||
EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt, work in | Uttaro, J., and W. Henderickx, "A Network Virtualization | |||
progress, February, 2017 | Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | |||
DOI 10.17487/RFC8365, March 2018, | ||||
<https://www.rfc-editor.org/info/rfc8365>. | ||||
[EVPNRouteTypes] IANA EVPN Route Type registry, | [RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, | |||
https://www.iana.org/assignments/evpn | "The BGP Tunnel Encapsulation Attribute", RFC 9012, | |||
DOI 10.17487/RFC9012, April 2021, | ||||
<https://www.rfc-editor.org/info/rfc9012>. | ||||
7.2 Informative References | [RFC9135] Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. | |||
Rabadan, "Integrated Routing and Bridging in Ethernet VPN | ||||
(EVPN)", RFC 9135, DOI 10.17487/RFC9135, October 2021, | ||||
<https://www.rfc-editor.org/info/rfc9135>. | ||||
7.2. Informative References | ||||
[IEEE-802.1Q] | ||||
IEEE, "IEEE Standard for Local and Metropolitan Area | ||||
Networks -- Bridges and Bridged Networks", | ||||
DOI 10.1109/IEEESTD.2018.8403927, IEEE Std 802.1Q, July | ||||
2018, | ||||
<https://standards.ieee.org/standard/802_1Q-2018.html>. | ||||
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, | Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February | |||
<http://www.rfc-editor.org/info/rfc4364>. | 2006, <https://www.rfc-editor.org/info/rfc4364>. | |||
[RFC7606] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, | [RFC5227] Cheshire, S., "IPv4 Address Conflict Detection", RFC 5227, | |||
"Revised Error Handling for BGP UPDATE Messages", RFC 7606, August | DOI 10.17487/RFC5227, July 2008, | |||
2015, <http://www.rfc-editor.org/info/rfc7606>. | <https://www.rfc-editor.org/info/rfc5227>. | |||
[802.1D-REV] "IEEE Standard for Local and metropolitan area networks | [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) | |||
- Media Access Control (MAC) Bridges", IEEE Std. 802.1D, June 2004. | Version 3 for IPv4 and IPv6", RFC 5798, | |||
DOI 10.17487/RFC5798, March 2010, | ||||
<https://www.rfc-editor.org/info/rfc5798>. | ||||
[802.1Q] "IEEE Standard for Local and metropolitan area networks - | [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | |||
Media Access Control (MAC) Bridges and Virtual Bridged Local Area | L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | |||
Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. | eXtensible Local Area Network (VXLAN): A Framework for | |||
Overlaying Virtualized Layer 2 Networks over Layer 3 | ||||
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | ||||
<https://www.rfc-editor.org/info/rfc7348>. | ||||
[RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. | [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. | |||
Rekhter, "Framework for Data Center (DC) Network Virtualization", RFC | Rekhter, "Framework for Data Center (DC) Network | |||
7365, DOI 10.17487/RFC7365, October 2014, <https://www.rfc- | Virtualization", RFC 7365, DOI 10.17487/RFC7365, October | |||
editor.org/info/rfc7365>. | 2014, <https://www.rfc-editor.org/info/rfc7365>. | |||
[RFC5227] Cheshire, S., "IPv4 Address Conflict Detection", RFC 5227, | ||||
DOI 10.17487/RFC5227, July 2008, <https://www.rfc- | ||||
editor.org/info/rfc5227>. | ||||
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. | |||
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible | Patel, "Revised Error Handling for BGP UPDATE Messages", | |||
Local Area Network (VXLAN): A Framework for Overlaying Virtualized | RFC 7606, DOI 10.17487/RFC7606, August 2015, | |||
Layer 2 Networks over Layer 3 Networks", RFC 7348, DOI | <https://www.rfc-editor.org/info/rfc7606>. | |||
10.17487/RFC7348, August 2014, <https://www.rfc- | ||||
editor.org/info/rfc7348>. | ||||
[GENEVE] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., | [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., | |||
"Geneve: Generic Network Virtualization Encapsulation", Work in | "Geneve: Generic Network Virtualization Encapsulation", | |||
Progress, draft-ietf-nvo3-geneve-06, March 2018. | RFC 8926, DOI 10.17487/RFC8926, November 2020, | |||
<https://www.rfc-editor.org/info/rfc8926>. | ||||
8. Acknowledgments | Acknowledgments | |||
The authors would like to thank Mukul Katiyar and Jeffrey Zhang for | The authors would like to thank Mukul Katiyar, Jeffrey Zhang, and | |||
their valuable feedback and contributions. The following people also | Alex Nichol for their valuable feedback and contributions. Tony | |||
helped improving this document with their feedback: Tony Przygienda | Przygienda and Thomas Morin also helped improve this document with | |||
and Thomas Morin. Special THANK YOU to Eric Rosen for his detailed | their feedback. Special thanks to Eric Rosen for his detailed | |||
review, it really helped improve the readability and clarify the | review, which really helped improve the readability and clarify the | |||
concepts. Thank you to Alvaro Retana for his thorough review. | concepts. We also thank Alvaro Retana for his thorough review. | |||
9. Contributors | Contributors | |||
In addition to the authors listed on the front page, the following | In addition to the authors listed on the front page, the following | |||
co-authors have also contributed to this document: | coauthors have also contributed to this document: | |||
Senthil Sathappan | Senthil Sathappan | |||
Florin Balus | Florin Balus | |||
Aldrin Isaac | Aldrin Isaac | |||
Senad Palislamovic | Senad Palislamovic | |||
Samir Thoria | Samir Thoria | |||
10. Authors' Addresses | Authors' Addresses | |||
Jorge Rabadan (Editor) | Jorge Rabadan (editor) | |||
Nokia | Nokia | |||
777 E. Middlefield Road | 777 E. Middlefield Road | |||
Mountain View, CA 94043 USA | Mountain View, CA 94043 | |||
United States of America | ||||
Email: jorge.rabadan@nokia.com | Email: jorge.rabadan@nokia.com | |||
Wim Henderickx | Wim Henderickx | |||
Nokia | Nokia | |||
Email: wim.henderickx@nokia.com | Email: wim.henderickx@nokia.com | |||
John E. Drake | John Drake | |||
Juniper | Juniper | |||
Email: jdrake@juniper.net | ||||
Ali Sajassi | Email: jdrake@juniper.net | |||
Cisco | ||||
Email: sajassi@cisco.com | ||||
Wen Lin | Wen Lin | |||
Juniper | Juniper | |||
Email: wlin@juniper.net | Email: wlin@juniper.net | |||
Ali Sajassi | ||||
Cisco | ||||
Email: sajassi@cisco.com | ||||
End of changes. 371 change blocks. | ||||
1044 lines changed or deleted | 1098 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |