rfc9574.original | rfc9574.txt | |||
---|---|---|---|---|
BESS Workgroup J. Rabadan, Ed. | Internet Engineering Task Force (IETF) J. Rabadan, Ed. | |||
Internet-Draft S. Sathappan | Request for Comments: 9574 S. Sathappan | |||
Intended status: Standards Track Nokia | Category: Standards Track Nokia | |||
Expires: July 29, 2022 W. Lin | ISSN: 2070-1721 W. Lin | |||
Juniper Networks | Juniper Networks | |||
M. Katiyar | M. Katiyar | |||
Versa Networks | Versa Networks | |||
A. Sajassi | A. Sajassi | |||
Cisco Systems | Cisco Systems | |||
January 25, 2022 | May 2024 | |||
Optimized Ingress Replication Solution for Ethernet VPN (EVPN) | Optimized Ingress Replication Solution for Ethernet VPNs (EVPNs) | |||
draft-ietf-bess-evpn-optimized-ir-12 | ||||
Abstract | Abstract | |||
Network Virtualization Overlay networks using Ethernet VPN (EVPN) as | Network Virtualization Overlay (NVO) networks using Ethernet VPNs | |||
their control plane may use Ingress Replication or PIM (Protocol | (EVPNs) as their control plane may use trees based on ingress | |||
Independent Multicast)-based trees to convey the overlay Broadcast, | replication or Protocol Independent Multicast (PIM) to convey the | |||
Unknown unicast and Multicast (BUM) traffic. PIM provides an | overlay Broadcast, Unknown Unicast, or Multicast (BUM) traffic. PIM | |||
efficient solution to avoid sending multiple copies of the same | provides an efficient solution that prevents sending multiple copies | |||
packet over the same physical link, however it may not always be | of the same packet over the same physical link; however, it may not | |||
deployed in the Network Virtualization Overlay core network. Ingress | always be deployed in the NVO network core. Ingress replication | |||
Replication avoids the dependency on PIM in the Network | avoids the dependency on PIM in the NVO network core. While ingress | |||
Virtualization Overlay network core. While Ingress Replication | replication provides a simple multicast transport, some NVO networks | |||
provides a simple multicast transport, some Network Virtualization | with demanding multicast applications require a more efficient | |||
Overlay networks with demanding multicast applications require a more | solution without PIM in the core. This document describes a solution | |||
efficient solution without PIM in the core. This document describes | to optimize the efficiency of ingress replication trees. | |||
a solution to optimize the efficiency of Ingress Replication trees. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on July 29, 2022. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9574. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Revised BSD License text as described in Section 4.e of the | |||
the Trust Legal Provisions and are provided without warranty as | Trust Legal Provisions and are provided without warranty as described | |||
described in the Simplified BSD License. | in the Revised BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Terminology and Conventions . . . . . . . . . . . . . . . . . 6 | 2. Terminology and Conventions | |||
3. Solution Requirements . . . . . . . . . . . . . . . . . . . . 9 | 3. Solution Requirements | |||
4. EVPN BGP Attributes for Optimized Ingress Replication . . . . 9 | 4. EVPN BGP Attributes for Optimized Ingress Replication | |||
5. Non-Selective Assisted-Replication (AR) Solution Description 13 | 5. Non-selective Assisted Replication (AR) Solution Description | |||
5.1. Non-selective AR-REPLICATOR Procedures . . . . . . . . . 15 | 5.1. Non-selective AR-REPLICATOR Procedures | |||
5.2. Non-Selective AR-LEAF Procedures . . . . . . . . . . . . 17 | 5.2. Non-selective AR-LEAF Procedures | |||
5.3. RNVE Procedures . . . . . . . . . . . . . . . . . . . . . 19 | 5.3. RNVE Procedures | |||
6. Selective Assisted-Replication (AR) Solution Description . . 20 | 6. Selective Assisted Replication (AR) Solution Description | |||
6.1. Selective AR-REPLICATOR Procedures . . . . . . . . . . . 21 | 6.1. Selective AR-REPLICATOR Procedures | |||
6.2. Selective AR-LEAF Procedures . . . . . . . . . . . . . . 23 | 6.2. Selective AR-LEAF Procedures | |||
7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . 26 | 7. Pruned Flooding Lists (PFLs) | |||
7.1. A Pruned-Flood-List Example . . . . . . . . . . . . . . . 26 | 7.1. Example of a Pruned Flooding List | |||
8. AR Procedures for Single-IP AR-REPLICATORS . . . . . . . . . 28 | 8. AR Procedures for Single-IP AR-REPLICATORS | |||
9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon 28 | 9. AR Procedures and EVPN All-Active Multihoming Split-Horizon | |||
9.1. Ethernet Segments on AR-LEAF Nodes . . . . . . . . . . . 29 | 9.1. Ethernet Segments on AR-LEAF Nodes | |||
9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . 29 | 9.2. Ethernet Segments on AR-REPLICATOR Nodes | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 30 | 10. Security Considerations | |||
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | 11. IANA Considerations | |||
12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 32 | 12. References | |||
13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 | 12.1. Normative References | |||
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 12.2. Informative References | |||
14.1. Normative References . . . . . . . . . . . . . . . . . . 32 | Acknowledgements | |||
14.2. Informative References . . . . . . . . . . . . . . . . . 33 | Contributors | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
Ethernet Virtual Private Networks (EVPN) may be used as the control | Ethernet Virtual Private Networks (EVPNs) may be used as the control | |||
plane for a Network Virtualization Overlay network [RFC8365]. | plane for a Network Virtualization Overlay (NVO) network [RFC8365]. | |||
Network Virtualization Edge (NVE) and Provider Edge (PE) devices that | Network Virtualization Edge (NVE) and Provider Edge (PE) devices that | |||
are part of the same EVPN Broadcast Domain (BD) use Ingress | are part of the same EVPN Broadcast Domain (BD) use Ingress | |||
Replication or PIM-based trees to transport the tenant's Broadcast, | Replication (IR) or PIM-based trees to transport the tenant's | |||
Unknown unicast and Multicast (BUM) traffic. | Broadcast, Unknown Unicast, or Multicast (BUM) traffic. | |||
In the Ingress Replication approach, the ingress NVE receving a BUM | In the ingress replication approach, the ingress NVE receiving a BUM | |||
frame from the Tenant System will create as many copies of the frame | frame from the Tenant System (TS) will create as many copies of the | |||
as remote NVEs/PEs are attached to the BD. Each of those copies will | frame as the number of remote NVEs/PEs that are attached to the BD. | |||
be encapsulated into an IP packet where the outer IP Destination | Each of those copies will be encapsulated into an IP packet where the | |||
Address (IP DA) identifies the loopback of the egress NVE/PE. The IP | outer IP Destination Address (IP DA) identifies the loopback of the | |||
fabric core nodes (also known as Spines) will simply route the IP | egress NVE/PE. The IP fabric core nodes (also known as spines) will | |||
encapsulated BUM frames based on the outer IP DA. If PIM-based trees | simply route the IP-encapsulated BUM frames based on the outer IP DA. | |||
are used instead of Ingress Replication, the NVEs/PEs attached to the | If PIM-based trees are used instead of ingress replication, the NVEs/ | |||
same BD will join a PIM-based tree. The ingress NVE receiving a BUM | PEs attached to the same BD will join a PIM-based tree. The ingress | |||
frame will send a single copy of the frame, encapsulated into an IP | NVE receiving a BUM frame will send a single copy of the frame, | |||
packet where the outer IP DA is the multicast address that represents | encapsulated into an IP packet where the outer IP DA is the multicast | |||
the PIM-based tree. The IP fabric core nodes are part of the PIM | address that represents the PIM-based tree. The IP fabric core nodes | |||
tree and keep multicast state for the multicast group, so that IP | are part of the PIM tree and keep multicast state for the multicast | |||
encapsulated BUM frames can be routed to all the NVEs/PEs that joined | group, so that IP-encapsulated BUM frames can be routed to all the | |||
the tree. | NVEs/PEs that joined the tree. | |||
The two approaches are illustrated in Figure 1. On the left-hand | The two approaches are illustrated in Figure 1. On the left-hand | |||
side, NVE1 uses Ingress Replication to send a BUM frame (originated | side of the diagram, NVE1 uses ingress replication to send a BUM | |||
from Tenant System TS1) to the remote nodes attached to the BD, i.e., | frame (originated from Tenant System TS1) to the remote nodes | |||
NVE2, NV3, PE1. On the right-hand side of the diagram, the same | attached to the BD, i.e., NVE2, NVE3, and PE1. On the right-hand | |||
example is depicted but using a PIM-based tree, i.e., (S1,G1), | side, the same example is depicted but using a PIM-based tree, i.e., | |||
instead of Ingress Replication. While a single copy of the tunneled | (S1,G1), instead of ingress replication. While a single copy of the | |||
BUM frame is generated in the latter approach, all the routers in the | tunneled BUM frame is generated in the latter approach, all the | |||
fabric need to keep muticast state, e.g., the Spine keeps a PIM | routers in the fabric need to keep multicast state, e.g., the spine | |||
multicast routing entry for (S1,G1) with an Incoming Interface (IIF) | keeps a PIM routing entry for (S1,G1) with an Incoming Interface | |||
and three Outgoing Interfaces (OIFs). | (IIF) and three Outgoing Interfaces (OIFs). | |||
To-WAN To-WAN | To WAN To WAN | |||
^ ^ | ^ ^ | |||
| | | | | | |||
+-----+ +-----+ | +-----+ +-----+ | |||
+----------| PE1 |-----------+ +----------| PE1 |-----------+ | +----------| PE1 |-----------+ +----------| PE1 |-----------+ | |||
| +--^--+ | | +--^--+ | | | +--^--+ | | +--^--+ | | |||
| | IP Fabric | | | IP Fabric | | | | IP Fabric | | | IP Fabric | | |||
| PE | | (S1,G1) |OIF to-G | | | PE | | (S1,G1) |OIF to G1 | | |||
| +----PE->+-----+ No State | | IIF +-----+ OIF to-G | | | +----PE->+-----+ No State | | IIF +-----+ OIF to G1 | | |||
| | +---2->|Spine|------+ | | +------>Spine|------+ | | | | +---2->|Spine|------+ | | +------>Spine|------+ | | |||
| | | +-3->+-----+ | | | | +-----+ | | | | | | +-3->+-----+ | | | | +-----+ | | | |||
| | | | 2 3 | | |PIM |OIF to-G | | | | | | | 2 3 | | |PIM |OIF to G1| | | |||
| | | |IR | | | | |tree | | | | | | | |IR | | | | |tree | | | | |||
|+-----+ +--v--+ +--v--+ | |+-----+ +--v--+ +--v--+ | | |+-----+ +--v--+ +--v--+ | |+-----+ +--v--+ +--v--+ | | |||
+| NVE1|---| NVE2|---| NVE3|-+ +| NVE1|---| NVE2|---| NVE3|-+ | +| NVE1|---| NVE2|---| NVE3|-+ +| NVE1|---| NVE2|---| NVE3|-+ | |||
+--^--+ +-----+ +-----+ +--^--+ +-----+ +-----+ | +--^--+ +-----+ +-----+ +--^--+ +-----+ +-----+ | |||
| | | | | | | | | | | | | | |||
| v v | v v | | v v | v v | |||
TS1 TS2 TS3 TS1 TS2 TS3 | TS1 TS2 TS3 TS1 TS2 TS3 | |||
Figure 1: Ingress Replication vs PIM-based trees in NVO networks | Figure 1: Ingress Replication vs. PIM-Based Trees in NVO Networks | |||
In Network Virtualization Overlay networks where PIM-based trees | In NVO networks where PIM-based trees cannot be used, ingress | |||
cannot be used, Ingress Replication is the only option. Examples of | replication is the only option. Examples of these situations are NVO | |||
these situations are Network Virtualization Overlay networks where | networks where the core nodes do not support PIM or the network | |||
the core nodes do not support PIM or the network operator does not | operator does not want to run PIM in the core. | |||
want to run PIM in the core. | ||||
In some use-cases, the amount of replication for BUM traffic is kept | In some use cases, the amount of replication for BUM traffic is kept | |||
under control on the NVEs due to the following fairly common | under control on the NVEs due to the following fairly common | |||
assumptions: | assumptions: | |||
a. Broadcast is greatly reduced due to the proxy ARP (Address | a. Broadcast traffic is greatly reduced due to the proxy Address | |||
Resolution Protocol) and proxy ND (Neighbor Discovery) | Resolution Protocol (ARP) and proxy Neighbor Discovery (ND) | |||
capabilities supported by EVPN on the NVEs | capabilities supported by EVPNs [RFC9161] on the NVEs. Some NVEs | |||
[I-D.ietf-bess-evpn-proxy-arp-nd]. Some NVEs can even provide | can even provide Dynamic Host Configuration Protocol (DHCP) | |||
Dynamic Host Configuration Protocol (DHCP) server functions for | server functions for the attached TSs, reducing the broadcast | |||
the attached Tenant Systems, reducing the broadcast even further. | traffic even further. | |||
b. Unknown unicast traffic is greatly reduced in Network | b. Unknown unicast traffic is greatly reduced in NVO networks where | |||
Virtualization Overlay networks where all the MAC and IP | all the Media Access Control (MAC) and IP addresses from the TSs | |||
addresses from the Tenant Systems are learned in the control | are learned in the control plane. | |||
plane. | ||||
c. Multicast applications are not used. | c. Multicast applications are not used. | |||
If the above assumptions are true for a given Network Virtualization | If the above assumptions are true for a given NVO network, then | |||
Overlay network, then Ingress Replication provides a simple solution | ingress replication provides a simple solution for multi-destination | |||
for multi-destination traffic. However, the statement c) above is | traffic. However, statement c. above is not always true, and | |||
not always true and multicast applications are required in many use- | multicast applications are required in many use cases. | |||
cases. | ||||
When the multicast sources are attached to NVEs residing in | When the multicast sources are attached to NVEs residing in | |||
hypervisors or low-performance-replication TORs (Top Of Rack | hypervisors or low-performance-replication Top-of-Rack (ToR) | |||
switches), the ingress replication of a large amount of multicast | switches, the ingress replication of a large amount of multicast | |||
traffic to a significant number of remote NVEs/PEs can seriously | traffic to a significant number of remote NVEs/PEs can seriously | |||
degrade the performance of the NVE and impact the application. | degrade the performance of the NVE and impact the application. | |||
This document describes a solution that makes use of two Ingress | This document describes a solution that makes use of two ingress | |||
Replication optimizations: | replication optimizations: | |||
1. Assisted-Replication (AR) | 1. Assisted Replication (AR) | |||
2. Pruned-Flood-Lists (PFL) | 2. Pruned Flooding Lists (PFLs) | |||
Assisted-Replication consists of a set of procedures that allows the | Assisted Replication consists of a set of procedures that allows the | |||
ingress NVE/PE to send a single copy of a Broadcast or Multicast | ingress NVE/PE to send a single copy of a broadcast or multicast | |||
frame received from a Tenant System to the Broadcast Domain, without | frame received from a TS to the BD without the need for PIM in the | |||
the need for PIM in the underlay. Assisted Replication defines the | underlay. Assisted Replication defines the roles of AR-REPLICATOR | |||
roles of AR-REPLICATOR and AR-LEAF routers. The AR-LEAF is the | and AR-LEAF routers. The AR-LEAF is the ingress NVE/PE attached to | |||
ingress NVE/PE attached to the Tenant System. The AR-LEAF sends a | the TS. The AR-LEAF sends a single copy of a broadcast or multicast | |||
single copy of a Broadcast or Multicast packet to a selected AR- | packet to a selected AR-REPLICATOR that replicates the packet | |||
REPLICATOR that replicates the packet mutiple times to remote AR-LEAF | multiple times to remote AR-LEAF or AR-REPLICATOR routers and is | |||
or AR-REPLICATOR routers, and therefore "assisting" the ingress AR- | therefore "assisting" the ingress AR-LEAF in delivering the broadcast | |||
LEAF in delivering the Broadcast or Multicast traffic to the remote | or multicast traffic to the remote NVEs/PEs attached to the same BD. | |||
NVEs/PEs attached to the same Broadcast Domain. Assisted-Replication | Assisted Replication can use a single AR-REPLICATOR or two AR- | |||
can use a single AR-REPLICATOR or two AR-REPLICATOR routers in the | REPLICATOR routers in the path between the ingress AR-LEAF and the | |||
path between the ingress AR-LEAF and the remote destination NVE/PEs. | remote destination NVEs/PEs. The procedures that use a single AR- | |||
The procedures that use a single AR-REPLICATOR (Non-Selective | REPLICATOR (the non-selective Assisted Replication solution) are | |||
Assisted-Replication Solution) are specified in Section 5, whereas | specified in Section 5, whereas Section 6 describes how multi-stage | |||
Section 6 describes how multi-staged replication, i.e., two AR- | replication, i.e., two AR-REPLICATOR routers in the path between the | |||
REPLICATOR routers in the path between the ingress AR-LEAF and | ingress AR-LEAF and destination NVEs/PEs, is accomplished (the | |||
destination NVEs/PEs, is accomplished (Selective Assisted-Replication | selective Assisted Replication solution). The procedures for | |||
Solution). The Assisted-Replication procedures do not impact unknown | Assisted Replication do not impact unknown unicast traffic, which | |||
unicast traffic, which follows the same forwarding procedures as | follows the same forwarding procedures as known unicast traffic so | |||
known unicast traffic so that packet re-ordering does not occur. | that packet reordering does not occur. | |||
Pruned-Flood-Lists is a method for the ingress NVE/PE to prune or | PFLs provide a method for the ingress NVE/PE to prune or remove | |||
remove certain destination NVEs/PEs from a flood-list, depending on | certain destination NVEs/PEs from a flooding list, depending on the | |||
the interest of those NVEs/PEs in receiving Broadcast, Multicast or | interest of those NVEs/PEs in receiving BUM traffic. As specified in | |||
Unknown unicast. As specified in [RFC8365], an NVE/PE builds a | [RFC8365], an NVE/PE builds a flooding list for BUM traffic based on | |||
flood-list for BUM traffic based on the Next-Hops of the received | the next hops of the received EVPN Inclusive Multicast Ethernet Tag | |||
EVPN Inclusive Multicast Ethernet Tag routes for the Broadcast | routes for the BD. While [RFC8365] states that the flooding list is | |||
Domain. While [RFC8365] states that the flood-list is used for all | used for all BUM traffic, this document allows pruning certain next | |||
BUM traffic, this document allows pruning certain Next-Hops from the | hops from the list. As an example, suppose an ingress NVE creates a | |||
list. As an example, suppose an ingress NVE creates a flood-list | flooding list with next hops PE1, PE2, and PE3. If PE2 and PE3 did | |||
with Next-Hops PE1, PE2 and PE3. If PE2 and PE3 signaled no-interest | not signal any interest in receiving unknown unicast traffic in their | |||
in receiving Unknown Unicast in their Inclusive Multicast Ethernet | Inclusive Multicast Ethernet Tag routes, when the ingress NVE | |||
Tag routes, when the ingress NVE receives an Unknown Unicast frame | receives an unknown unicast frame from a TS, it will replicate it | |||
from a Tenant System it will replicate it only to PE1. That is, PE2 | only to PE1. That is, PE2 and PE3 are "pruned" from the NVE's | |||
and PE3 are "pruned" from the NVE's flood-list for Unknown Unicast | flooding list for unknown unicast traffic. PFLs can be used with | |||
traffic. Pruned-Flood-Lists can be used with Ingress Replication or | ingress replication or Assisted Replication and are described in | |||
Assisted-Replication, and it is described in Section 7. | Section 7. | |||
Both optimizations, Assisted-Replication and Pruned-Flood-Lists, may | Both optimizations -- Assisted Replication and PFLs -- may be used | |||
be used together or independently so that the performance and | together or independently so that the performance and efficiency of | |||
efficiency of the network to transport multicast can be improved. | the network to transport multicast can be improved. Both solutions | |||
Both solutions require some extensions to the BGP attributes used in | require some extensions to the BGP attributes used in [RFC7432]; see | |||
[RFC7432], and they are described in Section 4. | Section 4 for details. | |||
The Assisted-Replication solution described in this document is | The Assisted Replication solution described in this document is | |||
focused on Network Virtualization Overlay networks (hence it uses IP | focused on NVO networks (hence its use of IP tunnels). MPLS | |||
tunnels) and MPLS transport networks are out of scope. The Pruned- | transport networks are out of scope for this document. The PFLs | |||
Flood-Lists solution MAY be used in Network Virtualization Overlay | solution MAY be used in NVO and MPLS transport networks. | |||
and MPLS transport networks. | ||||
Section 3 lists the requirements of the combined optimized Ingress | Section 3 lists the requirements of the combined optimized ingress | |||
Replication solution, whereas Section 5 and Section 6 describe the | replication solution, whereas Sections 5 and 6 describe the Assisted | |||
Assisted-Replication solution (for Non-Selective and Selective | Replication solution for non-selective and selective procedures, | |||
procedures, respectively), and Section 7 the Pruned-Flood-Lists | respectively. Section 7 provides the PFLs solution. | |||
solution. | ||||
2. Terminology and Conventions | 2. Terminology and Conventions | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
The following terminology is used throughout the document: | The following terminology is used throughout this document: | |||
- Asisted Replication forwarding mode: for an AR-LEAF, it means | AR-IP: Assisted Replication - IP. Refers to an IP address owned by | |||
sending an Attachment Circuit BM packet to a single AR-REPLICATOR | the AR-REPLICATOR and used to differentiate the incoming traffic | |||
with tunnel destination IP AR-IP. For an AR-REPLICATOR, it means | that must follow the AR procedures. The AR-IP is also used in the | |||
sending a BM packet to a selected number or all the overlay | Tunnel Identifier and Next Hop fields of the Replicator-AR route. | |||
tunnels when the packet was previously received from an overlay | ||||
tunnel. | ||||
- AR-LEAF: Assisted Replication - LEAF, refers to an NVE/PE that | AR-LEAF: Assisted Replication - LEAF. Refers to an NVE/PE that | |||
sends all the Broadcast and Multicast traffic to an AR-REPLICATOR | sends all the BM traffic to an AR-REPLICATOR that can replicate | |||
that can replicate the traffic further on its behalf. An AR-LEAF | the traffic further on its behalf. An AR-LEAF is typically an | |||
is typically an NVE/PE with poor replication performance | NVE/PE with poor replication performance capabilities. | |||
capabilities. | ||||
- AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an | AR-REPLICATOR: Assisted Replication - REPLICATOR. Refers to an NVE/ | |||
NVE/PE that can replicate Broadcast or Multicast traffic received | PE that can replicate broadcast or multicast traffic received on | |||
on overlay tunnels to other overlay tunnels and local Attachment | overlay tunnels to other overlay tunnels and local Attachment | |||
Circuits. This document defines the control and data plane | Circuits (ACs). This document defines the control and data plane | |||
procedures that an AR-REPLICATOR needs to follow. | procedures that an AR-REPLICATOR needs to follow. | |||
- AR-IP: IP address owned by the AR-REPLICATOR and used to | AR-VNI: Assisted Replication - VNI. Refers to a Virtual eXtensible | |||
differentiate the incoming traffic that must follow the AR | Local Area Network (VXLAN) Network Identifier (VNI) advertised by | |||
procedures. The AR-IP is also used in the Tunnel Identifier and | the AR-REPLICATOR along with the Replicator-AR route. It is used | |||
Next-Hop fields of the Replicator-AR route. | to identify the incoming packets that must follow the AR | |||
procedures ONLY in the single-IP AR-REPLICATOR case (see | ||||
Section 8). | ||||
- AR-VNI: VNI advertised by the AR-REPLICATOR along with the | Assisted Replication forwarding mode: In the case of an AR-LEAF, | |||
Replicator-AR route. It is used to identify the incoming packets | sending an AC Broadcast and Multicast (BM) packet to a single AR- | |||
that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR | REPLICATOR with a tunnel destination address AR-IP. In the case | |||
case Section 8. | of an AR-REPLICATOR, this means sending a BM packet to a selected | |||
number of, or all of, the overlay tunnels when the packet was | ||||
previously received from an overlay tunnel. | ||||
- BM traffic: Refers to Broadcast and Multicast frames (excluding | BD: Broadcast Domain, as defined in [RFC7432]. | |||
unknown unicast frames). | ||||
- BD: Broadcast Domain, as defined in [RFC7432]. | BD label: Defined as the MPLS label that identifies the BD and is | |||
advertised in Regular-IR or Replicator-AR routes, when the | ||||
encapsulation is MPLS over GRE (MPLSoGRE) or MPLS over UDP | ||||
(MPLSoUDP). | ||||
- BD label: defined as the MPLS label that identifies the Broadcast | BM traffic: Refers to broadcast and multicast frames (excluding | |||
Domain and is advertised in Regular-IR or Replicator-AR routes, | unknown unicast frames). | |||
when the encapsulation is MPLSoGRE or MPLSoUDP. | ||||
- DF and NDF: Designated Forwarder and Non-Designated Forwarder, are | DF and NDF: Designated Forwarder and Non-Designated Forwarder. | |||
roles defined in NVE/PEs attached to Multi-Homed Tenant Systems, | These are roles defined in NVEs/PEs attached to multihomed TSs, as | |||
as per [RFC7432] and [RFC8365]. | per [RFC7432] and [RFC8365]. | |||
- ES and ESI: Ethernet Segment and Ethernet Segment Identifier, as | ES and ESI: Ethernet Segment and Ethernet Segment Identifier. EVPN | |||
EVPN Multi-Homing concepts specified in [RFC7432]. | multihoming concepts as specified in [RFC7432]. | |||
- EVI: EVPN Instance. A group of Provider Edge (PE) devices | EVI: EVPN Instance. A group of Provider Edge (PE) devices | |||
participating in the same EVPN service, as specified in [RFC7432]. | participating in the same EVPN service, as specified in [RFC7432]. | |||
- GRE: Generic Routing Encapsulation [RFC4023]. | GRE: Generic Routing Encapsulation [RFC4023]. | |||
- Ingress Replication forwarding mode: it refers to the Ingress | Ingress Replication forwarding mode: Refers to the ingress | |||
Replication behavior explained in [RFC7432]. It means sending an | replication behavior explained in [RFC7432]. In this mode, an AC | |||
Attachment Circuit BM packet copy to each remote PE/NVE in the BD | BM packet copy is sent to each remote PE/NVE in the BD, and an | |||
and sending an overlay BM packet only to the Attachment Circuits | overlay BM packet is sent only to the ACs and not to other overlay | |||
and not other overlay tunnels. | tunnels. | |||
- IR-IP: local IP address of an NVE/PE that is used for the Ingress | IR-IP: Ingress Replication - IP. Refers to the local IP address of | |||
Replication signaling and procedures in [RFC7432]. Encapsulated | an NVE/PE that is used for the ingress replication signaling and | |||
incoming traffic with outer destination IP matching the IR-IP will | procedures provided in [RFC7432]. Encapsulated incoming traffic | |||
follow the Ingress Replication procedures and not the Assisted- | with an outer destination IP address matching the IR-IP will | |||
Replication procedures. The IR-IP is also used in the Tunnel | follow the procedures for ingress replication and not the | |||
Identifier and Next-hop fields of the Regular-IR route. | procedures for Assisted Replication. The IR-IP is also used in | |||
the Tunnel Identifier and Next Hop fields of the Regular-IR route. | ||||
- IR-VNI: VNI advertised along with the Inclusive Multicast Ethernet | IR-VNI: Ingress Replication - VNI. Refers to a VNI advertised along | |||
Tag route for Ingress Replication Tunnel Type. | with the Inclusive Multicast Ethernet Tag route for the ingress | |||
replication tunnel type. | ||||
- MPLS: Multi-Protocol Label Switching. | MPLS: Multi-Protocol Label Switching. | |||
- NVE: Network Virtualization Edge router, used in this document as | NVE: Network Virtualization Edge [RFC8365]. | |||
in [RFC8365]. | ||||
- NVGRE: Network Virtualization using Generic Routing Encapsulation, | NVGRE: Network virtualization using Generic Routing Encapsulation | |||
as in [RFC7637]. | [RFC7637]. | |||
- PE: Provider Edge router. | PE: Provider Edge. | |||
- PMSI: P-Multicast Service Interface - a conceptual interface for a | PMSI: P-Multicast Service Interface. A conceptual interface for a | |||
PE to send customer multicast traffic to all or some PEs in the | PE to send customer multicast traffic to all or some PEs in the | |||
same VPN [RFC6513]. | same VPN [RFC6513]. | |||
- RD: Route Distinguisher. | RD: Route Distinguisher. | |||
- Regular-IR route: an EVPN Inclusive Multicast Ethernet Tag route | Regular-IR route: An EVPN Inclusive Multicast Ethernet Tag route | |||
[RFC7432] that uses Ingress Replication Tunnel Type. | [RFC7432] that uses the ingress replication tunnel type. | |||
- RNVE: Regular NVE, refers to an NVE that supports the procedures | Replicator-AR route: An EVPN Inclusive Multicast Ethernet Tag route | |||
of [RFC8365] and does not support the procedures in this document. | that is advertised by an AR-REPLICATOR to signal its capabilities, | |||
However, this document defines procedures to interoperate with | as described in Section 4. | |||
RNVEs. | ||||
- Replicator-AR route: an EVPN Inclusive Multicast Ethernet Tag | RNVE: Regular NVE. Refers to an NVE that supports the procedures | |||
route that is advertised by an AR-REPLICATOR to signal its | provided in [RFC8365] and does not support the procedures provided | |||
capabilities, as described in Section 4. | in this document. However, this document defines procedures to | |||
interoperate with RNVEs. | ||||
- TOR: Top Of Rack switch. | ToR switch: Top-of-Rack switch. | |||
- TS and VM: Tenant System and Virtual Machine. In this document | TS and VM: Tenant System and Virtual Machine. In this document, TSs | |||
Tenant Systems and Virtual Machiness are the devices connected to | and VMs are the devices connected to the ACs of the PEs and NVEs. | |||
the Attachment Circuits of the PEs and NVEs. | ||||
- VNI: VXLAN Network Identifier, used in VXLAN tunnels. | VNI: VXLAN Network Identifier. Used in VXLAN tunnels. | |||
- VSID: Virtual Segment Identifier, used in NVGRE tunnels. | VSID: Virtual Segment Identifier. Used in NVGRE tunnels. | |||
- VXLAN: Virtual Extensible LAN [RFC7348]. | VXLAN: Virtual eXtensible Local Area Network [RFC7348]. | |||
3. Solution Requirements | 3. Solution Requirements | |||
The Ingress Replication optimization solution specified in this | The ingress replication optimization solution specified in this | |||
document meets the following requirements: | document meets the following requirements: | |||
a. It provides an Ingress Replication optimization for Broadcast and | a. The solution provides an ingress replication optimization for BM | |||
Multicast traffic without the need for PIM, while preserving the | traffic without the need for PIM while preserving the packet | |||
packet order for unicast applications, i.e., unknown unicast | order for unicast applications, i.e., unknown unicast traffic | |||
traffic should follow the same path as known unicast traffic. | should follow the same path as known unicast traffic. This | |||
This optimization is required in low-performance NVEs. | optimization is required in low-performance NVEs. | |||
b. It reduces the flooded traffic in Network Virtualization Overlay | b. The solution reduces the flooded traffic in NVO networks where | |||
networks where some NVEs do not need broadcast/multicast and/or | some NVEs do not need broadcast/multicast and/or unknown unicast | |||
unknown unicast traffic. | traffic. | |||
c. The solution is compatible with [RFC7432] and [RFC8365] and has | c. The solution is compatible with [RFC7432] and [RFC8365] and has | |||
no impact on the CE procedures for BM traffic. In particular, | no impact on the Customer Edge (CE) procedures for BM traffic. | |||
the solution supports the following EVPN functions: | In particular, the solution supports the following EVPN | |||
functions: | ||||
o All-active multi-homing, including the split-horizon and | * All-active multihoming, including the split-horizon and DF | |||
Designated Forwarder (DF) functions. | functions. | |||
o Single-active multi-homing, including the DF function. | * Single-active multihoming, including the DF function. | |||
o Handling of multi-destination traffic and processing of | * Handling of multi-destination traffic and processing of BM | |||
broadcast and multicast as per [RFC7432]. | traffic as per [RFC7432]. | |||
d. The solution is backwards compatible with existing NVEs using a | d. The solution is backward compatible with existing NVEs using a | |||
non-optimized version of Ingress Replication. A given BD can | non-optimized version of ingress replication. A given BD can | |||
have NVEs/PEs supporting regular Ingress Replication and | have NVEs/PEs supporting regular ingress replication and | |||
optimized Ingress Replication. | optimized ingress replication. | |||
e. The solution is independent of the Network Virtualization Overlay | e. The solution is independent of the NVO-specific data plane | |||
specific data plane encapsulation and the virtual identifiers | encapsulation and the virtual identifiers being used, e.g., VXLAN | |||
being used, e.g.: VXLAN VNIs, NVGRE VSIDs or MPLS labels, as long | VNIs, NVGRE VSIDs, or MPLS labels, as long as the tunnel is IP | |||
as the tunnel is IP-based. | based. | |||
4. EVPN BGP Attributes for Optimized Ingress Replication | 4. EVPN BGP Attributes for Optimized Ingress Replication | |||
This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag | The ingress replication optimization solution specified in this | |||
routes and attributes so that an NVE/PE can signal its optimized | document extends the Inclusive Multicast Ethernet Tag routes and | |||
Ingress Replication capabilities. | attributes described in [RFC7432] so that an NVE/PE can signal its | |||
optimized ingress replication capabilities. | ||||
The NLRI of the Inclusive Multicast Ethernet Tag route as in | The Network Layer Reachability Information (NLRI) of the Inclusive | |||
[RFC7432] is shown in Figure 2 and it is used in this document | Multicast Ethernet Tag route [RFC7432] is shown in Figure 2 and is | |||
without any modifications to its format. The PMSI Tunnel Attribute's | used in this document without any modifications to its format. The | |||
general format as in [RFC7432] (which takes it from [RFC6514]) is | PMSI Tunnel Attribute's general format as provided in [RFC7432] | |||
used in this document, only a new Tunnel Type and new flags are | (which takes it from [RFC6514]) is used in this document; only a new | |||
specified, as shown in Figure 3: | tunnel type and new flags are specified, as shown in Figure 3. | |||
+---------------------------------+ | +------------------------------------+ | |||
| RD (8 octets) | | | RD (8 octets) | | |||
+---------------------------------+ | +------------------------------------+ | |||
| Ethernet Tag ID (4 octets) | | | Ethernet Tag ID (4 octets) | | |||
+---------------------------------+ | +------------------------------------+ | |||
| IP Address Length (1 octet) | | | IP Address Length (1 octet) | | |||
+---------------------------------+ | +------------------------------------+ | |||
| Originating Router's IP Addr | | | Originating Router's IP Address | | |||
| (4 or 16 octets) | | | (4 or 16 octets) | | |||
+---------------------------------+ | +------------------------------------+ | |||
Figure 2: EVPN Inclusive Multicast Tag route's NLRI | Figure 2: EVPN Inclusive Multicast Ethernet Tag Route's NLRI | |||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+---------------------------------+ +--+--+--+--+--+--+--+--+ | +---------------------------------+ +--+--+--+--+--+--+--+--+ | |||
| Flags (1 octet) | -> |x |E |x | T |BM|U |L | | | Flags (1 octet) | -> |x |E |x | T |BM|U |L | | |||
+---------------------------------+ +--+--+--+--+--+--+--+--+ | +---------------------------------+ +--+--+--+--+--+--+--+--+ | |||
| Tunnel Type (1 octets) | T = Assisted-Replication Type | | Tunnel Type (1 octet) | T = Assisted Replication Type | |||
+---------------------------------+ BM = Broadcast and Multicast | +---------------------------------+ BM = Broadcast and Multicast | |||
| MPLS Label (3 octets) | U = Unknown unicast | | MPLS Label (3 octets) | U = Unknown (unknown unicast) | |||
+---------------------------------+ x = unassigned | +---------------------------------+ x = unassigned | |||
| Tunnel Identifier (variable) | | | Tunnel Identifier (variable) | | |||
+---------------------------------+ | +---------------------------------+ | |||
Figure 3: PMSI Tunnel Attribute | Figure 3: PMSI Tunnel Attribute | |||
The Flags field in Figure 3 is 8 bits long as per [RFC7902], where | The Flags field in Figure 3 is 8 bits long as per [RFC7902]. The | |||
the Extension flag (E) and the Leaf Information Required (L) Flag are | Extension (E) flag was allocated by [RFC7902], and the Leaf | |||
already allocated. This document defines the use of 4 bits of this | Information Required (L) flag was allocated by [RFC6514]. This | |||
Flags field, and suggests the following allocation to IANA: | document defines the use of 4 bits of this Flags field: | |||
- bits 3 and 4, forming together the Assisted-Replication Type (T) | * Bits 3 and 4, which together form the Assisted Replication Type | |||
field | (T) field | |||
- bit 5, called the Broadcast and Multicast (BM) flag | * Bit 5, called the Broadcast and Multicast (BM) flag | |||
- bit 6, called the Unknown (U) flag | * Bit 6, called the Unknown (U) flag | |||
Bits 5 and 6 are collectively referred to as the Pruned-Flood Lists | Bits 5 and 6 are collectively referred to as the Pruned Flooding | |||
(PFL) flags. | Lists (PFLs) flags. | |||
The T field and Pruned-Flood-Lists flags are defined as follows: | The T field and PFLs flags are defined as follows: | |||
- T is the Assisted-Replication Type field (2 bits) that defines the | * T is the Assisted Replication Type field (2 bits), which defines | |||
AR role of the advertising router: | the AR role of the advertising router: | |||
o 00 (decimal 0) = RNVE (non-AR support) | - 00 (decimal 0) = RNVE (non-AR support) | |||
o 01 (decimal 1) = AR-REPLICATOR | - 01 (decimal 1) = AR-REPLICATOR | |||
o 10 (decimal 2) = AR-LEAF | - 10 (decimal 2) = AR-LEAF | |||
o 11 (decimal 3) = RESERVED | - 11 (decimal 3) = RESERVED | |||
- The Pruned-Flood-Lists flags define the desired behavior of the | * The PFLs flags define the desired behavior of the advertising | |||
advertising router for the different types of traffic: | router for the different types of traffic: | |||
o Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from | - Broadcast and Multicast (BM) flag. BM = 1 means "prune me from | |||
the BM flooding list. BM=0 means regular behavior. | the BM flooding list". BM = 0 indicates regular behavior. | |||
o Unknown (U) flag. U=1 means "prune-me" from the Unknown | - Unknown (U) flag. U = 1 means "prune me from the Unknown | |||
flooding list. U=0 means regular behavior. | flooding list". U = 0 indicates regular behavior. | |||
- Flag L is an existing flag defined in [RFC6514] (L=Leaf | * The L flag (bit 7) is defined in [RFC6514] and will be used only | |||
Information Required, bit 7) and it will be used only in the | in the selective AR solution. | |||
Selective AR Solution. | ||||
Please refer to Section 11 for the IANA considerations related to the | Please refer to Section 11 for the IANA considerations related to the | |||
PMSI Tunnel Attribute flags. | PMSI Tunnel Attribute flags. | |||
In this document, the above Inclusive Multicast Ethernet Tag route | In this document, the above Inclusive Multicast Ethernet Tag route | |||
Figure 2 and PMSI Tunnel Attribute Figure 3 can be used in two | (Figure 2) and PMSI Tunnel Attribute (Figure 3) can be used in two | |||
different modes for the same BD: | different modes for the same BD: | |||
- Regular-IR route: in this route, Originating Router's IP Address, | Regular-IR route: In this route, Originating Router's IP Address, | |||
Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used | Tunnel Type (0x06), MPLS Label, and Tunnel Identifier MUST be used | |||
as described in [RFC7432] when Ingress Replication is in use. The | as described in [RFC7432] when ingress replication is in use. The | |||
NVE/PE that advertises the route will set the Next-Hop to an IP | NVE/PE that advertises the route will set the Next Hop to an IP | |||
address that we denominate IR-IP in this document. When | address that we denominate IR-IP in this document. When | |||
advertised by an AR-LEAF node, the Regular-IR route MUST be | advertised by an AR-LEAF node, the Regular-IR route MUST be | |||
advertised with type T set to 10 (AR-LEAF). | advertised with the T field set to 10 (AR-LEAF). | |||
- Replicator-AR route: this route is used by the AR-REPLICATOR to | Replicator-AR route: This route is used by the AR-REPLICATOR to | |||
advertise its AR capabilities, with the fields set as follows: | advertise its AR capabilities, with the fields set as follows: | |||
o Originating Router's IP Address MUST be set to an IP address of | * Originating Router's IP Address MUST be set to an IP address of | |||
the advertising router that is common to all the EVIs on the PE | the advertising router that is common to all the EVIs on the PE | |||
(usually this is a loopback address of the PE). | (usually this is a loopback address of the PE). | |||
+ The Tunnel Identifier and Next-Hop SHOULD be set to the same | - The Tunnel Identifier and Next Hop fields SHOULD be set to | |||
IP address as the Originating Router's IP address when the | the same IP address as the Originating Router's IP Address | |||
NVE/PE originates the route, that is, when the NVE/PE is not | field when the NVE/PE originates the route -- that is, when | |||
an ASBR as in section 10.2 of [RFC8365]. Irrespective of | the NVE/PE is not an ASBR; see Section 10.2 of [RFC8365]. | |||
the values in the Tunnel Identifier and Originating Router's | Irrespective of the values in the Tunnel Identifier and | |||
IP Address fields, the ingress NVE/PE will process the | Originating Router's IP Address fields, the ingress NVE/PE | |||
received Replicator-AR route and will use the IP Address in | will process the received Replicator-AR route and will use | |||
the Next-Hop field to create IP tunnels to the AR- | the IP address setting in the Next Hop field to create IP | |||
REPLICATOR. | tunnels to the AR-REPLICATOR. | |||
+ The Next-Hop address is referred to as the AR-IP and MUST be | - The Next Hop address is referred to as the AR-IP and MUST be | |||
different from the IR-IP for a given PE/NVE, unless the | different from the IR-IP for a given PE/NVE, unless the | |||
procedures in Section 8 are followed. | procedures provided in Section 8 are followed. | |||
o Tunnel Type MUST be set to Assisted-Replication Tunnel. | * Tunnel Type MUST be set to Assisted Replication Tunnel. | |||
Section 11 provides the allocated type value. | Section 11 provides the allocated type value. | |||
o T (AR role type) MUST be set to 01 (AR-REPLICATOR). | * T (Assisted Replication type) MUST be set to 01 (AR- | |||
REPLICATOR). | ||||
o L (Leaf Information Required) MUST be set to 0 (for non- | * L (Leaf Information Required) MUST be set to 0 for non- | |||
selective AR), and MUST be set to 1 (for selective AR). | selective AR and MUST be set to 1 for selective AR. | |||
An NVE/PE configured as AR-REPLICATOR for a BD MUST advertise a | An NVE/PE configured as an AR-REPLICATOR for a BD MUST advertise a | |||
Replicator-AR route for the BD and MAY advertise a Regular-IR route. | Replicator-AR route for the BD and MAY advertise a Regular-IR route. | |||
The advertisement of the Replicator-AR route will indicate the AR- | The advertisement of the Replicator-AR route will indicate to the AR- | |||
LEAFs what outer IP DA, i.e., the AR-IP, they need to use for IP | LEAFs which outer IP DA, i.e., which AR-IP, they need to use for IP- | |||
encapsulated BM frames that use Assisted Replication forwarding mode. | encapsulated BM frames that use Assisted Replication forwarding mode. | |||
The AR-REPLICATOR will forward an IP encapsulated BM frame in | The AR-REPLICATOR will forward an IP-encapsulated BM frame in | |||
Assisted Replication forwarding mode if the outer IP DA matches its | Assisted Replication forwarding mode if the outer IP DA matches its | |||
AR-IP, but will forward in Ingress Replication forwarding mode if the | AR-IP but will forward in Ingress Replication forwarding mode if the | |||
outer IP DA matches its IR-IP. | outer IP DA matches its IR-IP. | |||
In addition, this document also uses the Leaf Auto-Discovery (Leaf | In addition, this document also uses the Leaf Auto-Discovery (Leaf | |||
A-D) route defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in | A-D) route defined in [RFC9572] in cases where the selective AR mode | |||
case the selective AR mode is used. An AR-LEAF MAY send a Leaf A-D | is used. An AR-LEAF MAY send a Leaf A-D route in response to | |||
route in response to reception of a Replicator-AR route whose L flag | reception of a Replicator-AR route whose L flag is set. The Leaf A-D | |||
is set. The Leaf Auto-Discovery route is only used for selective AR | route is only used for selective AR, and the fields of such a route | |||
and the fields of such route are set as follows: | are set as follows: | |||
o Originating Router's IP Address is set to the advertising | * Originating Router's IP Address is set to the advertising router's | |||
router's IP address (same IP used by the AR-LEAF in regular-IR | IP address (the same IP address used by the AR-LEAF in Regular-IR | |||
routes). The Next-Hop address is set to the IR-IP, which | routes). The Next Hop address is set to the IR-IP, which SHOULD | |||
SHOULD be the same IP address as the advertising router's IP | be the same IP address as the advertising router's IP address, | |||
address, when the NVE/PE originates the route, i.e., when the | when the NVE/PE originates the route, i.e., when the NVE/PE is not | |||
NVE/PE is not an ASBR as in section 10.2 of [RFC8365]. | an ASBR; see Section 10.2 of [RFC8365]. | |||
o Route Key is the "Route Type Specific" NLRI of the Replicator- | * Route Key [RFC9572] is the "Route Type Specific" NLRI of the | |||
AR route for which this Leaf Auto-Discovery route is generated. | Replicator-AR route for which this Leaf A-D route is generated. | |||
o The AR-LEAF constructs an IP-address-specific route-target, | * The AR-LEAF constructs an IP-address-specific Route Target, | |||
analogously to [I-D.ietf-bess-evpn-bum-procedure-updates], by | analogously to [RFC9572], by placing the IP address carried in the | |||
placing the IP address carried in the Next-Hop field of the | Next Hop field of the received Replicator-AR route in the Global | |||
received Replicator-AR route in the Global Administrator field | Administrator field of the extended community, with the Local | |||
of the Community, with the Local Administrator field of this | Administrator field of this extended community set to 0, and | |||
Community set to 0, and setting the Extended Communities | setting the Extended Communities attribute of the Leaf A-D route | |||
attribute of the Leaf Auto-Discovery route to that Community. | to that extended community. The same IP-address-specific import | |||
The same IP-address-specific import route-target is auto- | Route Target is auto-configured by the AR-REPLICATOR that sent the | |||
configured by the AR-REPLICATOR that sent the Replicator-AR | Replicator-AR route, in order to control the acceptance of the | |||
route, in order to control the acceptance of the Leaf Auto- | Leaf A-D routes. | |||
Discovery routes. | ||||
o The Leaf Auto-Discovery route MUST include the PMSI Tunnel | * The Leaf A-D route MUST include the PMSI Tunnel Attribute with | |||
attribute with the Tunnel Type set to AR (Section 11), T (AR | Tunnel Type set to Assisted Replication Tunnel (Section 11), T | |||
role type) set to AR-LEAF and the Tunnel Identifier set to the | (Assisted Replication type) set to AR-LEAF, and Tunnel Identifier | |||
IP address of the advertising AR-LEAF. The PMSI Tunnel | set to the IP address of the advertising AR-LEAF. The PMSI Tunnel | |||
attribute MUST carry a downstream-assigned MPLS label or VNI | Attribute MUST carry a downstream-assigned MPLS label or VNI that | |||
that is used by the AR-REPLICATOR to send traffic to the AR- | is used by the AR-REPLICATOR to send traffic to the AR-LEAF. | |||
LEAF. | ||||
Each AR-enabled node understands and process the T (Assisted- | Each AR-enabled node understands and processes the T (Assisted | |||
Replication type) field in the PMSI Tunnel Attribute (Flags field) of | Replication type) field in the PMSI Tunnel Attribute (Flags field) of | |||
the routes, and MUST signal the corresponding type (AR-REPLICATOR or | the routes and MUST signal the corresponding type (AR-REPLICATOR or | |||
AR-LEAF type) according to its administrative choice. An NVE/PE | AR-LEAF type) according to its administrative choice. An NVE/PE | |||
following this specification is not expected to set the Assisted- | following this specification is not expected to set the Assisted | |||
Replication Type field to decimal 3 (which is a RESERVED value). If | Replication Type field to decimal 3 (which is a RESERVED value). If | |||
a route with the AR type field set to decimal 3 is received by an AR- | a route with the Assisted Replication Type field set to decimal 3 is | |||
REPLICATOR or AR-LEAF, the router will process the route as a | received by an AR-REPLICATOR or AR-LEAF, the router will process the | |||
Regular-IR route advertised by an RNVE. | route as a Regular-IR route advertised by an RNVE. | |||
Each node attached to the BD may understand and process the BM/U | Each node attached to the BD may understand and process the BM/U | |||
flags (Pruned-Flood-Lists flags). Note that these BM/U flags may be | flags (PFLs flags). Note that these BM/U flags may be used to | |||
used to optimize the delivery of multi-destination traffic and their | optimize the delivery of multi-destination traffic; their use SHOULD | |||
use SHOULD be an administrative choice, and independent of the AR | be an administrative choice and independent of the AR role. When the | |||
role. When the Pruned-Flood-List capability is enabled, the BM/U | PFL capability is enabled, the BM/U flags can be used with the | |||
flags can be used with the Regular-IR, Replicator-AR and Leaf Auto- | Regular-IR, Replicator-AR, and Leaf A-D routes. | |||
Discovery routes. | ||||
Non-optimized Ingress Replication NVEs/PEs will be unaware of the new | Non-optimized ingress replication NVEs/PEs will be unaware of the new | |||
PMSI Tunnel Attribute flag definition as well as the new Tunnel Type | PMSI Tunnel Attribute flag definition as well as the new tunnel type | |||
(AR), i.e., non-upgraded NVEs/PEs will ignore the information | (AR), i.e., non-upgraded NVEs/PEs will ignore the information | |||
contained in the flags field or an unknown Tunnel Type (type AR in | contained in the Flags field or an unknown tunnel type (type AR in | |||
this case) for any Inclusive Multicast Ethernet Tag route. | this case) for any Inclusive Multicast Ethernet Tag route. | |||
5. Non-Selective Assisted-Replication (AR) Solution Description | 5. Non-selective Assisted Replication (AR) Solution Description | |||
Figure 4 illustrates an example Network Virtualization Overlay | Figure 4 illustrates an example NVO network where the non-selective | |||
network where the non-selective AR function is enabled. Three | AR function is enabled. Three different roles are defined for a | |||
different roles are defined for a given BD: AR-REPLICATOR, AR-LEAF | given BD: AR-REPLICATOR, AR-LEAF, and RNVE. The solution is called | |||
and RNVE (Regular NVE). The solution is called "non-selective" | "non-selective" because the chosen AR-REPLICATOR for a given flow | |||
because the chosen AR-REPLICATOR for a given flow MUST replicate the | MUST replicate the BM traffic to all the NVEs/PEs in the BD except | |||
BM traffic to all the NVE/PEs in the BD except for the source NVE/PE. | for the source NVE/PE. NVO tunnels, i.e., IP tunnels, exist among | |||
Network Virtualization Overlay tunnels, i.e., IP tunnels, exist among | ||||
all the PEs and NVEs in the diagram. The PEs and NVEs in the diagram | all the PEs and NVEs in the diagram. The PEs and NVEs in the diagram | |||
have Tenant Systems or Virtual Machines connected to their Attachment | have TSs or VMs connected to their ACs. | |||
Circuits. | ||||
( ) | ( ) | |||
(_ WAN _) | (_ WAN _) | |||
+---(_ _)----+ | +---(_ _)----+ | |||
| (_ _) | | | (_ _) | | |||
PE1 | PE2 | | PE1 | PE2 | | |||
+------+----+ +----+------+ | +------+----+ +----+------+ | |||
TS1--+ (BD-1) | | (BD-1) +--TS2 | TS1--+ (BD-1) | | (BD-1) +--TS2 | |||
|REPLICATOR | |REPLICATOR | | |REPLICATOR | |REPLICATOR | | |||
+--------+--+ +--+--------+ | +--------+--+ +--+--------+ | |||
| | | | | | |||
+--+----------------+--+ | +--+----------------+--+ | |||
| | | | | | |||
| | | | | | |||
+----+ VXLAN/nvGRE/MPLSoGRE +----+ | +----+ VXLAN/NVGRE/MPLSoGRE +----+ | |||
| | IP Fabric | | | | | IP Fabric | | | |||
| | | | | | | | | | |||
NVE1 | +-----------+----------+ | NVE3 | NVE1 | +-----------+----------+ | NVE3 | |||
Hypervisor| TOR | NVE2 |Hypervisor | Hypervisor| ToR | NVE2 |Hypervisor | |||
+---------+-+ +-----+-----+ +-+---------+ | +---------+-+ +-----+-----+ +-+---------+ | |||
| (BD-1) | | (BD-1) | | (BD-1) | | | (BD-1) | | (BD-1) | | (BD-1) | | |||
| LEAF | | RNVE | | LEAF | | | LEAF | | RNVE | | LEAF | | |||
+--+-----+--+ +--+-----+--+ +--+-----+--+ | +--+-----+--+ +--+-----+--+ +--+-----+--+ | |||
| | | | | | | | | | | | | | |||
VM11 VM12 TS3 TS4 VM31 VM32 | VM11 VM12 TS3 TS4 VM31 VM32 | |||
Figure 4: Non-Selective AR scenario | Figure 4: Non-selective AR Scenario | |||
In AR BDs such as BD-1 in the example, BM (Broadcast and Multicast) | In AR BDs, such as BD-1 in Figure 4, BM traffic between two NVEs may | |||
traffic between two NVEs may follow a different path than unicast | follow a different path than unicast traffic. This solution | |||
traffic. This solution recommends the replication of BM through the | recommends the replication of BM traffic through the AR-REPLICATOR | |||
AR-REPLICATOR node, whereas unknown/known unicast will be delivered | node, whereas unknown/known unicast traffic will be delivered | |||
directly from the source node to the destination node without being | directly from the source node to the destination node without being | |||
replicated by any intermediate node. | replicated by any intermediate node. | |||
Note that known unicast forwarding is not impacted by this solution, | Note that known unicast forwarding is not impacted by this solution, | |||
i.e., unknown unicast SHALL follow the same path as known unicast | i.e., unknown unicast traffic SHALL follow the same path as known | |||
traffic. | unicast traffic. | |||
5.1. Non-selective AR-REPLICATOR Procedures | 5.1. Non-selective AR-REPLICATOR Procedures | |||
An AR-REPLICATOR is defined as an NVE/PE capable of replicating | An AR-REPLICATOR is defined as an NVE/PE capable of replicating | |||
incoming BM traffic received on an overlay tunnel to other overlay | incoming BM traffic received on an overlay tunnel to other overlay | |||
tunnels and local Attachment Circuits. The AR-REPLICATOR signals its | tunnels and local ACs. The AR-REPLICATOR signals its role in the | |||
role in the control plane and understands where the other roles (AR- | control plane and understands where the other roles (AR-LEAF nodes, | |||
LEAF nodes, RNVEs and other AR-REPLICATORs) are located. A given AR- | RNVEs, and other AR-REPLICATORs) are located. A given AR-enabled BD | |||
enabled BD service may have zero, one or more AR-REPLICATORs. In our | service may have zero, one, or more AR-REPLICATORs. In our example | |||
example in Figure 4, PE1 and PE2 are defined as AR-REPLICATORs. The | in Figure 4, PE1 and PE2 are defined as AR-REPLICATORs. The | |||
following considerations apply to the AR-REPLICATOR role: | following considerations apply to the AR-REPLICATOR role: | |||
a. The AR-REPLICATOR role SHOULD be an administrative choice in any | a. The AR-REPLICATOR role SHOULD be an administrative choice in any | |||
NVE/PE that is part of an AR-enabled BD. This administrative | NVE/PE that is part of an AR-enabled BD. This administrative | |||
option to enable AR-REPLICATOR capabilities MAY be implemented as | option to enable AR-REPLICATOR capabilities MAY be implemented as | |||
a system level option as opposed to as a per-BD option. | a system-level option as opposed to a per-BD option. | |||
b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY | b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY | |||
advertise a Regular-IR route. The AR-REPLICATOR MUST NOT | advertise a Regular-IR route. The AR-REPLICATOR MUST NOT | |||
generate a Regular-IR route if it does not have local attachment | generate a Regular-IR route if it does not have local ACs. If | |||
circuits (AC). If the Regular-IR route is advertised, the | the Regular-IR route is advertised, the Assisted Replication Type | |||
Assisted-Replication Type field of the Regular-IR route MUST be | field of the Regular-IR route MUST be set to 0. | |||
set to zero. | ||||
c. The Replicator-AR and Regular-IR routes are generated according | c. The Replicator-AR and Regular-IR routes are generated according | |||
to Section 4. The AR-IP and IR-IP are different IP addresses | to Section 4. The AR-IP and IR-IP are different IP addresses | |||
owned by the AR-REPLICATOR. | owned by the AR-REPLICATOR. | |||
d. When a node defined as AR-REPLICATOR receives a BM packet on an | d. When a node defined as an AR-REPLICATOR receives a BM packet on | |||
overlay tunnel, it will do a tunnel destination IP address lookup | an overlay tunnel, it will do a tunnel destination IP address | |||
and apply the following procedures: | lookup and apply the following procedures: | |||
o If the destination IP address is the AR-REPLICATOR IR-IP | * If the destination IP address is the AR-REPLICATOR IR-IP | |||
Address the node will process the packet normally as in | address, the node will process the packet normally as | |||
[RFC7432]. | discussed in [RFC7432]. | |||
o If the destination IP address is the AR-REPLICATOR AR-IP | * If the destination IP address is the AR-REPLICATOR AR-IP | |||
Address the node MUST replicate the packet to local Attachment | address, the node MUST replicate the packet to local ACs and | |||
Circuits and overlay tunnels (excluding the overlay tunnel to | overlay tunnels (excluding the overlay tunnel to the source of | |||
the source of the packet). When replicating to remote AR- | the packet). When replicating to remote AR-REPLICATORs, the | |||
REPLICATORs the tunnel destination IP address will be an IR- | tunnel destination IP address will be an IR-IP. This will | |||
IP. That will be an indication for the remote AR-REPLICATOR | indicate to the remote AR-REPLICATOR that it MUST NOT | |||
that it MUST NOT replicate to overlay tunnels. The tunnel | replicate to overlay tunnels. The tunnel source IP address | |||
source IP address used by the AR-REPLICATOR MUST be its IR-IP | used by the AR-REPLICATOR MUST be its IR-IP when replicating | |||
when replicating to AR-REPLICATOR or AR-LEAF nodes. | to AR-REPLICATOR or AR-LEAF nodes. | |||
An AR-REPLICATOR MUST follow a data path implementation compatible | An AR-REPLICATOR MUST follow a data path implementation compatible | |||
with the following rules: | with the following rules: | |||
- The AR-REPLICATORs will build a flooding list composed of | * The AR-REPLICATORs will build a flooding list composed of ACs and | |||
Attachment Circuits and overlay tunnels to remote nodes in the BD. | overlay tunnels to remote nodes in the BD. Some of those overlay | |||
Some of those overlay tunnels MAY be flagged as non-BM receivers | tunnels MAY be flagged as non-BM receivers based on the BM flag | |||
based on the BM flag received from the remote nodes in the BD. | received from the remote nodes in the BD. | |||
- When an AR-REPLICATOR receives a BM packet on an Attachment | * When an AR-REPLICATOR receives a BM packet on an AC, it will | |||
Circuit, it will forward the BM packet to its flooding list | forward the BM packet to its flooding list (including local ACs | |||
(including local Attachment Circuits and remote NVE/PEs), skipping | and remote NVEs/PEs), skipping the non-BM overlay tunnels. | |||
the non-BM overlay tunnels. | ||||
- When an AR-REPLICATOR receives a BM packet on an overlay tunnel, | * When an AR-REPLICATOR receives a BM packet on an overlay tunnel, | |||
it will check the destination IP address of the underlay IP header | it will check the destination IP address of the underlay IP header | |||
and: | and: | |||
o If the destination IP address matches its IR-IP, the AR- | - If the destination IP address matches its IR-IP, the AR- | |||
REPLICATOR will skip all the overlay tunnels from the flooding | REPLICATOR will skip all the overlay tunnels from the flooding | |||
list, i.e. it will only replicate to local Attachment Circuits. | list, i.e., it will only replicate to local ACs. This is the | |||
This is the regular Ingress Replication behavior described in | regular ingress replication behavior described in [RFC7432]. | |||
[RFC7432]. | ||||
o If the destination IP address matches its AR-IP, the AR- | - If the destination IP address matches its AR-IP, the AR- | |||
REPLICATOR MUST forward the BM packet to its flooding list (ACs | REPLICATOR MUST forward the BM packet to its flooding list (ACs | |||
and overlay tunnels) excluding the non-BM overlay tunnels. The | and overlay tunnels), excluding the non-BM overlay tunnels. | |||
AR-REPLICATOR will ensure the traffic is not sent back to the | The AR-REPLICATOR will ensure that the traffic is not sent back | |||
originating AR-LEAF. | to the originating AR-LEAF. | |||
o If the encapsulation is MPLSoGRE or MPLSoUDP and the received | - If the encapsulation is MPLSoGRE or MPLSoUDP and the received | |||
BD label that the AR-REPLICATOR advertised in the Replicator-AR | BD label that the AR-REPLICATOR advertised in the Replicator-AR | |||
route is not the bottom of the stack, the AR-REPLICATOR MUST | route is not at the bottom of the stack, the AR-REPLICATOR MUST | |||
copy the all the labels below the BD label and propagate them | copy all the labels below the BD label and propagate them when | |||
when forwarding the packet to the egress overlay tunnels. | forwarding the packet to the egress overlay tunnels. | |||
- The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood- | * The AR-REPLICATOR/LEAF nodes will build an unknown unicast | |||
list composed of Attachment Circuits and overlay tunnels to the | flooding list composed of ACs and overlay tunnels to the IR-IP | |||
IR-IP Addresses of the remote nodes in the BD. Some of those | addresses of the remote nodes in the BD. Some of those overlay | |||
overlay tunnels MAY be flagged as non-U (Unknown unicast) | tunnels MAY be flagged as non-U (unknown unicast) receivers based | |||
receivers based on the U flag received from the remote nodes in | on the U flag received from the remote nodes in the BD. | |||
the BD. | ||||
o When an AR-REPLICATOR/LEAF receives an unknown unicast packet | - When an AR-REPLICATOR/LEAF receives an unknown unicast packet | |||
on an Attachment Circuit, it will forward the unknown unicast | on an AC, it will forward the unknown unicast packet to its | |||
packet to its flood-list, skipping the non-U overlay tunnels. | flooding list, skipping the non-U overlay tunnels. | |||
o When an AR-REPLICATOR/LEAF receives an unknown unicast packet | - When an AR-REPLICATOR/LEAF receives an unknown unicast packet | |||
on an overlay tunnel, it will forward the unknown unicast | on an overlay tunnel, it will forward the unknown unicast | |||
packet to its local Attachment Circuits and never to an overlay | packet to its local ACs and never to an overlay tunnel. This | |||
tunnel. This is the regular Ingress Replication behavior | is the regular ingress replication behavior described in | |||
described in [RFC7432]. | [RFC7432]. | |||
5.2. Non-Selective AR-LEAF Procedures | 5.2. Non-selective AR-LEAF Procedures | |||
AR-LEAF is defined as an NVE/PE that - given its poor replication | An AR-LEAF is defined as an NVE/PE that, given its poor replication | |||
performance - sends all the BM traffic to an AR-REPLICATOR that can | performance, sends all the BM traffic to an AR-REPLICATOR that can | |||
replicate the traffic further on its behalf. It MAY signal its AR- | replicate the traffic further on its behalf. It MAY signal its AR- | |||
LEAF capability in the control plane and understands where the other | LEAF capability in the control plane and understands where the other | |||
roles are located (AR-REPLICATOR and RNVEs). A given service can | roles are located (AR-REPLICATORs and RNVEs). A given service can | |||
have zero, one or more AR-LEAF nodes. Figure 4 shows NVE1 and NVE3 | have zero, one, or more AR-LEAF nodes. In Figure 4, NVE1 and NVE3 | |||
(both residing in hypervisors) acting as AR-LEAF. The following | (both residing in hypervisors) act as AR-LEAF nodes. The following | |||
considerations apply to the AR-LEAF role: | considerations apply to the AR-LEAF role: | |||
a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE | a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE | |||
that is part of an AR-enabled BD. This administrative option to | that is part of an AR-enabled BD. This administrative option to | |||
enable AR-LEAF capabilities MAY be implemented as a system level | enable AR-LEAF capabilities MAY be implemented as a system-level | |||
option as opposed to as per-BD option. | option as opposed to a per-BD option. | |||
b. In this non-selective AR solution, the AR-LEAF MUST advertise a | b. In this non-selective AR solution, the AR-LEAF MUST advertise a | |||
single Regular-IR inclusive multicast route as in [RFC7432]. The | single Regular-IR Inclusive Multicast Ethernet Tag route as | |||
AR-LEAF SHOULD set the Assisted-Replication Type field to AR- | described in [RFC7432]. The AR-LEAF SHOULD set the Assisted | |||
LEAF. Note that although this field does not make any difference | Replication Type field to AR-LEAF. Note that although this field | |||
for the remote nodes when creating an EVPN destination to the AR- | does not affect the remote nodes when creating an EVPN | |||
LEAF, this field is useful for an easy operation and | destination to the AR-LEAF, this field is useful from the | |||
troubleshooting of the BD. | standpoint of ease of operation and troubleshooting of the BD. | |||
c. In a BD where there are no AR-REPLICATORs due to the AR- | c. In a BD where there are no AR-REPLICATORs due to the AR- | |||
REPLICATORs being down or reconfigured, the AR-LEAF MUST use | REPLICATORs being down or reconfigured, the AR-LEAF MUST use | |||
regular Ingress Replication, based on the remote Regular-IR | regular ingress replication based on the remote Regular-IR | |||
Inclusive Multicast Routes as described in [RFC7432]. This may | Inclusive Multicast Ethernet Tag routes as described in | |||
happen in the following cases: | [RFC7432]. This may happen in the following cases: | |||
o The AR-LEAF has a list of AR-REPLICATORs for the BD, but it | * The AR-LEAF has a list of AR-REPLICATORs for the BD, but it | |||
detects that all the AR-REPLICATORs for the BD are down (via | detects that all the AR-REPLICATORs for the BD are down (via | |||
next-hop tracking in the IGP or any other detection | next-hop tracking in the IGP or some other detection | |||
mechanism). | mechanism). | |||
o The AR-LEAF receives updates from all the former AR- | * The AR-LEAF receives updates from all the former AR- | |||
REPLICATORs containing a non-REPLICATOR AR type in the | REPLICATORs containing a non-REPLICATOR AR type in the | |||
Inclusive Multicast Etherner Tag routes. | Inclusive Multicast Ethernet Tag routes. | |||
o The AR-LEAF never discovered an AR-REPLICATOR for the BD. | * The AR-LEAF never discovered an AR-REPLICATOR for the BD. | |||
d. In a service where there is one or more AR-REPLICATORs (based on | d. In a service where there are one or more AR-REPLICATORs (based on | |||
the received Replicator-AR routes for the BD), the AR-LEAF can | the received Replicator-AR routes for the BD), the AR-LEAF can | |||
locally select which AR-REPLICATOR it sends the BM traffic to: | locally select which AR-REPLICATOR it sends the BM traffic to: | |||
o A single AR-REPLICATOR MAY be selected for all the BM packets | * A single AR-REPLICATOR MAY be selected for all the BM packets | |||
received on the AR-LEAF attachment circuits (ACs) for a given | received on the AR-LEAF ACs for a given BD. This selection is | |||
BD. This selection is a local decision and it does not have | a local decision and does not have to match other AR-LEAFs' | |||
to match other AR-LEAFs' selections within the same BD. | selections within the same BD. | |||
o An AR-LEAF MAY select more than one AR-REPLICATOR and do | * An AR-LEAF MAY select more than one AR-REPLICATOR and do | |||
either per-flow or per-BD load balancing. | either per-flow or per-BD load balancing. | |||
o In case of a failure of the selected AR-REPLICATOR, another | * In the case of failure of the selected AR-REPLICATOR, another | |||
AR-REPLICATOR SHOULD be selected by the AR-LEAF. | AR-REPLICATOR SHOULD be selected by the AR-LEAF. | |||
o When an AR-REPLICATOR is selected for a given flow or BD, the | * When an AR-REPLICATOR is selected for a given flow or BD, the | |||
AR-LEAF MUST send all the BM packets targeted to that AR- | AR-LEAF MUST send all the BM packets targeted to that AR- | |||
REPLICATOR using the forwarding information given by the | REPLICATOR using the forwarding information given by the | |||
Replicator-AR route for the chosen AR-REPLICATOR, with tunnel | Replicator-AR route for the chosen AR-REPLICATOR, with Tunnel | |||
type = 0x0A (AR tunnel). The underlay destination IP address | Type = 0x0A (AR tunnel). The underlay destination IP address | |||
MUST be the AR-IP advertised by the AR-REPLICATOR in the | MUST be the AR-IP advertised by the AR-REPLICATOR in the | |||
Replicator-AR route. | Replicator-AR route. | |||
o An AR-LEAF MAY change the AR-REPLICATOR(s) selection | * An AR-LEAF MAY change the selection of AR-REPLICATOR(s) | |||
dynamically, due to an administrative or policy configuration | dynamically due to an administrative or policy configuration | |||
change. | change. | |||
o AR-LEAF nodes SHALL send service-level BM control plane | * AR-LEAF nodes SHALL send service-level BM control plane | |||
packets following regular Ingress Replication procedures. An | packets, following the procedures for regular ingress | |||
example would be IGMP, MLD or PIM multicast packets, and in | replication. An example would be IGMP, Multicast Listener | |||
general any packets using link-local scope multicast IPv4 or | Discovery (MLD), or PIM packets, and, in general, any packets | |||
IPv6 packets. The AR-REPLICATORs MUST NOT replicate these | using link-local scope multicast IPv4 or IPv6 packets. The | |||
control plane packets to other overlay tunnels since they will | AR-REPLICATORs MUST NOT replicate these control plane packets | |||
use the regular IR-IP Address. | to other overlay tunnels, since they will use the IR-IP | |||
address. | ||||
e. The use of an AR-REPLICATOR-activation-timer (in seconds, default | e. The use of an AR-REPLICATOR-activation-timer (in seconds, with a | |||
value is 3) on the AR-LEAF nodes is RECOMMENDED. Upon receiving | default value of 3) on the AR-LEAF nodes is RECOMMENDED. Upon | |||
a new Replicator-AR route where the AR-REPLICATOR is selected, | receiving a new Replicator-AR route where the AR-REPLICATOR is | |||
the AR-LEAF will run a timer before programming the new AR- | selected, the AR-LEAF will run a timer before programming the new | |||
REPLICATOR. In case of a new added AR-REPLICATOR, or in case the | AR-REPLICATOR. In the case of a newly added AR-REPLICATOR or if | |||
AR-REPLICATOR reboots, this timer will give the AR-REPLICATOR | an AR-REPLICATOR reboots, this timer will give the AR-REPLICATOR | |||
some time to program the AR-LEAF nodes before the AR-LEAF sends | some time to program the AR-LEAF nodes before the AR-LEAF sends | |||
BM traffic. The AR-REPLICATOR-activation-timer SHOULD be | BM traffic. The AR-REPLICATOR-activation-timer SHOULD be | |||
configurable in seconds, and its value account for the time it | configurable in seconds, and its value needs to account for the | |||
takes for the AR-LEAF Regular-IR inclusive multicast route to get | time it takes for the AR-LEAF Regular-IR Inclusive Multicast | |||
to the AR-REPLICATOR and be programmed. While the AR-REPLICATOR- | Ethernet Tag route to get to the AR-REPLICATOR and be programmed. | |||
activation-time is running, the AR-LEAF node will use regular | While the AR-REPLICATOR-activation-timer is running, the AR-LEAF | |||
ingress replication. | node will use regular ingress replication. | |||
f. If the AR-LEAF has selected an AR-REPLICATOR, it is a matter of | f. If the AR-LEAF has selected an AR-REPLICATOR, whether or not to | |||
local policy to change to a new preferred AR-REPLICATOR for the | change to a new preferred AR-REPLICATOR for the existing BM | |||
existing BM traffic flows. | traffic flows is a matter of local policy. | |||
An AR-LEAF MUST follow a data path implementation compatible with the | An AR-LEAF MUST follow a data path implementation compatible with the | |||
following rules: | following rules: | |||
- The AR-LEAF nodes will build two flood-lists: | * The AR-LEAF nodes will build two flooding lists: | |||
1. Flood-list #1 - composed of Attachment Circuits and an AR- | Flooding list #1: Composed of ACs and an AR-REPLICATOR-set of | |||
REPLICATOR-set of overlay tunnels. The AR-REPLICATOR-set is | overlay tunnels. The AR-REPLICATOR-set is defined as one or | |||
defined as one or more overlay tunnels to the AR-IP Addresses | more overlay tunnels to the AR-IP addresses of the remote AR- | |||
of the remote AR-REPLICATOR(s) in the BD. The selection of | REPLICATOR(s) in the BD. The selection of more than one AR- | |||
more than one AR-REPLICATOR is described in point d) above and | REPLICATOR is described in item d. above and is a local AR-LEAF | |||
it is a local AR-LEAF decision. | decision. | |||
2. Flood-list #2 - composed of Attachment Circuits and overlay | Flooding list #2: Composed of ACs and overlay tunnels to the | |||
tunnels to the remote IR-IP Addresses. | remote IR-IP addresses. | |||
- When an AR-LEAF receives a BM packet on an Attachment Circuit, it | * When an AR-LEAF receives a BM packet on an AC, it will check the | |||
will check the AR-REPLICATOR-set: | AR-REPLICATOR-set: | |||
o If the AR-REPLICATOR-set is empty, the AR-LEAF MUST send the | - If the AR-REPLICATOR-set is empty, the AR-LEAF MUST send the | |||
packet to flood-list #2. | packet to flooding list #2. | |||
o If the AR-REPLICATOR-set is NOT empty, the AR-LEAF MUST send | - If the AR-REPLICATOR-set is NOT empty, the AR-LEAF MUST send | |||
the packet to flood-list #1, where only one of the overlay | the packet to flooding list #1, where only one of the overlay | |||
tunnels of the AR-REPLICATOR-set is used. | tunnels of the AR-REPLICATOR-set is used. | |||
- When an AR-LEAF receives a BM packet on an overlay tunnel, it will | * When an AR-LEAF receives a BM packet on an overlay tunnel, it will | |||
forward the BM packet to its local Attachment Circuits and never | forward the BM packet to its local ACs and never to an overlay | |||
to an overlay tunnel. This is the regular Ingress Replication | tunnel. This is the regular ingress replication behavior | |||
behavior described in [RFC7432]. | described in [RFC7432]. | |||
- AR-LEAF nodes process Unknown unicast traffic in the same way AR- | * AR-LEAF nodes process unknown unicast traffic in the same way AR- | |||
REPLICATORS do, as described in Section 5.1. | REPLICATORS do, as described in Section 5.1. | |||
5.3. RNVE Procedures | 5.3. RNVE Procedures | |||
RNVE (Regular Network Virtualization Edge node) is defined as an NVE/ | An RNVE is defined as an NVE/PE without AR-REPLICATOR or AR-LEAF | |||
PE without AR-REPLICATOR or AR-LEAF capabilities that does Ingress | capabilities that does ingress replication as described in [RFC7432]. | |||
Replication as described in [RFC7432]. The RNVE does not signal any | The RNVE does not signal any AR role and is unaware of the AR- | |||
AR role and is unaware of the AR-REPLICATOR/LEAF roles in the BD. | REPLICATOR/LEAF roles in the BD. The RNVE will ignore the flags in | |||
The RNVE will ignore the Flags in the Regular-IR routes and will | the Regular-IR routes and will ignore the Replicator-AR routes (due | |||
ignore the Replicator-AR routes (due to an unknown tunnel type in the | to an unknown tunnel type in the PMSI Tunnel Attribute) and the Leaf | |||
PMSI Tunnel Attribute) and the Leaf Auto-Discovery routes (due to the | A-D routes (due to the IP-address-specific Route Target). | |||
IP-address-specific route-target). | ||||
This role provides EVPN with the backwards compatibility required in | This role provides EVPNs with the backward compatibility required in | |||
optimized Ingress Replication BDs. Figure 4 shows NVE2 as RNVE. | optimized ingress replication BDs. In Figure 4, NVE2 acts as an | |||
RNVE. | ||||
6. Selective Assisted-Replication (AR) Solution Description | 6. Selective Assisted Replication (AR) Solution Description | |||
Figure 5 is used to describe the selective AR solution. | Figure 5 is used to describe the selective AR solution. | |||
( ) | ( ) | |||
(_ WAN _) | (_ WAN _) | |||
+---(_ _)----+ | +---(_ _)----+ | |||
| (_ _) | | | (_ _) | | |||
PE1 | PE2 | | PE1 | PE2 | | |||
+------+----+ +----+------+ | +------+----+ +----+------+ | |||
TS1--+ (BD-1) | | (BD-1) +--TS2 | TS1--+ (BD-1) | | (BD-1) +--TS2 | |||
|REPLICATOR | |REPLICATOR | | |REPLICATOR | |REPLICATOR | | |||
+--------+--+ +--+--------+ | +--------+--+ +--+--------+ | |||
| | | | | | |||
+--+----------------+--+ | +--+----------------+--+ | |||
| | | | | | |||
| | | | | | |||
+----+ VXLAN/nvGRE/MPLSoGRE +----+ | +----+ VXLAN/NVGRE/MPLSoGRE +----+ | |||
| | IP Fabric | | | | | IP Fabric | | | |||
| | | | | | | | | | |||
NVE1 | +-----------+----------+ | NVE3 | NVE1 | +-----------+----------+ | NVE3 | |||
Hypervisor| TOR | NVE2 |Hypervisor | Hypervisor| ToR | NVE2 |Hypervisor | |||
+---------+-+ +-----+-----+ +-+---------+ | +---------+-+ +-----+-----+ +-+---------+ | |||
| (BD-1) | | (BD-1) | | (BD-1) | | | (BD-1) | | (BD-1) | | (BD-1) | | |||
| LEAF-set1 | |LEAF-set-1 | |LEAF-set-2 | | |LEAF-set-1 | |LEAF-set-1 | |LEAF-set-2 | | |||
+--+-----+--+ +--+-----+--+ +--+-----+--+ | +--+-----+--+ +--+-----+--+ +--+-----+--+ | |||
| | | | | | | | | | | | | | |||
VM11 VM12 TS3 TS4 VM31 VM32 | VM11 VM12 TS3 TS4 VM31 VM32 | |||
Figure 5: Selective AR scenario | Figure 5: Selective AR Scenario | |||
The solution is called "selective" because a given AR-REPLICATOR MUST | The solution is called "selective" because a given AR-REPLICATOR MUST | |||
replicate the BM traffic to only the AR-LEAFs that requested the | replicate the BM traffic to only the AR-LEAFs that requested the | |||
replication (as opposed to all the AR-LEAF nodes) and MUST replicate | replication (as opposed to all the AR-LEAF nodes) and MUST replicate | |||
the BM traffic to the RNVEs (if there are any). The same AR roles | the BM traffic to the RNVEs (if there are any). The same AR roles as | |||
defined in Section 4 are used here, however the procedures are | those defined in Sections 4 and 5 are used here; however, the | |||
different. | procedures are different. | |||
The Selective AR procedures create multiple AR-LEAF-sets in the EVPN | The selective AR procedures create multiple AR-LEAF-sets in the EVPN | |||
BD, and build single-hop trees among AR-LEAFs of the same set (AR- | BD and build single-hop trees among AR-LEAFs of the same set (AR- | |||
LEAF->AR-REPLICATOR->AR-LEAF), and two-hop trees among AR-LEAFs of | LEAF->AR-REPLICATOR->AR-LEAF) and two-hop trees among AR-LEAFs of | |||
different sets (AR-LEAF->AR-REPLICATOR->AR-REPLICATOR->AR-LEAF). | different sets (AR-LEAF->AR-REPLICATOR->AR-REPLICATOR->AR-LEAF). | |||
Compared to the Selective solution, the Non-Selective AR method | Compared to the selective solution, the non-selective AR method | |||
assumes that all the AR-LEAFs of the BD are in the same set and | assumes that all the AR-LEAFs of the BD are in the same set and | |||
always creates two-hop trees among AR-LEAFs. While the Selective | always creates single-hop trees among AR-LEAFs. While the selective | |||
solution is more efficient than the Non-Selective solution in multi- | solution is more efficient than the non-selective solution in multi- | |||
stage IP fabrics, the trade-off is additional signaling and an | stage IP fabrics, the trade-off is additional signaling and an | |||
additional outer source IP address lookup. | additional outer source IP address lookup. | |||
The following sub-sections describe the differences in the procedures | The following subsections describe the differences in the procedures | |||
of AR-REPLICATOR/LEAFs compared to the non-selective AR solution. | for AR-REPLICATORs/LEAFs compared to the non-selective AR solution. | |||
There is no change on the RNVEs. | There are no changes applicable to RNVEs. | |||
6.1. Selective AR-REPLICATOR Procedures | 6.1. Selective AR-REPLICATOR Procedures | |||
In our example in Figure 5, PE1 and PE2 are defined as Selective AR- | In our example in Figure 5, PE1 and PE2 are defined as selective AR- | |||
REPLICATORs. The following considerations apply to the Selective AR- | REPLICATORs. The following considerations apply to the selective AR- | |||
REPLICATOR role: | REPLICATOR role: | |||
a. The Selective AR-REPLICATOR capability SHOULD be an | a. The selective AR-REPLICATOR role SHOULD be an administrative | |||
administrative choice in any NVE/PE that is part of an Assisted- | choice in any NVE/PE that is part of an AR-enabled BD. This | |||
Replication-enabled BD, as the AR role itself. This | administrative option MAY be implemented as a system-level option | |||
administrative option MAY be implemented as a system level option | as opposed to a per-BD option. | |||
as opposed to as a per-BD option. | ||||
b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF | b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF, | |||
and RNVE nodes. In spite of the 'Selective' administrative | and RNVE nodes. In spite of the "selective" administrative | |||
option, an AR-REPLICATOR MUST NOT behave as a Selective AR- | option, an AR-REPLICATOR MUST NOT behave as a selective AR- | |||
REPLICATOR if at least one of the AR-REPLICATORs has the L flag | REPLICATOR if at least one of the AR-REPLICATORs has the L flag | |||
NOT set. If at least one AR-REPLICATOR sends a Replicator-AR | NOT set. If at least one AR-REPLICATOR sends a Replicator-AR | |||
route with L=0 (in the BD context), the rest of the AR- | route with L = 0 (in the BD context), the rest of the AR- | |||
REPLICATORs will fall back to non-selective AR mode. | REPLICATORs will fall back to non-selective AR mode. | |||
c. The Selective AR-REPLICATOR MUST follow the procedures described | c. The selective AR-REPLICATOR MUST follow the procedures described | |||
in Section 5.1, except for the following differences: | in Section 5.1, except for the following differences: | |||
o The Replicator-AR route MUST include L=1 (Leaf Information | * The AR-REPLICATOR MUST have the L flag set to 1 when | |||
Required) in the Replicator-AR route. This flag is used by | advertising the Replicator-AR route. This flag is used by the | |||
the AR-REPLICATORs to advertise their 'selective' AR- | AR-REPLICATORs to advertise their "selective" AR-REPLICATOR | |||
REPLICATOR capabilities. In addition, the AR-REPLICATOR auto- | capabilities. In addition, the AR-REPLICATOR auto-configures | |||
configures its IP-address-specific import route-target as | its IP-address-specific import Route Target as described in | |||
described in the third bullet of the procedures for Leaf Auto- | the third bullet of the procedures for Leaf A-D routes in | |||
Discovery route in Section 4. | Section 4. | |||
o The AR-REPLICATOR will build a 'selective' AR-LEAF-set with | * The AR-REPLICATOR will build a "selective" AR-LEAF-set with | |||
the list of nodes that requested replication to its own AR-IP. | the list of nodes that requested replication to its own AR-IP. | |||
For instance, assuming NVE1 and NVE2 advertise a Leaf Auto- | For instance, assuming that NVE1 and NVE2 advertise a Leaf A-D | |||
Discovery route with PE1's IP-address-specific route-target | route with PE1's IP-address-specific Route Target and NVE3 | |||
and NVE3 advertises a Leaf Auto-Discovery route with PE2's IP- | advertises a Leaf A-D route with PE2's IP-address-specific | |||
address-specific route-target, PE1 will only add NVE1/NVE2 to | Route Target, PE1 will only add NVE1/NVE2 to its selective AR- | |||
its selective AR-LEAF-set for BD-1, and exclude NVE3. | LEAF-set for BD-1 and exclude NVE3. Likewise, PE2 will only | |||
Likewise, PE2 will only add NVE3 to its selective AR-LEAF-set | add NVE3 to its selective AR-LEAF-set for BD-1 and exclude | |||
for BD-1, and exclude NVE1/NVE2. | NVE1/NVE2. | |||
o When a node defined and operating as a Selective AR-REPLICATOR | * When a node defined and operating as a selective AR-REPLICATOR | |||
receives a packet on an overlay tunnel, it will do a tunnel | receives a packet on an overlay tunnel, it will do a tunnel | |||
destination IP lookup and if the destination IP address is the | destination IP lookup, and if the destination IP address is | |||
AR-REPLICATOR AR-IP Address, the node MUST replicate the | the AR-REPLICATOR AR-IP address, the node MUST replicate the | |||
packet to: | packet to: | |||
+ local Attachment Circuits | - Local ACs. | |||
+ overlay tunnels in the Selective AR-LEAF-set, excluding the | - Overlay tunnels in the selective AR-LEAF-set, excluding the | |||
overlay tunnel to the source AR-LEAF. | overlay tunnel to the source AR-LEAF. | |||
+ overlay tunnels to the RNVEs if the tunnel source IP | - Overlay tunnels to the RNVEs if the tunnel source IP | |||
address is the IR-IP of an AR-LEAF. In any other case, the | address is the IR-IP of an AR-LEAF. In any other case, the | |||
AR-REPLICATOR MUST NOT replicate the BM traffic to remote | AR-REPLICATOR MUST NOT replicate the BM traffic to remote | |||
RNVEs. In other words, only the first-hop selective AR- | RNVEs. In other words, only the first-hop selective AR- | |||
REPLICATOR will replicate to all the RNVEs. | REPLICATOR will replicate to all the RNVEs. | |||
+ overlay tunnels to the remote Selective AR-REPLICATORs if | - Overlay tunnels to the remote selective AR-REPLICATORs if | |||
the tunnel source IP address (of the encapsulated packet | the tunnel source IP address (of the encapsulated packet | |||
that arrived on the overlay tunnel) is an IR-IP of its own | that arrived on the overlay tunnel) is an IR-IP of its own | |||
AR-LEAF-set. In any other case, the AR-REPLICATOR MUST NOT | AR-LEAF-set. In any other case, the AR-REPLICATOR MUST NOT | |||
replicate the BM traffic to remote AR-REPLICATORs. When | replicate the BM traffic to remote AR-REPLICATORs. When | |||
doing this replication, the tunnel destination IP address | doing this replication, the tunnel destination IP address | |||
is the AR-IP of the remote Selective AR-REPLICATOR. The | is the AR-IP of the remote selective AR-REPLICATOR. The | |||
tunnel destination IP AR-IP will be an indication for the | tunnel destination address AR-IP will indicate to the | |||
remote Selective AR-REPLICATOR that the packet needs | remote selective AR-REPLICATOR that the packet needs | |||
further replication to its AR-LEAFs. | further replication to its AR-LEAFs. | |||
A Selective AR-REPLICATOR data path implementation MUST be compatible | A selective AR-REPLICATOR data path implementation MUST be compatible | |||
with the following rules: | with the following rules: | |||
- The Selective AR-REPLICATORs will build two flood-lists: | * The selective AR-REPLICATORs will build two flooding lists: | |||
1. Flood-list #1 - composed of Attachment Circuits and overlay | Flooding list #1: Composed of ACs and overlay tunnels to the | |||
tunnels to the remote nodes in the BD, always using the IR-IPs | remote nodes in the BD, always using the IR-IPs in the tunnel | |||
in the tunnel destination IP addresses. | destination IP addresses. | |||
2. Flood-list #2 - composed of Attachment Circuits, a Selective | Flooding list #2: Composed of ACs, a selective AR-LEAF-set, and a | |||
AR-LEAF-set and a Selective AR-REPLICATOR-set, where: | selective AR-REPLICATOR-set, where: | |||
+ The Selective AR-LEAF-set is composed of the overlay | - The selective AR-LEAF-set is composed of the overlay tunnels | |||
tunnels to the AR-LEAFs that advertise a Leaf Auto- | to the AR-LEAFs that advertise a Leaf A-D route for the | |||
Discovery route for the local AR-REPLICATOR. This set is | local AR-REPLICATOR. This set is updated with every Leaf | |||
updated with every Leaf Auto-Discovery route received/ | A-D route received/withdrawn from a new AR-LEAF. | |||
withdrawn from a new AR-LEAF. | ||||
+ The Selective AR-REPLICATOR-set is composed of the overlay | - The selective AR-REPLICATOR-set is composed of the overlay | |||
tunnels to all the AR-REPLICATORs that send a Replicator-AR | tunnels to all the AR-REPLICATORs that send a Replicator-AR | |||
route with L=1. The AR-IP addresses are used as tunnel | route with L = 1. The AR-IP addresses are used as tunnel | |||
destination IP. | destination IP addresses. | |||
- Some of the overlay tunnels in the flood-lists MAY be flagged as | * Some of the overlay tunnels in the flooding lists MAY be flagged | |||
non-BM receivers based on the BM flag received from the remote | as non-BM receivers based on the BM flag received from the remote | |||
nodes in the routes. | nodes in the routes. | |||
- When a Selective AR-REPLICATOR receives a BM packet on an | * When a selective AR-REPLICATOR receives a BM packet on an AC, it | |||
Attachment Circuit, it MUST forward the BM packet to its flood- | MUST forward the BM packet to its flooding list #1, skipping the | |||
list #1, skipping the non-BM overlay tunnels. | non-BM overlay tunnels. | |||
- When a Selective AR-REPLICATOR receives a BM packet on an overlay | * When a selective AR-REPLICATOR receives a BM packet on an overlay | |||
tunnel, it will check the destination and source IPs of the | tunnel, it will check the destination and source IPs of the | |||
underlay IP header and: | underlay IP header and: | |||
o If the destination IP address matches its AR-IP and the source | - If the destination IP address matches its AR-IP and the source | |||
IP address matches an IP of its own Selective AR-LEAF-set, the | IP address matches an IP of its own selective AR-LEAF-set, the | |||
AR-REPLICATOR MUST forward the BM packet to its flood-list #2, | AR-REPLICATOR MUST forward the BM packet to its flooding list | |||
unless some AR-REPLICATOR within the BD has advertised L=0. In | #2, unless some AR-REPLICATOR within the BD has advertised L = | |||
the latter case, the node reverts back to non-selective mode | 0. In the latter case, the node reverts to Non-selective mode, | |||
and flood-list #1 MUST be used. Non-BM overlay tunnels are | and flooding list #1 MUST be used. Non-BM overlay tunnels are | |||
skipped when sending BM packets. | skipped when sending BM packets. | |||
o If the destination IP address matches its AR-IP and the source | - If the destination IP address matches its AR-IP and the source | |||
IP address does not match any IP address of its Selective AR- | IP address does not match any IP address of its selective AR- | |||
LEAF-set, the AR-REPLICATOR MUST forward the BM packet to | LEAF-set, the AR-REPLICATOR MUST forward the BM packet to | |||
flood-list #2 but skipping the AR-REPLICATOR-set. Non-BM | flooding list #2, skipping the AR-REPLICATOR-set. Non-BM | |||
overlay tunnels are skipped when sending BM packets. | overlay tunnels are skipped when sending BM packets. | |||
o If the destination IP address matches its IR-IP, the AR- | - If the destination IP address matches its IR-IP, the AR- | |||
REPLICATOR MUST use flood-list #1 but MUST skip all the overlay | REPLICATOR MUST use flooding list #1 but MUST skip all the | |||
tunnels from the flooding list, i.e. it will only replicate to | overlay tunnels from the flooding list, i.e., it will only | |||
local Attachment Circuits. This is the regular-IR behavior | replicate to local ACs. This is the regular ingress | |||
described in [RFC7432]. Non-BM overlay tunnels are skipped | replication behavior described in [RFC7432]. Non-BM overlay | |||
when sending BM packets. | tunnels are skipped when sending BM packets. | |||
- In any case, the AR-REPLICATOR ensures the traffic is not sent | * In any case, the AR-REPLICATOR ensures that the traffic is not | |||
back to the originating source. If the encapsulation is MPLSoGRE | sent back to the originating source. If the encapsulation is | |||
or MPLSoUDP and the received BD label (the label that the AR- | MPLSoGRE or MPLSoUDP and the received BD label (the label that the | |||
REPLICATOR advertised in the Replicator-AR route) is not the | AR-REPLICATOR advertised in the Replicator-AR route) is not at the | |||
bottom of the stack, the AR-REPLICATOR MUST copy the rest of the | bottom of the stack, the AR-REPLICATOR MUST copy the rest of the | |||
labels when forwarding them to the egress overlay tunnels. | labels when forwarding them to the egress overlay tunnels. | |||
6.2. Selective AR-LEAF Procedures | 6.2. Selective AR-LEAF Procedures | |||
A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per BD | A selective AR-LEAF chooses a single selective AR-REPLICATOR per BD | |||
and: | and: | |||
- Sends all the BD's BM traffic to that AR-REPLICATOR and | * Sends all the BD's BM traffic to that AR-REPLICATOR and | |||
- Expects to receive all the BM traffic for a given BD from the same | ||||
* Expects to receive all the BM traffic for a given BD from the same | ||||
AR-REPLICATOR (except for the BM traffic from the RNVEs, which | AR-REPLICATOR (except for the BM traffic from the RNVEs, which | |||
comes directly from the RNVEs) | comes directly from the RNVEs) | |||
In the example of Figure 5, we consider NVE1/NVE2/NVE3 as Selective | In the example in Figure 5, we consider NVE1/NVE2/NVE3 as selective | |||
AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that | AR-LEAFs. NVE1 selects PE1 as its selective AR-REPLICATOR. If that | |||
is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other | is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other | |||
AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic | AR-LEAFs/REPLICATORs send BM traffic, NVE1 will receive that traffic | |||
from PE1. These are the differences in the behavior of a Selective | from PE1. A selective AR-LEAF and a non-selective AR-LEAF behave | |||
AR-LEAF compared to a non-selective AR-LEAF: | differently, as follows: | |||
a. The AR-LEAF role selective capability SHOULD be an administrative | a. The selective AR-LEAF role SHOULD be an administrative choice in | |||
choice in any NVE/PE that is part of an Assisted-Replication- | any NVE/PE that is part of an AR-enabled BD. This administrative | |||
enabled BD. This administrative option to enable AR-LEAF | option to enable AR-LEAF capabilities MAY be implemented as a | |||
capabilities MAY be implemented as a system level option as | system-level option as opposed to a per-BD option. | |||
opposed to as per-BD option. | ||||
b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs | b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs | |||
in the BD. The Selective AR-LEAF MUST advertise a Leaf Auto- | in the BD. The selective AR-LEAF MUST advertise a Leaf A-D route | |||
Discovery route after receiving a Replicator-AR route with L=1. | after receiving a Replicator-AR route with L = 1. It is | |||
It is RECOMMENDED that the Selective AR-LEAF waits for an AR- | RECOMMENDED that the selective AR-LEAF wait for a period | |||
LEAF-join-wait-timer (in seconds, default value is 3) before | specified by an AR-LEAF-join-wait-timer (in seconds, with a | |||
sending the Leaf Auto-Discovery route, so that the AR-LEAF can | default value of 3) before sending the Leaf A-D route, so that | |||
collect all the Replicator-AR routes for the BD before | the AR-LEAF can collect all the Replicator-AR routes for the BD | |||
advertising the Leaf Auto-Discovery route. If the Replicator-AR | before advertising the Leaf A-D route. If the Replicator-AR | |||
route with L=1 is withdrawn, the corresponding Leaf Auto- | route with L = 1 is withdrawn, the corresponding Leaf A-D route | |||
Discovery route is withdrawn too. | is withdrawn too. | |||
c. In a service where there is more than one Selective AR-REPLICATOR | c. In a service where there is more than one selective AR- | |||
the Selective AR-LEAF MUST locally select a single Selective AR- | REPLICATOR, the selective AR-LEAF MUST locally select a single | |||
REPLICATOR for the BD. Once selected: | selective AR-REPLICATOR for the BD. Once selected: | |||
o The Selective AR-LEAF MUST send a Leaf Auto-Discovery route | * The selective AR-LEAF MUST send a Leaf A-D route, including | |||
including the Route-key and IP-address-specific route-target | the route key and IP-address-specific Route Target of the | |||
of the selected AR-REPLICATOR. | selected AR-REPLICATOR. | |||
o The Selective AR-LEAF MUST send all the BM packets received on | * The selective AR-LEAF MUST send all the BM packets received on | |||
the attachment circuits (ACs) for a given BD to that AR- | the ACs for a given BD to that AR-REPLICATOR. | |||
REPLICATOR. | ||||
o In case of a failure on the selected AR-REPLICATOR (detected | * In the case of failure of the selected AR-REPLICATOR (detected | |||
when the Replicator-AR route becomes infeasible as the result | when the Replicator-AR route becomes infeasible as a result of | |||
of any of the underlying BGP mechanisms), another AR- | any of the underlying BGP mechanisms), another AR-REPLICATOR | |||
REPLICATOR will be selected and a new Leaf Auto-Discovery | will be selected and a new Leaf A-D update will be issued for | |||
update will be issued for the new AR-REPLICATOR. This new | the new AR-REPLICATOR. This new route will update the | |||
route will update the selective list in the new Selective AR- | selective list in the new selective AR-REPLICATOR. In the | |||
REPLICATOR. In case of failure of the active Selective AR- | case of failure of the active selective AR-REPLICATOR, it is | |||
REPLICATOR, it is RECOMMENDED for the Selective AR-LEAF to | RECOMMENDED that the selective AR-LEAF revert to ingress | |||
revert to Ingress Replication behavior for a timer AR- | replication behavior for an AR-REPLICATOR-activation-timer (in | |||
REPLICATOR-activation-timer (in seconds, default value is 3) | seconds, with a default value of 3) to mitigate the traffic | |||
to mitigate the traffic impact. When the timer expires, the | impact. When the timer expires, the selective AR-LEAF will | |||
Selective AR-LEAF will resume its AR mode with the new | resume its AR mode with the new selective AR-REPLICATOR. The | |||
Selective AR-REPLICATOR. The AR-REPLICATOR-activation-timer | AR-REPLICATOR-activation-timer MAY be the same configurable | |||
MAY be the same configurable parameter as in Section 5.2. | parameter as the parameter discussed in Section 5.2. | |||
o A Selective AR-LEAF MAY change the AR-REPLICATOR(s) selection | * A selective AR-LEAF MAY change the selection of AR- | |||
dynamically, due to an administrative or policy configuration | REPLICATOR(s) dynamically due to an administrative or policy | |||
change. | configuration change. | |||
All the AR-LEAFs in a BD are expected to be configured as either | All the AR-LEAFs in a BD are expected to be configured as either | |||
selective or non-selective. A mix of selective and non-selective AR- | selective or non-selective. A mix of selective and non-selective AR- | |||
LEAFs SHOULD NOT coexist in the same BD. In case there is a non- | LEAFs SHOULD NOT coexist in the same BD. If a non-selective AR-LEAF | |||
selective AR-LEAF, its BM traffic sent to a selective AR-REPLICATOR | is present, its BM traffic sent to a selective AR-REPLICATOR will not | |||
will not be replicated to other AR-LEAFs that are not in its | be replicated to other AR-LEAFs that are not in its selective AR- | |||
Selective AR-LEAF-set. | LEAF-set. | |||
A Selective AR-LEAF MUST follow a data path implementation compatible | A selective AR-LEAF MUST follow a data path implementation compatible | |||
with the following rules: | with the following rules: | |||
- The Selective AR-LEAF nodes will build two flood-lists: | * The selective AR-LEAF nodes will build two flooding lists: | |||
1. Flood-list #1 - composed of Attachment Circuits and the | Flooding list #1: Composed of ACs and the overlay tunnel to the | |||
overlay tunnel to the selected AR-REPLICATOR (using the AR-IP | selected AR-REPLICATOR (using the AR-IP as the tunnel | |||
as the tunnel destination IP address). | destination IP address). | |||
2. Flood-list #2 - composed of Attachment Circuits and overlay | Flooding list #2: Composed of ACs and overlay tunnels to the | |||
tunnels to the remote IR-IP addresses. | remote IR-IP addresses. | |||
- Some of the overlay tunnels in the flood-lists MAY be flagged as | * Some of the overlay tunnels in the flooding lists MAY be flagged | |||
non-BM receivers based on the BM flag received from the remote | as non-BM receivers based on the BM flag received from the remote | |||
nodes in the routes. | nodes in the routes. | |||
- When an AR-LEAF receives a BM packet on an Attachment Circuit, it | * When an AR-LEAF receives a BM packet on an AC, it will check to | |||
will check if there is any selected AR-REPLICATOR. If there is, | see if an AR-REPLICATOR was selected; if one is found, flooding | |||
flood-list #1 MUST be used. Otherwise, flood-list #2 MUST be | list #1 MUST be used. Otherwise, flooding list #2 MUST be used. | |||
used. Non-BM overlay tunnels are skipped when sending BM packets. | Non-BM overlay tunnels are skipped when sending BM packets. | |||
- When an AR-LEAF receives a BM packet on an overlay tunnel, it MUST | * When an AR-LEAF receives a BM packet on an overlay tunnel, it MUST | |||
forward the BM packet to its local Attachment Circuits and never | forward the BM packet to its local ACs and never to an overlay | |||
to an overlay tunnel. This is the regular Ingress Replication | tunnel. This is the regular ingress replication behavior | |||
behavior described in [RFC7432]. | described in [RFC7432]. | |||
7. Pruned-Flood-Lists (PFL) | 7. Pruned Flooding Lists (PFLs) | |||
In addition to AR, the second optimization supported by this solution | In addition to AR, the second optimization supported by the ingress | |||
is the ability for the all the BD nodes to signal Pruned-Flood-Lists | replication optimization solution specified in this document is the | |||
(PFL). As described in Section 4, an EVPN node can signal a given | ability of all the BD nodes to signal PFLs. As described in | |||
value for the BM and U Pruned-Food-Lists flags in the Regular-IR, | Section 4, an EVPN node can signal a given value for the BM and U | |||
Replicator-AR or Leaf Auto-Discovery routes, where: | PFLs flags in the Regular-IR, Replicator-AR, or Leaf A-D routes, | |||
where: | ||||
- BM is the Broadcast and Multicast flag. BM=1 means "prune-me" | * BM is the Broadcast and Multicast flag. BM = 1 means "prune me | |||
from the BM flood-list. BM=0 means regular behavior. | from the BM flooding list". BM = 0 indicates regular behavior. | |||
- U is the Unknown flag. U=1 means "prune-me" from the Unknown | * U is the Unknown flag. U = 1 means "prune me from the Unknown | |||
flood-list. U=0 means regular behavior. | flooding list". U = 0 indicates regular behavior. | |||
The ability to signal and process these Pruned-Flood-Lists flags | The ability to signal and process these PFLs flags SHOULD be an | |||
SHOULD be an administrative choice. If a node is configured to | administrative choice. If a node is configured to process the PFLs | |||
process the Pruned-Flood-Lists flags, upon receiving a non-zero | flags, upon receiving a non-zero PFLs flag for a route, an NVE/PE | |||
Pruned-Flood-Lists flag for a route, the NVE/PE will add the | will add the corresponding flag to the created overlay tunnel in the | |||
corresponding flag to the created overlay tunnel in the flood-list. | flooding list. When replicating a BM packet in the context of a | |||
When replicating a BM packet in the context of a flood-list, the NVE/ | flooding list, the NVE/PE will skip the overlay tunnels marked with | |||
PE will skip the overlay tunnels marked with the flag BM=1, since the | the flag BM = 1, since the NVEs/PEs at the end of those tunnels are | |||
NVE/PE at the end of those tunnels are not expecting BM packets. | not expecting BM packets. Similarly, when replicating unknown | |||
Similarly, when replicating Unknown unicast packets, the NVE/PE will | unicast packets, the NVE/PE will skip the overlay tunnels marked with | |||
skip the overlay tunnels marked with U=1. | U = 1. | |||
An NVE/PE not following this document or not configured for this | An NVE/PE not following this document or not configured for this | |||
optimization will ignore any of the received Pruned-Flood-Lists | optimization will ignore any of the received PFLs flags. An AR-LEAF | |||
flags. An AR-LEAF or RNVE receiving BUM traffic on an overlay tunnel | or RNVE receiving BUM traffic on an overlay tunnel MUST replicate the | |||
MUST replicate the traffic to its local Attachment Circuits, | traffic to its local ACs, regardless of the BM/U flags on the overlay | |||
regardless of the BM/U flags on the overlay tunnels. | tunnels. | |||
This optimization MAY be used along with the Assisted-Replication | This optimization MAY be used along with the Assisted Replication | |||
solution. | solution. | |||
7.1. A Pruned-Flood-List Example | 7.1. Example of a Pruned Flooding List | |||
In order to illustrate the use of the solution described in this | In order to illustrate the use of the PFLs solution, we will assume | |||
document, we will assume that BD-1 in Figure 4 is optimized Ingress | that BD-1 in Figure 4 is optimized ingress replication enabled and: | |||
Replication enabled and: | ||||
- PE1 and PE2 are administratively configured as AR-REPLICATORs, due | * PE1 and PE2 are administratively configured as AR-REPLICATORs due | |||
to their high-performance replication capabilities. PE1 and PE2 | to their high-performance replication capabilities. PE1 and PE2 | |||
will send a Replicator-AR route with BM/U flags = 00. | will send a Replicator-AR route with BM/U flags = 00. | |||
- NVE1 and NVE3 are administratively configured as AR-LEAF nodes, | * NVE1 and NVE3 are administratively configured as AR-LEAF nodes due | |||
due to their low-performance software-based replication | to their low-performance software-based replication capabilities. | |||
capabilities. They will advertise a Regular-IR route with type | They will advertise a Regular-IR route with type AR-LEAF. | |||
AR-LEAF. Assuming both NVEs advertise all the attached Virtual | Assuming that both NVEs advertise all of the attached VMs' MAC and | |||
Machines MAC and IP addresses in EVPN as soon as they come up, and | IP addresses in EVPNs as soon as they come up and these NVEs do | |||
these NVEs do not have any Virtual Machines interested in | not have any VMs interested in multicast applications, they will | |||
multicast applications, they will be configured to signal BM/U | be configured to signal BM/U flags = 11 for BD-1. That is, | |||
flags = 11 for BD-1. That is, neither NVE1 nor NVE3 are | neither NVE1 nor NVE3 is interested in receiving BM or unknown | |||
interested in receiving BM or Unknown Unicast traffic since: | unicast traffic, since: | |||
o Their attached VMs (VM11, VM12, VM31, VM32) do not support | - Their attached VMs (VM11, VM12, VM31, VM32) do not support | |||
multicast applications. | multicast applications. | |||
o Their attached VMs will not receive ARP Requests. Proxy-ARP | - Their attached VMs will not receive ARP Requests. Proxy ARP | |||
[I-D.ietf-bess-evpn-proxy-arp-nd] on the remote NVE/PEs will | [RFC9161] on the remote NVEs/PEs will reply to ARP Requests | |||
reply ARP Requests locally, and no other Broadcast is expected. | locally, and no other broadcast traffic is expected. | |||
o Their attached VMs will not receive unknown unicast traffic, | - Their attached VMs will not receive unknown unicast traffic, | |||
since the VMs' MAC and IP addresses are always advertised by | since the VMs' MAC and IP addresses are always advertised by | |||
EVPN as long as the VMs are active. | EVPNs as long as the VMs are active. | |||
- NVE2 is optimized Ingress Replication unaware; therefore it takes | * NVE2 is optimized ingress replication unaware; therefore, it takes | |||
on the RNVE role in BD-1. | on the RNVE role in BD-1. | |||
Based on the above assumptions the following forwarding behavior will | Based on the above assumptions, the following forwarding behavior | |||
take place: | will take place: | |||
1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 | 1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 | |||
will forward further the BM packets to TS1, WAN link, PE2 and | will then forward the BM packets on to TS1, the WAN link, PE2, | |||
NVE2, but not to NVE3. PE2 and NVE2 will replicate the BM | and NVE2 but not to NVE3. PE2 and NVE2 will replicate the BM | |||
packets to their local Attachment Circuits but we will avoid NVE3 | packets to their local ACs, but NVE3 will be prevented from | |||
having to replicate unnecessarily those BM packets to VM31 and | having to replicate those BM packets to VM31 and VM32 | |||
VM32. | unnecessarily. | |||
2. Any BM packets received on PE2 from the WAN will be sent to PE1 | 2. Any BM packets received on PE2 from the WAN will be sent to PE1 | |||
and NVE2, but not to NVE1 and NVE3, sparing the two hypervisors | and NVE2 but not to NVE1 and NVE3, sparing the two hypervisors | |||
from replicating unnecessarily to their local Virtual Machines. | from replicating unnecessarily to their local VMs. PE1 and NVE2 | |||
PE1 and NVE2 will replicate to their local Attachment Circuits | will replicate to their local ACs only. | |||
only. | ||||
3. Any Unknown unicast packet sent from VM31 will be forwarded by | 3. Any unknown unicast packet sent from VM31 will be forwarded by | |||
NVE3 to NVE2, PE1 and PE2 but not NVE1. The solution avoids the | NVE3 to NVE2, PE1, and PE2 but not to NVE1. The solution | |||
unnecessary replication to NVE1, since the destination of the | prevents unnecessary replication to NVE1, since the destination | |||
unknown traffic cannot be at NVE1. | of the unknown traffic cannot be NVE1. | |||
4. Any Unknown unicast packet sent from TS1 will be forwarded by PE1 | 4. Any unknown unicast packet sent from TS1 will be forwarded by PE1 | |||
to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the | to the WAN link, PE2, and NVE2 but not to NVE1 and NVE3, since | |||
target of the unknown traffic cannot be at those NVEs. | the target of the unknown traffic cannot be NVE1 or NVE3. | |||
8. AR Procedures for Single-IP AR-REPLICATORS | 8. AR Procedures for Single-IP AR-REPLICATORS | |||
The procedures explained in sections Section 5 and Section 6 assume | The procedures explained in Sections 5 and 6 assume that the AR- | |||
that the AR-REPLICATOR can use two local routable IP addresses to | REPLICATOR can use two local routable IP addresses to terminate and | |||
terminate and originate Network Virtualization Overlay tunnels, i.e. | originate NVO tunnels, i.e., IR-IP and AR-IP addresses. This is | |||
IR-IP and AR-IP addresses. This is usually the case for PE-based AR- | usually the case for PE-based AR-REPLICATOR nodes. | |||
REPLICATOR nodes. | ||||
In some cases, the AR-REPLICATOR node does not support more than one | In some cases, the AR-REPLICATOR node does not support more than one | |||
IP address to terminate and originate Network Virtualization Overlay | IP address to terminate and originate NVO tunnels, i.e., the IR-IP | |||
tunnels, i.e. the IR-IP and AR-IP are the same IP addresses. This | and AR-IP are the same IP addresses. This may be the case in some | |||
may be the case in some software-based or low-end AR-REPLICATOR | software-based or low-end AR-REPLICATOR nodes. If this is the case, | |||
nodes. If this is the case, the procedures in sections Section 5 and | the procedures provided in Sections 5 and 6 MUST be modified in the | |||
Section 6 MUST be modified in the following way: | following way: | |||
- The Replicator-AR routes generated by the AR-REPLICATOR use an AR- | * The Replicator-AR routes generated by the AR-REPLICATOR use an AR- | |||
IP that will match its IR-IP. In order to differentiate the data | IP that will match its IR-IP. In order to differentiate the data | |||
plane packets that need to use Ingress Replication from the | plane packets that need to use ingress replication from the | |||
packets that must use Assisted Replication forwarding mode, the | packets that must use Assisted Replication forwarding mode, the | |||
Replicator-AR route MUST advertise a different VNI/VSID than the | Replicator-AR route MUST advertise a different VNI/VSID than the | |||
one used by the Regular-IR route. For instance, the AR-REPLICATOR | one used by the Regular-IR route. For instance, the AR-REPLICATOR | |||
will advertise AR-VNI along with the Replicator-AR route and IR- | will advertise an AR-VNI along with the Replicator-AR route and an | |||
VNI along with the Regular-IR route. Since both routes have the | IR-VNI along with the Regular-IR route. Since both routes have | |||
same key, different Route Distinguishers are needed in each route. | the same key, different Route Distinguishers are needed in each | |||
route. | ||||
- An AR-REPLICATOR will perform Ingress Replication or Assisted | * An AR-REPLICATOR will perform Ingress Replication forwarding mode | |||
Replication forwarding mode for the incoming Overlay packets based | or Assisted Replication forwarding mode for the incoming overlay | |||
on an ingress VNI lookup, as opposed to the tunnel IP DA lookup. | packets based on an ingress VNI lookup as opposed to the tunnel IP | |||
Note that, when replicating to remote AR-REPLICATOR nodes, the use | DA lookup. Note that when replicating to remote AR-REPLICATOR | |||
of the IR-VNI or AR-VNI advertised by the egress node will | nodes, the use of the IR-VNI or AR-VNI advertised by the egress | |||
determine the Ingress Replication or Assisted Replication | node will determine whether Ingress Replication forwarding mode or | |||
forwarding mode at the subsequent AR-REPLICATOR. | Assisted Replication forwarding mode is used at the subsequent AR- | |||
REPLICATOR. | ||||
The rest of the procedures will follow what is described in sections | The rest of the procedures will follow those described in Sections 5 | |||
Section 5 and Section 6. | and 6. | |||
9. AR Procedures and EVPN All-Active Multi-homing Split-Horizon | 9. AR Procedures and EVPN All-Active Multihoming Split-Horizon | |||
This section extends the procedures for the cases where two or more | This section extends the procedures for the cases where two or more | |||
AR-LEAF nodes are attached to the same Ethernet Segment, and two or | AR-LEAF nodes are attached to the same ES and two or more AR- | |||
more AR-REPLICATOR nodes are attached to the same Ethernet Segment in | REPLICATOR nodes are attached to the same ES in the BD. The mixed | |||
the BD. The mixed case, that is, an AR-LEAF node and an AR- | case -- where an AR-LEAF node and an AR-REPLICATOR node are attached | |||
REPLICATOR node are attached to the same Ethernet Segment, would | to the same ES -- would require extended procedures that are out of | |||
require extended procedures and it is out of scope. | scope for this document. | |||
9.1. Ethernet Segments on AR-LEAF Nodes | 9.1. Ethernet Segments on AR-LEAF Nodes | |||
If VXLAN or NVGRE are used, and if the Split-horizon is based on the | If a VXLAN or NVGRE is used and if the split-horizon is based on the | |||
tunnel IP Source Address and "Local-Bias" as described in [RFC8365], | tunnel source IP address and "local bias" as described in [RFC8365], | |||
the Split-horizon check will not work if there is an Ethernet-Segment | the split-horizon check will not work if an ES is shared between two | |||
shared between two AR-LEAF nodes, and the AR-REPLICATOR replaces the | AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel source IP | |||
tunnel IP Source Address of the packets with its own AR-IP. | address of the packets with its own AR-IP. | |||
In order to be compatible with the IP Source Address split-horizon | In order to be compatible with the source IP address split-horizon | |||
check, the AR-REPLICATOR MAY keep the original received tunnel IP | check, the AR-REPLICATOR MAY keep the original received tunnel source | |||
Source Address when replicating packets to a remote AR-LEAF or RNVE. | IP address when replicating packets to a remote AR-LEAF or RNVE. | |||
This will allow AR-LEAF nodes to apply Split-horizon check procedures | This will allow AR-LEAF nodes to apply split-horizon check procedures | |||
for BM packets, before sending them to the local Ethernet-Segment. | for BM packets before sending them to the local ES. Even if the AR- | |||
Even if the AR-LEAF's IP Source Address is preserved when replicating | LEAF's source IP address is preserved when replicating to AR-LEAFs or | |||
to AR-LEAFs or RNVEs, the AR-REPLICATOR MUST always use its IR-IP as | RNVEs, the AR-REPLICATOR MUST always use its IR-IP as the source IP | |||
the IP Source Address when replicating to other AR-REPLICATORs. | address when replicating to other AR-REPLICATORs. | |||
When EVPN is used for MPLS over GRE (or UDP), the ESI-label based | When EVPNs are used for MPLSoGRE or MPLSoUDP, the ESI-label-based | |||
split-horizon procedure as in [RFC7432] will not work for multi-homed | split-horizon procedure provided in [RFC7432] will not work for | |||
Ethernet-Segments defined on AR-LEAF nodes. "Local-Bias" is | multihomed ESs defined on AR-LEAF nodes. Local bias is recommended | |||
recommended in this case, as in the case of VXLAN or NVGRE explained | in this case, as it is in the case of a VXLAN or NVGRE as explained | |||
above. The "Local-Bias" and tunnel IP Source Address preservation | above. The local-bias and tunnel source IP address preservation | |||
mechanisms provide the required split-horizon behavior in non- | mechanisms provide the required split-horizon behavior in non- | |||
selective or selective AR. | selective or selective AR. | |||
Note that if the AR-REPLICATOR implementation keeps the received | Note that if the AR-REPLICATOR implementation keeps the received | |||
tunnel IP Source Address, the use of uRPF (unicast Reverse Path | tunnel source IP address, the use of unicast Reverse Path Forwarding | |||
Forwarding) checks in the IP fabric based on the tunnel IP Source | (uRPF) checks in the IP fabric based on the tunnel source IP address | |||
Address MUST be disabled. | MUST be disabled. | |||
9.2. Ethernet Segments on AR-REPLICATOR nodes | 9.2. Ethernet Segments on AR-REPLICATOR Nodes | |||
AR-REPLICATOR nodes attached to the same all-active Ethernet Segment | AR-REPLICATOR nodes attached to the same all-active ES will follow | |||
will follow "Local-Bias" procedures [RFC8365], as follows: | local-bias procedures [RFC8365] as follows: | |||
a. For BUM traffic received on a local AR-REPLICATOR's Attachment | a. For BUM traffic received on a local AR-REPLICATOR's AC, local- | |||
Circuit, "Local-Bias" procedures as in [RFC8365] MUST be | bias procedures as provided in [RFC8365] MUST be followed. | |||
followed. | ||||
b. For BUM traffic received on an AR-REPLICATOR overlay tunnel with | b. For BUM traffic received on an AR-REPLICATOR overlay tunnel with | |||
AR-IP as the IP Destination Address, "Local-Bias" MUST also be | AR-IP as the IP DA, local bias MUST also be followed. That is, | |||
followed. That is, traffic received with AR-IP as IP Destination | traffic received with AR-IP as the IP DA will be treated as | |||
Address will be treated as though it had been received on a local | though it had been received on a local AC that is part of the ES | |||
Attachment Circuit that is part of the Ethernet Segment and will | and will be forwarded to all local ESs, irrespective of their DF | |||
be forwarded to all local Ethernet Segments, irrespective of | or NDF state. | |||
their DF or NDF state. | ||||
c. BUM traffic received on an AR-REPLICATOR overlay tunnel with IR- | c. BUM traffic received on an AR-REPLICATOR overlay tunnel with IR- | |||
IP as the IP Destination Address, will follow regular [RFC8365] | IP as the IP DA will follow regular local-bias rules [RFC8365] | |||
"Local-Bias" rules and will not be forwarded to local Ethernet | and will not be forwarded to local ESs that are shared with the | |||
Segments that are shared with the AR-LEAF or AR-REPLICATOR | AR-LEAF or AR-REPLICATOR originating the traffic. | |||
originating the traffic. | ||||
d. In cases where the AR-REPLICATOR supports a single IP address, | d. In cases where the AR-REPLICATOR supports a single IP address, | |||
the IR-IP and the AR-IP are the same IP address, as discussed in | the IR-IP and the AR-IP are the same IP address, as discussed in | |||
Section 8. The received BUM traffic will be treated as in 'b' | Section 8. The received BUM traffic will be treated as specified | |||
above if the received VNI is the AR-VNI, and as in 'c' if the VNI | in item b above if the received VNI is the AR-VNI and as | |||
is the IR-VNI. | specified in item c if the VNI is the IR-VNI. | |||
10. Security Considerations | 10. Security Considerations | |||
The Security Considerations in [RFC7432] and [RFC8365] apply to this | The security considerations in [RFC7432] and [RFC8365] apply to this | |||
document. The Security Considerations related to the Leaf Auto- | document. The security considerations related to the Leaf A-D route | |||
Discovery route in [I-D.ietf-bess-evpn-bum-procedure-updates] apply | in [RFC9572] apply too. | |||
too. | ||||
In addition, the Assisted-Replication method introduced by this | In addition, the Assisted Replication method introduced by this | |||
document may bring some new risks for the successful delivery of BM | document may introduce some new risks that could affect the | |||
traffic. Unicast traffic is not affected by Assisted-Replication | successful delivery of BM traffic. Unicast traffic is not affected | |||
(although Unknown unicast traffic is affected by the Pruned-Flood- | by Assisted Replication (although unknown unicast traffic is affected | |||
Lists procedures). The forwarding of Broadcast and Multicast (BM) | by the procedures for PFLs). The forwarding of BM traffic is | |||
traffic is modified, and BM traffic from the AR-LEAF nodes will be | modified, and BM traffic from the AR-LEAF nodes will be drawn toward | |||
attracted by the existence of AR-REPLICATORs in the BD. An AR-LEAF | AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its | |||
will forward BM traffic to its selected AR-REPLICATOR, therefore an | selected AR-REPLICATOR; therefore, an attack on the AR-REPLICATOR | |||
attack on the AR-REPLICATOR could impact the delivery of the BM | could impact the delivery of the BM traffic using that node. Also, | |||
traffic using that node. Also, an attack on the AR-REPLICATOR and | an attack on the AR-REPLICATOR and any change to the advertised AR | |||
change of the advertised AR type will modify the selection on the AR- | type will modify the selections made by the AR-LEAF nodes. If no | |||
LEAF nodes. If no other AR-REPLICATOR is selected, the AR-LEAF nodes | other AR-REPLICATOR is selected, the AR-LEAF nodes will be forced to | |||
will be forced to use Ingress Replication forwarding mode, which will | use Ingress Replication forwarding mode, which will impact their | |||
impact on their performance, since the AR-LEAF nodes are usually | performance, since the AR-LEAF nodes are usually NVEs/PEs with poor | |||
NVEs/PEs with poor replication performance. | replication performance. | |||
This document introduces the ability for the AR-REPLICATOR to forward | This document introduces the ability of the AR-REPLICATOR to forward | |||
traffic received on an overlay tunnel to another overlay tunnel. The | traffic received on an overlay tunnel to another overlay tunnel. The | |||
reader may interpret that this introduces the risk of BM loops. That | reader may determine that this introduces the risk of BM loops -- | |||
is, an AR-LEAF receiving a BM encapsulated packet that the AR-LEAF | that is, an AR-LEAF receiving a BM-encapsulated packet that the AR- | |||
originated in the first place, due to one or two AR-REPLICATORs | LEAF originated in the first place due to one or two AR-REPLICATORs | |||
"looping" the BM traffic back to the AR-LEAF. The procedures in this | "looping" the BM traffic back to the AR-LEAF. Following the | |||
document prevent these BM loops, since the AR-REPLICATOR will always | procedures provided in this document will prevent these BM loops, | |||
forward the BM traffic using the correct tunnel IP Destination | since the AR-REPLICATOR will always forward the BM traffic using the | |||
Address (or correct VNI in case of single-IP AR-REPLICATORs) that | correct tunnel IP DA (or the correct VNI in the case of single-IP AR- | |||
instructs the remote nodes how to forward the traffic. This is true | REPLICATORs), which instructs the remote nodes regarding how to | |||
in both the Non-Selective and Selective modes defined in this | forward the traffic. This is true for both the Non-selective and | |||
document. However, a wrong implementation of the procedures in this | Selective modes defined in this document. However, incorrect | |||
document may lead to those unexpected BM loops. | implementation of the procedures provided in this document may lead | |||
to those unexpected BM loops. | ||||
The Selective mode provides a multi-staged replication solution, | The Selective mode provides a multi-stage replication solution, where | |||
where a proper configuration of all the AR-REPLICATORs will avoid any | proper configuration of all the AR-REPLICATORs will prevent any | |||
issues. A mix of mistakenly configured Selective and Non-Selective | issues. A mix of mistakenly configured selective and non-selective | |||
AR-REPLICATORs in the same BD could theoretically create packet | AR-REPLICATORs in the same BD could theoretically create packet | |||
duplication in some AR-LEAFs, however this document specifies a fall | duplication in some AR-LEAFs; however, this document specifies a | |||
back solution to Non-Selective mode in case the AR-REPLICATORs | fallback solution -- falling back to Non-selective mode in cases | |||
advertised an inconsistent AR Replication mode. | where the AR-REPLICATORs advertised an inconsistent AR mode. | |||
This document allows the AR-REPLICATOR to preserve the tunnel IP | This document allows the AR-REPLICATOR to preserve the tunnel source | |||
Source Address of the AR-LEAF (as an option) when forwarding BM | IP address of the AR-LEAF (as an option) when forwarding BM packets | |||
packets from an overlay tunnel to another overlay tunnel. Preserving | from an overlay tunnel to another overlay tunnel. Preserving the AR- | |||
the AR-LEAF IP Source Address makes the "Local Bias" filtering | LEAF source IP address makes the local-bias filtering procedures | |||
procedures possible for AR-LEAF nodes that are attached to the same | possible for AR-LEAF nodes that are attached to the same ES. If the | |||
Ethernet Segment. If the AR-REPLICATOR does not preserve the AR-LEAF | AR-REPLICATOR does not preserve the AR-LEAF source IP address, AR- | |||
IP Source Address, AR-LEAF nodes attached to all-active Ethernet | LEAF nodes attached to all-active ESs will cause packet duplication | |||
Segments will cause packet duplication on the multi-homed CE. | on the multihomed CE. | |||
The AR-REPLICATOR nodes are, by design, using more bandwidth than | The AR-REPLICATOR nodes are, by design, using more bandwidth than PEs | |||
[RFC7432] PEs or [RFC8365] NVEs would use. Certain network events or | [RFC7432] or NVEs [RFC8365] would use. Certain network events or | |||
unexpected low performance may exceed the AR-REPLICATOR local | unexpected low performance may exceed the AR-REPLICATOR's local | |||
bandwidth and cause service disruption. | bandwidth and cause service disruption. | |||
Finally, the use of PFL as in Section 7, should be handled with care. | Finally, PFLs (Section 7) should be used with care. Intentional or | |||
An intentional or unintentional misconfiguration of the BDs on a | unintentional misconfiguration of the BDs on a given leaf node may | |||
given leaf node may result in the leaf not receiving the required BM | result in the leaf not receiving the required BM or unknown unicast | |||
or Unknown unicast traffic. | traffic. | |||
11. IANA Considerations | 11. IANA Considerations | |||
IANA has allocated the following Border Gateway Protocol (BGP) | IANA has allocated the following Border Gateway Protocol (BGP) | |||
Parameters: | parameters: | |||
- Allocation in the P-Multicast Service Interface Tunnel (PMSI | ||||
Tunnel) Tunnel Types registry: | ||||
Value Meaning Reference | ||||
0x0A Assisted-Replication Tunnel [This document] | ||||
- Allocations in the P-Multicast Service Interface (PMSI) Tunnel | ||||
Attribute Flags registry: | ||||
Value Name Reference | ||||
3-4 Assisted-Replication Type (T) [This document] | ||||
5 Broadcast and Multicast (BM) [This document] | ||||
6 Unknown (U) [This document] | ||||
12. Contributors | ||||
In addition to the names in the front page, the following co-authors | ||||
also contributed to this document: | ||||
Wim Henderickx | ||||
Nokia | ||||
Kiran Nagaraj | ||||
Nokia | ||||
Ravi Shekhar | * Allocation in the "P-Multicast Service Interface Tunnel (PMSI | |||
Juniper Networks | Tunnel) Tunnel Types" registry: | |||
Nischal Sheth | +=======+=============================+===========+ | |||
Juniper Networks | | Value | Meaning | Reference | | |||
+=======+=============================+===========+ | ||||
| 0x0A | Assisted Replication Tunnel | RFC 9574 | | ||||
+-------+-----------------------------+-----------+ | ||||
Aldrin Isaac | Table 1 | |||
Juniper | ||||
Mudassir Tufail | * Allocations in the "P-Multicast Service Interface (PMSI) Tunnel | |||
Citibank | Attribute Flags" registry: | |||
13. Acknowledgments | +=======+===============================+===========+ | |||
| Value | Name | Reference | | ||||
+=======+===============================+===========+ | ||||
| 3-4 | Assisted Replication Type (T) | RFC 9574 | | ||||
+-------+-------------------------------+-----------+ | ||||
| 5 | Broadcast and Multicast (BM) | RFC 9574 | | ||||
+-------+-------------------------------+-----------+ | ||||
| 6 | Unknown (U) | RFC 9574 | | ||||
+-------+-------------------------------+-----------+ | ||||
The authors would like to thank Neil Hart, David Motz, Dai Truong, | Table 2 | |||
Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz | ||||
for their valuable feedback and contributions. Also thanks to John | ||||
Scudder for his thorough review that improved the quality of the | ||||
document significantly. | ||||
14. References | 12. References | |||
14.1. Normative References | 12.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | 2012, <https://www.rfc-editor.org/info/rfc6513>. | |||
[RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP | [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP | |||
Encodings and Procedures for Multicast in MPLS/BGP IP | Encodings and Procedures for Multicast in MPLS/BGP IP | |||
VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, | VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, | |||
<https://www.rfc-editor.org/info/rfc6514>. | <https://www.rfc-editor.org/info/rfc6514>. | |||
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., | |||
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based | Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based | |||
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February | Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February | |||
2015, <https://www.rfc-editor.org/info/rfc7432>. | 2015, <https://www.rfc-editor.org/info/rfc7432>. | |||
[I-D.ietf-bess-evpn-bum-procedure-updates] | ||||
Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A. | ||||
Sajassi, "Updates on EVPN BUM Procedures", draft-ietf- | ||||
bess-evpn-bum-procedure-updates-14 (work in progress), | ||||
November 2021. | ||||
[RFC7902] Rosen, E. and T. Morin, "Registry and Extensions for | [RFC7902] Rosen, E. and T. Morin, "Registry and Extensions for | |||
P-Multicast Service Interface Tunnel Attribute Flags", | P-Multicast Service Interface Tunnel Attribute Flags", | |||
RFC 7902, DOI 10.17487/RFC7902, June 2016, | RFC 7902, DOI 10.17487/RFC7902, June 2016, | |||
<https://www.rfc-editor.org/info/rfc7902>. | <https://www.rfc-editor.org/info/rfc7902>. | |||
[RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
2012, <https://www.rfc-editor.org/info/rfc6513>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | |||
Uttaro, J., and W. Henderickx, "A Network Virtualization | Uttaro, J., and W. Henderickx, "A Network Virtualization | |||
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | |||
DOI 10.17487/RFC8365, March 2018, | DOI 10.17487/RFC8365, March 2018, | |||
<https://www.rfc-editor.org/info/rfc8365>. | <https://www.rfc-editor.org/info/rfc8365>. | |||
14.2. Informative References | [RFC9572] Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A. | |||
Sajassi, "Updates to EVPN Broadcast, Unknown Unicast, or | ||||
Multicast (BUM) Procedures", RFC 9572, | ||||
DOI 10.17487/RFC9572, May 2024, | ||||
<https://www.rfc-editor.org/info/rfc9572>. | ||||
12.2. Informative References | ||||
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., | ||||
"Encapsulating MPLS in IP or Generic Routing Encapsulation | ||||
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, | ||||
<https://www.rfc-editor.org/info/rfc4023>. | ||||
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | |||
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | |||
eXtensible Local Area Network (VXLAN): A Framework for | eXtensible Local Area Network (VXLAN): A Framework for | |||
Overlaying Virtualized Layer 2 Networks over Layer 3 | Overlaying Virtualized Layer 2 Networks over Layer 3 | |||
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | |||
<https://www.rfc-editor.org/info/rfc7348>. | <https://www.rfc-editor.org/info/rfc7348>. | |||
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., | ||||
"Encapsulating MPLS in IP or Generic Routing Encapsulation | ||||
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, | ||||
<https://www.rfc-editor.org/info/rfc4023>. | ||||
[RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network | [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network | |||
Virtualization Using Generic Routing Encapsulation", | Virtualization Using Generic Routing Encapsulation", | |||
RFC 7637, DOI 10.17487/RFC7637, September 2015, | RFC 7637, DOI 10.17487/RFC7637, September 2015, | |||
<https://www.rfc-editor.org/info/rfc7637>. | <https://www.rfc-editor.org/info/rfc7637>. | |||
[I-D.ietf-bess-evpn-proxy-arp-nd] | [RFC9161] Rabadan, J., Ed., Sathappan, S., Nagaraj, K., Hankins, G., | |||
Rabadan, J., Sathappan, S., Nagaraj, K., Hankins, G., and | and T. King, "Operational Aspects of Proxy ARP/ND in | |||
T. King, "Operational Aspects of Proxy ARP/ND in Ethernet | Ethernet Virtual Private Networks", RFC 9161, | |||
Virtual Private Networks", draft-ietf-bess-evpn-proxy-arp- | DOI 10.17487/RFC9161, January 2022, | |||
nd-16 (work in progress), October 2021. | <https://www.rfc-editor.org/info/rfc9161>. | |||
Acknowledgements | ||||
The authors would like to thank Neil Hart, David Motz, Dai Truong, | ||||
Thomas Morin, Jeffrey Zhang, Shankar Murthy, and Krzysztof Szarkowicz | ||||
for their valuable feedback and contributions. Also, thanks to John | ||||
Scudder for his thorough review, which improved the quality of the | ||||
document significantly. | ||||
Contributors | ||||
In addition to the authors listed on the front page, the following | ||||
people also contributed to this document and should be considered | ||||
coauthors: | ||||
Wim Henderickx | ||||
Nokia | ||||
Kiran Nagaraj | ||||
Nokia | ||||
Ravi Shekhar | ||||
Juniper Networks | ||||
Nischal Sheth | ||||
Juniper Networks | ||||
Aldrin Isaac | ||||
Juniper | ||||
Mudassir Tufail | ||||
Citibank | ||||
Authors' Addresses | Authors' Addresses | |||
J. Rabadan (editor) | Jorge Rabadan (editor) | |||
Nokia | Nokia | |||
777 Middlefield Road | 777 Middlefield Road | |||
Mountain View, CA 94043 | Mountain View, CA 94043 | |||
USA | United States of America | |||
Email: jorge.rabadan@nokia.com | Email: jorge.rabadan@nokia.com | |||
S. Sathappan | Senthil Sathappan | |||
Nokia | Nokia | |||
Email: senthil.sathappan@nokia.com | Email: senthil.sathappan@nokia.com | |||
W. Lin | Wen Lin | |||
Juniper Networks | Juniper Networks | |||
Email: wlin@juniper.net | Email: wlin@juniper.net | |||
M. Katiyar | Mukul Katiyar | |||
Versa Networks | Versa Networks | |||
Email: mukul@versa-networks.com | Email: mukul@versa-networks.com | |||
A. Sajassi | Ali Sajassi | |||
Cisco Systems | Cisco Systems | |||
Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
End of changes. 311 change blocks. | ||||
985 lines changed or deleted | 973 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |