BESS Workgroup
Internet Engineering Task Force (IETF) J. Rabadan, Ed.
Internet-Draft
Request for Comments: 9574 S. Sathappan
Intended status:
Category: Standards Track Nokia
Expires: July 29, 2022
ISSN: 2070-1721 W. Lin
Juniper Networks
M. Katiyar
Versa Networks
A. Sajassi
Cisco Systems
January 25, 2022
May 2024
Optimized Ingress Replication Solution for Ethernet VPN (EVPN)
draft-ietf-bess-evpn-optimized-ir-12 VPNs (EVPNs)
Abstract
Network Virtualization Overlay (NVO) networks using Ethernet VPN (EVPN) VPNs
(EVPNs) as their control plane may use Ingress Replication trees based on ingress
replication or PIM (Protocol Protocol Independent Multicast)-based trees Multicast (PIM) to convey the
overlay Broadcast, Unknown unicast and Unicast, or Multicast (BUM) traffic. PIM
provides an efficient solution to avoid that prevents sending multiple copies
of the same packet over the same physical link, however link; however, it may not
always be deployed in the Network Virtualization Overlay core network. NVO network core. Ingress
Replication replication
avoids the dependency on PIM in the Network
Virtualization Overlay NVO network core. While Ingress Replication ingress
replication provides a simple multicast transport, some Network Virtualization
Overlay NVO networks
with demanding multicast applications require a more efficient
solution without PIM in the core. This document describes a solution
to optimize the efficiency of Ingress Replication ingress replication trees.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list It represents the consensus of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid the IETF community. It has
received public review and has been approved for a maximum publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of six months RFC 7841.
Information about the current status of this document, any errata,
and how to provide feedback on it may be updated, replaced, or obsoleted by other documents obtained at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 29, 2022.
https://www.rfc-editor.org/info/rfc9574.
Copyright Notice
Copyright (c) 2022 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified Revised BSD License text as described in Section 4.e of the
Trust Legal Provisions and are provided without warranty as described
in the Simplified Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology and Conventions . . . . . . . . . . . . . . . . . 6
3. Solution Requirements . . . . . . . . . . . . . . . . . . . . 9
4. EVPN BGP Attributes for Optimized Ingress Replication . . . . 9
5. Non-Selective Assisted-Replication Non-selective Assisted Replication (AR) Solution Description 13
5.1. Non-selective AR-REPLICATOR Procedures . . . . . . . . . 15
5.2. Non-Selective Non-selective AR-LEAF Procedures . . . . . . . . . . . . 17
5.3. RNVE Procedures . . . . . . . . . . . . . . . . . . . . . 19
6. Selective Assisted-Replication Assisted Replication (AR) Solution Description . . 20
6.1. Selective AR-REPLICATOR Procedures . . . . . . . . . . . 21
6.2. Selective AR-LEAF Procedures . . . . . . . . . . . . . . 23
7. Pruned-Flood-Lists (PFL) . . . . . . . . . . . . . . . . . . 26 Pruned Flooding Lists (PFLs)
7.1. A Pruned-Flood-List Example . . . . . . . . . . . . . . . 26 of a Pruned Flooding List
8. AR Procedures for Single-IP AR-REPLICATORS . . . . . . . . . 28
9. AR Procedures and EVPN All-Active Multi-homing Multihoming Split-Horizon 28
9.1. Ethernet Segments on AR-LEAF Nodes . . . . . . . . . . . 29
9.2. Ethernet Segments on AR-REPLICATOR nodes . . . . . . . . 29 Nodes
10. Security Considerations . . . . . . . . . . . . . . . . . . . 30
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31
12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 32
13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 32
14.1.
12.1. Normative References . . . . . . . . . . . . . . . . . . 32
14.2.
12.2. Informative References . . . . . . . . . . . . . . . . . 33
Acknowledgements
Contributors
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34
1. Introduction
Ethernet Virtual Private Networks (EVPN) (EVPNs) may be used as the control
plane for a Network Virtualization Overlay (NVO) network [RFC8365].
Network Virtualization Edge (NVE) and Provider Edge (PE) devices that
are part of the same EVPN Broadcast Domain (BD) use Ingress
Replication (IR) or PIM-based trees to transport the tenant's
Broadcast, Unknown unicast and Unicast, or Multicast (BUM) traffic.
In the Ingress Replication ingress replication approach, the ingress NVE receving receiving a BUM
frame from the Tenant System (TS) will create as many copies of the
frame as the number of remote NVEs/PEs that are attached to the BD.
Each of those copies will be encapsulated into an IP packet where the
outer IP Destination Address (IP DA) identifies the loopback of the
egress NVE/PE. The IP fabric core nodes (also known as Spines) spines) will
simply route the IP
encapsulated IP-encapsulated BUM frames based on the outer IP DA.
If PIM-based trees are used instead of Ingress Replication, ingress replication, the NVEs/PEs NVEs/
PEs attached to the same BD will join a PIM-based tree. The ingress
NVE receiving a BUM frame will send a single copy of the frame,
encapsulated into an IP packet where the outer IP DA is the multicast
address that represents the PIM-based tree. The IP fabric core nodes
are part of the PIM tree and keep multicast state for the multicast
group, so that IP
encapsulated IP-encapsulated BUM frames can be routed to all the
NVEs/PEs that joined the tree.
The two approaches are illustrated in Figure 1. On the left-hand
side,
side of the diagram, NVE1 uses Ingress Replication ingress replication to send a BUM
frame (originated from Tenant System TS1) to the remote nodes
attached to the BD, i.e., NVE2, NV3, NVE3, and PE1. On the right-hand side of the diagram,
side, the same example is depicted but using a PIM-based tree, i.e.,
(S1,G1), instead of Ingress Replication. ingress replication. While a single copy of the
tunneled BUM frame is generated in the latter approach, all the
routers in the fabric need to keep muticast multicast state, e.g., the Spine spine
keeps a PIM
multicast routing entry for (S1,G1) with an Incoming Interface
(IIF) and three Outgoing Interfaces (OIFs).
To-WAN To-WAN
To WAN To WAN
^ ^
| |
+-----+ +-----+
+----------| PE1 |-----------+ +----------| PE1 |-----------+
| +--^--+ | | +--^--+ |
| | IP Fabric | | | IP Fabric |
| PE | | (S1,G1) |OIF to-G to G1 |
| +----PE->+-----+ No State | | IIF +-----+ OIF to-G to G1 |
| | +---2->|Spine|------+ | | +------>Spine|------+ |
| | | +-3->+-----+ | | | | +-----+ | |
| | | | 2 3 | | |PIM |OIF to-G | to G1| |
| | | |IR | | | | |tree | | |
|+-----+ +--v--+ +--v--+ | |+-----+ +--v--+ +--v--+ |
+| NVE1|---| NVE2|---| NVE3|-+ +| NVE1|---| NVE2|---| NVE3|-+
+--^--+ +-----+ +-----+ +--^--+ +-----+ +-----+
| | | | | |
| v v | v v
TS1 TS2 TS3 TS1 TS2 TS3
Figure 1: Ingress Replication vs PIM-based trees vs. PIM-Based Trees in NVO networks Networks
In Network Virtualization Overlay NVO networks where PIM-based trees cannot be used, Ingress Replication ingress
replication is the only option. Examples of these situations are Network Virtualization Overlay NVO
networks where the core nodes do not support PIM or the network
operator does not want to run PIM in the core.
In some use-cases, use cases, the amount of replication for BUM traffic is kept
under control on the NVEs due to the following fairly common
assumptions:
a. Broadcast traffic is greatly reduced due to the proxy ARP (Address Address
Resolution Protocol) Protocol (ARP) and proxy ND (Neighbor Discovery) Neighbor Discovery (ND)
capabilities supported by EVPN EVPNs [RFC9161] on the NVEs
[I-D.ietf-bess-evpn-proxy-arp-nd]. NVEs. Some NVEs
can even provide Dynamic Host Configuration Protocol (DHCP)
server functions for the attached Tenant Systems, TSs, reducing the broadcast
traffic even further.
b. Unknown unicast traffic is greatly reduced in Network
Virtualization Overlay NVO networks where
all the MAC Media Access Control (MAC) and IP addresses from the Tenant Systems TSs
are learned in the control plane.
c. Multicast applications are not used.
If the above assumptions are true for a given Network Virtualization
Overlay NVO network, then Ingress Replication
ingress replication provides a simple solution for multi-destination
traffic. However, the statement c) c. above is not always true true, and
multicast applications are required in many use- use cases.
When the multicast sources are attached to NVEs residing in
hypervisors or low-performance-replication TORs (Top Of Rack
switches), Top-of-Rack (ToR)
switches, the ingress replication of a large amount of multicast
traffic to a significant number of remote NVEs/PEs can seriously
degrade the performance of the NVE and impact the application.
This document describes a solution that makes use of two Ingress
Replication ingress
replication optimizations:
1. Assisted-Replication Assisted Replication (AR)
2. Pruned-Flood-Lists (PFL)
Assisted-Replication Pruned Flooding Lists (PFLs)
Assisted Replication consists of a set of procedures that allows the
ingress NVE/PE to send a single copy of a Broadcast broadcast or Multicast multicast
frame received from a Tenant System TS to the Broadcast Domain, BD without the need for PIM in the
underlay. Assisted Replication defines the roles of AR-REPLICATOR
and AR-LEAF routers. The AR-LEAF is the ingress NVE/PE attached to
the Tenant System. TS. The AR-LEAF sends a single copy of a Broadcast broadcast or Multicast multicast
packet to a selected AR-
REPLICATOR AR-REPLICATOR that replicates the packet mutiple
multiple times to remote AR-LEAF or AR-REPLICATOR routers, routers and is
therefore "assisting" the ingress AR-
LEAF AR-LEAF in delivering the Broadcast broadcast
or Multicast multicast traffic to the remote NVEs/PEs attached to the same Broadcast Domain. Assisted-Replication BD.
Assisted Replication can use a single AR-REPLICATOR or two AR-REPLICATOR AR-
REPLICATOR routers in the path between the ingress AR-LEAF and the
remote destination NVE/PEs. NVEs/PEs. The procedures that use a single AR-REPLICATOR (Non-Selective
Assisted-Replication Solution) AR-
REPLICATOR (the non-selective Assisted Replication solution) are
specified in Section 5, whereas Section 6 describes how multi-staged multi-stage
replication, i.e., two AR-
REPLICATOR AR-REPLICATOR routers in the path between the
ingress AR-LEAF and destination NVEs/PEs, is accomplished (Selective Assisted-Replication
Solution). (the
selective Assisted Replication solution). The Assisted-Replication procedures for
Assisted Replication do not impact unknown unicast traffic, which
follows the same forwarding procedures as known unicast traffic so
that packet re-ordering reordering does not occur.
Pruned-Flood-Lists is
PFLs provide a method for the ingress NVE/PE to prune or remove
certain destination NVEs/PEs from a flood-list, flooding list, depending on the
interest of those NVEs/PEs in receiving Broadcast, Multicast or
Unknown unicast. BUM traffic. As specified in
[RFC8365], an NVE/PE builds a
flood-list flooding list for BUM traffic based on
the Next-Hops next hops of the received EVPN Inclusive Multicast Ethernet Tag
routes for the Broadcast
Domain. BD. While [RFC8365] states that the flood-list flooding list is
used for all BUM traffic, this document allows pruning certain Next-Hops next
hops from the list. As an example, suppose an ingress NVE creates a flood-list
flooding list with Next-Hops next hops PE1, PE2 PE2, and PE3. If PE2 and PE3 signaled no-interest did
not signal any interest in receiving Unknown Unicast unknown unicast traffic in their
Inclusive Multicast Ethernet Tag routes, when the ingress NVE
receives an Unknown Unicast unknown unicast frame from a Tenant System TS, it will replicate it
only to PE1. That is, PE2 and PE3 are "pruned" from the NVE's flood-list
flooding list for Unknown Unicast unknown unicast traffic. Pruned-Flood-Lists PFLs can be used with Ingress Replication
ingress replication or
Assisted-Replication, Assisted Replication and it is are described in
Section 7.
Both optimizations, Assisted-Replication optimizations -- Assisted Replication and Pruned-Flood-Lists, PFLs -- may be used
together or independently so that the performance and efficiency of
the network to transport multicast can be improved. Both solutions
require some extensions to the BGP attributes used in
[RFC7432], and they are described in [RFC7432]; see
Section 4. 4 for details.
The Assisted-Replication Assisted Replication solution described in this document is
focused on Network Virtualization Overlay NVO networks (hence it uses its use of IP
tunnels) and tunnels). MPLS
transport networks are out of scope. scope for this document. The Pruned-
Flood-Lists PFLs
solution MAY be used in Network Virtualization Overlay NVO and MPLS transport networks.
Section 3 lists the requirements of the combined optimized Ingress
Replication ingress
replication solution, whereas Section Sections 5 and Section 6 describe the
Assisted-Replication Assisted
Replication solution (for Non-Selective for non-selective and Selective selective procedures, respectively), and
respectively. Section 7 provides the Pruned-Flood-Lists PFLs solution.
2. Terminology and Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
The following terminology is used throughout the this document:
- Asisted
AR-IP: Assisted Replication forwarding mode: for an AR-LEAF, it means
sending an Attachment Circuit BM packet - IP. Refers to a single AR-REPLICATOR
with tunnel destination IP AR-IP. For an AR-REPLICATOR, it means
sending a BM packet IP address owned by
the AR-REPLICATOR and used to a selected number or all differentiate the overlay
tunnels when incoming traffic
that must follow the packet was previously received from an overlay
tunnel.
- AR procedures. The AR-IP is also used in the
Tunnel Identifier and Next Hop fields of the Replicator-AR route.
AR-LEAF: Assisted Replication - LEAF, refers LEAF. Refers to an NVE/PE that
sends all the Broadcast and Multicast BM traffic to an AR-REPLICATOR that can replicate
the traffic further on its behalf. An AR-LEAF is typically an
NVE/PE with poor replication performance capabilities.
-
AR-REPLICATOR: Assisted Replication - REPLICATOR, refers REPLICATOR. Refers to an
NVE/PE NVE/
PE that can replicate Broadcast broadcast or Multicast multicast traffic received on
overlay tunnels to other overlay tunnels and local Attachment
Circuits.
Circuits (ACs). This document defines the control and data plane
procedures that an AR-REPLICATOR needs to follow.
AR-VNI: Assisted Replication - AR-IP: IP address owned by the AR-REPLICATOR and used VNI. Refers to
differentiate the incoming traffic that must follow the AR
procedures. The AR-IP is also used in the Tunnel a Virtual eXtensible
Local Area Network (VXLAN) Network Identifier and
Next-Hop fields of the Replicator-AR route.
- AR-VNI: VNI (VNI) advertised by
the AR-REPLICATOR along with the Replicator-AR route. It is used
to identify the incoming packets that must follow the AR
procedures ONLY in the Single-IP single-IP AR-REPLICATOR case (see
Section 8.
- BM traffic: Refers to 8).
Assisted Replication forwarding mode: In the case of an AR-LEAF,
sending an AC Broadcast and Multicast frames (excluding
unknown unicast frames).
- (BM) packet to a single AR-
REPLICATOR with a tunnel destination address AR-IP. In the case
of an AR-REPLICATOR, this means sending a BM packet to a selected
number of, or all of, the overlay tunnels when the packet was
previously received from an overlay tunnel.
BD: Broadcast Domain, as defined in [RFC7432].
-
BD label: defined Defined as the MPLS label that identifies the Broadcast
Domain BD and is
advertised in Regular-IR or Replicator-AR routes, when the
encapsulation is MPLSoGRE MPLS over GRE (MPLSoGRE) or MPLSoUDP.
- MPLS over UDP
(MPLSoUDP).
BM traffic: Refers to broadcast and multicast frames (excluding
unknown unicast frames).
DF and NDF: Designated Forwarder and Non-Designated Forwarder, Forwarder.
These are roles defined in NVE/PEs NVEs/PEs attached to Multi-Homed Tenant Systems, multihomed TSs, as
per [RFC7432] and [RFC8365].
-
ES and ESI: Ethernet Segment and Ethernet Segment Identifier, as Identifier. EVPN Multi-Homing
multihoming concepts as specified in [RFC7432].
-
EVI: EVPN Instance. A group of Provider Edge (PE) devices
participating in the same EVPN service, as specified in [RFC7432].
-
GRE: Generic Routing Encapsulation [RFC4023].
-
Ingress Replication forwarding mode: it refers Refers to the Ingress
Replication ingress
replication behavior explained in [RFC7432]. It means sending In this mode, an
Attachment Circuit AC
BM packet copy is sent to each remote PE/NVE in the BD BD, and sending an
overlay BM packet is sent only to the Attachment Circuits ACs and not to other overlay
tunnels.
-
IR-IP: Ingress Replication - IP. Refers to the local IP address of
an NVE/PE that is used for the Ingress
Replication ingress replication signaling and
procedures provided in [RFC7432]. Encapsulated incoming traffic
with an outer destination IP address matching the IR-IP will
follow the Ingress Replication procedures for ingress replication and not the Assisted-
Replication procedures.
procedures for Assisted Replication. The IR-IP is also used in
the Tunnel Identifier and Next-hop Next Hop fields of the Regular-IR route.
-
IR-VNI: Ingress Replication - VNI. Refers to a VNI advertised along
with the Inclusive Multicast Ethernet Tag route for Ingress Replication Tunnel Type.
- the ingress
replication tunnel type.
MPLS: Multi-Protocol Label Switching.
-
NVE: Network Virtualization Edge router, used in this document as
in [RFC8365].
-
NVGRE: Network Virtualization virtualization using Generic Routing Encapsulation,
as in Encapsulation
[RFC7637].
-
PE: Provider Edge router.
- Edge.
PMSI: P-Multicast Service Interface - a Interface. A conceptual interface for a
PE to send customer multicast traffic to all or some PEs in the
same VPN [RFC6513].
-
RD: Route Distinguisher.
-
Regular-IR route: an An EVPN Inclusive Multicast Ethernet Tag route
[RFC7432] that uses Ingress Replication Tunnel Type.
- the ingress replication tunnel type.
Replicator-AR route: An EVPN Inclusive Multicast Ethernet Tag route
that is advertised by an AR-REPLICATOR to signal its capabilities,
as described in Section 4.
RNVE: Regular NVE, refers NVE. Refers to an NVE that supports the procedures
of
provided in [RFC8365] and does not support the procedures provided
in this document. However, this document defines procedures to
interoperate with RNVEs.
- Replicator-AR route: an EVPN Inclusive Multicast Ethernet Tag
route that is advertised by an AR-REPLICATOR to signal its
capabilities, as described in Section 4.
- TOR: Top Of Rack
ToR switch: Top-of-Rack switch.
-
TS and VM: Tenant System and Virtual Machine. In this document
Tenant Systems document, TSs
and Virtual Machiness VMs are the devices connected to the Attachment Circuits ACs of the PEs and NVEs.
-
VNI: VXLAN Network Identifier, used Identifier. Used in VXLAN tunnels.
-
VSID: Virtual Segment Identifier, used Identifier. Used in NVGRE tunnels.
-
VXLAN: Virtual Extensible LAN eXtensible Local Area Network [RFC7348].
3. Solution Requirements
The Ingress Replication ingress replication optimization solution specified in this
document meets the following requirements:
a. It The solution provides an Ingress Replication ingress replication optimization for Broadcast and
Multicast BM
traffic without the need for PIM, PIM while preserving the packet
order for unicast applications, i.e., unknown unicast traffic
should follow the same path as known unicast traffic. This
optimization is required in low-performance NVEs.
b. It The solution reduces the flooded traffic in Network Virtualization Overlay NVO networks where
some NVEs do not need broadcast/multicast and/or unknown unicast
traffic.
c. The solution is compatible with [RFC7432] and [RFC8365] and has
no impact on the CE Customer Edge (CE) procedures for BM traffic.
In particular, the solution supports the following EVPN
functions:
o
* All-active multi-homing, multihoming, including the split-horizon and
Designated Forwarder (DF) DF
functions.
o
* Single-active multi-homing, multihoming, including the DF function.
o
* Handling of multi-destination traffic and processing of
broadcast and multicast BM
traffic as per [RFC7432].
d. The solution is backwards backward compatible with existing NVEs using a
non-optimized version of Ingress Replication. ingress replication. A given BD can
have NVEs/PEs supporting regular Ingress Replication ingress replication and
optimized Ingress Replication. ingress replication.
e. The solution is independent of the Network Virtualization Overlay
specific NVO-specific data plane
encapsulation and the virtual identifiers being used, e.g.: e.g., VXLAN
VNIs, NVGRE VSIDs VSIDs, or MPLS labels, as long as the tunnel is IP-based. IP
based.
4. EVPN BGP Attributes for Optimized Ingress Replication
This
The ingress replication optimization solution specified in this
document extends the [RFC7432] Inclusive Multicast Ethernet Tag routes and
attributes described in [RFC7432] so that an NVE/PE can signal its
optimized
Ingress Replication ingress replication capabilities.
The NLRI Network Layer Reachability Information (NLRI) of the Inclusive
Multicast Ethernet Tag route as in [RFC7432] is shown in Figure 2 and it is
used in this document without any modifications to its format. The
PMSI Tunnel Attribute's general format as provided in [RFC7432]
(which takes it from [RFC6514]) is used in this document, document; only a new Tunnel Type
tunnel type and new flags are specified, as shown in Figure 3:
+---------------------------------+ 3.
+------------------------------------+
| RD (8 octets) |
+---------------------------------+
+------------------------------------+
| Ethernet Tag ID (4 octets) |
+---------------------------------+
+------------------------------------+
| IP Address Length (1 octet) |
+---------------------------------+
+------------------------------------+
| Originating Router's IP Addr Address |
| (4 or 16 octets) |
+---------------------------------+
+------------------------------------+
Figure 2: EVPN Inclusive Multicast Ethernet Tag route's Route's NLRI
0 1 2 3 4 5 6 7
+---------------------------------+ +--+--+--+--+--+--+--+--+
| Flags (1 octet) | -> |x |E |x | T |BM|U |L |
+---------------------------------+ +--+--+--+--+--+--+--+--+
| Tunnel Type (1 octets) octet) | T = Assisted-Replication Assisted Replication Type
+---------------------------------+ BM = Broadcast and Multicast
| MPLS Label (3 octets) | U = Unknown unicast (unknown unicast)
+---------------------------------+ x = unassigned
| Tunnel Identifier (variable) |
+---------------------------------+
Figure 3: PMSI Tunnel Attribute
The Flags field in Figure 3 is 8 bits long as per [RFC7902], where
the [RFC7902]. The
Extension flag (E) flag was allocated by [RFC7902], and the Leaf
Information Required (L) Flag are
already allocated. flag was allocated by [RFC6514]. This
document defines the use of 4 bits of this Flags field, and suggests the following allocation to IANA:
- bits field:
* Bits 3 and 4, forming which together form the Assisted-Replication Assisted Replication Type
(T) field
- bit
* Bit 5, called the Broadcast and Multicast (BM) flag
- bit
* Bit 6, called the Unknown (U) flag
Bits 5 and 6 are collectively referred to as the Pruned-Flood Pruned Flooding
Lists
(PFL) (PFLs) flags.
The T field and Pruned-Flood-Lists PFLs flags are defined as follows:
-
* T is the Assisted-Replication Assisted Replication Type field (2 bits) that bits), which defines
the AR role of the advertising router:
o
- 00 (decimal 0) = RNVE (non-AR support)
o
- 01 (decimal 1) = AR-REPLICATOR
o
- 10 (decimal 2) = AR-LEAF
o
- 11 (decimal 3) = RESERVED
-
* The Pruned-Flood-Lists PFLs flags define the desired behavior of the advertising
router for the different types of traffic:
o
- Broadcast and Multicast (BM) flag. BM=1 BM = 1 means "prune-me" "prune me from
the BM flooding list. BM=0 means list". BM = 0 indicates regular behavior.
o
- Unknown (U) flag. U=1 U = 1 means "prune-me" "prune me from the Unknown
flooding list. U=0 means list". U = 0 indicates regular behavior.
- Flag
* The L is an existing flag (bit 7) is defined in [RFC6514] (L=Leaf
Information Required, bit 7) and it will be used only
in the
Selective selective AR Solution. solution.
Please refer to Section 11 for the IANA considerations related to the
PMSI Tunnel Attribute flags.
In this document, the above Inclusive Multicast Ethernet Tag route
Figure 2
(Figure 2) and PMSI Tunnel Attribute Figure 3 (Figure 3) can be used in two
different modes for the same BD:
-
Regular-IR route: in In this route, Originating Router's IP Address,
Tunnel Type (0x06), MPLS Label Label, and Tunnel Identifier MUST be used
as described in [RFC7432] when Ingress Replication ingress replication is in use. The
NVE/PE that advertises the route will set the Next-Hop Next Hop to an IP
address that we denominate IR-IP in this document. When
advertised by an AR-LEAF node, the Regular-IR route MUST be
advertised with type the T field set to 10 (AR-LEAF).
-
Replicator-AR route: this This route is used by the AR-REPLICATOR to
advertise its AR capabilities, with the fields set as follows:
o
* Originating Router's IP Address MUST be set to an IP address of
the advertising router that is common to all the EVIs on the PE
(usually this is a loopback address of the PE).
+
- The Tunnel Identifier and Next-Hop Next Hop fields SHOULD be set to
the same IP address as the Originating Router's IP address Address
field when the NVE/PE originates the route, route -- that is, when
the NVE/PE is not an ASBR as in section ASBR; see Section 10.2 of [RFC8365].
Irrespective of the values in the Tunnel Identifier and
Originating Router's IP Address fields, the ingress NVE/PE
will process the received Replicator-AR route and will use
the IP Address address setting in the Next-Hop Next Hop field to create IP
tunnels to the AR-
REPLICATOR.
+ AR-REPLICATOR.
- The Next-Hop Next Hop address is referred to as the AR-IP and MUST be
different from the IR-IP for a given PE/NVE, unless the
procedures provided in Section 8 are followed.
o
* Tunnel Type MUST be set to Assisted-Replication Assisted Replication Tunnel.
Section 11 provides the allocated type value.
o
* T (AR role (Assisted Replication type) MUST be set to 01 (AR-REPLICATOR).
o (AR-
REPLICATOR).
* L (Leaf Information Required) MUST be set to 0 (for for non-
selective AR), AR and MUST be set to 1 (for for selective AR). AR.
An NVE/PE configured as an AR-REPLICATOR for a BD MUST advertise a
Replicator-AR route for the BD and MAY advertise a Regular-IR route.
The advertisement of the Replicator-AR route will indicate to the AR-
LEAFs what which outer IP DA, i.e., the which AR-IP, they need to use for IP IP-
encapsulated BM frames that use Assisted Replication forwarding mode.
The AR-REPLICATOR will forward an IP encapsulated IP-encapsulated BM frame in
Assisted Replication forwarding mode if the outer IP DA matches its
AR-IP,
AR-IP but will forward in Ingress Replication forwarding mode if the
outer IP DA matches its IR-IP.
In addition, this document also uses the Leaf Auto-Discovery (Leaf
A-D) route defined in [I-D.ietf-bess-evpn-bum-procedure-updates] [RFC9572] in
case cases where the selective AR mode
is used. An AR-LEAF MAY send a Leaf A-D route in response to
reception of a Replicator-AR route whose L flag is set. The Leaf Auto-Discovery A-D
route is only used for selective AR AR, and the fields of such a route
are set as follows:
o
* Originating Router's IP Address is set to the advertising router's
IP address (same (the same IP address used by the AR-LEAF in regular-IR Regular-IR
routes). The Next-Hop Next Hop address is set to the IR-IP, which SHOULD
be the same IP address as the advertising router's IP address,
when the NVE/PE originates the route, i.e., when the NVE/PE is not
an ASBR as in section ASBR; see Section 10.2 of [RFC8365].
o
* Route Key [RFC9572] is the "Route Type Specific" NLRI of the Replicator-
AR
Replicator-AR route for which this Leaf Auto-Discovery A-D route is generated.
o
* The AR-LEAF constructs an IP-address-specific route-target, Route Target,
analogously to [I-D.ietf-bess-evpn-bum-procedure-updates], [RFC9572], by placing the IP address carried in the Next-Hop
Next Hop field of the received Replicator-AR route in the Global
Administrator field of the Community, extended community, with the Local
Administrator field of this
Community extended community set to 0, and
setting the Extended Communities attribute of the Leaf Auto-Discovery A-D route
to that Community. extended community. The same IP-address-specific import route-target
Route Target is auto-
configured auto-configured by the AR-REPLICATOR that sent the
Replicator-AR route, in order to control the acceptance of the
Leaf Auto-
Discovery A-D routes.
o
* The Leaf Auto-Discovery A-D route MUST include the PMSI Tunnel
attribute Attribute with the
Tunnel Type set to AR Assisted Replication Tunnel (Section 11), T (AR
role
(Assisted Replication type) set to AR-LEAF AR-LEAF, and the Tunnel Identifier
set to the IP address of the advertising AR-LEAF. The PMSI Tunnel
attribute
Attribute MUST carry a downstream-assigned MPLS label or VNI that
is used by the AR-REPLICATOR to send traffic to the AR-
LEAF. AR-LEAF.
Each AR-enabled node understands and process processes the T (Assisted- (Assisted
Replication type) field in the PMSI Tunnel Attribute (Flags field) of
the routes, routes and MUST signal the corresponding type (AR-REPLICATOR or
AR-LEAF type) according to its administrative choice. An NVE/PE
following this specification is not expected to set the Assisted- Assisted
Replication Type field to decimal 3 (which is a RESERVED value). If
a route with the AR type Assisted Replication Type field set to decimal 3 is
received by an AR-
REPLICATOR AR-REPLICATOR or AR-LEAF, the router will process the
route as a Regular-IR route advertised by an RNVE.
Each node attached to the BD may understand and process the BM/U
flags (Pruned-Flood-Lists (PFLs flags). Note that these BM/U flags may be used to
optimize the delivery of multi-destination traffic and traffic; their use SHOULD
be an administrative choice, choice and independent of the AR role. When the Pruned-Flood-List
PFL capability is enabled, the BM/U flags can be used with the
Regular-IR, Replicator-AR Replicator-AR, and Leaf Auto-
Discovery A-D routes.
Non-optimized Ingress Replication ingress replication NVEs/PEs will be unaware of the new
PMSI Tunnel Attribute flag definition as well as the new Tunnel Type tunnel type
(AR), i.e., non-upgraded NVEs/PEs will ignore the information
contained in the flags Flags field or an unknown Tunnel Type tunnel type (type AR in
this case) for any Inclusive Multicast Ethernet Tag route.
5. Non-Selective Assisted-Replication Non-selective Assisted Replication (AR) Solution Description
Figure 4 illustrates an example Network Virtualization Overlay NVO network where the non-selective
AR function is enabled. Three different roles are defined for a
given BD: AR-REPLICATOR, AR-LEAF AR-LEAF, and RNVE (Regular NVE). RNVE. The solution is called
"non-selective" because the chosen AR-REPLICATOR for a given flow
MUST replicate the BM traffic to all the NVE/PEs NVEs/PEs in the BD except
for the source NVE/PE.
Network Virtualization Overlay NVO tunnels, i.e., IP tunnels, exist among
all the PEs and NVEs in the diagram. The PEs and NVEs in the diagram
have Tenant Systems TSs or Virtual Machines VMs connected to their Attachment
Circuits. ACs.
( )
(_ WAN _)
+---(_ _)----+
| (_ _) |
PE1 | PE2 |
+------+----+ +----+------+
TS1--+ (BD-1) | | (BD-1) +--TS2
|REPLICATOR | |REPLICATOR |
+--------+--+ +--+--------+
| |
+--+----------------+--+
| |
| |
+----+ VXLAN/nvGRE/MPLSoGRE VXLAN/NVGRE/MPLSoGRE +----+
| | IP Fabric | |
| | | |
NVE1 | +-----------+----------+ | NVE3
Hypervisor| TOR ToR | NVE2 |Hypervisor
+---------+-+ +-----+-----+ +-+---------+
| (BD-1) | | (BD-1) | | (BD-1) |
| LEAF | | RNVE | | LEAF |
+--+-----+--+ +--+-----+--+ +--+-----+--+
| | | | | |
VM11 VM12 TS3 TS4 VM31 VM32
Figure 4: Non-Selective Non-selective AR scenario Scenario
In AR BDs BDs, such as BD-1 in the example, Figure 4, BM (Broadcast and Multicast) traffic between two NVEs may
follow a different path than unicast traffic. This solution
recommends the replication of BM traffic through the AR-REPLICATOR
node, whereas unknown/known unicast traffic will be delivered
directly from the source node to the destination node without being
replicated by any intermediate node.
Note that known unicast forwarding is not impacted by this solution,
i.e., unknown unicast traffic SHALL follow the same path as known
unicast traffic.
5.1. Non-selective AR-REPLICATOR Procedures
An AR-REPLICATOR is defined as an NVE/PE capable of replicating
incoming BM traffic received on an overlay tunnel to other overlay
tunnels and local Attachment Circuits. ACs. The AR-REPLICATOR signals its role in the
control plane and understands where the other roles (AR-
LEAF (AR-LEAF nodes, RNVEs
RNVEs, and other AR-REPLICATORs) are located. A given AR-
enabled AR-enabled BD
service may have zero, one one, or more AR-REPLICATORs. In our example
in Figure 4, PE1 and PE2 are defined as AR-REPLICATORs. The
following considerations apply to the AR-REPLICATOR role:
a. The AR-REPLICATOR role SHOULD be an administrative choice in any
NVE/PE that is part of an AR-enabled BD. This administrative
option to enable AR-REPLICATOR capabilities MAY be implemented as
a system level system-level option as opposed to as a per-BD option.
b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY
advertise a Regular-IR route. The AR-REPLICATOR MUST NOT
generate a Regular-IR route if it does not have local attachment
circuits (AC). ACs. If
the Regular-IR route is advertised, the
Assisted-Replication Assisted Replication Type
field of the Regular-IR route MUST be set to zero. 0.
c. The Replicator-AR and Regular-IR routes are generated according
to Section 4. The AR-IP and IR-IP are different IP addresses
owned by the AR-REPLICATOR.
d. When a node defined as an AR-REPLICATOR receives a BM packet on
an overlay tunnel, it will do a tunnel destination IP address
lookup and apply the following procedures:
o
* If the destination IP address is the AR-REPLICATOR IR-IP
Address
address, the node will process the packet normally as
discussed in [RFC7432].
o
* If the destination IP address is the AR-REPLICATOR AR-IP
Address
address, the node MUST replicate the packet to local Attachment
Circuits ACs and
overlay tunnels (excluding the overlay tunnel to the source of
the packet). When replicating to remote AR-
REPLICATORs AR-REPLICATORs, the
tunnel destination IP address will be an IR-
IP. That IR-IP. This will be an indication for
indicate to the remote AR-REPLICATOR that it MUST NOT
replicate to overlay tunnels. The tunnel source IP address
used by the AR-REPLICATOR MUST be its IR-IP when replicating
to AR-REPLICATOR or AR-LEAF nodes.
An AR-REPLICATOR MUST follow a data path implementation compatible
with the following rules:
-
* The AR-REPLICATORs will build a flooding list composed of
Attachment Circuits ACs and
overlay tunnels to remote nodes in the BD. Some of those overlay
tunnels MAY be flagged as non-BM receivers based on the BM flag
received from the remote nodes in the BD.
-
* When an AR-REPLICATOR receives a BM packet on an Attachment
Circuit, AC, it will
forward the BM packet to its flooding list (including local Attachment Circuits ACs
and remote NVE/PEs), NVEs/PEs), skipping the non-BM overlay tunnels.
-
* When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
it will check the destination IP address of the underlay IP header
and:
o
- If the destination IP address matches its IR-IP, the AR-
REPLICATOR will skip all the overlay tunnels from the flooding
list, i.e. i.e., it will only replicate to local Attachment Circuits. ACs. This is the
regular Ingress Replication ingress replication behavior described in [RFC7432].
o
- If the destination IP address matches its AR-IP, the AR-
REPLICATOR MUST forward the BM packet to its flooding list (ACs
and overlay tunnels) tunnels), excluding the non-BM overlay tunnels.
The AR-REPLICATOR will ensure that the traffic is not sent back
to the originating AR-LEAF.
o
- If the encapsulation is MPLSoGRE or MPLSoUDP and the received
BD label that the AR-REPLICATOR advertised in the Replicator-AR
route is not at the bottom of the stack, the AR-REPLICATOR MUST
copy the all the labels below the BD label and propagate them when
forwarding the packet to the egress overlay tunnels.
-
* The AR-REPLICATOR/LEAF nodes will build an Unknown unknown unicast flood-
flooding list composed of Attachment Circuits ACs and overlay tunnels to the IR-IP Addresses
addresses of the remote nodes in the BD. Some of those overlay
tunnels MAY be flagged as non-U (Unknown (unknown unicast) receivers based
on the U flag received from the remote nodes in the BD.
o
- When an AR-REPLICATOR/LEAF receives an unknown unicast packet
on an Attachment Circuit, AC, it will forward the unknown unicast packet to its flood-list,
flooding list, skipping the non-U overlay tunnels.
o
- When an AR-REPLICATOR/LEAF receives an unknown unicast packet
on an overlay tunnel, it will forward the unknown unicast
packet to its local Attachment Circuits ACs and never to an overlay tunnel. This
is the regular Ingress Replication ingress replication behavior described in
[RFC7432].
5.2. Non-Selective Non-selective AR-LEAF Procedures
An AR-LEAF is defined as an NVE/PE that - that, given its poor replication
performance -
performance, sends all the BM traffic to an AR-REPLICATOR that can
replicate the traffic further on its behalf. It MAY signal its AR-
LEAF capability in the control plane and understands where the other
roles are located (AR-REPLICATOR (AR-REPLICATORs and RNVEs). A given service can
have zero, one one, or more AR-LEAF nodes. In Figure 4 shows 4, NVE1 and NVE3
(both residing in hypervisors) acting act as AR-LEAF. AR-LEAF nodes. The following
considerations apply to the AR-LEAF role:
a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE
that is part of an AR-enabled BD. This administrative option to
enable AR-LEAF capabilities MAY be implemented as a system level system-level
option as opposed to as a per-BD option.
b. In this non-selective AR solution, the AR-LEAF MUST advertise a
single Regular-IR inclusive multicast Inclusive Multicast Ethernet Tag route as
described in [RFC7432]. The AR-LEAF SHOULD set the Assisted-Replication Assisted
Replication Type field to AR-
LEAF. AR-LEAF. Note that although this field
does not make any difference
for affect the remote nodes when creating an EVPN
destination to the AR-
LEAF, AR-LEAF, this field is useful for an easy from the
standpoint of ease of operation and troubleshooting of the BD.
c. In a BD where there are no AR-REPLICATORs due to the AR-
REPLICATORs being down or reconfigured, the AR-LEAF MUST use
regular Ingress Replication, ingress replication based on the remote Regular-IR
Inclusive Multicast Routes Ethernet Tag routes as described in
[RFC7432]. This may happen in the following cases:
o
* The AR-LEAF has a list of AR-REPLICATORs for the BD, but it
detects that all the AR-REPLICATORs for the BD are down (via
next-hop tracking in the IGP or any some other detection
mechanism).
o
* The AR-LEAF receives updates from all the former AR-
REPLICATORs containing a non-REPLICATOR AR type in the
Inclusive Multicast Etherner Ethernet Tag routes.
o
* The AR-LEAF never discovered an AR-REPLICATOR for the BD.
d. In a service where there is are one or more AR-REPLICATORs (based on
the received Replicator-AR routes for the BD), the AR-LEAF can
locally select which AR-REPLICATOR it sends the BM traffic to:
o
* A single AR-REPLICATOR MAY be selected for all the BM packets
received on the AR-LEAF attachment circuits (ACs) ACs for a given BD. This selection is
a local decision and it does not have to match other AR-LEAFs'
selections within the same BD.
o
* An AR-LEAF MAY select more than one AR-REPLICATOR and do
either per-flow or per-BD load balancing.
o
* In the case of a failure of the selected AR-REPLICATOR, another
AR-REPLICATOR SHOULD be selected by the AR-LEAF.
o
* When an AR-REPLICATOR is selected for a given flow or BD, the
AR-LEAF MUST send all the BM packets targeted to that AR-
REPLICATOR using the forwarding information given by the
Replicator-AR route for the chosen AR-REPLICATOR, with tunnel
type Tunnel
Type = 0x0A (AR tunnel). The underlay destination IP address
MUST be the AR-IP advertised by the AR-REPLICATOR in the
Replicator-AR route.
o
* An AR-LEAF MAY change the AR-REPLICATOR(s) selection
dynamically, of AR-REPLICATOR(s)
dynamically due to an administrative or policy configuration
change.
o
* AR-LEAF nodes SHALL send service-level BM control plane
packets
packets, following the procedures for regular Ingress Replication procedures. ingress
replication. An example would be IGMP, MLD Multicast Listener
Discovery (MLD), or PIM multicast packets, and and, in
general general, any packets
using link-local scope multicast IPv4 or IPv6 packets. The
AR-REPLICATORs MUST NOT replicate these control plane packets
to other overlay tunnels tunnels, since they will use the regular IR-IP Address.
address.
e. The use of an AR-REPLICATOR-activation-timer (in seconds, with a
default value is of 3) on the AR-LEAF nodes is RECOMMENDED. Upon
receiving a new Replicator-AR route where the AR-REPLICATOR is
selected, the AR-LEAF will run a timer before programming the new AR-
REPLICATOR.
AR-REPLICATOR. In the case of a new newly added AR-REPLICATOR, AR-REPLICATOR or in case the if
an AR-REPLICATOR reboots, this timer will give the AR-REPLICATOR
some time to program the AR-LEAF nodes before the AR-LEAF sends
BM traffic. The AR-REPLICATOR-activation-timer SHOULD be
configurable in seconds, and its value needs to account for the
time it takes for the AR-LEAF Regular-IR inclusive multicast Inclusive Multicast
Ethernet Tag route to get to the AR-REPLICATOR and be programmed.
While the AR-REPLICATOR-
activation-time AR-REPLICATOR-activation-timer is running, the AR-LEAF
node will use regular ingress replication.
f. If the AR-LEAF has selected an AR-REPLICATOR, it is a matter of
local policy whether or not to
change to a new preferred AR-REPLICATOR for the existing BM
traffic flows. flows is a matter of local policy.
An AR-LEAF MUST follow a data path implementation compatible with the
following rules:
-
* The AR-LEAF nodes will build two flood-lists:
1. Flood-list #1 - composed flooding lists:
Flooding list #1: Composed of Attachment Circuits ACs and an AR-
REPLICATOR-set AR-REPLICATOR-set of
overlay tunnels. The AR-REPLICATOR-set is defined as one or
more overlay tunnels to the AR-IP Addresses addresses of the remote AR-REPLICATOR(s) AR-
REPLICATOR(s) in the BD. The selection of more than one AR-REPLICATOR AR-
REPLICATOR is described in point d) item d. above and
it is a local AR-LEAF
decision.
2. Flood-list #2 - composed
Flooding list #2: Composed of Attachment Circuits ACs and overlay tunnels to the
remote IR-IP Addresses.
- addresses.
* When an AR-LEAF receives a BM packet on an Attachment Circuit, AC, it will check the
AR-REPLICATOR-set:
o
- If the AR-REPLICATOR-set is empty, the AR-LEAF MUST send the
packet to flood-list flooding list #2.
o
- If the AR-REPLICATOR-set is NOT empty, the AR-LEAF MUST send
the packet to flood-list flooding list #1, where only one of the overlay
tunnels of the AR-REPLICATOR-set is used.
-
* When an AR-LEAF receives a BM packet on an overlay tunnel, it will
forward the BM packet to its local Attachment Circuits ACs and never to an overlay
tunnel. This is the regular Ingress Replication ingress replication behavior
described in [RFC7432].
-
* AR-LEAF nodes process Unknown unknown unicast traffic in the same way AR-
REPLICATORS do, as described in Section 5.1.
5.3. RNVE Procedures
An RNVE (Regular Network Virtualization Edge node) is defined as an NVE/
PE NVE/PE without AR-REPLICATOR or AR-LEAF
capabilities that does Ingress
Replication ingress replication as described in [RFC7432].
The RNVE does not signal any AR role and is unaware of the AR-REPLICATOR/LEAF AR-
REPLICATOR/LEAF roles in the BD. The RNVE will ignore the Flags flags in
the Regular-IR routes and will ignore the Replicator-AR routes (due
to an unknown tunnel type in the PMSI Tunnel Attribute) and the Leaf Auto-Discovery
A-D routes (due to the IP-address-specific route-target). Route Target).
This role provides EVPN EVPNs with the backwards backward compatibility required in
optimized Ingress Replication ingress replication BDs. In Figure 4 shows 4, NVE2 acts as an
RNVE.
6. Selective Assisted-Replication Assisted Replication (AR) Solution Description
Figure 5 is used to describe the selective AR solution.
( )
(_ WAN _)
+---(_ _)----+
| (_ _) |
PE1 | PE2 |
+------+----+ +----+------+
TS1--+ (BD-1) | | (BD-1) +--TS2
|REPLICATOR | |REPLICATOR |
+--------+--+ +--+--------+
| |
+--+----------------+--+
| |
| |
+----+ VXLAN/nvGRE/MPLSoGRE VXLAN/NVGRE/MPLSoGRE +----+
| | IP Fabric | |
| | | |
NVE1 | +-----------+----------+ | NVE3
Hypervisor| TOR ToR | NVE2 |Hypervisor
+---------+-+ +-----+-----+ +-+---------+
| (BD-1) | | (BD-1) | | (BD-1) |
| LEAF-set1
|LEAF-set-1 | |LEAF-set-1 | |LEAF-set-2 |
+--+-----+--+ +--+-----+--+ +--+-----+--+
| | | | | |
VM11 VM12 TS3 TS4 VM31 VM32
Figure 5: Selective AR scenario Scenario
The solution is called "selective" because a given AR-REPLICATOR MUST
replicate the BM traffic to only the AR-LEAFs that requested the
replication (as opposed to all the AR-LEAF nodes) and MUST replicate
the BM traffic to the RNVEs (if there are any). The same AR roles as
those defined in Section Sections 4 and 5 are used here, however here; however, the
procedures are different.
The Selective selective AR procedures create multiple AR-LEAF-sets in the EVPN
BD,
BD and build single-hop trees among AR-LEAFs of the same set (AR-
LEAF->AR-REPLICATOR->AR-LEAF),
LEAF->AR-REPLICATOR->AR-LEAF) and two-hop trees among AR-LEAFs of
different sets (AR-LEAF->AR-REPLICATOR->AR-REPLICATOR->AR-LEAF).
Compared to the Selective selective solution, the Non-Selective non-selective AR method
assumes that all the AR-LEAFs of the BD are in the same set and
always creates two-hop single-hop trees among AR-LEAFs. While the Selective selective
solution is more efficient than the Non-Selective non-selective solution in multi-
stage IP fabrics, the trade-off is additional signaling and an
additional outer source IP address lookup.
The following sub-sections subsections describe the differences in the procedures
of AR-REPLICATOR/LEAFs
for AR-REPLICATORs/LEAFs compared to the non-selective AR solution.
There is are no change on the changes applicable to RNVEs.
6.1. Selective AR-REPLICATOR Procedures
In our example in Figure 5, PE1 and PE2 are defined as Selective selective AR-
REPLICATORs. The following considerations apply to the Selective selective AR-
REPLICATOR role:
a. The Selective selective AR-REPLICATOR capability role SHOULD be an administrative
choice in any NVE/PE that is part of an Assisted-
Replication-enabled BD, as the AR role itself. AR-enabled BD. This
administrative option MAY be implemented as a system level system-level option
as opposed to as a per-BD option.
b. Each AR-REPLICATOR will build a list of AR-REPLICATOR, AR-LEAF AR-LEAF,
and RNVE nodes. In spite of the 'Selective' "selective" administrative
option, an AR-REPLICATOR MUST NOT behave as a Selective selective AR-
REPLICATOR if at least one of the AR-REPLICATORs has the L flag
NOT set. If at least one AR-REPLICATOR sends a Replicator-AR
route with L=0 L = 0 (in the BD context), the rest of the AR-
REPLICATORs will fall back to non-selective AR mode.
c. The Selective selective AR-REPLICATOR MUST follow the procedures described
in Section 5.1, except for the following differences:
o
* The Replicator-AR route AR-REPLICATOR MUST include L=1 (Leaf Information
Required) in have the L flag set to 1 when
advertising the Replicator-AR route. This flag is used by the
AR-REPLICATORs to advertise their 'selective' AR-
REPLICATOR "selective" AR-REPLICATOR
capabilities. In addition, the AR-REPLICATOR auto-
configures auto-configures
its IP-address-specific import route-target Route Target as described in
the third bullet of the procedures for Leaf Auto-
Discovery route A-D routes in
Section 4.
o
* The AR-REPLICATOR will build a 'selective' "selective" AR-LEAF-set with
the list of nodes that requested replication to its own AR-IP.
For instance, assuming that NVE1 and NVE2 advertise a Leaf Auto-
Discovery A-D
route with PE1's IP-address-specific route-target Route Target and NVE3
advertises a Leaf Auto-Discovery A-D route with PE2's IP-
address-specific route-target, IP-address-specific
Route Target, PE1 will only add NVE1/NVE2 to its selective AR-LEAF-set AR-
LEAF-set for BD-1, BD-1 and exclude NVE3. Likewise, PE2 will only
add NVE3 to its selective AR-LEAF-set for BD-1, BD-1 and exclude
NVE1/NVE2.
o
* When a node defined and operating as a Selective selective AR-REPLICATOR
receives a packet on an overlay tunnel, it will do a tunnel
destination IP lookup lookup, and if the destination IP address is
the AR-REPLICATOR AR-IP Address, address, the node MUST replicate the
packet to:
+ local Attachment Circuits
+ overlay
- Local ACs.
- Overlay tunnels in the Selective selective AR-LEAF-set, excluding the
overlay tunnel to the source AR-LEAF.
+ overlay
- Overlay tunnels to the RNVEs if the tunnel source IP
address is the IR-IP of an AR-LEAF. In any other case, the
AR-REPLICATOR MUST NOT replicate the BM traffic to remote
RNVEs. In other words, only the first-hop selective AR-
REPLICATOR will replicate to all the RNVEs.
+ overlay
- Overlay tunnels to the remote Selective selective AR-REPLICATORs if
the tunnel source IP address (of the encapsulated packet
that arrived on the overlay tunnel) is an IR-IP of its own
AR-LEAF-set. In any other case, the AR-REPLICATOR MUST NOT
replicate the BM traffic to remote AR-REPLICATORs. When
doing this replication, the tunnel destination IP address
is the AR-IP of the remote Selective selective AR-REPLICATOR. The
tunnel destination IP address AR-IP will be an indication for indicate to the
remote Selective selective AR-REPLICATOR that the packet needs
further replication to its AR-LEAFs.
A Selective selective AR-REPLICATOR data path implementation MUST be compatible
with the following rules:
-
* The Selective selective AR-REPLICATORs will build two flood-lists:
1. Flood-list #1 - composed flooding lists:
Flooding list #1: Composed of Attachment Circuits ACs and overlay tunnels to the
remote nodes in the BD, always using the IR-IPs in the tunnel
destination IP addresses.
2. Flood-list #2 - composed
Flooding list #2: Composed of Attachment Circuits, ACs, a Selective
AR-LEAF-set selective AR-LEAF-set, and a Selective
selective AR-REPLICATOR-set, where:
+
- The Selective selective AR-LEAF-set is composed of the overlay tunnels
to the AR-LEAFs that advertise a Leaf Auto-
Discovery A-D route for the
local AR-REPLICATOR. This set is updated with every Leaf Auto-Discovery
A-D route received/
withdrawn received/withdrawn from a new AR-LEAF.
+
- The Selective selective AR-REPLICATOR-set is composed of the overlay
tunnels to all the AR-REPLICATORs that send a Replicator-AR
route with L=1. L = 1. The AR-IP addresses are used as tunnel
destination IP.
- IP addresses.
* Some of the overlay tunnels in the flood-lists flooding lists MAY be flagged
as non-BM receivers based on the BM flag received from the remote
nodes in the routes.
-
* When a Selective selective AR-REPLICATOR receives a BM packet on an
Attachment Circuit, AC, it
MUST forward the BM packet to its flood- flooding list #1, skipping the
non-BM overlay tunnels.
-
* When a Selective selective AR-REPLICATOR receives a BM packet on an overlay
tunnel, it will check the destination and source IPs of the
underlay IP header and:
o
- If the destination IP address matches its AR-IP and the source
IP address matches an IP of its own Selective selective AR-LEAF-set, the
AR-REPLICATOR MUST forward the BM packet to its flood-list flooding list
#2, unless some AR-REPLICATOR within the BD has advertised L=0. L =
0. In the latter case, the node reverts back to non-selective mode Non-selective mode,
and flood-list flooding list #1 MUST be used. Non-BM overlay tunnels are
skipped when sending BM packets.
o
- If the destination IP address matches its AR-IP and the source
IP address does not match any IP address of its Selective selective AR-
LEAF-set, the AR-REPLICATOR MUST forward the BM packet to
flood-list #2 but
flooding list #2, skipping the AR-REPLICATOR-set. Non-BM
overlay tunnels are skipped when sending BM packets.
o
- If the destination IP address matches its IR-IP, the AR-
REPLICATOR MUST use flood-list flooding list #1 but MUST skip all the
overlay tunnels from the flooding list, i.e. i.e., it will only
replicate to local Attachment Circuits. ACs. This is the regular-IR regular ingress
replication behavior described in [RFC7432]. Non-BM overlay
tunnels are skipped when sending BM packets.
-
* In any case, the AR-REPLICATOR ensures that the traffic is not
sent back to the originating source. If the encapsulation is
MPLSoGRE or MPLSoUDP and the received BD label (the label that the AR-
REPLICATOR
AR-REPLICATOR advertised in the Replicator-AR route) is not at the
bottom of the stack, the AR-REPLICATOR MUST copy the rest of the
labels when forwarding them to the egress overlay tunnels.
6.2. Selective AR-LEAF Procedures
A Selective selective AR-LEAF chooses a single Selective selective AR-REPLICATOR per BD
and:
-
* Sends all the BD's BM traffic to that AR-REPLICATOR and
-
* Expects to receive all the BM traffic for a given BD from the same
AR-REPLICATOR (except for the BM traffic from the RNVEs, which
comes directly from the RNVEs)
In the example of in Figure 5, we consider NVE1/NVE2/NVE3 as Selective selective
AR-LEAFs. NVE1 selects PE1 as its Selective selective AR-REPLICATOR. If that
is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other
AR-LEAF/REPLICATORs
AR-LEAFs/REPLICATORs send BM traffic, NVE1 will receive that traffic
from PE1. These are the differences in the behavior of a Selective A selective AR-LEAF compared to and a non-selective AR-LEAF: AR-LEAF behave
differently, as follows:
a. The selective AR-LEAF role selective capability SHOULD be an administrative choice in
any NVE/PE that is part of an Assisted-Replication-
enabled AR-enabled BD. This administrative
option to enable AR-LEAF capabilities MAY be implemented as a system level
system-level option as opposed to as a per-BD option.
b. The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs
in the BD. The Selective selective AR-LEAF MUST advertise a Leaf Auto-
Discovery A-D route
after receiving a Replicator-AR route with L=1. L = 1. It is
RECOMMENDED that the Selective selective AR-LEAF waits wait for a period
specified by an AR-
LEAF-join-wait-timer AR-LEAF-join-wait-timer (in seconds, with a
default value is of 3) before sending the Leaf Auto-Discovery A-D route, so that
the AR-LEAF can collect all the Replicator-AR routes for the BD
before advertising the Leaf Auto-Discovery A-D route. If the Replicator-AR
route with L=1 L = 1 is withdrawn, the corresponding Leaf Auto-
Discovery A-D route
is withdrawn too.
c. In a service where there is more than one Selective AR-REPLICATOR selective AR-
REPLICATOR, the Selective selective AR-LEAF MUST locally select a single Selective AR-
REPLICATOR
selective AR-REPLICATOR for the BD. Once selected:
o
* The Selective selective AR-LEAF MUST send a Leaf Auto-Discovery route A-D route, including
the Route-key route key and IP-address-specific route-target Route Target of the
selected AR-REPLICATOR.
o
* The Selective selective AR-LEAF MUST send all the BM packets received on
the attachment circuits (ACs) ACs for a given BD to that AR-
REPLICATOR.
o AR-REPLICATOR.
* In the case of a failure on of the selected AR-REPLICATOR (detected
when the Replicator-AR route becomes infeasible as the a result of
any of the underlying BGP mechanisms), another AR-
REPLICATOR AR-REPLICATOR
will be selected and a new Leaf Auto-Discovery A-D update will be issued for
the new AR-REPLICATOR. This new route will update the
selective list in the new Selective AR-
REPLICATOR. selective AR-REPLICATOR. In the
case of failure of the active Selective AR-
REPLICATOR, selective AR-REPLICATOR, it is
RECOMMENDED for that the Selective selective AR-LEAF to revert to Ingress Replication ingress
replication behavior for a timer AR-
REPLICATOR-activation-timer an AR-REPLICATOR-activation-timer (in
seconds, with a default value is of 3) to mitigate the traffic
impact. When the timer expires, the
Selective selective AR-LEAF will
resume its AR mode with the new
Selective selective AR-REPLICATOR. The
AR-REPLICATOR-activation-timer MAY be the same configurable
parameter as the parameter discussed in Section 5.2.
o
* A Selective selective AR-LEAF MAY change the AR-REPLICATOR(s) selection
dynamically, of AR-
REPLICATOR(s) dynamically due to an administrative or policy
configuration change.
All the AR-LEAFs in a BD are expected to be configured as either
selective or non-selective. A mix of selective and non-selective AR-
LEAFs SHOULD NOT coexist in the same BD. In case there is If a non-
selective AR-LEAF, non-selective AR-LEAF
is present, its BM traffic sent to a selective AR-REPLICATOR will not
be replicated to other AR-LEAFs that are not in its
Selective AR-LEAF-set. selective AR-
LEAF-set.
A Selective selective AR-LEAF MUST follow a data path implementation compatible
with the following rules:
-
* The Selective selective AR-LEAF nodes will build two flood-lists:
1. Flood-list #1 - composed flooding lists:
Flooding list #1: Composed of Attachment Circuits ACs and the overlay tunnel to the
selected AR-REPLICATOR (using the AR-IP as the tunnel
destination IP address).
2. Flood-list #2 - composed
Flooding list #2: Composed of Attachment Circuits ACs and overlay tunnels to the
remote IR-IP addresses.
-
* Some of the overlay tunnels in the flood-lists flooding lists MAY be flagged
as non-BM receivers based on the BM flag received from the remote
nodes in the routes.
-
* When an AR-LEAF receives a BM packet on an Attachment Circuit, AC, it will check to
see if there an AR-REPLICATOR was selected; if one is any selected AR-REPLICATOR. If there is,
flood-list found, flooding
list #1 MUST be used. Otherwise, flood-list flooding list #2 MUST be used.
Non-BM overlay tunnels are skipped when sending BM packets.
-
* When an AR-LEAF receives a BM packet on an overlay tunnel, it MUST
forward the BM packet to its local Attachment Circuits ACs and never to an overlay
tunnel. This is the regular Ingress Replication ingress replication behavior
described in [RFC7432].
7. Pruned-Flood-Lists (PFL) Pruned Flooding Lists (PFLs)
In addition to AR, the second optimization supported by this the ingress
replication optimization solution specified in this document is the
ability for the of all the BD nodes to signal Pruned-Flood-Lists
(PFL). PFLs. As described in
Section 4, an EVPN node can signal a given value for the BM and U Pruned-Food-Lists
PFLs flags in the Regular-IR,
Replicator-AR Replicator-AR, or Leaf Auto-Discovery A-D routes,
where:
-
* BM is the Broadcast and Multicast flag. BM=1 BM = 1 means "prune-me" "prune me
from the BM flood-list. BM=0 means flooding list". BM = 0 indicates regular behavior.
-
* U is the Unknown flag. U=1 U = 1 means "prune-me" "prune me from the Unknown
flood-list. U=0 means
flooding list". U = 0 indicates regular behavior.
The ability to signal and process these Pruned-Flood-Lists PFLs flags SHOULD be an
administrative choice. If a node is configured to process the Pruned-Flood-Lists PFLs
flags, upon receiving a non-zero
Pruned-Flood-Lists PFLs flag for a route, the an NVE/PE
will add the corresponding flag to the created overlay tunnel in the flood-list.
flooding list. When replicating a BM packet in the context of a flood-list,
flooding list, the NVE/
PE NVE/PE will skip the overlay tunnels marked with
the flag BM=1, BM = 1, since the
NVE/PE NVEs/PEs at the end of those tunnels are
not expecting BM packets. Similarly, when replicating Unknown unknown
unicast packets, the NVE/PE will skip the overlay tunnels marked with U=1.
U = 1.
An NVE/PE not following this document or not configured for this
optimization will ignore any of the received Pruned-Flood-Lists PFLs flags. An AR-LEAF
or RNVE receiving BUM traffic on an overlay tunnel MUST replicate the
traffic to its local Attachment Circuits, ACs, regardless of the BM/U flags on the overlay
tunnels.
This optimization MAY be used along with the Assisted-Replication Assisted Replication
solution.
7.1. A Pruned-Flood-List Example of a Pruned Flooding List
In order to illustrate the use of the solution described in this
document, PFLs solution, we will assume
that BD-1 in Figure 4 is optimized Ingress
Replication ingress replication enabled and:
-
* PE1 and PE2 are administratively configured as AR-REPLICATORs, AR-REPLICATORs due
to their high-performance replication capabilities. PE1 and PE2
will send a Replicator-AR route with BM/U flags = 00.
-
* NVE1 and NVE3 are administratively configured as AR-LEAF nodes, nodes due
to their low-performance software-based replication capabilities.
They will advertise a Regular-IR route with type AR-LEAF.
Assuming that both NVEs advertise all of the attached Virtual
Machines VMs' MAC and
IP addresses in EVPN EVPNs as soon as they come up, up and these NVEs do
not have any Virtual Machines VMs interested in multicast applications, they will
be configured to signal BM/U flags = 11 for BD-1. That is,
neither NVE1 nor NVE3 are is interested in receiving BM or Unknown Unicast traffic unknown
unicast traffic, since:
o
- Their attached VMs (VM11, VM12, VM31, VM32) do not support
multicast applications.
o
- Their attached VMs will not receive ARP Requests. Proxy-ARP
[I-D.ietf-bess-evpn-proxy-arp-nd] Proxy ARP
[RFC9161] on the remote NVE/PEs NVEs/PEs will reply to ARP Requests
locally, and no other Broadcast broadcast traffic is expected.
o
- Their attached VMs will not receive unknown unicast traffic,
since the VMs' MAC and IP addresses are always advertised by
EVPN
EVPNs as long as the VMs are active.
-
* NVE2 is optimized Ingress Replication ingress replication unaware; therefore therefore, it takes
on the RNVE role in BD-1.
Based on the above assumptions assumptions, the following forwarding behavior
will take place:
1. Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1
will then forward further the BM packets on to TS1, the WAN link, PE2 PE2,
and
NVE2, NVE2 but not to NVE3. PE2 and NVE2 will replicate the BM
packets to their local Attachment Circuits ACs, but we will avoid NVE3 will be prevented from
having to replicate unnecessarily those BM packets to VM31 and
VM32. VM32
unnecessarily.
2. Any BM packets received on PE2 from the WAN will be sent to PE1
and NVE2, NVE2 but not to NVE1 and NVE3, sparing the two hypervisors
from replicating unnecessarily to their local Virtual Machines. VMs. PE1 and NVE2
will replicate to their local Attachment Circuits ACs only.
3. Any Unknown unknown unicast packet sent from VM31 will be forwarded by
NVE3 to NVE2, PE1 PE1, and PE2 but not to NVE1. The solution avoids the
prevents unnecessary replication to NVE1, since the destination
of the unknown traffic cannot be at NVE1.
4. Any Unknown unknown unicast packet sent from TS1 will be forwarded by PE1
to the WAN link, PE2 PE2, and NVE2 but not to NVE1 and NVE3, since
the target of the unknown traffic cannot be at those NVEs. NVE1 or NVE3.
8. AR Procedures for Single-IP AR-REPLICATORS
The procedures explained in sections Section Sections 5 and Section 6 assume that the AR-REPLICATOR AR-
REPLICATOR can use two local routable IP addresses to terminate and
originate Network Virtualization Overlay NVO tunnels, i.e. i.e., IR-IP and AR-IP addresses. This is
usually the case for PE-based AR-
REPLICATOR AR-REPLICATOR nodes.
In some cases, the AR-REPLICATOR node does not support more than one
IP address to terminate and originate Network Virtualization Overlay NVO tunnels, i.e. i.e., the IR-IP
and AR-IP are the same IP addresses. This may be the case in some
software-based or low-end AR-REPLICATOR nodes. If this is the case,
the procedures provided in sections Section Sections 5 and
Section 6 MUST be modified in the
following way:
-
* The Replicator-AR routes generated by the AR-REPLICATOR use an AR-
IP that will match its IR-IP. In order to differentiate the data
plane packets that need to use Ingress Replication ingress replication from the
packets that must use Assisted Replication forwarding mode, the
Replicator-AR route MUST advertise a different VNI/VSID than the
one used by the Regular-IR route. For instance, the AR-REPLICATOR
will advertise an AR-VNI along with the Replicator-AR route and IR-
VNI an
IR-VNI along with the Regular-IR route. Since both routes have
the same key, different Route Distinguishers are needed in each
route.
-
* An AR-REPLICATOR will perform Ingress Replication forwarding mode
or Assisted Replication forwarding mode for the incoming Overlay overlay
packets based on an ingress VNI lookup, lookup as opposed to the tunnel IP
DA lookup. Note that, that when replicating to remote AR-REPLICATOR
nodes, the use of the IR-VNI or AR-VNI advertised by the egress
node will determine the whether Ingress Replication forwarding mode or
Assisted Replication forwarding mode is used at the subsequent AR-REPLICATOR. AR-
REPLICATOR.
The rest of the procedures will follow what is those described in sections
Section Sections 5
and Section 6.
9. AR Procedures and EVPN All-Active Multi-homing Multihoming Split-Horizon
This section extends the procedures for the cases where two or more
AR-LEAF nodes are attached to the same Ethernet Segment, ES and two or more AR-REPLICATOR AR-
REPLICATOR nodes are attached to the same Ethernet Segment ES in the BD. The mixed case, that is,
case -- where an AR-LEAF node and an AR-
REPLICATOR AR-REPLICATOR node are attached
to the same Ethernet Segment, ES -- would require extended procedures and it is that are out of scope.
scope for this document.
9.1. Ethernet Segments on AR-LEAF Nodes
If a VXLAN or NVGRE are used, is used and if the Split-horizon split-horizon is based on the
tunnel source IP Source Address address and "Local-Bias" "local bias" as described in [RFC8365],
the Split-horizon split-horizon check will not work if there is an Ethernet-Segment ES is shared between two
AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel source IP Source Address
address of the packets with its own AR-IP.
In order to be compatible with the source IP Source Address address split-horizon
check, the AR-REPLICATOR MAY keep the original received tunnel source
IP
Source Address address when replicating packets to a remote AR-LEAF or RNVE.
This will allow AR-LEAF nodes to apply Split-horizon split-horizon check procedures
for BM packets, packets before sending them to the local Ethernet-Segment. ES. Even if the AR-LEAF's AR-
LEAF's source IP Source Address address is preserved when replicating to AR-LEAFs or
RNVEs, the AR-REPLICATOR MUST always use its IR-IP as the source IP Source Address
address when replicating to other AR-REPLICATORs.
When EVPN is EVPNs are used for MPLS over GRE (or UDP), MPLSoGRE or MPLSoUDP, the ESI-label based ESI-label-based
split-horizon procedure as provided in [RFC7432] will not work for multi-homed
Ethernet-Segments
multihomed ESs defined on AR-LEAF nodes. "Local-Bias" Local bias is recommended
in this case, as it is in the case of a VXLAN or NVGRE as explained
above. The "Local-Bias" local-bias and tunnel source IP Source Address address preservation
mechanisms provide the required split-horizon behavior in non-
selective or selective AR.
Note that if the AR-REPLICATOR implementation keeps the received
tunnel source IP Source Address, address, the use of uRPF (unicast unicast Reverse Path
Forwarding) Forwarding
(uRPF) checks in the IP fabric based on the tunnel source IP Source
Address address
MUST be disabled.
9.2. Ethernet Segments on AR-REPLICATOR nodes Nodes
AR-REPLICATOR nodes attached to the same all-active Ethernet Segment ES will follow "Local-Bias"
local-bias procedures [RFC8365], [RFC8365] as follows:
a. For BUM traffic received on a local AR-REPLICATOR's Attachment
Circuit, "Local-Bias" AC, local-
bias procedures as provided in [RFC8365] MUST be followed.
b. For BUM traffic received on an AR-REPLICATOR overlay tunnel with
AR-IP as the IP Destination Address, "Local-Bias" DA, local bias MUST also be followed. That is,
traffic received with AR-IP as the IP Destination
Address DA will be treated as
though it had been received on a local
Attachment Circuit AC that is part of the Ethernet Segment ES
and will be forwarded to all local Ethernet Segments, ESs, irrespective of their DF
or NDF state.
c. BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-
IP as the IP Destination Address, DA will follow regular [RFC8365]
"Local-Bias" local-bias rules [RFC8365]
and will not be forwarded to local Ethernet
Segments ESs that are shared with the
AR-LEAF or AR-REPLICATOR originating the traffic.
d. In cases where the AR-REPLICATOR supports a single IP address,
the IR-IP and the AR-IP are the same IP address, as discussed in
Section 8. The received BUM traffic will be treated as specified
in 'b' item b above if the received VNI is the AR-VNI, AR-VNI and as
specified in 'c' item c if the VNI is the IR-VNI.
10. Security Considerations
The Security Considerations security considerations in [RFC7432] and [RFC8365] apply to this
document. The Security Considerations security considerations related to the Leaf Auto-
Discovery A-D route
in [I-D.ietf-bess-evpn-bum-procedure-updates] [RFC9572] apply too.
In addition, the Assisted-Replication Assisted Replication method introduced by this
document may bring introduce some new risks for that could affect the
successful delivery of BM traffic. Unicast traffic is not affected
by Assisted-Replication Assisted Replication (although Unknown unknown unicast traffic is affected
by the Pruned-Flood-
Lists procedures). procedures for PFLs). The forwarding of Broadcast and Multicast (BM) BM traffic is
modified, and BM traffic from the AR-LEAF nodes will be
attracted by the existence of drawn toward
AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its
selected AR-REPLICATOR, therefore AR-REPLICATOR; therefore, an attack on the AR-REPLICATOR
could impact the delivery of the BM traffic using that node. Also,
an attack on the AR-REPLICATOR and any change of to the advertised AR
type will modify the selection on selections made by the AR-
LEAF AR-LEAF nodes. If no
other AR-REPLICATOR is selected, the AR-LEAF nodes will be forced to
use Ingress Replication forwarding mode, which will impact on their
performance, since the AR-LEAF nodes are usually NVEs/PEs with poor
replication performance.
This document introduces the ability for of the AR-REPLICATOR to forward
traffic received on an overlay tunnel to another overlay tunnel. The
reader may interpret determine that this introduces the risk of BM loops. That loops --
that is, an AR-LEAF receiving a BM encapsulated BM-encapsulated packet that the AR-LEAF AR-
LEAF originated in the first place, place due to one or two AR-REPLICATORs
"looping" the BM traffic back to the AR-LEAF. The Following the
procedures provided in this document will prevent these BM loops,
since the AR-REPLICATOR will always forward the BM traffic using the
correct tunnel IP Destination
Address DA (or the correct VNI in the case of single-IP AR-REPLICATORs) that AR-
REPLICATORs), which instructs the remote nodes regarding how to
forward the traffic. This is true
in for both the Non-Selective Non-selective and
Selective modes defined in this document. However, a wrong incorrect
implementation of the procedures provided in this document may lead
to those unexpected BM loops.
The Selective mode provides a multi-staged multi-stage replication solution, where a
proper configuration of all the AR-REPLICATORs will avoid prevent any
issues. A mix of mistakenly configured Selective selective and Non-Selective non-selective
AR-REPLICATORs in the same BD could theoretically create packet
duplication in some AR-LEAFs, however AR-LEAFs; however, this document specifies a fall
back
fallback solution -- falling back to Non-Selective Non-selective mode in case cases
where the AR-REPLICATORs advertised an inconsistent AR Replication mode.
This document allows the AR-REPLICATOR to preserve the tunnel source
IP
Source Address address of the AR-LEAF (as an option) when forwarding BM packets
from an overlay tunnel to another overlay tunnel. Preserving the AR-LEAF AR-
LEAF source IP Source Address address makes the "Local Bias" local-bias filtering procedures
possible for AR-LEAF nodes that are attached to the same
Ethernet Segment. ES. If the
AR-REPLICATOR does not preserve the AR-LEAF source IP Source Address, AR-LEAF address, AR-
LEAF nodes attached to all-active Ethernet
Segments ESs will cause packet duplication
on the multi-homed multihomed CE.
The AR-REPLICATOR nodes are, by design, using more bandwidth than
[RFC7432] PEs
[RFC7432] or [RFC8365] NVEs [RFC8365] would use. Certain network events or
unexpected low performance may exceed the AR-REPLICATOR AR-REPLICATOR's local
bandwidth and cause service disruption.
Finally, the use of PFL as in Section 7, PFLs (Section 7) should be handled used with care.
An intentional Intentional or
unintentional misconfiguration of the BDs on a given leaf node may
result in the leaf not receiving the required BM or Unknown unknown unicast
traffic.
11. IANA Considerations
IANA has allocated the following Border Gateway Protocol (BGP)
Parameters:
-
parameters:
* Allocation in the P-Multicast "P-Multicast Service Interface Tunnel (PMSI
Tunnel) Tunnel Types Types" registry:
+=======+=============================+===========+
| Value | Meaning | Reference |
+=======+=============================+===========+
| 0x0A Assisted-Replication | Assisted Replication Tunnel [This document]
- | RFC 9574 |
+-------+-----------------------------+-----------+
Table 1
* Allocations in the P-Multicast "P-Multicast Service Interface (PMSI) Tunnel
Attribute Flags Flags" registry:
+=======+===============================+===========+
| Value | Name | Reference |
+=======+===============================+===========+
| 3-4 Assisted-Replication | Assisted Replication Type (T) [This document] | RFC 9574 |
+-------+-------------------------------+-----------+
| 5 | Broadcast and Multicast (BM) [This document] | RFC 9574 |
+-------+-------------------------------+-----------+
| 6 | Unknown (U) [This document] | RFC 9574 |
+-------+-------------------------------+-----------+
Table 2
12. Contributors
In addition to the names in the front page, the following co-authors
also contributed to this document:
Wim Henderickx
Nokia
Kiran Nagaraj
Nokia
Ravi Shekhar
Juniper Networks
Nischal Sheth
Juniper Networks
Aldrin Isaac
Juniper
Mudassir Tufail
Citibank
13. Acknowledgments
The authors would like to thank Neil Hart, David Motz, Dai Truong,
Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz
for their valuable feedback and contributions. Also thanks to John
Scudder for his thorough review that improved the quality of the
document significantly.
14. References
14.1.
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase
[RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/
BGP IP VPNs", RFC
2119 Key Words", BCP 14, RFC 8174, 6513, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. 10.17487/RFC6513, February
2012, <https://www.rfc-editor.org/info/rfc6513>.
[RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
Encodings and Procedures for Multicast in MPLS/BGP IP
VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,
<https://www.rfc-editor.org/info/rfc6514>.
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[I-D.ietf-bess-evpn-bum-procedure-updates]
Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A.
Sajassi, "Updates on EVPN BUM Procedures", draft-ietf-
bess-evpn-bum-procedure-updates-14 (work in progress),
November 2021.
[RFC7902] Rosen, E. and T. Morin, "Registry and Extensions for
P-Multicast Service Interface Tunnel Attribute Flags",
RFC 7902, DOI 10.17487/RFC7902, June 2016,
<https://www.rfc-editor.org/info/rfc7902>.
[RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in MPLS/
BGP IP VPNs", RFC 6513,
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC6513, February
2012, <https://www.rfc-editor.org/info/rfc6513>. 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
Uttaro, J., and W. Henderickx, "A Network Virtualization
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365,
DOI 10.17487/RFC8365, March 2018,
<https://www.rfc-editor.org/info/rfc8365>.
14.2.
[RFC9572] Zhang, Z., Lin, W., Rabadan, J., Patel, K., and A.
Sajassi, "Updates to EVPN Broadcast, Unknown Unicast, or
Multicast (BUM) Procedures", RFC 9572,
DOI 10.17487/RFC9572, May 2024,
<https://www.rfc-editor.org/info/rfc9572>.
12.2. Informative References
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed.,
"Encapsulating MPLS in IP or Generic Routing Encapsulation
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005,
<https://www.rfc-editor.org/info/rfc4023>.
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
eXtensible Local Area Network (VXLAN): A Framework for
Overlaying Virtualized Layer 2 Networks over Layer 3
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
<https://www.rfc-editor.org/info/rfc7348>.
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed.,
"Encapsulating MPLS in IP or Generic Routing Encapsulation
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005,
<https://www.rfc-editor.org/info/rfc4023>.
[RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
Virtualization Using Generic Routing Encapsulation",
RFC 7637, DOI 10.17487/RFC7637, September 2015,
<https://www.rfc-editor.org/info/rfc7637>.
[I-D.ietf-bess-evpn-proxy-arp-nd]
[RFC9161] Rabadan, J., Ed., Sathappan, S., Nagaraj, K., Hankins, G.,
and T. King, "Operational Aspects of Proxy ARP/ND in
Ethernet Virtual Private Networks", draft-ietf-bess-evpn-proxy-arp-
nd-16 (work in progress), October 2021. RFC 9161,
DOI 10.17487/RFC9161, January 2022,
<https://www.rfc-editor.org/info/rfc9161>.
Acknowledgements
The authors would like to thank Neil Hart, David Motz, Dai Truong,
Thomas Morin, Jeffrey Zhang, Shankar Murthy, and Krzysztof Szarkowicz
for their valuable feedback and contributions. Also, thanks to John
Scudder for his thorough review, which improved the quality of the
document significantly.
Contributors
In addition to the authors listed on the front page, the following
people also contributed to this document and should be considered
coauthors:
Wim Henderickx
Nokia
Kiran Nagaraj
Nokia
Ravi Shekhar
Juniper Networks
Nischal Sheth
Juniper Networks
Aldrin Isaac
Juniper
Mudassir Tufail
Citibank
Authors' Addresses
J.
Jorge Rabadan (editor)
Nokia
777 Middlefield Road
Mountain View, CA 94043
USA
United States of America
Email: jorge.rabadan@nokia.com
S.
Senthil Sathappan
Nokia
Email: senthil.sathappan@nokia.com
W.
Wen Lin
Juniper Networks
Email: wlin@juniper.net
M.
Mukul Katiyar
Versa Networks
Email: mukul@versa-networks.com
A.
Ali Sajassi
Cisco Systems
Email: sajassi@cisco.com