rfc9125.original | rfc9125.txt | |||
---|---|---|---|---|
BESS Working Group A. Farrel | Internet Engineering Task Force (IETF) A. Farrel | |||
Internet-Draft Old Dog Consulting | Request for Comments: 9125 Old Dog Consulting | |||
Intended status: Standards Track J. Drake | Category: Standards Track J. Drake | |||
Expires: January 23, 2022 E. Rosen | ISSN: 2070-1721 E. Rosen | |||
Juniper Networks | Juniper Networks | |||
K. Patel | K. Patel | |||
Arrcus, Inc. | Arrcus, Inc. | |||
L. Jalil | L. Jalil | |||
Verizon | Verizon | |||
July 22, 2021 | August 2021 | |||
Gateway Auto-Discovery and Route Advertisement for Segment Routing | Gateway Auto-Discovery and Route Advertisement for Site Interconnection | |||
Enabled Site Interconnection | Using Segment Routing | |||
draft-ietf-bess-datacenter-gateway-13 | ||||
Abstract | Abstract | |||
Data centers are attached to the Internet or a backbone network by | Data centers are attached to the Internet or a backbone network by | |||
gateway routers. One data center typically has more than one gateway | gateway routers. One data center typically has more than one gateway | |||
for commercial, load balancing, and resiliency reasons. Other sites, | for commercial, load-balancing, and resiliency reasons. Other sites, | |||
such as access networks, also need to be connected across backbone | such as access networks, also need to be connected across backbone | |||
networks through gateways. | networks through gateways. | |||
This document defines a mechanism using the BGP Tunnel Encapsulation | This document defines a mechanism using the BGP Tunnel Encapsulation | |||
attribute to allow data center gateway routers to advertise routes to | attribute to allow data center gateway routers to advertise routes to | |||
the prefixes reachable in the site, including advertising them on | the prefixes reachable in the site, including advertising them on | |||
behalf of other gateways at the same site. This allows segment | behalf of other gateways at the same site. This allows segment | |||
routing to be used to identify multiple paths across the Internet or | routing to be used to identify multiple paths across the Internet or | |||
backbone network between different gateways. The paths can be | backbone network between different gateways. The paths can be | |||
selected for load-balancing, resilience, and quality purposes. | selected for load-balancing, resilience, and quality purposes. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on January 23, 2022. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9125. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 | 2. Requirements Language | |||
3. Site Gateway Auto-Discovery . . . . . . . . . . . . . . . . . 5 | 3. Site Gateway Auto-Discovery | |||
4. Relationship to BGP Link State and Egress Peer Engineering . 7 | 4. Relationship to BGP - Link State and Egress Peer Engineering | |||
5. Advertising a Site Route Externally . . . . . . . . . . . . . 7 | 5. Advertising a Site Route Externally | |||
6. Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . 8 | 6. Encapsulation | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 7. IANA Considerations | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 8. Security Considerations | |||
9. Manageability Considerations . . . . . . . . . . . . . . . . 10 | 9. Manageability Considerations | |||
9.1. Relationship to Route Target Constraint . . . . . . . . . 10 | 9.1. Relationship to Route Target Constraint | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 | 10. References | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 10.1. Normative References | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 11 | 10.2. Informative References | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 11 | Acknowledgements | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
Data centers (DCs) are critical components of the infrastructure used | Data centers (DCs) are critical components of the infrastructure used | |||
by network operators to provide services to their customers. DCs | by network operators to provide services to their customers. DCs | |||
(sites) are interconnected by a backbone network, which consists of | (sites) are interconnected by a backbone network, which consists of | |||
any number of private networks and/or the Internet. DCs are attached | any number of private networks and/or the Internet. DCs are attached | |||
to the backbone network by gateway routers (GWs). One DC typically | to the backbone network by routers that are gateways (GWs). One DC | |||
has more than one GW for various reasons including commercial | typically has more than one GW for various reasons including | |||
preferences, load balancing, or resiliency against connection or | commercial preferences, load balancing, or resiliency against | |||
device failure. | connection or device failure. | |||
Segment Routing (SR) [RFC8402] is a protocol mechanism that can be | Segment Routing (SR) ([RFC8402]) is a protocol mechanism that can be | |||
used within a DC, and also for steering traffic that flows between | used within a DC as well as for steering traffic that flows between | |||
two DC sites. In order for a source site (also known as an ingress | two DC sites. In order for a source site (also known as an ingress | |||
site) that uses SR to load balance the flows it sends to a | site) that uses SR to load-balance the flows it sends to a | |||
destination site (also known as an egress site), it needs to know the | destination site (also known as an egress site), it needs to know the | |||
complete set of entry nodes (i.e., GWs) for that egress DC from the | complete set of entry nodes (i.e., GWs) for that egress DC from the | |||
backbone network connecting the two DCs. Note that it is assumed | backbone network connecting the two DCs. Note that it is assumed | |||
that the connected set of DC sites and the border nodes in the | that the connected set of DC sites and the border nodes in the | |||
backbone network on the paths that connect the DC sites are part of | backbone network on the paths that connect the DC sites are part of | |||
the same SR BGP Link State (LS) instance ([RFC7752] and | the same SR BGP - Link State (LS) instance (see [RFC7752] and | |||
[I-D.ietf-idr-bgpls-segment-routing-epe]) so that traffic engineering | [RFC9086]) so that traffic engineering using SR may be used for these | |||
using SR may be used for these flows. | flows. | |||
Other sites, such as access networks, also need to be connected | Other sites, such as access networks, also need to be connected | |||
across backbone networks through gateways. For illustrative | across backbone networks through gateways. For illustrative | |||
purposes, consider the ingress and egress sites shown in Figure 1 as | purposes, consider the ingress and egress sites shown in Figure 1 as | |||
separate ASes (noting that the sites could be implemented as part of | separate Autonomous Systems (ASes) (noting that the sites could be | |||
the ASes to which they are attached, or as separate ASes). The | implemented as part of the ASes to which they are attached, or as | |||
various ASes that provide connectivity between the ingress and egress | separate ASes). The various ASes that provide connectivity between | |||
sites could each be constructed differently and use different | the ingress and egress sites could each be constructed differently | |||
technologies such as IP, MPLS using global table routing information | and use different technologies such as IP; MPLS using global table | |||
from native BGP, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN. That | routing information from BGP; MPLS IP VPN; SR-MPLS IP VPN; or SRv6 IP | |||
is, the ingress and egress sites can be connected by tunnels across a | VPN. That is, the ingress and egress sites can be connected by | |||
variety of technologies. This document describes how SR identifiers | tunnels across a variety of technologies. This document describes | |||
(SIDs) are used to identify the paths between the ingress and egress | how SR Segment Identifiers (SIDs) are used to identify the paths | |||
sites. | between the ingress and egress sites. | |||
The solution described in this document is agnostic as to whether the | The solution described in this document is agnostic as to whether the | |||
transit ASes do or do not have SR capabilities. The solution uses SR | transit ASes do or do not have SR capabilities. The solution uses SR | |||
to stitch together path segments between GWs and through the ASBRs. | to stitch together path segments between GWs and through the | |||
Thus, there is a requirement that the GWs and ASBRs are SR-capable. | Autonomous System Border Routers (ASBRs). Thus, there is a | |||
The solution supports the SR path being extended into the ingress and | requirement that the GWs and ASBRs are SR capable. The solution | |||
egress sites if they are SR-capable. | supports the SR path being extended into the ingress and egress sites | |||
if they are SR capable. | ||||
The solution defined in this document can be seen in the broader | The solution defined in this document can be seen in the broader | |||
context of site interconnection in | context of site interconnection in [SR-INTERCONNECT]. That document | |||
[I-D.farrel-spring-sr-domain-interconnect]. That document shows how | shows how other existing protocol elements may be combined with the | |||
other existing protocol elements may be combined with the solution | solution defined in this document to provide a full system, but it is | |||
defined in this document to provide a full system, but is not a | not a necessary reference for understanding this document. | |||
necessary reference for understanding this document. | ||||
Suppose that there are two gateways, GW1 and GW2 as shown in | Suppose that there are two gateways, GW1 and GW2 as shown in | |||
Figure 1, for a given egress site and that they each advertise a | Figure 1, for a given egress site and that they each advertise a | |||
route to prefix X which is located within the egress site with each | route to prefix X, which is located within the egress site with each | |||
setting itself as next hop. One might think that the GWs for X could | setting itself as next hop. One might think that the GWs for X could | |||
be inferred from the routes' next hop fields, but typically it is not | be inferred from the routes' next-hop fields, but typically it is not | |||
the case that both routes get distributed across the backbone: rather | the case that both routes get distributed across the backbone: rather | |||
only the best route, as selected by BGP, is distributed. This | only the best route, as selected by BGP, is distributed. This | |||
precludes load balancing flows across both GWs. | precludes load-balancing flows across both GWs. | |||
----------------- --------------------- | ----------------- --------------------- | |||
| Ingress | | Egress ------ | | | Ingress | | Egress ------ | | |||
| Site | | Site |Prefix| | | | Site | | Site |Prefix| | | |||
| | | | X | | | | | | | X | | | |||
| | | ------ | | | | | ------ | | |||
| -- | | --- --- | | | -- | | --- --- | | |||
| |GW| | | |GW1| |GW2| | | | |GW| | | |GW1| |GW2| | | |||
-------++-------- ----+-----------+-+-- | -------++-------- ----+-----------+-+-- | |||
| \ | / | | | \ | / | | |||
skipping to change at page 4, line 30 ¶ | skipping to change at line 165 ¶ | |||
| | ----| |---- | | | | | ----| |---- | | | |||
| | AS1 |ASBR+------+ASBR| AS2 | | | | | AS1 |ASBR+------+ASBR| AS2 | | | |||
| | ----| |---- | | | | | ----| |---- | | | |||
| --------------- -------------------- | | | --------------- -------------------- | | |||
--+-----------------------------------------------+-- | --+-----------------------------------------------+-- | |||
| |ASBR| |ASBR| | | | |ASBR| |ASBR| | | |||
| ---- AS3 ---- | | | ---- AS3 ---- | | |||
| | | | | | |||
----------------------------------------------------- | ----------------------------------------------------- | |||
Figure 1: Example Site Interconnection | Figure 1: Example Site Interconnection | |||
The obvious solution to this problem is to use the BGP feature that | The obvious solution to this problem is to use the BGP feature that | |||
allows the advertisement of multiple paths in BGP (known as Add- | allows the advertisement of multiple paths in BGP (known as Add- | |||
Paths) [RFC7911] to ensure that all routes to X get advertised by | Paths) ([RFC7911]) to ensure that all routes to X get advertised by | |||
BGP. However, even if this is done, the identity of the GWs will be | BGP. However, even if this is done, the identity of the GWs will be | |||
lost as soon as the routes get distributed through an Autonomous | lost as soon as the routes get distributed through an ASBR that will | |||
System Border Router (ASBR) that will set itself to be the next hop. | set itself to be the next hop. And if there are multiple ASes in the | |||
And if there are multiple Autonomous Systems (ASes) in the backbone, | backbone, not only will the next hop change several times, but the | |||
not only will the next hop change several times, but the Add-Paths | Add-Paths technique will experience scaling issues. This all means | |||
technique will experience scaling issues. This all means that the | that the Add-Paths approach is effectively limited to sites connected | |||
Add-Paths approach is effectively limited to sites connected over a | over a single AS. | |||
single AS. | ||||
This document defines a solution that overcomes this limitation and | This document defines a solution that overcomes this limitation and | |||
works equally well with a backbone constructed from one or more ASes | works equally well with a backbone constructed from one or more ASes | |||
using the Tunnel Encapsulation attribute [RFC9012] as follows: | using the Tunnel Encapsulation attribute ([RFC9012]) as follows: | |||
When a GW to a given site advertises a route to a prefix X within | When a GW to a given site advertises a route to a prefix X within | |||
that site, it will include a Tunnel Encapsulation attribute that | that site, it will include a Tunnel Encapsulation attribute that | |||
contains the union of the Tunnel Encapsulation attributes | contains the union of the Tunnel Encapsulation attributes | |||
advertised by each of the GWs to that site, including itself. | advertised by each of the GWs to that site, including itself. | |||
In other words, each route advertised by a GW identifies all of the | In other words, each route advertised by a GW identifies all of the | |||
GWs to the same site (see Section 3 for a discussion of how GWs | GWs to the same site (see Section 3 for a discussion of how GWs | |||
discover each other). I.e., the Tunnel Encapsulation attribute | discover each other), i.e., the Tunnel Encapsulation attribute | |||
advertised by each GW contains multiple Tunnel TLVs, one or more from | advertised by each GW contains multiple Tunnel TLVs, one or more from | |||
each active GW, and each Tunnel TLV will contain a Tunnel Egress | each active GW, and each Tunnel TLV will contain a Tunnel Egress | |||
Endpoint Sub-TLV that identifies the GW for that Tunnel TLV. | Endpoint sub-TLV that identifies the GW for that Tunnel TLV. | |||
Therefore, even if only one of the routes is distributed to other | Therefore, even if only one of the routes is distributed to other | |||
ASes, it will not matter how many times the next hop changes, as the | ASes, it will not matter how many times the next hop changes, as the | |||
Tunnel Encapsulation attribute will remain unchanged. | Tunnel Encapsulation attribute will remain unchanged. | |||
To put this in the context of Figure 1, GW1 and GW2 discover each | To put this in the context of Figure 1, GW1 and GW2 discover each | |||
other as gateways for the egress site. Both GW1 and GW2 advertise | other as gateways for the egress site. Both GW1 and GW2 advertise | |||
themselves as having routes to prefix X. Furthermore, GW1 includes a | themselves as having routes to prefix X. Furthermore, GW1 includes a | |||
Tunnel Encapsulation attribute which is the union of its Tunnel | Tunnel Encapsulation attribute, which is the union of its Tunnel | |||
Encapsulation attribute and GW2's Tunnel Encapsulation attribute. | Encapsulation attribute and GW2's Tunnel Encapsulation attribute. | |||
Similarly, GW2 includes a Tunnel Encapsulation attribute which is the | Similarly, GW2 includes a Tunnel Encapsulation attribute, which is | |||
union of its Tunnel Encapsulation attribute and GW1's Tunnel | the union of its Tunnel Encapsulation attribute and GW1's Tunnel | |||
Encapsulation attribute. The gateway in the ingress site can now see | Encapsulation attribute. The gateway in the ingress site can now see | |||
all possible paths to X in the egress site regardless of which route | all possible paths to X in the egress site regardless of which route | |||
is propagated to it, and it can choose one, or balance traffic flows | is propagated to it, and it can choose one or balance traffic flows | |||
as it sees fit. | as it sees fit. | |||
2. Requirements Language | 2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Site Gateway Auto-Discovery | 3. Site Gateway Auto-Discovery | |||
To allow a given site's GWs to auto-discover each other and to | To allow a given site's GWs to auto-discover each other and to | |||
coordinate their operations, the following procedures are | coordinate their operations, the following procedures are | |||
implemented: | implemented: | |||
o A route target ([RFC4360]) MUST be attached to each GW's auto- | * A route target ([RFC4360]) MUST be attached to each GW's auto- | |||
discovery route (defined below) and its value MUST be set to a | discovery route (defined below), and its value MUST be set to a | |||
value that indicates the site identifier. The rules for | value that indicates the site identifier. The rules for | |||
constructing a route target are detailed in [RFC4360]. It is | constructing a route target are detailed in [RFC4360]. It is | |||
RECOMMENDED that a Type x00 or x02 route target be used. | RECOMMENDED that a Type x00 or x02 route target be used. | |||
o Site identifiers are set through configuration. The site | * Site identifiers are set through configuration. The site | |||
identifiers MUST be the same across all GWs to the site (i.e., the | identifiers MUST be the same across all GWs to the site (i.e., the | |||
same identifier is used by all GWs to the same site), and MUST be | same identifier is used by all GWs to the same site) and MUST be | |||
unique across all sites that are connected (i.e., across all GWs | unique across all sites that are connected (i.e., across all GWs | |||
to all sites that are interconnected). | to all sites that are interconnected). | |||
o Each GW MUST construct an import filtering rule to import any | * Each GW MUST construct an import filtering rule to import any | |||
route that carries a route target with the same site identifier | route that carries a route target with the same site identifier | |||
that the GW itself uses. This means that only these GWs will | that the GW itself uses. This means that only these GWs will | |||
import those routes, and that all GWs to the same site will import | import those routes, and that all GWs to the same site will import | |||
each other's routes and will learn (auto-discover) the current set | each other's routes and will learn (auto-discover) the current set | |||
of active GWs for the site. | of active GWs for the site. | |||
The auto-discovery route that each GW advertises consists of the | The auto-discovery route that each GW advertises consists of the | |||
following: | following: | |||
o An IPv4 or IPv6 Network Layer Reachability Information (NLRI) | * IPv4 or IPv6 Network Layer Reachability Information (NLRI) | |||
[RFC4760] containing one of the GW's loopback addresses (that is, | ([RFC4760]) containing one of the GW's loopback addresses (that | |||
with an AFI/SAFI pair that is one of IPv4/NLRI used for unicast | is, with an AFI/SAFI pair that is one of the following: IPv4/NLRI | |||
forwarding (1/1), IPv6/NLRI used for unicast forwarding (2/1), | used for unicast forwarding (1/1); IPv6/NLRI used for unicast | |||
IPv4/NLRI with MPLS Labels (1/4), or IPv6/NLRI with MPLS Labels | forwarding (2/1); IPv4/NLRI with MPLS Labels (1/4); or IPv6/NLRI | |||
(2/4)). | with MPLS Labels (2/4)). | |||
o A Tunnel Encapsulation attribute [RFC9012] containing the GW's | * A Tunnel Encapsulation attribute ([RFC9012]) containing the GW's | |||
encapsulation information encoded in one or more Tunnel TLVs. | encapsulation information encoded in one or more Tunnel TLVs. | |||
To avoid the side effect of applying the Tunnel Encapsulation | To avoid the side effect of applying the Tunnel Encapsulation | |||
attribute to any packet that is addressed to the GW itself, the | attribute to any packet that is addressed to the GW itself, the | |||
address advertised for auto-discovery MUST be a different loopback | address advertised for auto-discovery MUST be a different loopback | |||
address than is advertised for packets directed to the gateway | address than is advertised for packets directed to the gateway | |||
itself. | itself. | |||
As described in Section 1, each GW will include a Tunnel | As described in Section 1, each GW will include a Tunnel | |||
Encapsulation attribute with the GW encapsulation information for | Encapsulation attribute with the GW encapsulation information for | |||
each of the site's active GWs (including itself) in every route | each of the site's active GWs (including itself) in every route | |||
advertised externally to that site. As the current set of active GWs | advertised externally to that site. As the current set of active GWs | |||
changes (due to the addition of a new GW or the failure/removal of an | changes (due to the addition of a new GW or the failure/removal of an | |||
existing GW) each externally advertised route will be re-advertised | existing GW), each externally advertised route will be re-advertised | |||
with a new Tunnel Encapsulation attribute which reflects the current | with a new Tunnel Encapsulation attribute, which reflects the current | |||
set of active GWs. | set of active GWs. | |||
If a gateway becomes disconnected from the backbone network, or if | If a gateway becomes disconnected from the backbone network, or if | |||
the site operator decides to terminate the gateway's activity, it | the site operator decides to terminate the gateway's activity, it | |||
MUST withdraw the advertisements described above. This means that | MUST withdraw the advertisements described above. This means that | |||
remote gateways at other sites will stop seeing advertisements from | remote gateways at other sites will stop seeing advertisements from | |||
or about this gateway. Note that if the routing within a site is | or about this gateway. Note that if the routing within a site is | |||
broken (for example, such that there is a route from one GW to | broken (for example, such that there is a route from one GW to | |||
another, but not in the reverse direction), then it is possible that | another but not in the reverse direction), then it is possible that | |||
incoming traffic will be routed to the wrong GW to reach the | incoming traffic will be routed to the wrong GW to reach the | |||
destination prefix - in this degraded network situation, traffic may | destination prefix; in this degraded network situation, traffic may | |||
be dropped. | be dropped. | |||
Note that if a GW is (mis)configured with a different site identifier | Note that if a GW is (mis)configured with a different site identifier | |||
from the other GWs to the same site then it will not be auto- | from the other GWs to the same site, then it will not be auto- | |||
discovered by the other GWs (and will not auto-discover the other | discovered by the other GWs (and will not auto-discover the other | |||
GWs). This would result in a GW for another site receiving only the | GWs). This would result in a GW for another site receiving only the | |||
Tunnel Encapsulation attribute included in the BGP best route; i.e., | Tunnel Encapsulation attribute included in the BGP best route, i.e., | |||
the Tunnel Encapsulation attribute of the (mis)configured GW or that | the Tunnel Encapsulation attribute of the (mis)configured GW or that | |||
of the other GWs. | of the other GWs. | |||
4. Relationship to BGP Link State and Egress Peer Engineering | 4. Relationship to BGP - Link State and Egress Peer Engineering | |||
When a remote GW receives a route to a prefix X, it uses the Tunnel | When a remote GW receives a route to a prefix X, it uses the Tunnel | |||
Egress Endpoint Sub-TLVs in the containing Tunnel Encapsulation | Egress Endpoint sub-TLVs in the containing Tunnel Encapsulation | |||
attribute to identify the GWs through which X can be reached. It | attribute to identify the GWs through which X can be reached. It | |||
uses this information to compute SR Traffic Engineering (SR TE) paths | uses this information to compute SR Traffic Engineering (SR TE) paths | |||
across the backbone network looking at the information advertised to | across the backbone network looking at the information advertised to | |||
it in SR BGP Link State (BGP-LS) | it in SR BGP - Link State (BGP-LS) ([RFC9085]) and correlated using | |||
[I-D.ietf-idr-bgp-ls-segment-routing-ext] and correlated using the | the site identity. SR Egress Peer Engineering (EPE) ([RFC9086]) can | |||
site identity. SR Egress Peer Engineering (EPE) | be used to supplement the information advertised in BGP-LS. | |||
[I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement | ||||
the information advertised in BGP-LS. | ||||
5. Advertising a Site Route Externally | 5. Advertising a Site Route Externally | |||
When a packet destined for prefix X is sent on an SR TE path to a GW | When a packet destined for prefix X is sent on an SR TE path to a GW | |||
for the site containing X (that is, the packet is sent in the ingress | for the site containing X (that is, the packet is sent in the ingress | |||
site on an SR TE path that describes the whole path including those | site on an SR TE path that describes the whole path including those | |||
parts that are within the egress site), it needs to carry the | parts that are within the egress site), it needs to carry the | |||
receiving GW's SID for X such that this SID becomes the next SID that | receiving GW's SID for X such that this SID becomes the next SID that | |||
is due to be processed before the GW completes its processing of the | is due to be processed before the GW completes its processing of the | |||
packet. To achieve this, each Tunnel TLV in the Tunnel Encapsulation | packet. To achieve this, each Tunnel TLV in the Tunnel Encapsulation | |||
attribute contains a Prefix-SID sub-TLV [RFC9012] for X. | attribute contains a Prefix-SID sub-TLV ([RFC9012]) for X. | |||
As defined in [RFC9012], the Prefix-SID sub-TLV is only for IPv4/IPV6 | As defined in [RFC9012], the Prefix-SID sub-TLV is only for IPv4/IPV6 | |||
labelled unicast routes, so the solution described in this document | Labeled Unicast routes, so the solution described in this document | |||
only applies to routes of those types. If the use of the Prefix-SID | only applies to routes of those types. If the use of the Prefix-SID | |||
sub-TLV for routes of other types is defined in the future, further | sub-TLV for routes of other types is defined in the future, further | |||
documents will be needed to describe their use for site | documents will be needed to describe their use for site | |||
interconnection consistent with this document. | interconnection consistent with this document. | |||
Alternatively, if MPLS SR is in use and if the GWs for a given egress | Alternatively, if MPLS SR is in use and if the GWs for a given egress | |||
site are configured to allow GWs at remote ingress sites to perform | site are configured to allow GWs at remote ingress sites to perform | |||
SR TE through that egress site for a prefix X, then each GW to the | SR TE through that egress site for a prefix X, then each GW to the | |||
egress site computes an SR TE path through the egress site to X, and | egress site computes an SR TE path through the egress site to X and | |||
places each in an MPLS label stack sub-TLV [RFC9012] in the SR Tunnel | places each in an MPLS Label Stack sub-TLV ([RFC9012]) in the SR | |||
TLV for that GW. | Tunnel TLV for that GW. | |||
Please refer to Section 7 of | Please refer to Section 7 of [SR-INTERCONNECT] for worked examples of | |||
[I-D.farrel-spring-sr-domain-interconnect] for worked examples of how | how the SID stack is constructed in this case and how the | |||
the SID stack is constructed in this case, and how the advertisements | advertisements would work. | |||
would work. | ||||
6. Encapsulation | 6. Encapsulation | |||
If a site is configured to allow remote GWs send packets to the site | If a site is configured to allow remote GWs to send packets to the | |||
in the site's native encapsulation, then each GW to the site will | site in the site's native encapsulation, then each GW to the site | |||
also include multiple instances of a Tunnel TLV for that native | will also include multiple instances of a Tunnel TLV for that native | |||
encapsulation in externally advertised routes: one for each GW and | encapsulation in externally advertised routes: one for each GW. Each | |||
each containing a Tunnel Egress Endpoint sub-TLV with that GW's | Tunnel TLV contains a Tunnel Egress Endpoint sub-TLV with the address | |||
address. A remote GW may then encapsulate a packet according to the | of the GW that the Tunnel TLV identifies. A remote GW may then | |||
rules defined via the sub-TLVs included in each of the Tunnel TLVs. | encapsulate a packet according to the rules defined via the sub-TLVs | |||
included in each of the Tunnel TLVs. | ||||
7. IANA Considerations | 7. IANA Considerations | |||
IANA maintains a registry called "Border Gateway Protocol (BGP) | IANA maintains the "BGP Tunnel Encapsulation Attribute Tunnel Types" | |||
Parameters" with a sub-registry called "BGP Tunnel Encapsulation | registry in the "Border Gateway Protocol (BGP) Tunnel Encapsulation" | |||
Attribute Tunnel Types." The registration policy for this registry | registry. | |||
is First-Come First-Served [RFC8126]. | ||||
IANA previously assigned the value 17 from this sub-registry for "SR | IANA had previously assigned the value 17 from this subregistry for | |||
Tunnel", referencing this document. IANA is now requested to mark | "SR Tunnel", referencing this document as an Internet-Draft. At that | |||
that assignment as deprecated. IANA may reclaim that codepoint at | time, the assignment policy for this range of the registry was "First | |||
such a time that the registry is depleted. | Come First Served" [RFC8126]. | |||
IANA has marked that assignment as deprecated. IANA may reclaim that | ||||
codepoint at such a time that the registry is depleted. | ||||
8. Security Considerations | 8. Security Considerations | |||
From a protocol point of view, the mechanisms described in this | From a protocol point of view, the mechanisms described in this | |||
document can leverage the security mechanisms already defined for | document can leverage the security mechanisms already defined for | |||
BGP. Further discussion of security considerations for BGP may be | BGP. Further discussion of security considerations for BGP may be | |||
found in the BGP specification itself [RFC4271] and in the security | found in the BGP specification itself ([RFC4271]) and in the security | |||
analysis for BGP [RFC4272]. The original discussion of the use of | analysis for BGP ([RFC4272]). The original discussion of the use of | |||
the TCP MD5 signature option to protect BGP sessions is found in | the TCP MD5 signature option to protect BGP sessions is found in | |||
[RFC5925], while [RFC6952] includes an analysis of BGP keying and | [RFC5925], while [RFC6952] includes an analysis of BGP keying and | |||
authentication issues. | authentication issues. | |||
The mechanisms described in this document involve sharing routing or | The mechanisms described in this document involve sharing routing or | |||
reachability information between sites: that may mean disclosing | reachability information between sites, which may mean disclosing | |||
information that is normally contained within a site. So it needs to | information that is normally contained within a site. So it needs to | |||
be understood that normal security paradigms based on the boundaries | be understood that normal security paradigms based on the boundaries | |||
of sites are weakened and interception of BGP messages may result in | of sites are weakened and interception of BGP messages may result in | |||
information being disclosed to third parties. Discussion of these | information being disclosed to third parties. Discussion of these | |||
issues with respect to VPNs can be found in [RFC4364], while | issues with respect to VPNs can be found in [RFC4364], while | |||
[RFC7926] describes many of the issues associated with the exchange | [RFC7926] describes many of the issues associated with the exchange | |||
of topology or TE information between sites. | of topology or TE information between sites. | |||
Particular exposures resulting from this work include: | Particular exposures resulting from this work include: | |||
o Gateways to a site will know about all other gateways to the same | * Gateways to a site will know about all other gateways to the same | |||
site. This feature applies within a site and so is not a | site. This feature applies within a site, so it is not a | |||
substantial exposure, but it does mean that if the BGP exchanges | substantial exposure, but it does mean that if the BGP exchanges | |||
within a site can be snooped or if a gateway can be subverted then | within a site can be snooped or if a gateway can be subverted, | |||
an attacker may learn the full set of gateways to a site. This | then an attacker may learn the full set of gateways to a site. | |||
would facilitate more effective attacks on that site. | This would facilitate more effective attacks on that site. | |||
o The existence of multiple gateways to a site becomes more visible | * The existence of multiple gateways to a site becomes more visible | |||
across the backbone and even into remote sites. This means that | across the backbone and even into remote sites. This means that | |||
an attacker is able to prepare a more comprehensive attack than | an attacker is able to prepare a more comprehensive attack than | |||
exists when only the locally attached backbone network (e.g., the | exists when only the locally attached backbone network (e.g., the | |||
AS that hosts the site) can see all of the gateways to a site. | AS that hosts the site) can see all of the gateways to a site. | |||
For example, a Denial of Service attack on a single GW is | For example, a Denial-of-Service attack on a single GW is | |||
mitigated by the existence of other GWs, but if the attacker knows | mitigated by the existence of other GWs, but if the attacker knows | |||
about all the gateways then the whole set can be attacked at once. | about all the gateways, then the whole set can be attacked at | |||
once. | ||||
o A node in a site that does not have external BGP peering (i.e., is | * A node in a site that does not have external BGP peering (i.e., is | |||
not really a site gateway and cannot speak BGP into the backbone | not really a site gateway and cannot speak BGP into the backbone | |||
network) may be able to get itself advertised as a gateway by | network) may be able to get itself advertised as a gateway by | |||
letting other genuine gateways discover it (by speaking BGP to | letting other genuine gateways discover it (by speaking BGP to | |||
them within the site) and so may get those genuine gateways to | them within the site), so it may get those genuine gateways to | |||
advertise it as a gateway into the backbone network. This would | advertise it as a gateway into the backbone network. This would | |||
allow the malicious node to attract traffic without having to have | allow the malicious node to attract traffic without having to have | |||
secure BGP peerings with out-of-site nodes. | secure BGP peerings with out-of-site nodes. | |||
o An external party intercepting BGP messages anywhere between sites | * An external party intercepting BGP messages anywhere between sites | |||
may learn information about the functioning of the sites and the | may learn information about the functioning of the sites and the | |||
locations of end points. While this is not necessarily a | locations of endpoints. While this is not necessarily a | |||
significant security or privacy risk, it is possible that the | significant security or privacy risk, it is possible that the | |||
disclosure of this information could be used by an attacker. | disclosure of this information could be used by an attacker. | |||
o If it is possible to modify a BGP message within the backbone, it | * If it is possible to modify a BGP message within the backbone, it | |||
may be possible to spoof the existence of a gateway. This could | may be possible to spoof the existence of a gateway. This could | |||
cause traffic to be attracted to a specific node and might result | cause traffic to be attracted to a specific node and might result | |||
in black-holing of traffic. | in traffic not being delivered. | |||
All of the issues in the list above could cause disruption to site | All of the issues in the list above could cause disruption to site | |||
interconnection, but are not new protocol vulnerabilities so much as | interconnection, but they are not new protocol vulnerabilities so | |||
new exposures of information that SHOULD be protected against using | much as new exposures of information that SHOULD be protected against | |||
existing protocol mechanisms such as securing the TCP sessions over | using existing protocol mechanisms such as securing the TCP sessions | |||
which the BGP messages flow. Furthermore, it is a general | over which the BGP messages flow. Furthermore, it is a general | |||
observation that if these attacks are possible then it is highly | observation that if these attacks are possible, then it is highly | |||
likely that far more significant attacks can be made on the routing | likely that far more significant attacks can be made on the routing | |||
system. It should be noted that BGP peerings are not discovered, but | system. It should be noted that BGP peerings are not discovered but | |||
always arise from explicit configuration. | always arise from explicit configuration. | |||
Given that the gateways and ASBRs are connected by tunnels that may | Given that the gateways and ASBRs are connected by tunnels that may | |||
run across parts of the network that are not trusted, data center | run across parts of the network that are not trusted, data center | |||
operators using the approach set out in this network MUST consider | operators using the approach set out in this network MUST consider | |||
using gateway-to-gateway encryption to protect the data center | using gateway-to-gateway encryption to protect the data center | |||
traffic. Additionally, due consideration MUST be given to encrypting | traffic. Additionally, due consideration MUST be given to encrypting | |||
end-to-end traffic as it would be for any traffic that uses a public | end-to-end traffic as it would be for any traffic that uses a public | |||
or untrusted network for transport. | or untrusted network for transport. | |||
9. Manageability Considerations | 9. Manageability Considerations | |||
The principal configuration item added by this solution is the | The principal configuration item added by this solution is the | |||
allocation of a site identifier. The same identifier MUST be | allocation of a site identifier. The same identifier MUST be | |||
assigned to every GW to the same site, and each site MUST have a | assigned to every GW to the same site, and each site MUST have a | |||
different identifier. This requires coordination, probably through a | different identifier. This requires coordination, probably through a | |||
central management agent. | central management agent. | |||
It should be noted that BGP peerings are not discovered, but always | It should be noted that BGP peerings are not discovered but always | |||
arise from explicit configuration. This is no different from any | arise from explicit configuration. This is no different from any | |||
other BGP operation. | other BGP operation. | |||
The site identifiers that are configured and carried in route targets | The site identifiers that are configured and carried in route targets | |||
(see Section 3) are an important feature to ensure that all of the | (see Section 3) are an important feature to ensure that all of the | |||
gateways to a site discover each other. It is, therefore, important | gateways to a site discover each other. Therefore, it is important | |||
that this value is not misconfigured since that would result in the | that this value is not misconfigured since that would result in the | |||
gateways not discovering each other and not advertising each other. | gateways not discovering each other and not advertising each other. | |||
9.1. Relationship to Route Target Constraint | 9.1. Relationship to Route Target Constraint | |||
In order to limit the VPN routing information that is maintained at a | In order to limit the VPN routing information that is maintained at a | |||
given route reflector, [RFC4364] suggests the use of "Cooperative | given route reflector, [RFC4364] suggests that route reflectors use | |||
Route Filtering" [RFC5291] between route reflectors. [RFC4684] | "Cooperative Route Filtering", which was renamed "Outbound Route | |||
defines an extension to that mechanism to include support for | Filtering" and defined in [RFC5291]. [RFC4684] defines an extension | |||
multiple autonomous systems and asymmetric VPN topologies such as | to that mechanism to include support for multiple autonomous systems | |||
hub-and-spoke. The mechanism in RFC 4684 is known as Route Target | and asymmetric VPN topologies such as hub-and-spoke. The mechanism | |||
Constraint (RTC). | in RFC 4684 is known as Route Target Constraint (RTC). | |||
An operator would not normally configure RTC by default for any AFI/ | An operator would not normally configure RTC by default for any AFI/ | |||
SAFI combination, and would only enable it after careful | SAFI combination and would only enable it after careful | |||
consideration. When using the mechanisms defined in this document, | consideration. When using the mechanisms defined in this document, | |||
the operator should consider carefully the effects of filtering | the operator should carefully consider the effects of filtering | |||
routes. In some cases this may be desirable, and in others it could | routes. In some cases, this may be desirable, and in others, it | |||
limit the effectiveness of the procedures. | could limit the effectiveness of the procedures. | |||
10. Acknowledgements | ||||
Thanks to Bruno Rijsman, Stephane Litkowski, Boris Hassanov, Linda | ||||
Dunbar, Ravi Singh, and Daniel Migault for review comments, and to | ||||
Robert Raszuk for useful discussions. Gyan Mishra provided a helpful | ||||
GenArt review, and John Scudder and Benjamin Kaduk made helpful | ||||
comments during IESG review. | ||||
11. References | 10. References | |||
11.1. Normative References | 10.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A | [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A | |||
Border Gateway Protocol 4 (BGP-4)", RFC 4271, | Border Gateway Protocol 4 (BGP-4)", RFC 4271, | |||
DOI 10.17487/RFC4271, January 2006, | DOI 10.17487/RFC4271, January 2006, | |||
<https://www.rfc-editor.org/info/rfc4271>. | <https://www.rfc-editor.org/info/rfc4271>. | |||
skipping to change at page 11, line 47 ¶ | skipping to change at line 509 ¶ | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, | [RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, | |||
"The BGP Tunnel Encapsulation Attribute", RFC 9012, | "The BGP Tunnel Encapsulation Attribute", RFC 9012, | |||
DOI 10.17487/RFC9012, April 2021, | DOI 10.17487/RFC9012, April 2021, | |||
<https://www.rfc-editor.org/info/rfc9012>. | <https://www.rfc-editor.org/info/rfc9012>. | |||
11.2. Informative References | 10.2. Informative References | |||
[I-D.farrel-spring-sr-domain-interconnect] | ||||
Farrel, A. and J. Drake, "Interconnection of Segment | ||||
Routing Sites - Problem Statement and Solution Landscape", | ||||
draft-farrel-spring-sr-domain-interconnect-06 (work in | ||||
progress), May 2021. | ||||
[I-D.ietf-idr-bgp-ls-segment-routing-ext] | ||||
Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., | ||||
and M. Chen, "BGP Link-State extensions for Segment | ||||
Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-18 | ||||
(work in progress), April 2021. | ||||
[I-D.ietf-idr-bgpls-segment-routing-epe] | ||||
Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, | ||||
S., and J. Dong, "BGP-LS extensions for Segment Routing | ||||
BGP Egress Peer Engineering", draft-ietf-idr-bgpls- | ||||
segment-routing-epe-19 (work in progress), May 2019. | ||||
[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", | [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", | |||
RFC 4272, DOI 10.17487/RFC4272, January 2006, | RFC 4272, DOI 10.17487/RFC4272, January 2006, | |||
<https://www.rfc-editor.org/info/rfc4272>. | <https://www.rfc-editor.org/info/rfc4272>. | |||
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February | Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February | |||
2006, <https://www.rfc-editor.org/info/rfc4364>. | 2006, <https://www.rfc-editor.org/info/rfc4364>. | |||
[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, | [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, | |||
skipping to change at page 13, line 22 ¶ | skipping to change at line 558 ¶ | |||
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | |||
Writing an IANA Considerations Section in RFCs", BCP 26, | Writing an IANA Considerations Section in RFCs", BCP 26, | |||
RFC 8126, DOI 10.17487/RFC8126, June 2017, | RFC 8126, DOI 10.17487/RFC8126, June 2017, | |||
<https://www.rfc-editor.org/info/rfc8126>. | <https://www.rfc-editor.org/info/rfc8126>. | |||
[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., | [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., | |||
Decraene, B., Litkowski, S., and R. Shakir, "Segment | Decraene, B., Litkowski, S., and R. Shakir, "Segment | |||
Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, | Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, | |||
July 2018, <https://www.rfc-editor.org/info/rfc8402>. | July 2018, <https://www.rfc-editor.org/info/rfc8402>. | |||
[RFC9085] Previdi, S., Talaulikar, K., Ed., Filsfils, C., Gredler, | ||||
H., and M. Chen, "Border Gateway Protocol - Link State | ||||
(BGP-LS) Extensions for Segment Routing", RFC 9085, | ||||
DOI 10.17487/RFC9085, August 2021, | ||||
<https://www.rfc-editor.org/info/rfc9085>. | ||||
[RFC9086] Previdi, S., Talaulikar, K., Ed., Filsfils, C., Patel, K., | ||||
Ray, S., and J. Dong, "Border Gateway Protocol - Link | ||||
State (BGP-LS) Extensions for Segment Routing BGP Egress | ||||
Peer Engineering", RFC 9086, DOI 10.17487/RFC9086, August | ||||
2021, <https://www.rfc-editor.org/info/rfc9086>. | ||||
[SR-INTERCONNECT] | ||||
Farrel, A. and J. Drake, "Interconnection of Segment | ||||
Routing Sites - Problem Statement and Solution Landscape", | ||||
Work in Progress, Internet-Draft, draft-farrel-spring-sr- | ||||
domain-interconnect-06, 19 May 2021, | ||||
<https://datatracker.ietf.org/doc/html/draft-farrel- | ||||
spring-sr-domain-interconnect-06>. | ||||
Acknowledgements | ||||
Thanks to Bruno Rijsman, Stephane Litkowski, Boris Hassanov, Linda | ||||
Dunbar, Ravi Singh, and Daniel Migault for review comments, and to | ||||
Robert Raszuk for useful discussions. Gyan Mishra provided a helpful | ||||
GenArt review, and John Scudder and Benjamin Kaduk made helpful | ||||
comments during IESG review. | ||||
Authors' Addresses | Authors' Addresses | |||
Adrian Farrel | Adrian Farrel | |||
Old Dog Consulting | Old Dog Consulting | |||
Email: adrian@olddog.co.uk | Email: adrian@olddog.co.uk | |||
John Drake | John Drake | |||
Juniper Networks | Juniper Networks | |||
End of changes. 73 change blocks. | ||||
199 lines changed or deleted | 197 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |