Internet Engineering Task Force (IETF) J. UttaroInternet-DraftRequest for Comments: 9494 Independent Contributor Updates: 6368(if approved)E. ChenIntended status:Category: Standards Track Palo Alto NetworksExpires: 13 January 2024ISSN: 2070-1721 B. Decraene Orange J.G.Scudder Juniper Networks12 JulyNovember 2023Support for Long-lived BGPLong-Lived Graceful Restartdraft-ietf-idr-long-lived-gr-06for BGP AbstractIn this document, we introduceThis document introduces anewBGP capabilitytermed "Long- livedcalled the "Long-Lived Graceful Restart Capability"so(or "LLGR Capability"). The benefit of this capability is that stale routes can be retained for a longer time upon session failure than is provided for by BGP Graceful Restart(RFC(as described in RFC 4724). A well-known BGP community called "LLGR_STALE" is introduced for marking stale routes retained for a longer time. A second well-known BGPcommunity, "NO_LLGR",community called "NO_LLGR" is introducedto markfor marking routes for which these procedures should not be applied. We also specify that such long-lived stale routes be treated as theleast-preferred,least preferred and that their advertisements be limited to BGP speakers that have advertised thenewcapability. Use of this extension is not advisable in all cases, and we provide guidelines to help determine if it is. This memo updates RFC 6368 by specifying that the LLGR_STALE community must be propagated into, or out of, the path attributes exchanged betweenPEthe Provider Edge (PE) andCE.Customer Edge (CE) routers. Status of This Memo ThisInternet-Draftissubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsan Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved fora maximumpublication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841. Information about the current status ofsix monthsthis document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 13 January 2024.https://www.rfc-editor.org/info/rfc9494. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(https://trustee.ietf.org/ license-info)(https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 42. Terminology 2.1. Definitions. . . . . . . . . . . . . . . . . . . . . . . . . 42.2. Abbreviations 2.3. Requirements Language 3. Protocol Extensions. . . . . . . . . . . . . . . . . . . . . 53.1.Long-livedLong-Lived Graceful Restart Capability. . . . . . . . . 53.2. LLGR_STALE Community. . . . . . . . . . . . . . . . . . 73.3. NO_LLGR Community. . . . . . . . . . . . . . . . . . . . 74. Theory of Operation. . . . . . . . . . . . . . . . . . . . . 74.1. Use of the Graceful Restart Capability. . . . . . . . . . . 84.2. Session Resets. . . . . . . . . . . . . . . . . . . . . 84.3. Processing LLGR_STALE Routes. . . . . . . . . . . . . . 104.4. Route Selection. . . . . . . . . . . . . . . . . . . . . 114.5. Errors. . . . . . . . . . . . . . . . . . . . . . . . . 114.6. Optional Partial Deployment Procedure. . . . . . . . . . 114.7. ProcedureswhenWhen BGPisIs the PE-CE Protocol in a VPN. . . 124.7.1. ProcedureswhenWhen EBGPisIs the PE-CE Protocol in a VPN. . . . . . . . . . . . . . . . . . . . . . . . . 124.7.2. ProcedureswhenWhen IBGPisIs the PE-CE Protocol in a VPN. . . . . . . . . . . . . . . . . . . . . . . . . 135. Deployment Considerations. . . . . . . . . . . . . . . . . . 135.1. When BGPisIs the PE-CE Protocol in a VPN. . . . . . . . . 155.2. Risks of Depreferencing Routes. . . . . . . . . . . . . 156. Security Considerations. . . . . . . . . . . . . . . . . . . 177. Examples of Operation. . . . . . . . . . . . . . . . . . . . 188.Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 10.IANA Considerations. . . . . . . . . . . . . . . . . . . . . 21 11.9. References. . . . . . . . . . . . . . . . . . . . . . . . . 22 11.1.9.1. Normative References. . . . . . . . . . . . . . . . . . 22 11.2.9.2. Informative References. . . . . . . . . . . . . . . . . 23Acknowledgements Contributors Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . 241. IntroductionHistorically, routingRouting protocols in general, and BGP in particular, have historically been designed with a focus oncorrectness,"correctness", where a key part of"correctness"correctness is for each network element's forwarding state to convergetowardto the current state of the network as quickly as possible. For this reason, the protocol was designed to remove state advertised by routers that went down (from a BGP perspective) as quickly as possible. Over time, this has been relaxed somewhat, notably by BGP Graceful Restart (GR) [RFC4724]; however, the paradigm has remained one of attempting to rapidly remove"stale"stale state from the network. Over time, two phenomena have arisen that call into question the underlying assumptions of this paradigm. 1. Thefirst is thewidespread adoption of tunneled forwardinginfrastructures, forinfrastructures (for example,MPLS.MPLS). Such infrastructures eliminate the risk of some types of forwarding loops that can arise in hop-by-hopforwarding and thusforwarding; thus, they reduce one of the motivations for strong consistency between forwarding elements. 2. Thesecond is theincreasing use of BGP as a transport for datawhichthat is less closely associated with packet forwarding than was originally the case. Examples include the use of BGP forautodiscovery (VPLSauto-discovery (Virtual Private LAN Service (VPLS) [RFC4761]) and filter programming(FLOWSPEC(Flow Specification (FLOWSPEC) [RFC8955]). In these cases, BGP data takes on a character more akin to configuration than totraditionalconventional routing. The observations above motivate a desire to offer network operators the ability to choose to retain BGP data for a longer period than has hitherto been possible when the BGP control plane fails for some reason. Although the semantics of BGP Graceful Restart [RFC4724] are close to those desired, several gaps exist, most notably in the maximum time for which"stale"stale information can beretained --retained: Graceful Restart imposes a 4095-second upper bound. In this document, we introduce anewBGP capabilitytermedcalled the "Long-livedLived Graceful RestartCapability" soCapability". The goal of this capability is that stale information can be retained for a longer time across a session reset. We also introduce twonewBGP well-knowncommunities, "LLGR_STALE",communities: * LLGR_STALE to mark such information, and"NO_LLGR",* NO_LLGR to indicate that these procedures should not be applied to the marked route. Long-lived stale information is to be treated asleast-preferred,least preferred, and its advertisement limited to BGP speakers that support thenewcapability. Where possible, we reference the semantics of BGP Graceful Restart [RFC4724] rather than specifying similar semantics in this document. The expected deployment model for this extension is that it will only be invoked for certain address families. This is discussed in more detail inthe Deployment Considerations section (Section 5). When used, itsSection 5. The use of this extension may be combined with that oftraditionalconventional GracefulRestart,Restart; inwhich casesuch a case, it is invokedonlyafter thetraditionalconventional Graceful Restart interval haselapsed, or it may beelapsed. When not combined, LLGR is invoked immediately. Apart from the potential to greatly extend the timer, the most obvious difference betweenLong-LivedLLGR andtraditionalconventional Graceful Restart is that inthe Long-Lived version,LLGR, routes are"depreferenced","depreferenced"; that is, they are treated asleast-preferred, whereasleast preferred. Contrarily, inthe traditional version,conventional GR, route preference is not affected. The design choice to treatLong-Lived Stalelong-lived stale routes asleast-preferredleast preferred was informed by the expectation that they might be retained fora(potentially) an almost unbounded period oftime, whereastime; whereas, in thetraditionalconventional Graceful Restart case, stale routes are retained for only a brief interval. In the case of GracefulRestart case,Restart, thetradeofftrade- off between advertising new route status (at the cost of routing churn) and not advertising it (at the cost of suboptimal or incorrect route selection) is resolved in favor of not advertising. In theLLGR case,case of LLGR, it is resolved in favor of advertising new state,andusing stale information only as a last resort. Section 7 provides some simple examples illustrating the operation of this extension.1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.2. Terminology 2.1. DefinitionsCE: A Customer Edge router. [RFC4364] Depreference, Depreferenced:Depreference: A route is said to be depreferenced if it has its route selection preference reduced in reaction to some event.EoR: Marker for End-of-RIB, defined in [RFC4724] Section 2. GR: Abbreviation for "Graceful Restart" [RFC4724], also sometimesHelper: Sometimes referred tohereinas"conventional Graceful Restart" or "conventional GR" to distinguish it from the "Long-lived Graceful Restart" defined by this document. Helper: Or"helper router". During Graceful Restart orLong-livedLong-Lived Graceful Restart, the router that detects a session failure and applies the listed procedures. [RFC4724] refers to this as the "receiving speaker".LLGR: Abbreviation for "Long-lived Graceful Restart". LLST: Abbreviation for "Long-lived Stale Time". PE: A Provider Edge router. [RFC4364]Route:We useIn this document, "route"to meanmeans any information encoded asaBGPNLRINetwork Layer Reachability Information (NLRI) and a set of path attributes. As discussed above, the connection between such routes and the installation of forwarding state may be quite remote. Further note that, for brevity, in this document when we reference conventional Graceful Restart, we cite its base specification, [RFC4724]. That specification has been updated by [RFC8538]. The citation to [RFC4724] is not intended to be limiting. 2.2. Abbreviations CE: Customer Edge (See [RFC4364] for more information on Customer Edge routers.) EoR: End-of-RIB (See Section 2 of [RFC4724] for more information on End-of-RIB markers.) GR: Graceful Restart (See [RFC4724] for more information on GR.) This term is also sometimes referred to herein as "conventional Graceful Restart" or "conventional GR" to distinguish it from the "Long-Lived Graceful Restart" or "LLGR" defined by this document. LLGR: Long-Lived Graceful Restart LLST: Long-Lived Stale Time PE: Provider Edge (See [RFC4364] for more information on Provider Edge routers.) VRF: VPN Routing and Forwardingtable.(See [RFC4364] for more information on VRF tables.) 2.3. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Protocol Extensions AnewBGP capability and twonewBGP communities areintroduced.introduced in the subsections that follow. 3.1.Long-livedLong-Lived Graceful Restart Capability The"Long-lived"Long-Lived Graceful Restart Capability", or "LLGRCapability"Capability", (value: 71) is a BGP capability [RFC5492] that can be used by a BGP speaker to indicate its ability to preserve its state according to the procedures of this document.ThisIf the LLGR capabilityMUST be advertised in conjunction withis advertised, the Graceful Restart capability[RFC4724],[RFC4724] MUST also be advertised; seethe "Use of Graceful Restart Capability" section (Section 4.1).Section 4.1. The capability value consists of zero or more tuples <AFI, SAFI, Flags,Long-lived Stale Time>LLST> as follows: +--------------------------------------------------+ | Address Family Identifier (16 bits) | +--------------------------------------------------+ | Subsequent Address Family Identifier (8 bits) | +--------------------------------------------------+ | Flags for Address Family (8 bits) | +--------------------------------------------------+ |Long-livedLong-Lived Stale Time (24 bits) | +--------------------------------------------------+ | ... | +--------------------------------------------------+ | Address Family Identifier (16 bits) | +--------------------------------------------------+ | Subsequent Address Family Identifier (8 bits) | +--------------------------------------------------+ | Flags for Address Family (8 bits) | +--------------------------------------------------+ |Long-livedLong-Lived Stale Time (24 bits) | +--------------------------------------------------+ The meaning of the fields are as follows: Address Family Identifier (AFI), Subsequent Address Family Identifier (SAFI): The AFI and SAFI, taken in combination, indicate that the BGP speaker has the ability to preserve its forwarding state for the address family during a subsequent BGP restart. Routes may be either: * explicitly associated with a particular AFI and SAFI if using the encodingof [RFC4760]described in [RFC4760], or * implicitly associated with <AFI=IPv4, SAFI=Unicast> if using the encodingofdescribed in [RFC4271]. Flags for Address Family: This field contains bit flags relating to routes that were advertised with the given AFI and SAFI. 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |F| Reserved | +-+-+-+-+-+-+-+-+ The most significant bit is used to indicate whether the state for routes that were advertised with the given AFI and SAFI has indeed been preserved during the previous BGP restart. When set (value 1), the bit indicates that the state has been preserved. This bit is called the "F bit" since it was historically used to indicate the preservation ofForwarding State.forwarding state. Use of the F bit is detailed inthe Session Resets section (Section 4.2).Section 4.2. The remaining bits are reserved and MUST be set to zero by the sender and ignored by the receiver.Long-livedLong-Lived Stale Time: This time (in seconds) specifies how long stale information (for this AFI/SAFI) may be retained by the receiver (in additionwithto the period specified by the "Restart Time" in the Graceful Restart Capability). Because the potential use cases for this extension vary widely, there is no suggested default value for the LLST. 3.2. LLGR_STALE Community The well-known BGP community[RFC1997] "LLGR_STALE"LLGR_STALE (value: 0xFFFF0006) can be used to mark stale routes retained for a longer period oftime.time (see [RFC1997] for more information on BGP communities). Such long-lived stale routes are to be handled according to the procedures specified inthe Theory of Operation section (Section 4).Section 4. An implementation MAY allow users to configure policies that accept, reject, or modify routes based on the presence or absence of this community. 3.3. NO_LLGR Community The well-known BGP community"NO_LLGR"NO_LLGR (value: 0xFFFF0007) can be used to mark routes that a BGP speaker does not want to be treated according to these procedures, as detailed inthe Operation section (Section 4).Section 4. An implementation MAY allow users to configure policies that accept, reject, or modify routes based on the presence or absence of this community. 4. Theory of Operation IfAa BGP speaker is configured to support the procedures of this document, it MUST use BGP Capabilities Advertisement [RFC5492] to advertise the"Long-livedLong-Lived Graceful RestartCapability".Capability. The setting of the parameters for an AFI/SAFI depends on the properties of the BGP speaker, network scale, and local configuration. In the presence of the"Long-livedLong-Lived Graceful RestartCapability",Capability, the procedures specified in [RFC4724]and [RFC8538]continue to apply unless explicitly revised by this document. 4.1. Use of the Graceful Restart CapabilityTheIf the LLGR Capability is advertised, the Graceful Restart capability MUST also beadvertised in conjunction with the LLGR capability.advertised. If it is not so advertised, the LLGRcapabilityCapability MUST be disregarded. The purpose for mandatingthat both be used in conjunctionthis is to enable the reuse of certain base mechanisms that are common to both"flavors", notably"flavors" notably: origination, collection, and processing ofEoR,EoR as well as thefinite state machinefinite-state-machine modifications andconnection resetconnection-reset logic introduced by GR. We observethatthat, if support for conventional Graceful Restart is not desired for the session, the conventional GR phase can be skipped by omitting allAFI/SAFIAFIs/SAFIs from the GRcapability,Capability, advertising a Restart Time of zero, or both.The Session Resets section (Section 4.2)Section 4.2 discusses the interaction of conventional andlong-lived GR.LLGR. 4.2. Session Resets BGP Graceful Restart[RFC4724], updated by [RFC8538],[RFC4724] defines conditions under which a BGP session can reset and have its associated routes retained. If such a reset occurs for a sessionforin which the LLGR Capability has also been exchanged, the following proceduresapply.apply: * If the Graceful Restart Capability that was received does not list allAFI/SAFIAFIs/SAFIs supported by the session, thenfor those non-listed AFI/ SAFIthe GR"Restart Time"Restart Time shall be deemedzero.zero for those AFIs/SAFIs that are not listed. * Similarly, if the received LLGR Capability does not list allAFI/SAFIAFIs/ SAFIs supported by the session, thenfor those non-listed AFI/SAFIthe"Long-livedLong-Lived StaleTime"Time shall be deemedzero.zero for those AFIs/SAFIs that are not listed. The following text in Section 4.2 ofthe GR specification[RFC4724] no longer applies: | If the session does not get re-established within the "Restart | Time" that the peer advertised previously, the Receiving Speaker | MUST delete all the stale routes from the peer that it is | retaining. and the following procedures are specified instead: After the session goes down, and before the session is re- established, the stale routes for an AFI/SAFI MUST be retained. The interval for which they are retained is limited by the sum of the"Restart Time"Restart Time in the received Graceful Restart Capability and the"Long-livedLong-Lived StaleTime"Time in the receivedLong-livedLong-Lived Graceful Restart Capability. The timers received in theLong-livedLong-Lived Graceful Restart Capability SHOULD be modifiable by local configuration, which may imposeeitheran upperorbound, a lower bound, orboth,both on their respective values. If the value of the"Restart Time"Restart Time or the"Long-livedLong-Lived StaleTime"Time is zero, the duration of the corresponding period would be zero seconds. For example, if the"Restart Time"Restart Time is zero and the"Long-livedLong-Lived StaleTime"Time is nonzero, only the procedures particular to LLGR would apply. Conversely, if the"Long-livedLong-Lived StaleTime"Time is zero and the"Restart Time"Restart Time is nonzero, only the procedures of GR would apply. If both are zero, none of these procedures would apply, only those of the base BGP specification [RFC4271] (although EoR would still be used as detailed in [RFC4724]). And finally, if both are nonzero, then the procedures would be appliedserially --serially: first those ofGR,GR and then those of LLGR.We observe that duringDuring the first interval, we observe that, while the procedures of GR are in effect, route preference would not be affected. During the second interval, while LLGR procedures are in effect, routes would be treated asleast-preferredleast preferred as specified elsewhere in this document. Once the"Restart Time"Restart Time period ends (including the casethatin which the"Restart Time"Restart Time is zero), the LLGR period is said to have begun and the following procedures MUST be performed: * For each AFI/SAFI for which it has received a nonzero"Long-livedLong-Lived StaleTime",Time, the helper router MUST start a timer for that"Long- livedLong- Lived StaleTime".Time. If the timer for the"Long-livedLong-Lived StaleTime"Time for a given AFI/SAFI expires before the session is re-established, the helper MUST delete all stale routes of that AFI/SAFI from the neighbor that it is retaining. * The helper router MUST attach the LLGR_STALE community to the stale routes being retained. Note that this requirement implies that the routes would need to bereadvertised,readvertised in order to disseminate the modified community. * If any of the routes from the peer have been marked with the NO_LLGR community, either as sent by thepeer,peer or as the result of a configured policy, they MUST NOT beretained, butretained and MUST be removed as per the normal operation of [RFC4271]. * The helper router MUST perform the procedures listedunderin Section 4.3. Once the session is re-established, the procedures specified in [RFC4724] apply for the stale routes irrespective of whether the stale routes are retained during the"Restart Time"Restart Time period or the"Long-livedLong- Lived StaleTime"Time period. However, in the case of consecutive restarts, the previously marked stale routes MUST NOT be deleted before the timer for the"Long-livedLong-Lived StaleTime"Time expires.SimilarlySimilar to [RFC4724], once thesessionLLGR Period begins, the Helper MUST immediately remove all the stale routes from the peer that it isre-established,retaining for that address family if any of the following occur: * the F bit for a specific address family is not set in the newly received LLGR Capability, orif* a specific address family is not included in the newly received LLGR Capability, orif* the LLGR and accompanying GR Capability are not received in the re-established session atall, then the Helper MUST immediately remove all the stale routes from the peer that it is retaining for that address family.all. If a"Long-livedLong-Lived StaleTime"Time timer is running for routes with a given AFI/SAFI received from a peer, it MUST NOT be updated (other than by manual operator intervention) until the peer has established and synchronized a new session. The session is termed "synchronized" for a given AFI/SAFI once the EoR for that AFI/SAFI has been received from thepeer,peer or once the Selection_Deferral_Timer discussed in [RFC4724] expires. The value of a"Long-livedLong-Lived StaleTime"Time in the capability received from a neighbor MAY be reduced by local configuration. While the session is down, the expiration of a"Long-livedLong-Lived StaleTime"Time timer is treated analogously to the expiration of the"Restart Time"Restart Time timer inGraceful Restart,[RFC4724], other than applying only to theAFI/ SAFIAFI/SAFI it accompanies. However, the timer continues to run once the session has re-established. The timer is neither stopped nor updated until the EoR marker is received for the relevant AFI/SAFI from the peer. If the timer expires during synchronization with the peer, any stale routes that the peer has notrefreshed,refreshed are removed. If the session subsequently resets prior to becoming synchronized, any remaining routes (for the AFI/SAFI whose LLST timer expired) MUST be removed immediately. 4.3. Processing LLGR_STALE Routes A BGP speaker that has advertised the"Long-livedLong-Lived Graceful RestartCapability"Capability to a neighbor MUST perform the following upon receiving a route from that neighbor with the"LLGR_STALE" community,LLGR_STALE community or upon attaching the"LLGR_STALE"LLGR_STALE community itself per Section 4.2: * Treat the route as theleast-preferredleast preferred in route selection (see below). Seethe Risks of Depreferencing Routes section (Section 5.2)Section 5.2 for a discussion of potential risks inherent in doing this. * The route SHOULD NOT be advertised to any neighbor from which theLong-livedLong-Lived Graceful Restart Capability has not been received. The exception is described inthe Optional Partial Deployment Procedure section (Section 4.6).Section 4.6. Note that this requirement implies that such routes should be withdrawn from any such neighbor. * The"LLGR_STALE"LLGR_STALE community MUST NOT be removed when the route is further advertised. 4.4. Route Selection A"least-preferred"least preferred route MUST be treated as less preferred than any other route that is not alsoleast-preferred.least preferred. When performing route selection between two routes when bothof whichareleast-preferred,least preferred, normaltie-breakingtiebreaking applies. Note that this would only be expected to happen if the only routes available for selection wereleast- preferred --least preferred; in all other cases, such routes would have been eliminated from consideration. 4.5. Errors If the LLGRcapabilityCapability is received without an accompanying GRcapability,Capability, the LLGRcapabilityCapability MUST be ignored, that is, the implementation MUST behave as though no LLGRcapability hadCapability has been received. 4.6. Optional Partial Deployment Procedure Ideally, all routers in an Autonomous System (AS) would support this specification before itwaswere enabled. However, to facilitate incremental deployment, stale routes MAY be advertised to neighbors that have not advertised theLong-livedLong-Lived Graceful Restart Capability under the following conditions: * The neighbors MUST be internal(IBGP(Internal BGP (IBGP) or Confederation) neighbors. * The NO_EXPORT community [RFC1997] MUST be attached to the stale routes. * The stale routes MUST have their LOCAL_PREF set to zero. Seethe Risks of Depreferencing Routes section (Section 5.2)Section 5.2 for a discussion of potential risks inherent in doing this. If this strategy for partial deployment is used, the network operator should set the LOCAL_PREF to zero for all long-lived stale routes throughout the Autonomous System. This trades off a small reduction in flexibility (ordering may not be preserved between competing long- lived stale routes) for consistency between routers that do, and do not, support this specification. Since the consistency of route selection can be important for preventing forwarding loops, the latter consideration dominates. 4.7. ProcedureswhenWhen BGPisIs the PE-CE Protocol in a VPN 4.7.1. ProcedureswhenWhen EBGPisIs the PE-CE Protocol in a VPN In VPNdeployments, for example [RFC4364], EBGPdeployments (for example, [RFC4364]), External BGP (EBGP) is often used as a PE-CE protocol. It may be a practical necessity in such deployments to accommodate interoperation with peer routers that cannot easily be upgraded to support specifications such as this one. This leads to a problem: the procedures defined elsewhere in thisspecification, we take pains to ensure that "stale" routing information will not leak beyond the perimeter of routersdocument generally prevent LLGR stale routes from being sent across EBGP sessions that don't supportthese procedures so that it can be depreferenced as expected, and we provide a workaround (Section 4.6) for the case where one or more IBGP routers are not upgraded. However, inLLGR, but this could prevent the VPNPE-CE case, the protocol in use is EBGP, and our workaround does not work since it relies on the use of LOCAL_PREF, an IBGP-only path attribute.routes from being used for their intended purpose. We observe that the principal motivation for restricting the propagation of "stale" routing information is the desire to prevent it from spreading without limit once it exits the "safe" perimeter. We further observe that VPN deployments are typically topologically constrained, making this concern moot. For this reason, an implementation MAY advertise stale routes over a PE-CE session, when explicitly configured to do so. That is, the second rule listed in Section 4.3 MAY be disregarded in such cases. All other rules continue to apply. Finally, if this exception is used, the implementationSHOULDSHOULD, bydefaultdefault, attach the NO_EXPORT community to the routes in question, as an additional protection against stale routes spreading without limit. Attachment of the NO_EXPORT community MAY be disabled by explicitconfiguration,configuration in order to accommodate exceptional cases. See further discussion of using an explicitly configured policy to mitigate this issue in Section 5.1. 4.7.2. ProcedureswhenWhen IBGPisIs the PE-CE Protocol in a VPN If IBGP is used as the PE-CE protocol, following the procedures of [RFC6368], then when a PE router imports a VPN route that contains the ATTR_SET attribute into a destination VRF and subsequently advertises that route to a CErouter,router: * If the CE routerdoes supportsupports the procedures of this document (in other words, if the CE router has advertised the LLGR Capability): In addition to includingin the advertised routethe path attributes derived from the ATTR_SET attribute in the advertised route as per [RFC6368], the PE router MUST also include the LLGR_STALE community if it is present in the path attributes of the imported route, even if it is not present in the ATTR_SET attribute. * If the CE router does not support the procedures of thisdocument, thendocument: Then the optional procedures of Section 4.6 MAY be followed, attaching the NO_EXPORT community and setting the value of LOCAL_PREF to zero, overriding the value found in the ATTR_SET. Similarly, when a PE router receives a route from a CE into its VRF and subsequently exports that route to a VPN addressfamily,family: * If the PE routerdoes supportsupports the procedures of this document (in other words, if the PE router has advertised the LLGR Capability): In addition to including in the VPN route the ATTR_SET derived from the path attributes as per [RFC6368], the PE router MUST also include the LLGR_STALE community in the VPN route if it is present in the path attributes of the route as received from the CE. * If the PE router does not support the procedures of thisdocument, theredocument: There exists no ideal solution. The CE could advertise a route with LLGR_STALE, with the understanding that the LLGR_STALE marking will only be honored by the provider network if appropriate policy configuration exists on the PE (see Section 5.1). It is at least guaranteed that LLGR_STALE will be propagated when the route is propagated beyond the providernetwork. Or,network, or the CE could refrain from advertising the LLGR_STALE route to the incapable PE. 5. Deployment Considerations The deployment considerations discussed in [RFC4724] apply to this document. In addition, network operators are cautioned to carefully consider the potential disadvantages of deploying these procedures for a given AFI/SAFI. Most notably, if used for an AFI/SAFI that conveystraditionalconventional reachability information, the use of along-livedlong- lived stale route could result in a loss of connectivity for the covered prefix. This specification takes pains to mitigate this risk wherepossible,possible by making such routesleast-preferredleast preferred and by restricting the scope of such routes to routers that support these procedures (or, optionally, a single Autonomous System, see"Optional Partial Deployment Procedure" (Section 4.6)).Section 4.6). However, if a stale route is chosen as best for a given prefix, then according to the normal rules of IPforwarding a stale more-specific route,forwarding, thathas no non-stale alternate paths available,route willstillbe usedinstead offor matching destinations, even if anon-stale less-specific route.non- stale less specific matching route is also available. Networks in which the deployment of these procedures would be especially concerning include thosewhichthat do not use "tunneled" forwarding (in other words, those usingtraditionalconventional hop-by-hop forwarding). Implementations MUST NOT enable these procedures by default. They MUST require affirmative configuration per AFI/SAFI in order to enable them. The procedures of this document do not alter the route resolvability requirement of Section 9.1.2.1 of [RFC4271]. Because of this, it will commonly be the case that "stale" IBGP routes will only continue to be used if the router depicted in the next hop remains resolvable, even if its BGP component is down. Details of IGP fault-tolerance strategies are beyond the scope of this document. In addition to the foregoing, it may be advisable to check the viability of the next hop through other means, for example,BFDBidirectional Forwarding Detection (BFD) [RFC5880]. This may be especially useful in cases where the next hop is known directly at the network layer, notably EBGP. As discussed in this document, after a BGP session goes down and before the session is re-established, stale routes may be retained for up to two consecutive periods, controlled by the"Restart Time"Restart Time and the"Long-livedLong-Lived StaleTime", respectively.Time, respectively: * During the firstperiodperiod, routing churn would bepreventedprevented, but with potentialblackholing of traffic.persistent packet loss. * During the secondperiodperiod, potentialblackholing of trafficpersistent packet loss may bereducedreduced, but routing churn would be visible throughout the network. The setting of the relevant parameters for a particular application should take into accountthe tradeoffs, thetrade-offs, network dynamics, and potential failure scenarios. If needed, the first period can be bypassed either by local configuration or by setting the"Restart Time"Restart Time in the Graceful Restart Capability to zero and/or not listing the AFI/SAFI in thatCapability.capability. The setting of the F bit (and the"Forwarding State"Forwarding State bit of the accompanying GRcapability) dependsCapability) depends, inpartpart, on deployment considerations. The F bit can be understood as an indication that the Helper should flush associated routes (if the bit is left clear). As discussed inthe Introduction (Section 1),Section 1, an important use case for LLGR is for routes that are more akin to configuration than totraditionalconventional routing. For such routes, it may make sense to always set the F bit, regardless of other considerations. Likewise, for control-plane-onlyentitiesentities, such as dedicated routereflectors,reflectors that do not participate in the forwarding plane, it makes sense to always set the F bit. Overall, the rule of thumb is that if loss of state on the restarting router can reasonably be expected to cause a forwarding loop orblack hole,persistent packet loss, the F bit should be set scrupulously according to whether state has been retained. Specifics ofwhenwhether or not the F bitis, andisnot,set areimplementation-dependentimplementation dependent and may also be controlled by configuration. Also, for every AFI/SAFI represented in the LLGRcapabilityCapability that is also represented in the GRcapability,Capability, there will be two corresponding Fbits --bits: the LLGR F bit and the GR F bit. If the LLGR F bit is set, the corresponding GR F bit should also be set, since to do otherwise would cause the state to be cleared on the Receiving Router per the normal rules of GR, violating the intent of the set LLGR bit. 5.1. When BGPisIs the PE-CE Protocol in a VPN As discussed in Section 4.7, it may be necessary for a PE to advertise stale routes to a CE in some VPN deployments, even if the CE does not support this specification. In that case, the operator configuring their PE to advertise such routes should notify the operator of the CE receiving the routes, and the CE should be configured to depreference the routes. Similarly, it may be necessary for a CE to advertise stale routes to a PE, even if the PE does not support this specification. In that case, the operator configuring their CE to advertise such routes should notify the operator of the PE receiving the routes, and the PE should be configured to depreference the routes. Typical BGP implementations will be able to be configured to depreference routes by matching on the LLGR_STALE community and setting the LOCAL_PREF for matching routes to zero, similar to the procedure described in Section 4.6. 5.2. Risks of Depreferencing Routes Depreferencing EBGP routes is considered safe, no different from the common practice of applying a routing policy to an EBGP session. However, the same is not always true of IBGP. Consistent route selection is a fundamental tenet of IBGP correctness and safe operation in hop-by-hop routed networks. When routers within an AS apply different criteria in selecting routes, they can arrive at inconsistent route selections. This can lead to the formation of forwarding loops unless some form of tunneled forwarding is used to prevent "core" routers from making a (potentially inconsistent) forwarding decision based on the IP header. This specification uses the state of a peering session as an input to the selection criteria, depreferencing routes that are associated with a session that has gone down but that have not yet aged out. Since different routers within an AS might have different notions as to whether their respective sessions with a given peer are up or down, they might apply different selection criteria to routes from that peer. This could result in a forwarding loop forming between such routers. For an example of such a forwarding loop, consider the following simple topology: A ---- B ---- C ------------------------- D ^ ^ | | R1 R2 Figure 1 In this example, A - D are routers with a full mesh of IBGP sessions between them (the sessions are not shown). The short links have unit cost, the long link has cost 5. Routers A and D are AS border routers, each advertising some route, R, with the same LOCAL_PREF into theAS -- these areAS: denoted R1 and R2 in the diagram. In ordinary operation, it can be seen that routers B and C will select R1 forforwarding,forwarding and will forward toward A. Suppose that the session between A and B goes down for some reason, and it stays down long enough for LLGR processing to be invoked on B.ThenThen, on B, route R1 will be depreferenced, leading to the selection of R2 by B. However, C will continue to prefer R1.ItIn this case, it can be seen thatin this case,a forwarding loop for packets destined to R would form between B and C. (We note that other forwarding loop scenarios can be constructed fortraditionalconventional GR, but these are generally considered less severe since GR can remain in effect for a much more limited interval.) The potential benefits of this specification can outweigh the risks discussed above, as long as care is exercised in deployment. The cardinal rule to be followedis,is that if a given set of routesareis being used within an AS for hop-by-hop forwarding,itenabling LLGR procedures is notrecommended to enable LLGR procedures.recommended. If tunneled forwarding (such as MPLS) is used within the AS, or if routes are being used for purposes other than hop-by-hop forwarding, less caution isneeded, thoughneeded; however, the operator should still carefully consider the consequences of enabling LLGR. 6. Security Considerations The security implications of the LLGR mechanism defined in this document are akin to those incurred by the maintenance of stale routing information within a network. However, since the retention time maypotentiallybe much longer, the window during which certain attacks are feasible maybesubstantiallyincreased.increase. This is particularly relevant when considering the maintenance of routing information that is used for servicesegregation -segregation, such as MPLS label entries. For MPLS VPN services, the effectiveness of the traffic isolation between VPNs relies on the correctness of the MPLS labels between ingress and egress PEs. In particular, when an egress PE withdraws a label L1 allocated to a VPN1 route, this label must not be assigned to a VPN route of a different VPN until all ingress PEs stop using the old VPN1 route using L1. Such a corner case may happen today if the propagation of VPN routes by BGP messages between PEs takes more time than the labelre- allocationreallocation delay on a PE. Given that we can generally bound the worst-case BGP propagation time to a few minutes (forexample 2-5),example, 2-5 minutes), the security breach will not occur if PEs are designed to not reallocate a previously used and withdrawn label before a few minutes. The problem is made worse with BGP GR between PEsasbecause VPN routes can be stalled for a longer period of time (forexampleexample, 20 minutes). This is further aggravated by theBGPLLGR extensionproposedspecified in this documentasbecause VPN routes can be stalled for a much longer period of time (forexampleexample, 2 hours, 1 day). In order to exploit the vulnerability described above,there is a requirementan attacker needs to engineer a specific LLGR state between two PEdevices, whilst engineeringdevices and also cause the label reallocation to occurin a mannersuch thatresults inthe two topologiesoverlapping. Therefore, tooverlap. To avoid the potential for a VPN breach,before enabling BGP LLGR for a VPN address family,the operator shouldendeavor toensure that the lower boundon when afor labelmight be reusedreuse is greater than the upper bound onLLST.the LLST before enabling LLGR for a VPN address family. Section 4.2 discusses the provision of an upper bound on LLST. Details of features for setting a lower bound on label reuse time are beyond the scope of this document; however, factors that might need to be taken into account when setting this value include: * The load of the BGP route churn on a PE (in terms of the number of VPN labels advertised and the churn rate). * The label allocation policy on thePE (possibly dependingPE, which possibly depends upon the size of the pool of the VPN labels (which can be restricted by hardware considerations or other MPLS usages), the label allocation scheme (forexampleexample, per route or per VRF/CE), and there- allocationreallocation policy (forexampleexample, least recently used label). Note that[RFC4781][RFC4781], which defines the Graceful Restart Mechanism for BGP withMPLSMPLS, is also applicable toBGPLLGR. 7. Examples of Operation For illustrative purposes, we present a few examples of how this specification might be used in practice. These examples are neither exhaustive nor normative. Consider the following scenario: A border router, ASBR1, has an IBGP peering with a route reflector, RR1, from which it learns routes. It has an EBGP peering with an external peer, EXT, to which it advertises those routes. The external peer has advertised the GR and LLGR Capabilities to ASBR1. ASBR1 is configured to support GR and LLGR on its sessions with RR1 and EXT. RR1 advertises a GR Restart Time of 1 (second) and an LLST of 3600 (seconds): +==========+=====================================================+ | Time | Event | +==========+=====================================================+ | t | ASBR1's IBGP session with RR fails. ASBR1 retains | | | RR's routes according to the rules of GR[RFC4724][RFC4724]. | +----------+-----------------------------------------------------+ | t+1 | GR Restart Time expires. ASBR1 transitions RR's | | | routes to long-lived stale routes by attaching the | | | LLGR_STALE community and depreferencing them. | | | However, since it has no backup routes, it | | | continues to make use of them. It re-announces | | | them to EXT with the LLGR_STALE community attached. | +----------+-----------------------------------------------------+ | t+1+3600 | LLST expires. ASBR1 removes RR's stale routes from | | | its own RIB and sends BGP updates to withdraw them | | | from EXT. | +----------+-----------------------------------------------------+ Table 1 Next, imagine the samescenarioscenario, but suppose RR1 advertised a GR Restart Time of zero, effectively disabling GR. Equally, ASBR1 could have used a local configuration to override RR1's offered Restart Time, setting it to alocally-configuredlocally configured value of zero: +==========+=======================================================+ | Time | Event | +==========+=======================================================+ | t | ASBR1's IBGP session with RR fails. ASBR1 | | | transitions RR's routes to long-lived stale routes by | | | attaching the LLGR_STALE community and depreferencing | | | them. However, since it has no backup routes, it | | | continues to make use of them. It re-announces them | | | to EXT with the LLGR_STALE community attached. | +----------+-------------------------------------------------------+ | t+0+3600 | LLST expires. ASBR1 removes RR's stale routes from | | | its own RIB and sends BGP updates to withdraw them | | | from EXT. | +----------+-------------------------------------------------------+ Table 2 Next, imagine the original scenario, but consider that the ASBR1-RR1 session comes back up and becomes synchronized 180 seconds after the failure was detected: +=========+=====================================================+ | Time | Event | +=========+=====================================================+ | t | ASBR1's IBGP session with RR fails. ASBR1 retains | | | RR's routes according to the rules of GR[RFC4724][RFC4724]. | +---------+-----------------------------------------------------+ | t+1 | GR Restart Time expires. ASBR1 transitions RR's | | | routes to long-lived stale routes by attaching the | | | LLGR_STALE community and depreferencing them. | | | However, since it has no backup routes, it | | | continues to make use of them. It re-announces | | | them to EXT with the LLGR_STALE community attached. | +---------+-----------------------------------------------------+ | t+1+179 | Session isreestablishedre-established and resynchronized.ASBR1| | | ASBR1 removes the LLGR_STALE community from RR1'sroutes| | | routes and re-announces them to EXT with theLLGR_STALE| | | LLGR_STALE community removed. | +---------+-----------------------------------------------------+ Table 3 Finally, imagine the original scenario, but consider that EXT has not advertised the LLGR Capability to ASBR1: +==========+======================================================+ | Time | Event | +==========+======================================================+ | t | ASBR1's IBGP session with RR fails. ASBR1 retains | | | RR's routes according to the rules of GR[RFC4724][RFC4724]. | +----------+------------------------------------------------------+ | t+1 | GR Restart Time expires. ASBR1 transitions RR's | | | routes to long-lived stale routes by attaching the | | | LLGR_STALE community and depreferencing them. | | | However, since it has no backup routes, it continues | | | to make use of them. It withdraws them from EXT. | +----------+------------------------------------------------------+ | t+1+3600 | LLST expires. ASBR1 removes RR's stale routes from | | | its own RIB. | +----------+------------------------------------------------------+ Table 4 8.Acknowledgements We would like to thank Nabil Bitar, Martin Djernaes, Roberto Fragassi, Jeffrey Haas, Jakob Heitz, Daniam Henriques, Nicolai Leymann, Mike McBride, Paul Mattes, John Medamana, Pranav Mehta, Han Nguyen, Saikat Ray, Valery Smyslov, and Bo Wu for their valuable input and contributions to the discussion and solution. 9. Contributors Clarence Filsfils Cisco Systems Brussels 1150 Belgium Email: cf@cisco.com Pradosh Mohapatra Sproute Networks Email: mpradosh@yahoo.com Yakov Rekhter Eric Rosen Email: erosen52@gmail.com Rob Shakir Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 United States of America Email: robjs@google.com Adam Simpson Nokia Email: adam.1.simpson@nokia.com 10.IANA Considerations This document defines anewBGP capability- Long-livedcalled the "Long-Lived Graceful RestartCapability.Capability". IANA has assigned aCapability Codevalue of71,71 from the "Capability Codes" registry. This document introducesa newtwo BGP well-knowncommunitycommunities: * the first called "LLGR_STALE" for marking long-lived stale routes, andanother well-known community* the second called "NO_LLGR"to markfor marking routes that should not be retained if stale. IANA has assigned these well-known community values 0xFFFF0006 and 0xFFFF0007, respectively, from the "BGP Well-known Communities" registry.For each of these three registrations,IANAis requested to update the reference to refer to this document. IANA is requested to establishhas established anewregistry called"Long-livedthe "Long-Lived Graceful Restart Flags for Address Family" registry under theBorder"Border Gateway Protocol (BGP)ParametersParameters" group. The registration procedures are StandardsAction.Action (see [RFC8126]). The registryshouldis initiallybepopulated as follows:+==============+=======================+============+===============++==============+=======================+============+===========+ | Bit Position | Name | Short Name | Reference |+==============+=======================+============+===============++==============+=======================+============+===========+ | 0 | Preservation of state | F |This | | | | | documentRFC 9494 |+--------------+-----------------------+------------+---------------++--------------+-----------------------+------------+-----------+ | 1-7 | Unassigned | | |+--------------+-----------------------+------------+---------------++--------------+-----------------------+------------+-----------+ Table 511.9. References11.1.9.1. Normative References [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, <https://www.rfc-editor.org/info/rfc1997>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, <https://www.rfc-editor.org/info/rfc4271>. [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, DOI 10.17487/RFC4724, January 2007, <https://www.rfc-editor.org/info/rfc4724>. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, January 2007, <https://www.rfc-editor.org/info/rfc4760>. [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 2009, <https://www.rfc-editor.org/info/rfc5492>. [RFC6368] Marques, P., Raszuk, R., Patel, K., Kumaki, K., and T. Yamagata, "Internal BGP as the Provider/Customer Edge Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 6368, DOI 10.17487/RFC6368, September 2011, <https://www.rfc-editor.org/info/rfc6368>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. [RFC8538] Patel, K., Fernando, R., Scudder, J., and J. Haas, "Notification Message Support for BGP Graceful Restart", RFC 8538, DOI 10.17487/RFC8538, March 2019, <https://www.rfc-editor.org/info/rfc8538>.11.2.9.2. Informative References [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, <https://www.rfc-editor.org/info/rfc4364>. [RFC4761] Kompella, K., Ed. and Y. Rekhter, Ed., "Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, DOI 10.17487/RFC4761, January 2007, <https://www.rfc-editor.org/info/rfc4761>. [RFC4781] Rekhter, Y. and R. Aggarwal, "Graceful Restart Mechanism for BGP with MPLS", RFC 4781, DOI 10.17487/RFC4781, January 2007, <https://www.rfc-editor.org/info/rfc4781>. [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, <https://www.rfc-editor.org/info/rfc5880>. [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, <https://www.rfc-editor.org/info/rfc8126>. [RFC8955] Loibl, C., Hares, S., Raszuk, R., McPherson, D., and M. Bacher, "Dissemination of Flow Specification Rules", RFC 8955, DOI 10.17487/RFC8955, December 2020, <https://www.rfc-editor.org/info/rfc8955>. Acknowledgements We would like to thank Nabil Bitar, Martin Djernaes, Roberto Fragassi, Jeffrey Haas, Jakob Heitz, Daniam Henriques, Nicolai Leymann, Mike McBride, Paul Mattes, John Medamana, Pranav Mehta, Han Nguyen, Saikat Ray, Valery Smyslov, and Bo Wu for their valuable input and contributions to the discussion and solution. Contributors Clarence Filsfils Cisco Systems 1150 Brussels Belgium Email: cf@cisco.com Pradosh Mohapatra Sproute Networks Email: mpradosh@yahoo.com Yakov Rekhter Eric Rosen Email: erosen52@gmail.com Rob Shakir Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 United States of America Email: robjs@google.com Adam Simpson Nokia Email: adam.1.simpson@nokia.com Authors' Addresses James Uttaro Independent Contributor Email: juttaro@ieee.org Enke Chen Palo Alto Networks Email: enchen@paloaltonetworks.com Bruno Decraene Orange Email: bruno.decraene@orange.com John G. Scudder Juniper Networks Email: jgs@juniper.net