Internet Engineering Task Force (IETF) S. Litkowski Request for Comments: 8541 Orange Business Service Category: Informational B. Decraene ISSN: 2070-1721 Orange M. Horneffer Deutsche Telekom February 2019Link State ProtocolsImpact of Shortest Path First (SPF) Trigger and DelayAlgorithm ImpactStrategies on IGP Micro-loops Abstract A micro-loop is a packet-forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet-forwarding paradigm. This document analyzes the impact of using different link state IGP implementations in a single network with respect to micro-loops. The analysis is focused on the Shortest Path First (SPF) delayalgorithm.algorithm but also mentions the impact of SPF trigger strategies. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8541. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 3. SPF Trigger Strategies . . . . . . . . . . . . . . . . . . . 5 4. SPF Delay Strategies . . . . . . . . . . . . . . . . . . . . 5 4.1. Two-Step SPF Delay . . . . . . . . . . . . . . . . . . . 6 4.2. ExponentialBackoff . . . .Back-Off Delay . . . . . . . . . . . . . . . 6 5. Mixing Strategies . . . . . . . . . . . . . . . . . . . . . .78 6. Benefits of Standardized SPF Delay Behavior . . . . . . . . . 11 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 9.2. Informative References . . . . . . . . . . . . . . . . . 13 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 1. Introduction Link state IGP protocols are based on a topology database on which the SPF algorithm is run to find a consistent set of non-looping routing paths. Specifications like IS-IS [RFC1195] propose some optimizations of the route computation (see Appendix C.1 of [RFC1195]), but not all implementations follow those non-mandatory optimizations. In this document, we refer to the events that lead to a new SPF computation based on the topology as "SPF triggers". Link state IGP protocols, like OSPF [RFC2328] and IS-IS [RFC1195], use multiple timers to control the router behavior in case of churn: SPF delay, Partial Route Computation (PRC) delay, Link State Packet (LSP) generation delay, LSP flooding delay, and LSP retransmission interval. Some of the values and behaviors of these timers(values and behavior)are standardized in protocol specifications, and some are not. The SPF computation- related timers have generally remained unspecified. Implementations are free to implement non-standardized timers in any way. For some standardized timers, implementations may offer dynamically adjusted timers to help control the churn rather than use static configurable values. "SPF delay" refers to the timer in most implementations that specifies the required delay before running an SPF computation after an SPF trigger is received. A micro-loop is a packet-forwarding loop that may occur transiently among two or more routers in a hop-by-hop packet-forwarding paradigm.We observe that theseThese micro-loops are formed when two routers do not update their Forwarding Information Bases (FIBs) for a certain prefix at the same time. The micro-loop phenomenon is described in [MICROLOOP-LSRP]. Two micro-loop mitigation techniques have been defined by IETF. The mechanism in [RFC6976] has not been widely implemented, presumably due to the complexity of the technique. The mechanism in [RFC8333] has been implemented. However, it does not prevent all micro-loops that can occur for a given topology and failure scenario. In multi-vendor networks, using different implementations of a link state protocol may favor micro-loop creation during the convergence process due to discrepancies in timers. Service providers already know to usesimilartimers(valueswith similar values andbehavior)behaviors for all of the network as a best practice, but this is sometimes not possible due to the limitations of implementations. This document presents reasons for service providers to have consistent implementation of link state protocols across vendors. In particular, this document analyzes the impact of using different link state IGP implementations in a single network with regard to micro- loops. The analysis focuses on the SPF delay algorithm. [RFC8405] defines a solution that partially addresses this problem statement, and this document captures the reasoning of the provided solution. 2. Problem Statement S ---- E | | 10 | | 10 | | D ---- A | 2 Px Figure 1: Network TopologySuffering fromExperiencing Micro-loops Figure 1 represents a small network composed of four routers (S, D, E, and A). Router S primarily uses the SD link to reach the prefixes behind router D (named Px). When the SD link fails, the IGP convergence occurs. If S converges before E, S will forward the traffic to Px through E; however, because E has not converged yet, E will loop the traffic back to S, leading to a micro-loop. The micro-loop appears due to the asynchronous convergence of nodes in a network when an event occurs. Multiple factors (or a combination of factors) may increase the probability of a micro-loop appearing: o Delay of failure notification: The greater the time gap between E and S being advised of the failure, the greater the chance that a micro-loop may appear. o SPF delay: Most implementations support a delay for the SPF computation to catch as many events as possible. If S uses an SPF delay timer of x ms, E uses an SPF delay timer of y ms, and x < y, E would start converging after S, leading to a potential micro- loop. o SPF computation time: This is mostly a matter of CPU power and optimizations like incremental SPF. If S computes its SPF faster than E, there is a chance for a micro-loop to appear. Today, CPUs are fast enough to consider the SPF computation time as negligible (on the order of milliseconds in a large network). o SPF computation ordering: An SPF trigger can be common to multiple IGP areas or levels (e.g., IS-IS Level 1 and Level 2) or to multiple address families with multi-topologies. There is no specified order for SPF computation today, and it is implementation dependent. In such scenarios, if the order of SPF computation done in S and E for each area, level, topology, or SPF algorithm is different, there is a possibility for a micro-loop to appear. o RIB and FIB prefix insertion speed or ordering: This is highly dependent on the implementation. Even if all of these factors increase the probability of a micro-loop appearing, the SPF delay plays a significant role, especially in case of churn. As the number of IGP events increases, the delta between the SPF delay values used by routers becomessignificant andsignificant; in fact, it becomes the dominating factor (especially when one router increases its timer exponentially while another one increases it in a smoother way). Another important factor is the time to update the FIB. As of today, the total FIB update time is the major factor for IGP convergence. However, for micro-loops, what matters is not the total time but the difference in installing the same prefix between nodes. The time to update the FIB may be the main part for the first iteration but not for subsequent IGP events. In addition, the time to update the FIB is very implementation specific and difficult or impossible to standardize, while the SPF delay algorithm may be standardized. As a consequence, this document will focus on an analysis of SPF delay behavior and associated triggers. 3. SPF Trigger Strategies Depending on the change advertised in the LSP(Link State Protocol Data Unit)or LSA (Link State Advertisement), the topology may or may not be affected. An implementation may avoid running the SPF computation (and may only run an IP reachability computation instead) if the advertised change does not affect the topology. Different strategies can trigger the SPF computation: 1. An implementation may always run a full SPF for any type of change. 2. An implementation may run a full SPF only when required. For example, if a link fails, a local node will run an SPF for its local LSP update. If the LSP from the neighbor (describing the same failure) is received after SPF has started, the local node can decide that a new full SPF is not required as the topology has not changed. 3. If the topology does not change, an implementation may only recompute the IP reachability. As noted in Section 1, SPF optimizations are not mandatory in specifications. This has led to the implementation of different strategies. 4. SPF Delay Strategies Implementations of link state routing protocols use different strategies to delay SPF computation. The two most common SPF delay behaviors are the following: 1. Two-step SPF delay 2. Exponentialbackoffback-off delay These behaviors are explained in the following sections. 4.1. Two-Step SPF Delay The SPF delay is managed by four parameters: o rapid delay: the amount of time to wait before running SPF after the initial SPF trigger event. o rapid runs: the number of consecutive SPF runs that can use the rapid delay. When the number is exceeded, the delay moves to the slow delay value. o slow delay: the amount of time to wait before running an SPF. o wait time: the amount of time to wait without detecting SPF trigger events before going back to the rapid delay.Example:Figure 2 displays the evolution of the SPF delay timer (based on a two-step delay algorithm) upon the reception of multiple events. Figure 2 considers the following parameters for the algorithm: rapid delay (RD) = 50 ms, rapid runs = 3, slow delay (SD) = 1 s, wait time = 2ss. SPF delay time ^ | | SD- | x xx x | | | RD- | x x x x | +---------------------------------> Events | | | | || | | < wait time > Figure 2:Two-PhaseTwo-Step SPF Delay Algorithm 4.2. ExponentialBackoffBack-Off Delay The algorithm has two modes: fast mode andbackoffback-off mode. In fast mode, the SPF delay is usually delayed by a very small amount of time (fast reaction). When an SPF computation is run in fast mode, the algorithm automatically moves tobackoffback-off mode (a single SPF run is authorized in fast mode). Inbackoffback-off mode, the SPF delay increases exponentially in each run. When the network becomes stable, the algorithm moves back to fast mode. The SPF delay is managed by four parameters: o first delay: amount of time to wait before running SPF. This delay is used only when SPF is in fast mode. o incremental delay: amount of time to wait before running SPF. This delay is used only when SPF is inbackoffback-off mode and increments exponentially at each SPF run. o maximum delay: maximum amount of time to wait before running SPF. o wait time: amount of time to wait without events before going back to fast mode.Example:Figure 3 displays the evolution of the SPF delay timer (based on an exponential back-off delay algorithm) upon the reception of multiple events. Figure 3 considers the following parameters for the algorithm: first delay (FD) = 50 ms, incremental delay (ID) = 50 ms, maximum delay (MD) = 1 s, wait time = 2 s SPF delay time ^ MD- | xx x | | | | | | x | | | | x | FD- | x x x ID | +---------------------------------> Events | | | | || | | < wait time > FM->BM -------------------->FM Figure 3: Exponential Back-Off Delay Algorithm 5. Mixing StrategiesInFigure1, we consider1 illustrates a flow of packets from S to D.We observe thatSis usinguses optimized SPF triggering (full SPF is triggered only when necessary) anda two-steptwo- step SPF delay(rapid=150(rapid delay = 150 ms, rapidruns=3, slow=1runs = 3, slow delay = 1 s). As the implementation of S is optimized,Partial Reachability Computation (PRC)PRC is available.We observe thatFor PRC delay, we consider the same timers as for SPFdelay the PRC. We observe thatdelay. Eis usinguses an SPF trigger strategy that always computes a full SPF for any change and uses the exponentialbackoffback-off strategy foranSPF delay(start=150(first delay = 150 ms,inc=150incremental delay = 150 ms,max=1maximum delay = 1 s).We also considerConsider the following sequence of events: o t0=0 ms: A prefix is declared down in the network.We consider thisThis eventto have happenedhappens at time=0. o 200 ms: The prefix is declared up. o 400 ms: The prefix is declared down in the network. o 1000 ms: S-D link fails. +---------+-------------------+------------------+------------------+ | Time | Network Event | Router S Events | Router E Events | +---------+-------------------+------------------+------------------+ | t0=0 | Prefix DOWN | | | | 10 ms | | Schedule PRC (in | Schedule SPF (in | | | | 150 ms) | 150 ms) | | | | | | | | | | | | 160 ms | | PRC starts | SPF starts | | 161 ms | | PRC ends | | | 162 ms | | RIB/FIB starts | | | 163 ms | | | SPF ends | | 164 ms | | | RIB/FIB starts | | 175 ms | | RIB/FIB ends | | | 178 ms | | | RIB/FIB ends | | | | | | | 200 ms | Prefix UP | | | | 212 ms | | Schedule PRC (in | | | | | 150 ms) | | | 214 ms | | | Schedule SPF (in | | | | | 150 ms) | | | | | | | | | | | | 370 ms | | PRC starts | | | 372 ms | | PRC ends | | | 373 ms | | | SPF starts | | 373 ms | | RIB/FIB starts | | | 375 ms | | | SPF ends | | 376 ms | | | RIB/FIB starts | | 383 ms | | RIB/FIB ends | | | 385 ms | | | RIB/FIB ends | | | | | | | 400 ms | Prefix DOWN | | | | 410 ms | | Schedule PRC (in | Schedule SPF (in | | | | 300 ms) | 300 ms) | | | | | | | | | | | | | | | | | | | | | | 710 ms | | PRC starts | SPF starts | | 711 ms | | PRC ends | | | 712 ms | | RIB/FIB starts | | | 713 ms | | | SPF ends | | 714 ms | | | RIB/FIB starts | | 716 ms | | RIB/FIB ends | RIB/FIB ends | | | | | | | 1000 ms | S-D link DOWN | | | | 1010 ms | | Schedule SPF (in | Schedule SPF (in | | | | 150 ms) | 600 ms) | | | | | | | | | | | | 1160 ms | | SPF starts | | | 1161 ms | | SPF ends | | | 1162 ms | Micro-loop may | RIB/FIB starts | | | | start from here | | | | 1175 ms | | RIB/FIB ends | | | | | | | | | | | | | | | | | | | | | | | 1612 ms | | | SPF starts | | 1615 ms | | | SPF ends | | 1616 ms | | | RIB/FIB starts | | 1626 ms | Micro-loop ends | | RIB/FIB ends | +---------+-------------------+------------------+------------------+ Table 1: Route Computation When S and E Use Different Behaviors and Multiple Events Appear In Table 1, due to discrepancies in the SPF management and after multiple events of different types, the values of the SPF delay are completely misaligned between node S and node E, leading to the creation of micro-loops. The same issue can also appear with only a single type of event as shown below: +---------+-------------------+------------------+------------------+ | Time | Network Event | Router S Events | Router E Events | +---------+-------------------+------------------+------------------+ | t0=0 | Link DOWN | | | | 10 ms | | Schedule SPF (in | Schedule SPF (in | | | | 150 ms) | 150 ms) | | | | | | | | | | | | 160 ms | | SPF starts | SPF starts | | 161 ms | | SPF ends | | | 162 ms | | RIB/FIB starts | | | 163 ms | | | SPF ends | | 164 ms | | | RIB/FIB starts | | 175 ms | | RIB/FIB ends | | | 178 ms | | | RIB/FIB ends | | | | | | | 200 ms | Link DOWN | | | | 212 ms | | Schedule SPF (in | | | | | 150 ms) | | | 214 ms | | | Schedule SPF (in | | | | | 150 ms) | | | | | | | | | | | | 370 ms | | SPF starts | | | 372 ms | | SPF ends | | | 373 ms | | | SPF starts | | 373 ms | | RIB/FIB starts | | | 375 ms | | | SPF ends | | 376 ms | | | RIB/FIB starts | | 383 ms | | RIB/FIB ends | | | 385 ms | | | RIB/FIB ends | | | | | | | 400 ms | Link DOWN | | | | 410 ms | | Schedule SPF (in | Schedule SPF (in | | | | 150 ms) | 300 ms) | | | | | | | | | | | | 560 ms | | SPF starts | | | 561 ms | | SPF ends | | | 562 ms | Micro-loop may | RIB/FIB starts | | | | start from here | | | | 568 ms | | RIB/FIB ends | | | | | | | | | | | | | 710 ms | | | SPF starts | | 713 ms | | | SPF ends | | 714 ms | | | RIB/FIB starts | | 716 ms | Micro-loop ends | | RIB/FIB ends | | | | | | | 1000 ms | Link DOWN | | | | 1010 ms | | Schedule SPF (in | Schedule SPF (in | | | | 1 s) | 600 ms) | | | | | | | | | | | | | | | | | | | | | | 1612 ms | | | SPF starts | | 1615 ms | | | SPF ends | | 1616 ms | Micro-loop may | | RIB/FIB starts | | | start from here | | | | 1626 ms | | | RIB/FIB ends | | | | | | | | | | | | | | | | | | | | | | 2012 ms | | SPF starts | | | 2014 ms | | SPF ends | | | 2015 ms | | RIB/FIB starts | | | 2025 ms | Micro-loop ends | RIB/FIB ends | | | | | | | | | | | | +---------+-------------------+------------------+------------------+ Table 2: Route Computation upon Multiple Link Down Events When S and E Use Different Behaviors 6. Benefits of Standardized SPF Delay BehaviorUsingTable 3 uses the same event sequence asinTable1, we may expect fewer and/ or1. Fewer and/or shorter micro-loops are expected using a standardized SPF delay. +---------+-------------------+------------------+------------------+ | Time | Network Event | Router S Events | Router E Events | +---------+-------------------+------------------+------------------+ | t0=0 | Prefix DOWN | | | | 10 ms | | Schedule PRC (in | Schedule PRC (in | | | | 150 ms) | 150 ms) | | | | | | | | | | | | 160 ms | | PRC starts | PRC starts | | 161 ms | | PRC ends | | | 162 ms | | RIB/FIB starts | PRC ends | | 163 ms | | | RIB/FIB starts | | 175 ms | | RIB/FIB ends | | | 176 ms | | | RIB/FIB ends | | | | | | | 200 ms | Prefix UP | | | | 212 ms | | Schedule PRC (in | | | | | 150 ms) | | | 213 ms | | | Schedule PRC (in | | | | | 150 ms) | | | | | | | | | | | | 370 ms | | PRC starts | PRC starts | | 372 ms | | PRC ends | | | 373 ms | | RIB/FIB starts | PRC ends | | 374 ms | | | RIB/FIB starts | | 383 ms | | RIB/FIB ends | | | 384 ms | | | RIB/FIB ends | | | | | | | 400 ms | Prefix DOWN | | | | 410 ms | | Schedule PRC (in | Schedule PRC (in | | | | 300 ms) | 300 ms) | | | | | | | | | | | | | | | | | | | | | | 710 ms | | PRC starts | PRC starts | | 711 ms | | PRC ends | PRC ends | | 712 ms | | RIB/FIB starts | | | 713 ms | | | RIB/FIB starts | | 716 ms | | RIB/FIB ends | RIB/FIB ends | | | | | | | 1000 ms | S-D link DOWN | | | | 1010 ms | | Schedule SPF (in | Schedule SPF (in | | | | 150 ms) | 150 ms) | | | | | | | | | | | | 1160 ms | | SPF starts | | | 1161 ms | | SPF ends | SPF starts | | 1162 ms | Micro-loop may | RIB/FIB starts | SPF ends | | | start from here | | | | 1163 ms | | | RIB/FIB starts | | 1175 ms | | RIB/FIB ends | | | 1177 ms | Micro-loop ends | | RIB/FIB ends | +---------+-------------------+------------------+------------------+ Table 3: Route Computation When S and E Use the Same Standardized Behavior As displayed above, there can be other parameters, like router computation power and flooding timers, that may also influence micro- loops. In all the examples in this document comparing the SPF timer behavior of router S and router E, we have made router E a bit slower than router S. This can lead to micro-loops even when both S and E use a common standardized SPF behavior. However, by aligning implementations of the SPF delay, we expect that service providers may reduce the number and duration of micro-loops. 7. Security Considerations This document does not introduce any security considerations. 8. IANA Considerations This document has no actions for IANA. 9. References 9.1. Normative References [RFC1195] Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual environments", RFC 1195, DOI 10.17487/RFC1195, December 1990, <https://www.rfc-editor.org/info/rfc1195>. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998, <https://www.rfc-editor.org/info/rfc2328>. [RFC8405] Decraene, B., Litkowski, S., Gredler, H., Lindem, A., Francois, P., and C. Bowers, "Shortest Path First (SPF) Back-Off Delay Algorithm for Link-State IGPs", RFC 8405, DOI 10.17487/RFC8405, June 2018, <https://www.rfc-editor.org/info/rfc8405>. 9.2. Informative References [MICROLOOP-LSRP] Zinin, A., "Analysis and Minimization of Microloops in Link-state Routing Protocols", Work in Progress, draft- ietf-rtgwg-microloop-analysis-01, October 2005. [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., Francois, P., and O. Bonaventure, "Framework for Loop-Free Convergence Using the Ordered Forwarding Information Base (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July 2013, <https://www.rfc-editor.org/info/rfc6976>. [RFC8333] Litkowski, S., Decraene, B., Filsfils, C., and P. Francois, "Micro-loop Prevention by Introducing a Local Convergence Delay", RFC 8333, DOI 10.17487/RFC8333, March 2018, <https://www.rfc-editor.org/info/rfc8333>. Acknowledgements The authors would like to thank Mike Shand and Chris Bowers for their useful comments. Authors' Addresses Stephane Litkowski Orange Business Service Email: stephane.litkowski@orange.com Bruno Decraene Orange Email: bruno.decraene@orange.com Martin Horneffer Deutsche Telekom Email: martin.horneffer@telekom.de