rfc9616.original   rfc9616.txt 
Network Working Group B. Jonglez Internet Engineering Task Force (IETF) B. Jonglez
Internet-Draft ENS Lyon Request for Comments: 9616 ENS Lyon
Intended status: Standards Track J. Chroboczek Category: Standards Track J. Chroboczek
Expires: 23 October 2024 IRIF, Université Paris Cité ISSN: 2070-1721 IRIF, Université Paris Cité
21 April 2024 September 2024
Delay-based Metric Extension for the Babel Routing Protocol Delay-Based Metric Extension for the Babel Routing Protocol
draft-ietf-babel-rtt-extension-07
Abstract Abstract
This document defines an extension to the Babel routing protocol that This document defines an extension to the Babel routing protocol that
measures the round-trip time (RTT) between routers and makes it measures the round-trip time (RTT) between routers and makes it
possible to prefer lower latency links over higher latency ones. possible to prefer lower-latency links over higher-latency ones.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on 23 October 2024. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9616.
Copyright Notice Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Applicability
2. Specification of Requirements . . . . . . . . . . . . . . . . 4 2. Specification of Requirements
3. RTT sampling . . . . . . . . . . . . . . . . . . . . . . . . 4 3. RTT Sampling
3.1. Data structures . . . . . . . . . . . . . . . . . . . . . 4 3.1. Data Structures
3.2. Protocol operation . . . . . . . . . . . . . . . . . . . 5 3.2. Protocol Operation
3.3. Wrap-around and node restart . . . . . . . . . . . . . . 7 3.3. Wrap-Around and Node Restart
3.4. Implementation notes . . . . . . . . . . . . . . . . . . 7 3.4. Implementation Notes
4. RTT-based route selection . . . . . . . . . . . . . . . . . . 8 4. RTT-Based Route Selection
4.1. Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1. Smoothing
4.2. Cost computation . . . . . . . . . . . . . . . . . . . . 9 4.2. Cost Computation
4.3. Hysteresis . . . . . . . . . . . . . . . . . . . . . . . 11 4.3. Hysteresis
5. Backwards and forwards compatibility . . . . . . . . . . . . 11 5. Backwards and Forwards Compatibility
6. Packet format . . . . . . . . . . . . . . . . . . . . . . . . 11 6. Packet Format
6.1. Timestamp sub-TLV in Hello TLVs . . . . . . . . . . . . . 11 6.1. Timestamp Sub-TLV in Hello TLVs
6.2. Timestamp sub-TLV in IHU TLVs . . . . . . . . . . . . . . 12 6.2. Timestamp Sub-TLV in IHU TLVs
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 7. IANA Considerations
8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. Security Considerations
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 9. References
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 9.1. Normative References
10.1. Normative References . . . . . . . . . . . . . . . . . . 13 9.2. Informative References
10.2. Informative References . . . . . . . . . . . . . . . . . 14 Acknowledgements
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses
1. Introduction 1. Introduction
The Babel routing protocol [RFC8966] does not mandate a specific The Babel routing protocol [RFC8966] does not mandate a specific
algorithm for computing metrics; existing implementations use a algorithm for computing metrics; existing implementations use a
packet-loss based metric on wireless links and a simple hop-count packet-loss-based metric on wireless links and a simple hop-count
metric on all other types of links. While this strategy works metric on all other types of links. While this strategy works
reasonably well in many networks, it fails to select reasonable reasonably well in many networks, it fails to select reasonable
routes in some topologies involving tunnels or VPNs. routes in some topologies involving tunnels or VPNs.
+------------+ +------------+
| A (Paris) +---------------+ | A (Paris) +---------------+
+------------+ \ +------------+ \
/ \ / \
/ \ / \
/ \ / \
+------------+ +------------+ +------------+ +------------+
| B (Paris) | | C (Tokyo) | | B (Paris) | | C (Tokyo) |
+------------+ +------------+ +------------+ +------------+
\ / \ /
\ / \ /
\ / \ /
+------------+ / +------------+ /
| D (Paris) +---------------+ | D (Paris) +---------------+
+------------+ +------------+
Figure 1: Four routers in a diamond topology Figure 1: Four Routers in a Diamond Topology
Consider for example the topology described in Figure 1, with three For example, consider the topology described in Figure 1, with three
routers A, B and D located in Paris and a fourth router C located in routers A, B, and D located in Paris and a fourth router C located in
Tokyo, connected through tunnels in a diamond topology. When routing Tokyo, connected through tunnels in a diamond topology. When routing
traffic from A to D, it is obviously preferable to use the local traffic from A to D, it is obviously preferable to use the local
route through B, as this is likely to provide better service quality route through B as this is likely to provide better service quality
and lower monetary cost than the distant route through C. However, and lower monetary cost than the distant route through C. However,
the existing implementations of Babel consider both routes as having the existing implementations of Babel consider both routes as having
the same metric, and will therefore route the traffic through C in the same metric; therefore, they will route the traffic through C in
roughly half the cases. roughly half the cases.
In the first part of this document (Section 3), we specify an In the first part of this document (Section 3), we specify an
extension to the Babel routing protocol that produces a sequence of extension to the Babel routing protocol that produces a sequence of
accurate measurements of the round-trip time (RTT) between two Babel accurate measurements of the round-trip time (RTT) between two Babel
neighbours. These measurements are not directly usable as an input neighbours. These measurements are not directly usable as an input
to Babel's route selection procedure, since they tend to be noisy and to Babel's route selection procedure since they tend to be noisy and
to cause a negative feedback loop, which might give rise to frequent to cause a negative feedback loop, which might give rise to frequent
oscillations. In a second part (Section 4), we define an algorithm oscillations. In the second part (Section 4), we define an algorithm
that maps the sequence of RTT samples to a link cost that can be used that maps the sequence of RTT samples to a link cost that can be used
for route selection. for route selection.
1.1. Applicability 1.1. Applicability
The extension defined in Section 3 provides a sequence of accurate The extension defined in Section 3 provides a sequence of accurate
but potentially noisy RTT samples. Since the round-trip time is a but potentially noisy RTT samples. Since the RTT is a symmetric
symmetric measure of delay, this protocol is only applicable in measure of delay, this protocol is only applicable in environments
environments where the symmetric delay is a good predictor of whether where the symmetric delay is a good predictor of whether a link
a link should be taken by routing traffic, which might not should be taken by routing traffic, which might not necessarily be
necessarily be the case in networks built over exotic link the case in networks built over exotic link technologies.
technologies.
The extension makes minimal requirements on the nodes. In The extension makes minimal requirements on the nodes. In
particular, it does not assume synchronised clocks, but only requires particular, it does not assume synchronised clocks, and only requires
that clock drift be negligible during the time interval between two that clock drift be negligible during the time interval between two
Hello TLVs. Since that is on the order of a few seconds, this Hello TLVs. Since that is on the order of a few seconds, this
requirement is met even with cheap crystal oscillators such as the requirement is met even with cheap crystal oscillators, such as the
ones used in consumer electronics. ones used in consumer electronics.
The algorithm defined in Section 4 makes a number of assumptions The algorithm defined in Section 4 depends on a number of assumptions
about the network. The assumption with most severe consequences is about the network. The assumption with the most severe consequences
that all links below a certain RTT (rtt-min in Section 4.2) can be is that all links below a certain RTT (rtt-min in Section 4.2) can be
grouped in a single category of "good" links. While this is the case grouped in a single category of "good" links. While this is the case
in wide-area overlay networks, it makes the algorithm inapplicable in in wide-area overlay networks, it makes the algorithm inapplicable in
networks where distinguishing between low-latency links is important. networks where distinguishing between low-latency links is important.
There are other assumptions, but they are less likely to limit the There are other assumptions, but they are less likely to limit the
algorithm's applicability. The algorithm assumes that all links algorithm's applicability. The algorithm assumes that all links
above a certain RTT (rtt-max in Section 4.2) can be assumed to be above a certain RTT (rtt-max in Section 4.2) are equally bad, and
equally bad, and will only be used as a last resort. In addition, in they will only be used as a last resort. In addition, in order to
order to avoid oscillations, the algorithm is designed to react avoid oscillations, the algorithm is designed to react slowly to RTT
slowly to RTT variations, thus causing suboptimal routing for seconds variations, thus causing suboptimal routing for seconds or even
or even minutes after an RTT change; while this is a desirable minutes after an RTT change; while this is a desirable property in
property in fixed networks, as it avoid excessive route oscillations, fixed networks, as it avoid excessive route oscillations, it might be
it might be an issue with networks with high rates of node mobility. an issue with networks with high rates of node mobility.
2. Specification of Requirements 2. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3. RTT sampling 3. RTT Sampling
3.1. Data structures 3.1. Data Structures
We assume that every Babel speaker maintains a local clock, that We assume that every Babel speaker maintains a local clock that
counts microseconds from an arbitrary origin. We do not assume that counts microseconds from an arbitrary origin. We do not assume that
clocks are synchronised: clocks local to distinct nodes need not clocks are synchronised: clocks local to distinct nodes need not
share a common origin. The protocol will eventually recover if the share a common origin. The protocol will eventually recover if the
clock is stepped, so clocks need not persist across node reboots. clock is stepped, so clocks need not persist across node reboots.
Every Babel speaker maintains a Neighbour Table, described in Every Babel speaker maintains a Neighbour Table, described in
Section 3.2.4 of [RFC8966]. This extension extends every entry in Section 3.2.4 of [RFC8966]. This extension extends every entry in
the Neighbour Table with the following data: the Neighbour Table with the following data:
* the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according * the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according
to the neighbour's clock; to the neighbour's clock;
* the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according * the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according
to the local clock. to the local clock.
Both values are initially undefined. Both values are initially undefined.
3.2. Protocol operation 3.2. Protocol Operation
The RTT to a neighbour is estimated using an algorithm due to Mills The RTT to a neighbour is estimated using an algorithm due to Mills
[MILLS], originally developed for the HELLO routing protocol and [RFC891], originally developed for the HELLO routing protocol and
later used in NTP [NTP]. later used in NTP [RFC5905].
A Babel speaker periodically sends Hello messages to its neighbours A Babel speaker periodically sends Hello messages to its neighbours
(Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a (Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a
set of IHU ("I Heard You") messages, at most one per neighbour set of IHU ("I Heard You") messages, at most one per neighbour
(Section 3.4.2 of [RFC8966]). (Section 3.4.2 of [RFC8966]).
A B A B
| | | |
t1 + | t1 + |
|\ | |\ |
skipping to change at page 5, line 45 skipping to change at line 216
| /| | /|
| / | | / |
| / | | / |
| / | Hello(t2') | / | Hello(t2')
| / | IHU(t1, t1') | / | IHU(t1, t1')
|/ | |/ |
t2 + | t2 + |
| | | |
v v v v
Figure 2: Mill's algorithm Figure 2: Mills' Algorithm
In order to enable the computation of RTTs, a node A MUST include in In order to enable the computation of RTTs, a node A MUST include, in
every Hello that it sends a timestamp t1 (according to A's local every Hello that it sends, a timestamp t1 (according to A's local
clock), as illustrated in Figure 2. When a node B receives A's clock), as illustrated in Figure 2. When a node B receives A's
timestamped Hello, it computes the time t1' at which the Hello was timestamped Hello, it computes the time t1' at which the Hello was
received (according to B's local clock). It then MUST record the received (according to B's local clock). It then MUST record the
value t1 in the Origin Timestamp field of the Neighbour Table entry value t1 in the Origin Timestamp field of the Neighbour Table entry
corresponding to A, and the value t1' in the Receive Timestamp field corresponding to A and the value t1' in the Receive Timestamp field
of the Neighbour Table entry. of the Neighbour Table entry.
When B sends an IHU to A, it checks whether both timestamps are When B sends an IHU to A, it checks whether both timestamps are
defined in the Neighbour Table. If that is the case, then it MUST defined in the Neighbour Table. If that is the case, then it MUST
ensure that its IHU TLV is sent in a packet that also contains a ensure that its IHU TLV is sent in a packet that also contains a
timestamped Hello TLV (either a normally scheduled Hello, or an timestamped Hello TLV (either a normally scheduled Hello or an
unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include
in the IHU both the Origin Timestamp and the Receive Timestamp stored in the IHU both the Origin Timestamp and the Receive Timestamp stored
in the Neighbour Table. in the Neighbour Table.
Upon receiving B's packet, A computes the time t2 (according to its Upon receiving B's packet, A computes the time t2 (according to its
local clock) at which it was received. A MUST then verify that it local clock) at which it was received. Node A MUST then verify that
contains both a Hello TLV with timestamp t2' and an IHU TLV with two it contains both a Hello TLV with timestamp t2' and an IHU TLV with
timestamps t1 and t1'. If that is the case, A computes the value two timestamps t1 and t1'. If that is the case, A computes the
RTT = (t2 - t1) - (t2' - t1') (where all computations are done modulo value:
2^32), which is a measurement of the RTT between A and B. (A then
stores the values t2' and t2 in its Neighbour Table, as B did
before.)
This algorithm has a number of desirable properties. First, the RTT = (t2 - t1) - (t2' - t1')
algorithm is symmetric: A and B use the same procedures for
timestamping packets and computing RTT samples, and both nodes (where all computations are done modulo 2^32), which is a measurement
produce one RTT sample for each received (Hello, IHU) pair. Second, of the RTT between A and B. (A then stores the values t2' and t2 in
since there is no requirement that t1' and t2' be equal, the protocol its Neighbour Table, as B did before.)
is asynchronous: the only change to Babel's message scheduling is the
requirement that a packet containing an IHU also contain a Hello. This algorithm has a number of desirable properties:
Third, since it only ever computes differences of timestamps
according to a single clock, it does not require synchronised clocks. 1. The algorithm is symmetric: A and B use the same procedures for
Fourth, it requires very little additional state: a node only needs timestamping packets and computing RTT samples, and both nodes
to store the two timestamps associated with the last hello received produce one RTT sample for each received (Hello, IHU) pair.
from each neighbour. Finally, since it only requires piggybacking
one or two timestamps on each Hello and IHU TLV, it makes efficient 2. Since there is no requirement that t1' and t2' be equal, the
use of network resources. protocol is asynchronous: the only change to Babel's message
scheduling is the requirement that a packet containing an IHU
also contain a Hello.
3. Since the algorithm only ever computes differences of timestamps
according to a single clock, it does not require synchronised
clocks.
4. The algorithm requires very little additional state: a node only
needs to store the two timestamps associated with the last hello
received from each neighbour.
5. Since the algorithm only requires piggybacking one or two
timestamps on each Hello and IHU TLV, it makes efficient use of
network resources.
In principle, this algorithm is inaccurate in the presence of clock In principle, this algorithm is inaccurate in the presence of clock
drift (i.e., when A's and B's clocks are running at different drift (i.e., when A's clock and B's clock are running at different
frequencies). However, t2' - t1' is usually on the order of a few frequencies). However, t2' - t1' is usually on the order of a few
seconds, and significant clock drift is unlikely to happen at that seconds, and significant clock drift is unlikely to happen at that
time scale. time scale.
In order for RTT values to be consistent between implementations, In order for RTT values to be consistent between implementations,
timestamps need to be computed at roughly the same point in the timestamps need to be computed at roughly the same point in the
network stack. Transmit timestamps SHOULD be computed just before network stack. Transmit timestamps SHOULD be computed just before
the packet is passed to the network stack (i.e., before it is the packet is passed to the network stack (i.e., before it is
subjected to any queueing delays), and receive timestamps SHOULD be subjected to any queueing delays); receive timestamps SHOULD be
computed just after the packet is received from the network stack. computed just after the packet is received from the network stack.
3.3. Wrap-around and node restart 3.3. Wrap-Around and Node Restart
Timestamp values are a count of microseconds stored as a 32-bit Timestamp values are a count of microseconds stored as a 32-bit
unsigned integer; thus, they wrap around every 71 minutes or so. unsigned integer; thus, they wrap around every 71 minutes or so.
What is more, a node may occasionally reboot and restart its clock at What is more, a node may occasionally reboot and restart its clock at
an arbitrary origin. For these reasons, very old timestamps or an arbitrary origin. For these reasons, very old timestamps or
nonsensical timestamps MUST NOT be used to yield RTT samples. nonsensical timestamps MUST NOT be used to yield RTT samples.
The following algorithm can be used to achieve that. When a node The following algorithm can be used to discard obsolete samples.
receives a packet containing a Hello and an IHU, it compares the When a node receives a packet containing a Hello and an IHU, it
current local time t2 with the Origin Timestamp contained in the IHU; compares the current local time t2 with the Origin Timestamp
if the Origin Timestamp appears to be in the future, or if it is in contained in the IHU; if the Origin Timestamp appears to be in the
the past by more than a time T (the value T = 3 minutes is future, or if it is in the past by more than a time T (the value T =
recommended), then the timestamps are still recorded in the neighbour 3 minutes is recommended), then the timestamps are still recorded in
table, but are not used for computation of an RTT sample. the Neighbour Table, but they are not used for computation of an RTT
sample.
Similary, the node compares the Hello's timestamp with the Receive Similarly, the node compares the Hello's timestamp with the Receive
Timestamp recorded in the Neighbour Table; if the Hello's timestamp Timestamp recorded in the Neighbour Table; if the Hello's timestamp
appears to be older than the recorded timestamp, or if it appears to appears to be older than the recorded timestamp, or if it appears to
be more recent by an interval larger than the value T, then the be more recent by an interval larger than the value T, then the
timestamps are not used for computation of an RTT sample. timestamps are not used for computation of an RTT sample.
3.4. Implementation notes 3.4. Implementation Notes
The accuracy of the computed RTT samples depends on Transmit The accuracy of the computed RTT samples depends on Transmit
Timestamps being computed as late as possible before a packet Timestamps being computed as late as possible before a packet
containing a Hello TLV is passed to the network stack, and Receive containing a Hello TLV is passed to the network stack, and Receive
Timestamps being computed as early as possible after reception of a Timestamps being computed as early as possible after reception of a
packet containing a (Hello, IHU) pair. We have found the following packet containing a (Hello, IHU) pair. We have found the following
implementation strategy to be useful. implementation strategy to be useful.
When a Hello TLV is buffered for transmission, we insert a PadN sub- When a Hello TLV is buffered for transmission, we insert a PadN sub-
TLV (Section 4.7.2 of [RFC8966]) with a length of 4 octets within the TLV (Section 4.7.2 of [RFC8966]) with a length of 4 octets within the
TLV. When the packet is ready to be sent, we check whether it TLV. When the packet is ready to be sent, we check whether it
contains a 4-octet PadN sub-TLV; if that's the case, we overwrite the contains a 4-octet PadN sub-TLV; if that's the case, we overwrite the
PadN sub-TLV with a Timestamp sub-TLV with the current time, and send PadN sub-TLV with a Timestamp sub-TLV with the current time, and send
out the packet. out the packet.
Conversely, when a packet is received, we immediately compute the Conversely, when a packet is received, we immediately compute the
current time and record it with the received packet. We then process current time and record it with the received packet. We then process
the packet as usual, and use the recorded timestamp in order to the packet as usual and use the recorded timestamp in order to
compute an RTT sample. compute an RTT sample.
The protocol is designed to survive the clock being reset when a node The protocol is designed to survive the clock being reset when a node
reboots; on POSIX systems, this makes it possible to use the reboots; on POSIX systems, this makes it possible to use the
CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC
is not available, CLOCK_REALTIME may be used, since the protocol is is not available, CLOCK_REALTIME may be used, since the protocol is
able to survive the clock being occasionally stepped. able to survive the clock being occasionally stepped.
4. RTT-based route selection 4. RTT-Based Route Selection
The protocol described above yields a series of RTT samples. While The protocol described above yields a series of RTT samples. While
these samples are fairly accurate, they are not directly usable as an these samples are fairly accurate, they are not directly usable as an
input to the route selection procedure, for at least three reasons. input to the route selection procedure, for at least three reasons:
First of all, in the presence of bursty traffic, routers experience 1. In the presence of bursty traffic, routers experience transient
transient congestion, which causes occasional spikes in the measured congestion, which causes occasional spikes in the measured RTT.
RTT. Thus, the RTT signal may be noisy, and requires smoothing Thus, the RTT signal may be noisy and require smoothing before it
before it can be used for route selection. can be used for route selection.
Second, using the RTT signal for route selection gives rise to a 2. Using the RTT signal for route selection gives rise to a negative
negative feedback loop: when a route has a low RTT, it is deemed to feedback loop. When a route has a low RTT, it is deemed to be
be more desirable, which causes it to be used for more data traffic, more desirable; this causes it to be used for more data traffic,
which may lead to congestion, which in turn increases the RTT. which may lead to congestion, which in turn increases the RTT.
Without some form of hysteresis, using RTT for route selection would Without some form of hysteresis, using RTT for route selection
lead to oscillations between parallel routes, which would lead to would lead to oscillations between parallel routes, which would
packet reordering and negatively affect upper-layer protocols (such lead to packet reordering and negatively affect upper-layer
as TCP). protocols (such as TCP).
Third, even in the absence of congestion, the RTT tends to exhibit 3. Even in the absence of congestion, the RTT tends to exhibit some
some variation. If the RTTs of two parallel routes oscillate around variation. If the RTTs of two parallel routes oscillate around a
a common value, using the RTT as input to route selection will cause common value, using the RTT as input to route selection will
frequent routing oscillations, which, again, indicates the need for cause frequent routing oscillations, which, again, indicates the
some form of hysteresis. need for some form of hysteresis.
In this section, we describe an algorithm that integrates smoothing In this section, we describe an algorithm that integrates smoothing
and hysteresis and has been shown to behave well both in simulation and hysteresis. It has been shown to behave well both in simulation
and experimentally over the Internet [DELAY-BASED], and is and experimentally over the Internet [DELAY-BASED] and is RECOMMENDED
RECOMMENDED when RTT information is being used for route selection. when RTT information is being used for route selection. The
The algorithm is structured as follows: algorithm is structured as follows:
* the RTT values are first smoothed in order to avoid instabilities * the RTT values are first smoothed in order to avoid instabilities
due to outliers (Section 4.1); due to outliers (Section 4.1);
* the resulting smoothed samples are mapped to a cost using a * the resulting smoothed samples are mapped to a cost using a
bounded, non-linear mapping, which avoids instabilities at the bounded, non-linear mapping, which avoids instabilities at the
lower and at the upper end of the RTT range (Section 4.2); lower and upper end of the RTT range (Section 4.2);
* finally, a hysteresis filter is applied in order to limit the * a hysteresis filter is applied in order to limit the amount of
amount of oscillation in the middle of the RTT range oscillation in the middle of the RTT range (Section 4.3).
(Section 4.3).
4.1. Smoothing 4.1. Smoothing
The RTT samples provided by Mills' algorithm are fairly accurate, but The RTT samples provided by Mills' algorithm are fairly accurate, but
noisy: experiments indicate the occasional presence of individual noisy: experiments indicate the occasional presence of individual
samples that are much larger than the expected value. Thus, some samples that are much larger than the expected value. Thus, some
form of smoothing SHOULD be applied in order to avoid instabilities form of smoothing SHOULD be applied in order to avoid instabilities
due to occasional outliers. due to occasional outliers.
An implementation MAY use the exponential average algorithm, which is An implementation MAY use the exponential average algorithm, which is
simple to implement and appears to yield good results in practice simple to implement and appears to yield good results in practice
[DELAY-BASED]. The algorithm is parameterised by a constant α, where [DELAY-BASED]. The algorithm is parameterised by a constant α, where
0 < α < 1, which controls the amount of smoothing being applied. Fr 0 < α < 1, which controls the amount of smoothing being applied. For
each neighbour, it maintains a smoothed value RTT which is initially each neighbour, it maintains a smoothed value RTT, which is initially
undefined. When the first sample RTT0 is measured, the smoothed undefined. When the first sample RTT0 is measured, the smoothed
value is set to the value of RTT0. At each new sample RTTn, the value is set to the value of RTT0. At each new sample RTTn, the
smoothed value is set to a weighted average of the previous smoothed smoothed value is set to a weighted average of the previous smoothed
value and the new sample: value and the new sample:
RTT := α RTT + (1 - α) RTTn RTT := α RTT + (1 - α) RTTn
The smoothing constant α SHOULD be between 0.8 and 0.9; the value The smoothing constant α SHOULD be between 0.8 and 0.9; the value
0.836 is the RECOMMENDED default. 0.836 is the RECOMMENDED default.
4.2. Cost computation 4.2. Cost Computation
The smoothed RTT value obtained in the previous step needs to be The smoothed RTT value obtained in the previous step needs to be
mapped to a link cost, suitable for input to the metric computation mapped to a link cost, suitable for input to the metric computation
procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping
should be monotonic (larger RTTs imply larger costs). In addition, should be monotonic (larger RTTs imply larger costs). In addition,
the mapping should be constant beyond a certain value (all very bad the mapping should be constant beyond a certain value (all very bad
links are equally bad), so that congested links do not contribute to links are equally bad) so that congested links do not contribute to
routing instability. The mapping should also be constant around 0, routing instability. The mapping should also be constant around 0,
so that small oscillations in the RTT of low-RTT links do not so that small oscillations in the RTT of low-RTT links do not
contribute to routing instability. contribute to routing instability.
cost cost
^ ^
| |
| |
| C + max-rtt-penalty | C + max-rtt-penalty
| +--------------------------- | +---------------------------
skipping to change at page 10, line 29 skipping to change at line 434
| / . | / .
| / . | / .
C +------------+ . C +------------+ .
| . . | . .
| . . | . .
| . . | . .
| . . | . .
0 +----------------------------------------------------> 0 +---------------------------------------------------->
0 rtt-min rtt-max RTT 0 rtt-min rtt-max RTT
Figure 3: Mapping from RTT to link cost Figure 3: Mapping from RTT to Link Cost
Implementations SHOULD use the mapping described in figure Figure 3, Implementations SHOULD use the mapping described in Figure 3, which
which is parameterised by three parameters rtt-min, rtt-max, and max- is parameterised by three parameters: rtt-min, rtt-max, and max-rtt-
rtt-penalty. For RTT values below rtt-min, the link cost is just the penalty. For RTT values below rtt-min, the link cost is just the
nominal cost C of a single hop. Between rtt-min and rtt-max, the nominal cost C of a single hop. Between rtt-min and rtt-max, the
cost increases linearly; above rtt-max, the constant value max-rtt- cost increases linearly; above rtt-max, the constant value max-rtt-
penalty is added to the nominal cost. penalty is added to the nominal cost.
The value rtt-min should be slightly larger than the RTT of a local, The value rtt-min should be slightly larger than the RTT of a local,
uncongested link. The value rtt-max should be the RTT above which a uncongested link. The value rtt-max should be the RTT above which a
link should be avoided if possible, either because it is a long- link should be avoided if possible, either because it is a long-
distance link or because it is congested; reducing the value of rtt- distance link or because it is congested; reducing the value of rtt-
max improves stability, but prevents the protocol from discriminating max improves stability, but prevents the protocol from discriminating
between high-latency links. As to max-rtt-penalty, it controls how between high-latency links. As for max-rtt-penalty, it controls how
much the protocol will penalise long-distance links. The default much the protocol will penalise long-distance links. The default
values rtt-min = 10ms, rtt-max = 120ms, and max-rtt-penalty = 150 are values rtt-min = 10 ms, rtt-max = 120 ms, and max-rtt-penalty = 150
RECOMMENDED. are RECOMMENDED.
4.3. Hysteresis 4.3. Hysteresis
Even after applying a bounded mapping from smoothed RTT to a cost Even after applying a bounded mapping from smoothed RTT to a cost
value, the cost may fluctuate when a link's RTT is between rtt-min value, the cost may fluctuate when a link's RTT is between rtt-min
and rtt-max. Implementations SHOULD use a robust hysteresis and rtt-max. Implementations SHOULD use a robust hysteresis
algorithm, such as the one described in Appendix A.3 of [RFC8966]. algorithm, such as the one described in Appendix A.3 of [RFC8966].
5. Backwards and forwards compatibility 5. Backwards and Forwards Compatibility
This protocol extension stores the data that it requires within sub- This protocol extension stores the data that it requires within sub-
TLVs of Babel's Hello and IHU TLVs. As discussed in Appendix D of TLVs of Babel's Hello and IHU TLVs. As discussed in Appendix D of
[RFC8966], implementations that do not understand this extension will [RFC8966], implementations that do not understand this extension will
silently ignore the sub-TLVs while parsing the rest of the TLVs that silently ignore the sub-TLVs while parsing the rest of the TLVs that
they contain. In effect, this extension supports building hybrid they contain. In effect, this extension supports building hybrid
networks consisting of extended and unextended routers, and while networks consisting of extended and unextended routers; while such
such networks might suffer from sub-optimal routing, they will not networks might suffer from sub-optimal routing, they will not suffer
suffer from blackholes or routing loops. from routing loops or other pathologies.
If a sub-TLV defined in this extension is longer than expected, the If a sub-TLV defined in this extension is longer than expected, the
additional data is silently ignored. This provision is made in order additional data is silently ignored. This provision is made in order
to allow a future version of this protocol to extend the packet to allow a future version of this protocol to extend the packet
format with additional data, for example high-precision or absolute format with additional data, for example high-precision or absolute
timestamps. timestamps.
6. Packet format 6. Packet Format
This extension defines the Timestamp sub-TLV whose Type field has This extension defines the Timestamp sub-TLV whose Type field has the
value 3. This sub-TLV can be contained within a Hello sub-TLV, in value 3. This sub-TLV can be contained within a Hello sub-TLV, in
which case it carries a single timestamp, or within an IHU sub-TLV, which case it carries a single timestamp, or within an IHU sub-TLV,
in which case it carries two timestamps. in which case it carries two timestamps.
Timestamps are encoded as 32-bit unsigned integers (modulo 2^32), Timestamps are encoded as 32-bit unsigned integers (modulo 2^32),
expressed in units of one microsecond, counting from an arbitrary expressed in units of one microsecond, counting from an arbitrary
origin. Timestamps wrap around every 4295 seconds, or rougly 71 origin. Timestamps wrap around every 4295 seconds, or roughly 71
minutes (see also Section 3.3). minutes (see also Section 3.3).
6.1. Timestamp sub-TLV in Hello TLVs 6.1. Timestamp Sub-TLV in Hello TLVs
When contained within a Hello TLV, the Timestamp sub-TLV has the When contained within a Hello TLV, the Timestamp sub-TLV has the
following format: following format:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 3 | Length | Transmit timestamp | | Type = 3 | Length | Transmit Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (continued) | | (continued) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields:
Type Set to 3 to indicate a Timestamp sub-TLV. Type: Set to 3 to indicate a Timestamp sub-TLV.
Length The length of the body in octets, exclusive of the Type and Length: The length of the body in octets, exclusive of the Type and
Length fields. Length fields.
Transmit timestamp The time at which the packet containing this sub- Transmit Timestamp: The time at which the packet containing this
TLV was sent, according to the sender's clock. sub-TLV was sent, according to the sender's clock.
If the Length field is larger than the expected 4 octets, the sub-TLV If the Length field is larger than the expected 4 octets, the sub-TLV
MUST be processed normally (the first 4 octets are interpreted as MUST be processed normally (the first 4 octets are interpreted as
described above), and any extra data contained in this sub-TLV MUST described above) and any extra data contained in this sub-TLV MUST be
be silently ignored. If the Length field is smaller than the silently ignored. If the Length field is smaller than the expected 4
expected 4 octets, then this sub-TLV MUST be ignored (and the octets, then this sub-TLV MUST be ignored (and the remainder of the
remainder of the enclosing TLV processed as usual). enclosing TLV processed as usual).
6.2. Timestamp sub-TLV in IHU TLVs 6.2. Timestamp Sub-TLV in IHU TLVs
When contained in an IHU TLV, the Timestamp sub-TLV has the following When contained in an IHU TLV, the Timestamp sub-TLV has the following
format: format:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 3 | Length | Origin timestamp | | Type = 3 | Length | Origin Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (continued) | Receive timestamp | | (continued) | Receive Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (continued) | | (continued) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields: Type: Set to 3 to indicate a Timestamp sub-TLV.
Type Set to 3 to indicate a Timestamp sub-TLV.
Length The length of the body in octets, exclusive of the Type and Length: The length of the body in octets, exclusive of the Type and
Length fields. Length fields.
Origin timestamp A copy of the transmit timestamp of the last Origin Timestamp: A copy of the Transmit Timestamp of the last
Timestamp sub-TLV contained in a Hello TLV received from Timestamp sub-TLV contained in a Hello TLV received from
the node to which the enclosing IHU TLV applies. the node to which the enclosing IHU TLV applies.
Receive timestamp The time, according to the sender's clock, at Receive Timestamp: The time, according to the sender's clock, at
which the last timestamped Hello TLV was received from the which the last timestamped Hello TLV was received from the
node to which the enclosing IHU TLV applies. node to which the enclosing IHU TLV applies.
If the Length field is larger than the expected 8 octets, the sub-TLV If the Length field is larger than the expected 8 octets, the sub-TLV
MUST be processed normally (the first 8 octets are interpreted as MUST be processed normally (the first 8 octets are interpreted as
described above), and any extra data contained in this sub-TLV MUST described above), and any extra data contained in this sub-TLV MUST
be silently ignored. If the Length field is smaller than the be silently ignored. If the Length field is smaller than the
expected 8 octets, then this sub-TLV MUST be ignored (and the expected 8 octets, then this sub-TLV MUST be ignored (and the
remainder of the enclosing TLV processed as usual). remainder of the enclosing TLV processed as usual).
7. IANA Considerations 7. IANA Considerations
IANA has added the following entry to the "Babel Sub-TLV Types" IANA has added the following entry to the "Babel Sub-TLV Types"
registry: registry:
+======+===========+=================+ +======+===========+===========+
| Type | Name | Reference | | Type | Name | Reference |
+======+===========+=================+ +======+===========+===========+
| 3 | Timestamp | (this document) | | 3 | Timestamp | RFC 9616 |
+------+-----------+-----------------+ +------+-----------+-----------+
Table 1 Table 1
8. Security Considerations 8. Security Considerations
This extension adds additional timestamping data to two of the TLVs This extension adds timestamping data to two of the TLVs sent by a
sent by a Babel router. By broadcasting the value of a reasonably Babel router. By broadcasting the value of a reasonably accurate
accurate local clock, they might make a node more susceptible to local clock, these additional data might make a node more susceptible
timing attacks. to timing attacks.
Broadcasting an accurate time raises privacy issues. The timestamps Broadcasting an accurate time raises privacy issues. The timestamps
used by this protocol have an arbitrary origin, and therefore do not used by this protocol have an arbitrary origin; therefore, they do
leak a node's boot time or timezone. However, having access to not leak a node's boot time or time zone. However, having access to
accurate timestamps could allow an attacker to determine the physical accurate timestamps could allow an attacker to determine the physical
location of a node. Nodes might avoid disclosure of location location of a node. Nodes might avoid disclosure of location
information by not including timestamp sub-TLVs in the TLVs that they information by not including Timestamp sub-TLVs in the TLVs that they
send, which will cause their neighbours to fall back to hop-count send, which will cause their neighbours to fall back to hop-count
routing. routing.
9. Acknowledgements 9. References
The authors are indebted to Jean-Paul Smets, who prompted the
investigation that originally lead to this protocol. We are also
grateful to Donald Eastlake, Toke Høiland-Jørgensen, Maria Matejka,
David Schinazi, Pacal Thubert, Steffen Vogel, and Ondřej Zajiček.
10. References
10.1. Normative References
[RFC8966] Chroboczek, J. and D. Schinazi, "The Babel Routing 9.1. Normative References
Protocol", RFC 8966, DOI 10.17487/RFC8966, January 2021,
<https://www.rfc-editor.org/info/rfc8966>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
10.2. Informative References [RFC8966] Chroboczek, J. and D. Schinazi, "The Babel Routing
Protocol", RFC 8966, DOI 10.17487/RFC8966, January 2021,
<https://www.rfc-editor.org/info/rfc8966>.
9.2. Informative References
[DELAY-BASED] [DELAY-BASED]
Jonglez, B. and J. Chroboczek, "A delay-based routing Jonglez, B., Boutier, M., and J. Chroboczek, "A delay-
metric", March 2014. Available online from based routing metric", DOI 10.48550/arXiv.1403.3488, March
http://arxiv.org/abs/1403.3488 2014, <http://arxiv.org/abs/1403.3488>.
[MILLS] Mills, D., "DCN Local-Network Protocols", RFC 891, [RFC891] Mills, D., "DCN Local-Network Protocols", STD 44, RFC 891,
December 1983, <https://www.rfc-editor.org/rfc/rfc891>. DOI 10.17487/RFC0891, December 1983,
<https://www.rfc-editor.org/info/rfc891>.
[NTP] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
Time Protocol Version 4: Protocol and Algorithms "Network Time Protocol Version 4: Protocol and Algorithms
Specification", RFC 5905, June 2010, Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
<https://www.rfc-editor.org/rfc/rfc5905>. <https://www.rfc-editor.org/info/rfc5905>.
Acknowledgements
The authors are indebted to Jean-Paul Smets, who prompted the
investigation that originally lead to this protocol. We are also
grateful to Donald Eastlake, 3rd, Toke Høiland-Jørgensen, Maria
Matejka, David Schinazi, Pascal Thubert, Steffen Vogel, and Ondřej
Zajiček.
Authors' Addresses Authors' Addresses
Baptiste Jonglez Baptiste Jonglez
ENS Lyon ENS Lyon
France France
Email: baptiste.jonglez@ens-lyon.org Email: baptiste.jonglez@ens-lyon.org
Juliusz Chroboczek Juliusz Chroboczek
IRIF, Université Paris Cité IRIF, Université Paris Cité
 End of changes. 85 change blocks. 
225 lines changed or deleted 231 lines changed or added

This html diff was produced by rfcdiff 1.48.