rfc9199.original   rfc9199.txt 
DNSOP Working Group G.C.M. Moura Independent Submission G. Moura
Internet-Draft SIDN Labs/TU Delft Request for Comments: 9199 SIDN Labs/TU Delft
Intended status: Informational W. Hardaker Category: Informational W. Hardaker
Expires: 8 July 2022 J. Heidemann ISSN: 2070-1721 J. Heidemann
USC/Information Sciences Institute USC/Information Sciences Institute
M. Davids M. Davids
SIDN Labs SIDN Labs
4 January 2022 March 2022
Considerations for Large Authoritative DNS Servers Operators Considerations for Large Authoritative DNS Server Operators
draft-moura-dnsop-authoritative-recommendations-11
Abstract Abstract
Recent research work has explored the deployment characteristics and Recent research work has explored the deployment characteristics and
configuration of the Domain Name System (DNS). This document configuration of the Domain Name System (DNS). This document
summarizes the conclusions from these research efforts and offers summarizes the conclusions from these research efforts and offers
specific, tangible considerations or advice to authoritative DNS specific, tangible considerations or advice to authoritative DNS
server operators. Authoritative server operators may wish to follow server operators. Authoritative server operators may wish to follow
these considerations to improve their DNS services. these considerations to improve their DNS services.
It is possible that the results presented in this document could be It is possible that the results presented in this document could be
applicable in a wider context than just the DNS protocol, as some of applicable in a wider context than just the DNS protocol, as some of
the results may generically apply to any stateless/short-duration, the results may generically apply to any stateless/short-duration
anycasted service. anycasted service.
This document is not an IETF consensus document: it is published for This document is not an IETF consensus document: it is published for
informational purposes. informational purposes.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This is a contribution to the RFC Series, independently of any other
and may be updated, replaced, or obsoleted by other documents at any RFC stream. The RFC Editor has chosen to publish this document at
time. It is inappropriate to use Internet-Drafts as reference its discretion and makes no statement about its value for
material or to cite them other than as "work in progress." implementation or deployment. Documents approved for publication by
the RFC Editor are not candidates for any level of Internet Standard;
see Section 2 of RFC 7841.
This Internet-Draft will expire on 8 July 2022. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9199.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document.
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Background
3. Considerations . . . . . . . . . . . . . . . . . . . . . . . 5 3. Considerations
3.1. C1: Deploy anycast in every authoritative server to enhance 3.1. C1: Deploy Anycast in Every Authoritative Server to Enhance
distribution and latency . . . . . . . . . . . . . . . . 5 Distribution and Latency
3.1.1. Research background . . . . . . . . . . . . . . . . . 5 3.1.1. Research Background
3.1.2. Resulting considerations . . . . . . . . . . . . . . 6 3.1.2. Resulting Considerations
3.2. C2: Optimizing routing is more important than location 3.2. C2: Optimizing Routing is More Important than Location
count and diversity . . . . . . . . . . . . . . . . . . . 7 Count and Diversity
3.2.1. Research background . . . . . . . . . . . . . . . . . 7 3.2.1. Research Background
3.2.2. Resulting considerations . . . . . . . . . . . . . . 8 3.2.2. Resulting Considerations
3.3. C3: Collecting anycast catchment maps to improve 3.3. C3: Collect Anycast Catchment Maps to Improve Design
design . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.1. Research Background
3.3.1. Research background . . . . . . . . . . . . . . . . . 8 3.3.2. Resulting Considerations
3.3.2. Resulting considerations . . . . . . . . . . . . . . 9 3.4. C4: Employ Two Strategies When under Stress
3.4. C4: When under stress, employ two strategies . . . . . . 9 3.4.1. Research Background
3.4.1. Research background . . . . . . . . . . . . . . . . . 10 3.4.2. Resulting Considerations
3.4.2. Resulting considerations . . . . . . . . . . . . . . 11 3.5. C5: Consider Longer Time-to-Live Values Whenever Possible
3.5. C5: Consider longer time-to-live values whenever 3.5.1. Research Background
possible . . . . . . . . . . . . . . . . . . . . . . . . 11 3.5.2. Resulting Considerations
3.5.1. Research background . . . . . . . . . . . . . . . . . 11 3.6. C6: Consider the Difference in Parent and Children's TTL
3.5.2. Resulting considerations . . . . . . . . . . . . . . 13 Values
3.6. C6: Consider the TTL differences between parents and 3.6.1. Research Background
children . . . . . . . . . . . . . . . . . . . . . . . . 14 3.6.2. Resulting Considerations
3.6.1. Research background . . . . . . . . . . . . . . . . . 14 4. Security Considerations
3.6.2. Resulting considerations . . . . . . . . . . . . . . 15 5. Privacy Considerations
4. Security considerations . . . . . . . . . . . . . . . . . . . 15 6. IANA Considerations
5. Privacy Considerations . . . . . . . . . . . . . . . . . . . 15 7. References
6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 15 7.1. Normative References
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 7.2. Informative References
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 Acknowledgements
8.1. Normative References . . . . . . . . . . . . . . . . . . 16 Contributors
8.2. Informative References . . . . . . . . . . . . . . . . . 17 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
This document summarizes recent research work that explored the This document summarizes recent research that explored the deployed
deployed DNS configurations and offers derived, specific tangible DNS configurations and offers derived, specific, tangible advice to
advice to DNS authoritative server operators (DNS operators DNS authoritative server operators (referred to as "DNS operators"
hereafter). The considerations (C1--C5) presented in this document hereafter). The considerations (C1-C6) presented in this document
are backed by peer-reviewed research works, which used wide-scale are backed by peer-reviewed research, which used wide-scale Internet
Internet measurements to draw their conclusions. This document measurements to draw their conclusions. This document summarizes the
summarizes the research results and describes the resulting key research results and describes the resulting key engineering options.
engineering options. In each section, it points readers to the In each section, readers are pointed to the pertinent publications
pertinent publications where additional details are presented. where additional details are presented.
These considerations are designed for operators of "large" These considerations are designed for operators of "large"
authoritative DNS servers. In this context, "large" authoritative authoritative DNS servers, which, in this context, are servers with a
servers refers to those with a significant global user population, significant global user population, like top-level domain (TLD)
like top-level domain (TLD) operators, run by either a single or operators, run by either a single operator or multiple operators.
multiple operators. Typically these networks are deployed on wide Typically, these networks are deployed on wide anycast networks
anycast networks [RFC1546][AnyBest]. These considerations may not be [RFC1546] [AnyBest]. These considerations may not be appropriate for
appropriate for smaller domains, such as those used by an smaller domains, such as those used by an organization with users in
organization with users in one unicast network, or in one city or one unicast network or in a single city or region, where operational
region, where operational goals such as uniform, global low latency goals such as uniform, global low latency are less required.
are less required.
It is possible that the results presented in this document could be It is possible that the results presented in this document could be
applicable in a wider context than just the DNS protocol, as some of applicable in a wider context than just the DNS protocol, as some of
the results may generically apply to any stateless/short-duration, the results may generically apply to any stateless/short-duration
anycasted service. Because the conclusions of the reviewed studies anycasted service. Because the conclusions of the reviewed studies
don't measure smaller networks, the wording in this document don't measure smaller networks, the wording in this document
concentrates solely on disusing large-scale DNS authoritative concentrates solely on discussing large-scale DNS authoritative
services only. services.
This document is not an IETF consensus document: it is published for This document is not an IETF consensus document: it is published for
informational purposes. informational purposes.
2. Background 2. Background
The DNS has main two types of DNS servers: authoritative servers and The DNS has two main types of DNS servers: authoritative servers and
recursive resolvers, shown by a representational deployment model in recursive resolvers, shown by a representational deployment model in
Figure 1. An authoritative server (shown as AT1--AT4 in Figure 1) Figure 1. An authoritative server (shown as AT1-AT4 in Figure 1)
knows the content of a DNS zone, and is responsible for answering knows the content of a DNS zone and is responsible for answering
queries about that zone. It runs using local (possibly automatically queries about that zone. It runs using local (possibly automatically
updated) copies of the zone and does not need to query other servers updated) copies of the zone and does not need to query other servers
[RFC2181] in order to answer requests. A recursive resolver (Re1-- [RFC2181] in order to answer requests. A recursive resolver
Re3) is a server that iteratively queries authoritative and other (Re1-Re3) is a server that iteratively queries authoritative and
servers to answer queries received from client requests [RFC1034]. A other servers to answer queries received from client requests
client typically employs a software library called a stub resolver [RFC1034]. A client typically employs a software library called a
(stub in Figure 1) to issue its query to the upstream recursive "stub resolver" ("stub" in Figure 1) to issue its query to the
resolvers [RFC1034]. upstream recursive resolvers [RFC1034].
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| AT1 | | AT2 | | AT3 | | AT4 | | AT1 | | AT2 | | AT3 | | AT4 |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
| +-----+ | | | +-----+ | |
+------| Re1 |----+| | +------| Re1 |----+| |
| +-----+ | | +-----+ |
| ^ | | ^ |
| | | | | |
| +----+ +----+ | | +----+ +----+ |
+------|Re2 | |Re3 |------+ +------|Re2 | |Re3 |------+
+----+ +----+ +----+ +----+
^ ^ ^ ^
| | | |
| +------+ | | +------+ |
+-| stub |-+ +-| stub |-+
+------+ +------+
Figure 1: Relationship between recursive resolvers (Re) and Figure 1: Relationship between Recursive Resolvers (Re) and
authoritative name servers (ATn) Authoritative Name Servers (ATn)
DNS queries issued by a client contribute to a user's perceived DNS queries issued by a client contribute to a user's perceived
perceived latency and affect user experience [Singla2014] depending latency and affect the user experience [Singla2014] depending on how
on how long it takes for responses to be returned. The DNS system long it takes for responses to be returned. The DNS system has been
has been subject to repeated Denial of Service (DoS) attacks (for subject to repeated Denial-of-Service (DoS) attacks (for example, in
example, in November 2015 [Moura16b]) in order to specifically November 2015 [Moura16b]) in order to specifically degrade the user
degrade user experience. experience.
To reduce latency and improve resiliency against DoS attacks, the DNS To reduce latency and improve resiliency against DoS attacks, the DNS
uses several types of service replication. Replication at the uses several types of service replication. Replication at the
authoritative server level can be achieved with (i) the deployment of authoritative server level can be achieved with the following:
multiple servers for the same zone [RFC1035] (AT1---AT4 in Figure 1),
(ii) the use of IP anycast [RFC1546][RFC4786][RFC7094] that allows i. the deployment of multiple servers for the same zone [RFC1035]
the same IP address to be announced from multiple locations (each of (AT1-AT4 in Figure 1);
referred to as an "anycast instance" [RFC8499]) and (iii) the use of
load balancers to support multiple servers inside a single ii. the use of IP anycast [RFC1546] [RFC4786] [RFC7094] that allows
(potentially anycasted) instance. As a consequence, there are many the same IP address to be announced from multiple locations
possible ways an authoritative DNS provider can engineer its (each of referred to as an "anycast instance" [RFC8499]); and
production authoritative server network, with multiple viable choices
and no necessarily single optimal design. iii. the use of load balancers to support multiple servers inside a
single (potentially anycasted) instance. As a consequence,
there are many possible ways an authoritative DNS provider can
engineer its production authoritative server network with
multiple viable choices, and there is not necessarily a single
optimal design.
3. Considerations 3. Considerations
In the next sections we cover the specific consideration (C1--C6) for In the next sections, we cover the specific considerations (C1-C6)
conclusions drawn within the academic papers about large for conclusions drawn within academic papers about large
authoritative DNS server operators. These considerations are authoritative DNS server operators. These considerations are
conclusions reached from academic works that authoritative server conclusions reached from academic work that authoritative server
operators may wish to consider in order to improve their DNS service. operators may wish to consider in order to improve their DNS service.
Each consideration offers different improvements that may impact Each consideration offers different improvements that may impact
service latency, routing, anycast deployment, and defensive service latency, routing, anycast deployment, and defensive
strategies for example. strategies, for example.
3.1. C1: Deploy anycast in every authoritative server to enhance 3.1. C1: Deploy Anycast in Every Authoritative Server to Enhance
distribution and latency Distribution and Latency
3.1.1. Research background 3.1.1. Research Background
Authoritative DNS server operators announce their service using NS Authoritative DNS server operators announce their service using NS
records[RFC1034]. Different authoritative servers for a given zone records [RFC1034]. Different authoritative servers for a given zone
should return the same content; typically they stay synchronized should return the same content; typically, they stay synchronized
using DNS zone transfers (AXFR[RFC5936] and IXFR[RFC1995]), using DNS zone transfers (authoritative transfer (AXFR) [RFC5936] and
coordinating the zone data they all return to their clients. incremental zone transfer (IXFR) [RFC1995]), coordinating the zone
data they all return to their clients.
As discussed above, the DNS heavily relies upon replication to As discussed above, the DNS heavily relies upon replication to
support high reliability, ensure capacity and to reduce latency support high reliability, ensure capacity, and reduce latency
[Moura16b]. DNS has two complementary mechanisms for service [Moura16b]. The DNS has two complementary mechanisms for service
replication: nameserver replication (multiple NS records) and anycast replication: name server replication (multiple NS records) and
(multiple physical locations). Nameserver replication is strongly anycast (multiple physical locations). Name server replication is
recommended for all zones (multiple NS records), and IP anycast is strongly recommended for all zones (multiple NS records), and IP
used by many larger zones such as the DNS Root[AnyFRoot], most top- anycast is used by many larger zones such as the DNS root [AnyFRoot],
level domains[Moura16b] and many large commercial enterprises, most top-level domains [Moura16b], and many large commercial
governments and other organizations. enterprises, governments, and other organizations.
Most DNS operators strive to reduce service latency for users, which Most DNS operators strive to reduce service latency for users, which
is greatly affected by both of these replication techniques. is greatly affected by both of these replication techniques.
However, because operators only have control over their authoritative However, because operators only have control over their authoritative
servers, and not over the client's recursive resolvers, it is servers and not over the client's recursive resolvers, it is
difficult to ensure that recursives will be served by the closest difficult to ensure that recursives will be served by the closest
authoritative server. Server selection is ultimately up to the authoritative server. Server selection is ultimately up to the
recursive resolver's software implementation, and different vendors recursive resolver's software implementation, and different vendors
and even different releases employ different criteria to chose the and even different releases employ different criteria to choose the
authoritative servers with which to communicate. authoritative servers with which to communicate.
Understanding how recursive resolvers choose authoritative servers is Understanding how recursive resolvers choose authoritative servers is
a key step in improving the effectiveness of authoritative server a key step in improving the effectiveness of authoritative server
deployments. To measure and evaluate server deployments, deployments. To measure and evaluate server deployments,
[Mueller17b] deployed seven unicast authoritative name servers in [Mueller17b] describes the deployment of seven unicast authoritative
different global locations and then queried them from more than 9000 name servers in different global locations and then queried them from
RIPE authoritative server operators and their respective recursive more than 9000 Reseaux IP Europeens (RIPE) authoritative server
resolvers. operators and their respective recursive resolvers.
[Mueller17b] found that recursive resolvers in the wild query all It was found in [Mueller17b] that recursive resolvers in the wild
available authoritative servers, regardless of the observed latency. query all available authoritative servers, regardless of the observed
But the distribution of queries tends to be skewed towards latency. But the distribution of queries tends to be skewed towards
authoritatives with lower latency: the lower the latency between a authoritatives with lower latency: the lower the latency between a
recursive resolver and an authoritative server, the more often the recursive resolver and an authoritative server, the more often the
recursive will send queries to that server. These results were recursive will send queries to that server. These results were
obtained by aggregating results from all of the vantage points and obtained by aggregating results from all of the vantage points, and
were not specific to any specific vendor or version. they were not specific to any vendor or version.
The authors believe this behavior is a consequence of combining the The authors believe this behavior is a consequence of combining the
two main criteria employed by resolvers when selecting authoritative two main criteria employed by resolvers when selecting authoritative
servers: resolvers regularly check all listed authoritative servers servers: resolvers regularly check all listed authoritative servers
in an NS set to determine which is closer (the least latent) and when in an NS set to determine which is closer (the least latent), and
one isn't available selects one of the alternatives. when one isn't available, it selects one of the alternatives.
3.1.2. Resulting considerations 3.1.2. Resulting Considerations
For an authoritative DNS operator, this result means that the latency For an authoritative DNS operator, this result means that the latency
of all authoritative servers (NS records) matter, so they all must be of all authoritative servers (NS records) matter, so they all must be
similarly capable -- all available authoritatives will be queried by similarly capable -- all available authoritatives will be queried by
most recursive resolvers. Unicasted services, unfortunately, cannot most recursive resolvers. Unicasted services, unfortunately, cannot
deliver good latency worldwide (a unicast authoritative server in deliver good latency worldwide (a unicast authoritative server in
Europe will always have high latency to resolvers in California and Europe will always have high latency to resolvers in California and
Australia, for example, given its geographical distance). Australia, for example, given its geographical distance).
[Mueller17b] recommends that DNS operators deploy equally strong IP [Mueller17b] recommends that DNS operators deploy equally strong IP
anycast instances for every authoritative server (i.e., for each NS anycast instances for every authoritative server (i.e., for each NS
record). Each large authoritative DNS server provider should phase record). Each large authoritative DNS server provider should phase
out their usage of unicast and deploy a well engineered number of out its usage of unicast and deploy a number of well-engineered
anycast instances with good peering strategies so they can provide anycast instances with good peering strategies so they can provide
good latency to their global clients. good latency to their global clients.
As a case study, the ".nl" TLD zone was originally served on seven As a case study, the ".nl" TLD zone was originally served on seven
authoritative servers with a mixed unicast/anycast setup. In early authoritative servers with a mixed unicast/anycast setup. In early
2018, .nl moved to a setup with 4 anycast authoritative servers. 2018, .nl moved to a setup with 4 anycast authoritative servers.
[Mueller17b]'s contribution to DNS service engineering shows that The contribution of [Mueller17b] to DNS service engineering shows
because unicast cannot deliver good latency worldwide, anycast needs that because unicast cannot deliver good latency worldwide, anycast
to be used to provide a low latency service worldwide. needs to be used to provide a low-latency service worldwide.
3.2. C2: Optimizing routing is more important than location count and 3.2. C2: Optimizing Routing is More Important than Location Count and
diversity Diversity
3.2.1. Research background 3.2.1. Research Background
When selecting an anycast DNS provider or setting up an anycast When selecting an anycast DNS provider or setting up an anycast
service, choosing the best number of anycast service, choosing the best number of anycast instances [RFC4786]
instances[RFC4786][RFC7094] to deploy is a challenging problem. [RFC7094] to deploy is a challenging problem. Selecting the right
Selecting where and how many global locations to announce from using quantity and set of global locations that should send BGP
BGP is tricky. Intuitively, one could naively think that the more announcements is tricky. Intuitively, one could naively think that
instances the better and simply "more" will always lead to shorter more instances are better and that simply "more" will always lead to
response times. shorter response times.
This is not necessarily true, however. In fact, [Schmidt17a] found This is not necessarily true, however. In fact, proper route
that proper route engineering can matter more than the total number engineering can matter more than the total number of locations, as
of locations. They analyzed the relationship between the number of found in [Schmidt17a]. To study the relationship between the number
anycast instances and service performance (measuring latency of the of anycast instances and the associated service performance, the
round-trip time (RTT)), measuring the overall performance of four DNS authors measured the round-trip time (RTT) latency of four DNS root
Root servers. The Root DNS servers are implemented by 12 separate servers. The root DNS servers are implemented by 12 separate
organizations serving the DNS root zone at 13 different IPv4/IPv6 organizations serving the DNS root zone at 13 different IPv4/IPv6
address pairs. address pairs.
The results documented in [Schmidt17a] measured the performance of The results documented in [Schmidt17a] measured the performance of
the {c,f,k,l}.root-servers.net (hereafter, "C", "F", "K" and "L") the {c,f,k,l}.root-servers.net (referred to as "C", "F", "K", and "L"
servers from more than 7.9k RIPE Atlas probes. RIPE Atlas is a hereafter) servers from more than 7,900 RIPE Atlas probes. RIPE
Internet measurement platform with more than 12000 global vantage Atlas is an Internet measurement platform with more than 12,000
points called "Atlas Probes" -- it is used regularly by both global vantage points called "Atlas probes", and it is used regularly
researchers and operators [RipeAtlas15a] [RipeAtlas19a]. by both researchers and operators [RipeAtlas15a] [RipeAtlas19a].
[Schmidt17a] found that the C server, a smaller anycast deployment In [Schmidt17a], the authors found that the C server, a smaller
consisting of only 8 instances, provided very similar overall anycast deployment consisting of only 8 instances, provided very
performance in comparison to the much larger deployments of K and L, similar overall performance in comparison to the much larger
with 33 and 144 instances respectively. The median RTT for C, K and deployments of K and L, with 33 and 144 instances, respectively. The
L root server were all between 30-32ms. median RTTs for the C, K, and L root servers were all between 30-32
ms.
Because RIPE Atlas is known to have better coverage in Europe than Because RIPE Atlas is known to have better coverage in Europe than
other regions, the authors specifically analyzed the results per other regions, the authors specifically analyzed the results per
region and per country (Figure 5 in [Schmidt17a]), and show that region and per country (Figure 5 in [Schmidt17a]) and show that known
known Atlas bias toward Europe does not change the conclusion that Atlas bias toward Europe does not change the conclusion that properly
properly selected anycast locations is more important to latency than selected anycast locations are more important to latency than the
the number of sites. number of sites.
3.2.2. Resulting considerations 3.2.2. Resulting Considerations
The important conclusion of [Schmidt17a] is that when engineering The important conclusion from [Schmidt17a] is that when engineering
anycast services for performance, factors other than just the number anycast services for performance, factors other than just the number
of instances (such as local routing connectivity) must be considered. of instances (such as local routing connectivity) must be considered.
Specifically, optimizing routing policies is more important than Specifically, optimizing routing policies is more important than
simply adding new instances. They showed that 12 instances can simply adding new instances. The authors showed that 12 instances
provide reasonable latency, assuming they are globally distributed can provide reasonable latency, assuming they are globally
and have good local interconnectivity. However, additional instances distributed and have good local interconnectivity. However,
can still be useful for other reasons, such as when handling Denial- additional instances can still be useful for other reasons, such as
of-service (DoS) attacks [Moura16b]. when handling DoS attacks [Moura16b].
3.3. C3: Collecting anycast catchment maps to improve design 3.3. C3: Collect Anycast Catchment Maps to Improve Design
3.3.1. Research background 3.3.1. Research Background
An anycast DNS service may be deployed from anywhere from several An anycast DNS service may be deployed from anywhere and from several
locations to hundreds of locations (for example, l.root-servers.net locations to hundreds of locations (for example, l.root-servers.net
has over 150 anycast instances at the time this was written). has over 150 anycast instances at the time this was written).
Anycast leverages Internet routing to distribute incoming queries to Anycast leverages Internet routing to distribute incoming queries to
a service's hop-nearest distributed anycast locations. However, a service's nearest distributed anycast locations measured by the
usually queries are not evenly distributed across all anycast number of routing hops. However, queries are usually not evenly
locations, as found in the case of L-Root [IcannHedge18]. distributed across all anycast locations, as found in the case of
L-Root when analyzed using Hedgehog [IcannHedgehog].
Adding locations to or removing locations from a deployed anycast Adding locations to or removing locations from a deployed anycast
network changes the load distribution across all of its locations. network changes the load distribution across all of its locations.
When a new location is announced by BGP, locations may receive more When a new location is announced by BGP, locations may receive more
or less traffic than it was engineered for, leading to suboptimal or less traffic than it was engineered for, leading to suboptimal
service performance or even stressing some locations while leaving service performance or even stressing some locations while leaving
others underutilized. Operators constantly face this scenario that others underutilized. Operators constantly face this scenario when
when expanding an anycast service. Operators cannot easily directly expanding an anycast service. Operators cannot easily directly
estimate future query distributions based on proposed anycast network estimate future query distributions based on proposed anycast network
engineering decisions. engineering decisions.
To address this need and estimate the query loads based on changing, To address this need and estimate the query loads of an anycast
in particular expanding, anycast service changes [Vries17b] developed service undergoing changes (in particular expanding), [Vries17b]
a new technique enabling operators to carry out active measurements, describes the development of a new technique enabling operators to
using an open-source tool called Verfploeter (available at carry out active measurements using an open-source tool called
[VerfSrc]). The results allow the creation of detailed anycast maps Verfploeter (available at [VerfSrc]). The results allow the creation
and catchment estimates. By running verfploeter combined with a of detailed anycast maps and catchment estimates. By running
published IPv4 "hit list", DNS can precisely calculate which remote Verfploeter combined with a published IPv4 "hit list", the DNS can
prefixes will be matched to each anycast instance in a network. At precisely calculate which remote prefixes will be matched to each
the moment of this writing, Verfploeter still does not support IPv6 anycast instance in a network. At the time of this writing,
as the IPv4 hit lists used are generated via frequent large scale Verfploeter still does not support IPv6 as the IPv4 hit lists used
ICMP echo scans, which is not possible using IPv6. are generated via frequent large-scale ICMP echo scans, which is not
possible using IPv6.
As proof of concept, [Vries17b] documents how it verfploeter was used As proof of concept, [Vries17b] documents how Verfploeter was used to
to predict both the catchment and query load distribution for a new predict both the catchment and query load distribution for a new
anycast instance deployed for b.root-servers.net. Using two anycast anycast instance deployed for b.root-servers.net. Using two anycast
test instances in Miami (MIA) and Los Angeles (LAX), an ICMP echo test instances in Miami (MIA) and Los Angeles (LAX), an ICMP echo
query was sent from an IP anycast addresses to each IPv4 /24 network query was sent from an IP anycast address to each IPv4 /24 network
routing block on the Internet. routing block on the Internet.
The ICMP echo responses were recorded at both sites and analyzed and The ICMP echo responses were recorded at both sites and analyzed and
overlayed onto a graphical world map, resulting in an Internet scale overlaid onto a graphical world map, resulting in an Internet-scale
catchment map. To calculate expected load once the production catchment map. To calculate expected load once the production
network was enabled, the quantity of traffic received by b.root- network was enabled, the quantity of traffic received by b.root-
servers.net's single site at LAX was recorded based on a single day's servers.net's single site at LAX was recorded based on a single day's
traffic (2017-04-12, DITL datasets [Ditl17]). [Vries17b] predicted traffic (2017-04-12, "day in the life" (DITL) datasets [Ditl17]). In
that 81.6% of the traffic load would remain at the LAX site. This [Vries17b], it was predicted that 81.6% of the traffic load would
estimate by verfploeter turned out to be very accurate; the actual remain at the LAX site. This Verfploeter estimate turned out to be
measured traffic volume when production service at MIA was enabled very accurate; the actual measured traffic volume when production
was 81.4%. service at MIA was enabled was 81.4%.
Verfploeter can also be used to estimate traffic shifts based on Verfploeter can also be used to estimate traffic shifts based on
other BGP route engineering techniques (for example, AS path other BGP route engineering techniques (for example, Autonomous
prepending or BGP community use) in advance of operational System (AS) path prepending or BGP community use) in advance of
deployment. [Vries17b] studied this using prepending with 1-3 hops operational deployment. This was studied in [Vries17b] using
at each instance and compared the results against real operational prepending with 1-3 hops at each instance, and the results were
changes to validate the techniques accuracy. compared against real operational changes to validate the accuracy of
the techniques.
3.3.2. Resulting considerations 3.3.2. Resulting Considerations
An important operational takeaway [Vries17b] provides is how DNS An important operational takeaway [Vries17b] provides is how DNS
operators can make informed engineering choices when changing DNS operators can make informed engineering choices when changing DNS
anycast network deployments by using Verfploeter in advance. anycast network deployments by using Verfploeter in advance.
Operators can identify sub-optimal routing situations in advance with Operators can identify suboptimal routing situations in advance with
significantly better coverage than using other active measurement significantly better coverage rather than using other active
platforms such as RIPE Atlas. To date, Verfploeter has been deployed measurement platforms such as RIPE Atlas. To date, Verfploeter has
on a operational testbed (Anycast testbed) [AnyTest], on a large been deployed on an operational testbed (anycast testbed) [AnyTest]
unnamed operator and is run daily at b.root-servers.net[Vries17b]. on a large unnamed operator and is run daily at b.root-servers.net
[Vries17b].
Operators should use active measurement techniques like Verfploeter Operators should use active measurement techniques like Verfploeter
in advance of potential anycast network changes to accurately measure in advance of potential anycast network changes to accurately measure
the benefits and potential issues ahead of time. the benefits and potential issues ahead of time.
3.4. C4: When under stress, employ two strategies 3.4. C4: Employ Two Strategies When under Stress
3.4.1. Research background
3.4.1. Research Background
DDoS attacks are becoming bigger, cheaper, and more frequent DDoS attacks are becoming bigger, cheaper, and more frequent
[Moura16b]. The most powerful recorded DDoS attack against DNS [Moura16b]. The most powerful recorded DDoS attack against DNS
servers to date reached 1.2 Tbps by using IoT devices [Perlroth16]. servers to date reached 1.2 Tbps by using Internet of Things (IoT)
How should a DNS operator engineer its anycast authoritative DNS devices [Perlroth16]. How should a DNS operator engineer its anycast
server react to such a DDoS attack? [Moura16b] investigates this authoritative DNS server to react to such a DDoS attack? [Moura16b]
question using empirical observations grounded with theoretical investigates this question using empirical observations grounded with
option evaluations. theoretical option evaluations.
An authoritative DNS server deployed using anycast will have many An authoritative DNS server deployed using anycast will have many
server instances distributed over many networks. Ultimately, the server instances distributed over many networks. Ultimately, the
relationship between the DNS provider's network and a client's ISP relationship between the DNS provider's network and a client's ISP
will determine which anycast instance will answer queries for a given will determine which anycast instance will answer queries for a given
client, given that BGP is the protocol that maps clients to specific client, given that the BGP protocol maps clients to specific anycast
anycast instances by using routing information [RF:KDar02]. As a instances using routing information. As a consequence, when an
consequence, when an anycast authoritative server is under attack, anycast authoritative server is under attack, the load that each
the load that each anycast instance receives is likely to be unevenly anycast instance receives is likely to be unevenly distributed (a
distributed (a function of the source of the attacks), thus some function of the source of the attacks); thus, some instances may be
instances may be more overloaded than others which is what was more overloaded than others, which is what was observed when
observed analyzing the Root DNS events of Nov. 2015 [Moura16b]. analyzing the root DNS events of November 2015 [Moura16b]. Given the
Given the fact that different instances may have different capacity fact that different instances may have different capacities
(bandwidth, CPU, etc.), making a decision about how to react to (bandwidth, CPU, etc.), making a decision about how to react to
stress becomes even more difficult. stress becomes even more difficult.
In practice, an anycast instance is overloaded with incoming traffic, In practice, when an anycast instance is overloaded with incoming
operators have two options: traffic, operators have two options:
* They can withdraw its routes, pre-prepend its AS route to some or * They can withdraw its routes, pre-prepend its AS route to some or
all of its neighbors, perform other traffic shifting tricks (such all of its neighbors, perform other traffic-shifting tricks (such
as reducing route announcement propagation using BGP as reducing route announcement propagation using BGP communities
communities[RFC1997]), or by communicating with its upstream [RFC1997]), or communicate with its upstream network providers to
network providers to apply filtering (potentially using FlowSpec apply filtering (potentially using FlowSpec [RFC8955] or the DDoS
[RFC8955] or DOTS protocol ([RFC8811], [RFC8782], [RFC8783]). Open Threat Signaling (DOTS) protocol [RFC8811] [RFC9132]
These techniques shift both legitimate and attack traffic to other [RFC8783]). These techniques shift both legitimate and attack
anycast instances (with hopefully greater capacity) or to block traffic to other anycast instances (with hopefully greater
traffic entirely. capacity) or block traffic entirely.
* Alternatively, operators can be become a degraded absorber by * Alternatively, operators can become degraded absorbers by
continuing to operate, knowing dropping incoming legitimate continuing to operate, knowing dropping incoming legitimate
requests due to queue overflow. However, this approach will also requests due to queue overflow. However, this approach will also
absorb attack traffic directed toward its catchment, hopefully absorb attack traffic directed toward its catchment, hopefully
protecting the other anycast instances. protecting the other anycast instances.
[Moura16b] saw both of these behaviors deployed in practice by [Moura16b] describes seeing both of these behaviors deployed in
studying instance reachability and route-trip time (RTTs) in the DNS practice when studying instance reachability and RTTs in the DNS root
root events. When withdraw strategies were deployed, the stress of events. When withdraw strategies were deployed, the stress of
increased query loads were displaced from one instance to multiple increased query loads were displaced from one instance to multiple
other sites. In other observed events, one site was left to absorb other sites. In other observed events, one site was left to absorb
the brunt of an attack leaving the other sites to remain relatively the brunt of an attack, leaving the other sites to remain relatively
less affected. less affected.
3.4.2. Resulting considerations 3.4.2. Resulting Considerations
Operators should consider having both a anycast site withdraw Operators should consider having both an anycast site withdraw
strategy and a absorption strategy ready to be used before a network strategy and an absorption strategy ready to be used before a network
overload occurs. Operators should be able to deploy one or both of overload occurs. Operators should be able to deploy one or both of
these strategies rapidly. Ideally, these should be encoded into these strategies rapidly. Ideally, these should be encoded into
operating playbooks with defined site measurement guidelines for operating playbooks with defined site measurement guidelines for
which strategy to employ based on measured data from past events. which strategy to employ based on measured data from past events.
[Moura16b] speculates that careful, explicit, and automated [Moura16b] speculates that careful, explicit, and automated
management policies may provide stronger defenses to overload events. management policies may provide stronger defenses to overload events.
DNS operators should be ready to employ both traditional filtering DNS operators should be ready to employ both common filtering
approaches and other routing load balancing techniques approaches and other routing load-balancing techniques (such as
(withdraw/prepend/communities or isolate instances), where the best withdrawing routes, prepending Autonomous Systems (ASes), adding
choice depends on the specifics of the attack. communities, or isolating instances), where the best choice depends
on the specifics of the attack.
Note that this consideration refers to the operation of just one Note that this consideration refers to the operation of just one
anycast service point, i.e., just one anycasted IP address block anycast service point, i.e., just one anycasted IP address block
covering one NS record. However, DNS zones with multiple covering one NS record. However, DNS zones with multiple
authoritative anycast servers may also expect loads to shift from one authoritative anycast servers may also expect loads to shift from one
anycasted server to another, as resolvers switch from on anycasted server to another, as resolvers switch from one
authoritative service point to another when attempting to resolve a authoritative service point to another when attempting to resolve a
name [Mueller17b]. name [Mueller17b].
3.5. C5: Consider longer time-to-live values whenever possible 3.5. C5: Consider Longer Time-to-Live Values Whenever Possible
3.5.1. Research background 3.5.1. Research Background
Caching is the cornerstone of good DNS performance and reliability. Caching is the cornerstone of good DNS performance and reliability.
A 50 ms response to a new DNS query may be considered fast, but a A 50 ms response to a new DNS query may be considered fast, but a
less than 1 ms response to a cached entry is far faster. [Moura18b] response of less than 1 ms to a cached entry is far faster. In
showed that caching also protects users from short outages and even [Moura18b], it was shown that caching also protects users from short
significant DDoS attacks. outages and even significant DDoS attacks.
DNS record TTLs (time-to-live values) [RFC1034][RFC1035] directly Time-to-live (TTL) values [RFC1034] [RFC1035] for DNS records
control cache durations and affect latency, resilience, and the role directly control cache durations and affect latency, resilience, and
of DNS in CDN server selection. Some early work modeled caches as a the role of DNS in Content Delivery Network (CDN) server selection.
function of their TTLs [Jung03a], and recent work has examined their Some early work modeled caches as a function of their TTLs [Jung03a],
interaction with DNS[Moura18b], but until [Moura19b] no research and recent work has examined cache interactions with DNS [Moura18b],
provided considerations about the benefits of various TTL value but until [Moura19b], no research had provided considerations about
choices. To study this, Moura et. al. [Moura19b] carried out a the benefits of various TTL value choices. To study this, Moura et
measurement study investigating TTL choices and their impact on user al. [Moura19b] carried out a measurement study investigating TTL
experiences in the wild. They performed this study independent of choices and their impact on user experiences in the wild. They
specific resolvers (and their caching architectures), vendors, or performed this study independent of specific resolvers (and their
setups. caching architectures), vendors, or setups.
First, they identified several reasons why operators and zone-owners First, they identified several reasons why operators and zone owners
may want to choose longer or shorter TTLs: may want to choose longer or shorter TTLs:
* As discussed, longer TTLs lead to a longer cache life, resulting * Longer TTLs, as discussed, lead to a longer cache life, resulting
in faster responses. [Moura19b] measured this in the wild and in faster responses. In [Moura19b], this was measured this in the
showed that by increasing the TTL for .uy TLD from 5 minutes wild, and it showed that by increasing the TTL for the .uy TLD
(300s) to 1 day (86400s) the latency measured from 15k Atlas from 5 minutes (300 s) to 1 day (86,400 s), the latency measured
vantage points changed significantly: the median RTT decreased from 15,000 Atlas vantage points changed significantly: the median
from 28.7ms to 8ms, and the 75%ile decreased from 183ms to 21ms. RTT decreased from 28.7 ms to 8 ms, and the 75th percentile
decreased from 183 ms to 21 ms.
* Longer caching times also results in lower DNS traffic: * Longer caching times also result in lower DNS traffic:
authoritative servers will experience less traffic with extended authoritative servers will experience less traffic with extended
TTLs, as repeated queries are answered by resolver caches. TTLs, as repeated queries are answered by resolver caches.
* Consequently, longer caching results in a lower overall cost if * Longer caching consequently results in a lower overall cost if the
DNS is metered: some DNS-As-A-Service providers charge a per query DNS is metered: some providers that offer DNS as a Service charge
(metered) cost (often in addition to a fixed monthly cost). a per-query (metered) cost (often in addition to a fixed monthly
cost).
* Longer caching is more robust to DDoS attacks on DNS * Longer caching is more robust to DDoS attacks on DNS
infrastructure. [Moura18b] also measured and show that DNS infrastructure. DNS caching was also measured in [Moura18b], and
caching can greatly reduce the effects of a DDoS on DNS, provided it showed that the effects of a DDoS on DNS can be greatly
that caches last longer than the attack. reduced, provided that the caches last longer than the attack.
* However, shorter caching supports deployments that may require * Shorter caching, however, supports deployments that may require
rapid operational changes: An easy way to transition from an old rapid operational changes: an easy way to transition from an old
server to a new one is to simply change the DNS records. Since server to a new one is to simply change the DNS records. Since
there is no method to remotely remove cached DNS records, the TTL there is no method to remotely remove cached DNS records, the TTL
duration represents a necessary transition delay to fully shift duration represents a necessary transition delay to fully shift
from one server to another. Thus, low TTLs allow for more rapid from one server to another. Thus, low TTLs allow for more rapid
transitions. However, when deployments are planned in advance transitions. However, when deployments are planned in advance
(that is, longer than the TTL), it is possible to lower the TTLs (that is, longer than the TTL), it is possible to lower the TTLs
just-before a major operational change and raise them again just before a major operational change and raise them again
afterward. afterward.
* Shorter caching can also help with a DNS-based response to DDoS * Shorter caching can also help with a DNS-based response to DDoS
attacks. Specifically, some DDoS-scrubbing services use the DNS attacks. Specifically, some DDoS-scrubbing services use the DNS
to redirect traffic during an attack. Since DDoS attacks arrive to redirect traffic during an attack. Since DDoS attacks arrive
unannounced, DNS-based traffic redirection requires the TTL be unannounced, DNS-based traffic redirection requires that the TTL
kept quite low at all times to allow operators to suddenly have be kept quite low at all times to allow operators to suddenly have
their zone served by a DDoS-scrubbing service. their zone served by a DDoS-scrubbing service.
* Shorter caching helps DNS-based load balancing. Many large * Shorter caching helps DNS-based load balancing. Many large
services are known to rotate traffic among their servers using services are known to rotate traffic among their servers using
DNS-based load balancing. Each arriving DNS request provides an DNS-based load balancing. Each arriving DNS request provides an
opportunity to adjust service load by rotating IP address records opportunity to adjust the service load by rotating IP address
(A and AAAA) to the lowest unused server. Shorter TTLs may be records (A and AAAA) to the lowest unused server. Shorter TTLs
desired in these architectures to react more quickly to traffic may be desired in these architectures to react more quickly to
dynamics. Many recursive resolvers, however, have minimum caching traffic dynamics. Many recursive resolvers, however, have minimum
times of tens of seconds, placing a limit on this form of agility. caching times of tens of seconds, placing a limit on this form of
agility.
3.5.2. Resulting considerations 3.5.2. Resulting Considerations
Given these considerations, the proper choice for a TTL depends in Given these considerations, the proper choice for a TTL depends in
part on multiple external factors -- no single recommendation is part on multiple external factors -- no single recommendation is
appropriate for all scenarios. Organizations must weigh these trade- appropriate for all scenarios. Organizations must weigh these trade-
offs and find a good balance for their situation. Still, some offs and find a good balance for their situation. Still, some
guidelines can be reached when choosing TTLs: guidelines can be reached when choosing TTLs:
* For general DNS zone owners, [Moura19b] recommends a longer TTL of * For general DNS zone owners, [Moura19b] recommends a longer TTL of
at least one hour, and ideally 8, 12, or 24 hours. Assuming at least one hour and ideally 4, 8, or 24 hours. Assuming planned
planned maintenance can be scheduled at least a day in advance, maintenance can be scheduled at least a day in advance, long TTLs
long TTLs have little cost and may, even, literally provide a cost have little cost and may even literally provide cost savings.
savings.
* For registry operators: TLD and other public registration * For TLD and other public registration operators (for example, most
operators (for example most ccTLDs and .com, .net, .org) that host ccTLDs and .com, .net, and .org) that host many delegations (NS
many delegations (NS records, DS records and "glue" records), records, DS records, and "glue" records), [Moura19b] demonstrates
[Moura19b] demonstrates that most resolvers will use the TTL that most resolvers will use the TTL values provided by the child
values provided by the child delegations while the others some delegations while some others will choose the TTL provided by the
will choose the TTL provided by the parent's copy of the record. parent's copy of the record. As such, [Moura19b] recommends
As such, [Moura19b] recommends longer TTLs (at least an hour or longer TTLs (at least an hour or more) for registry operators as
more) for registry operators as well for child NS and other well for child NS and other records.
records.
* Users of DNS-based load balancing or DDoS-prevention services may * Users of DNS-based load balancing or DDoS-prevention services may
require shorter TTLs: TTLs may even need to be as short as 5 require shorter TTLs: TTLs may even need to be as short as 5
minutes, although 15 minutes may provide sufficient agility for minutes, although 15 minutes may provide sufficient agility for
many operators. There is always a tussle between shorter TTLs many operators. There is always a tussle between using shorter
providing more agility against all the benefits listed above for TTLs that provide more agility and using longer TTLs that include
using longer TTLs. all the benefits listed above.
* Use of A/AAAA and NS records: The TTLs for A/AAAA records should * Regarding the use of A/AAAA and NS records, the TTLs for A/AAAA
be shorter to or equal to the TTL for the corresponding NS records records should be shorter than or equal to the TTL for the
for in-bailiwick authoritative DNS servers, since [Moura19b] finds corresponding NS records for in-bailiwick authoritative DNS
that once an NS record expires, their associated A/AAAA will also servers, since [Moura19b] finds that once an NS record expires,
be re-queried when glue is required to be sent by the parents. their associated A/AAAA will also be requeried when glue is
For out-of-bailiwick servers, A, AAAA and NS records are usually required to be sent by the parents. For out-of-bailiwick servers,
all cached independently, so different TTLs can be used A, AAAA, and NS records are usually all cached independently, so
effectively if desired. In either case, short A and AAAA records different TTLs can be used effectively if desired. In either
may still be desired if DDoS-mitigation services are required. case, short A and AAAA records may still be desired if DDoS
mitigation services are required.
3.6. C6: Consider the TTL differences between parents and children 3.6. C6: Consider the Difference in Parent and Children's TTL Values
3.6.1. Research background 3.6.1. Research Background
Multiple record types exist or are related between the parent of a Multiple record types exist or are related between the parent of a
zone and the child. At a minimum, NS records are supposed to be zone and the child. At a minimum, NS records are supposed to be
identical in the parent (but often are not) as or corresponding IP identical in the parent (but often are not), as are corresponding IP
address in "glue" A/AAAA records that must exist for in-bailiwick addresses in "glue" A/AAAA records that must exist for in-bailiwick
authoritative servers. Additionally, if DNSSEC ([RFC4033] [RFC4034] authoritative servers. Additionally, if DNSSEC [RFC4033] [RFC4034]
[RFC4035] [RFC4509]) is deployed for a zone the parent's DS record [RFC4035] [RFC4509] is deployed for a zone, the parent's DS record
must cryptographically refer to a child's DNSKEY record. must cryptographically refer to a child's DNSKEY record.
Because some information exists in both the parent and a child, it is Because some information exists in both the parent and a child, it is
possible for the TTL values to differ between the parent's copy and possible for the TTL values to differ between the parent's copy and
the child's. [Moura19b] examines resolver behaviors when these the child's. [Moura19b] examines resolver behaviors when these
values differ in the wild, as they frequently do -- often parent values differed in the wild, as they frequently do -- often, parent
zones have defacto TTL values that a child has no control over. For zones have de facto TTL values that a child has no control over. For
example, NS records for TLDs in the root zone are all set to 2 days example, NS records for TLDs in the root zone are all set to 2 days
(48 hours), but some TLD's have lower values within their published (48 hours), but some TLDs have lower values within their published
records (the TTLs for .cl's NS records from their authoritative records (the TTLs for .cl's NS records from their authoritative
servers is 1 hour). [Moura19b] also examines the differences in the servers is 1 hour). [Moura19b] also examines the differences in the
TTLs between the NS records and the corresponding A/AAAA records for TTLs between the NS records and the corresponding A/AAAA records for
the addresses of a nameserver. RIPE Atlas nodes are used to the addresses of a name server. RIPE Atlas nodes are used to
determine what resolvers in the wild do with different information, determine what resolvers in the wild do with different information
and whether the parent's TTL is used for cache life-times ("parent- and whether the parent's TTL is used for cache lifetimes ("parent-
centric") or the child's is used ("child-centric"). centric") or the child's ("child-centric").
[Moura19b] finds that roughly 90% of resolvers follow the child's [Moura19b] found that roughly 90% of resolvers follow the child's
view of the TTL, while 10% appear parent-centric. It additionally view of the TTL, while 10% appear parent-centric. Additionally, it
finds that resolvers behave differently for cache lifetimes for in- found that resolvers behave differently for cache lifetimes for in-
bailiwick vs out-of-bailiwick NS/A/AAAA TTL combinations. bailiwick vs. out-of-bailiwick NS/A/AAAA TTL combinations.
Specifically, when NS TTLs are shorter than the corresponding address Specifically, when NS TTLs are shorter than the corresponding address
records, most resolvers will re-query for A/AAAA records for in- records, most resolvers will requery for A/AAAA records for the in-
bailiwick resolvers and switch to new address records even if the bailiwick resolvers and switch to new address records even if the
cache indicates the original A/AAAA records could be kept longer. On cache indicates the original A/AAAA records could be kept longer. On
the other hand, the inverse is true for out-of-bailiwick resolvers: the other hand, the inverse is true for out-of-bailiwick resolvers:
If the NS record expires first resolvers will honor the original if the NS record expires first, resolvers will honor the original
cache time of the nameserver's address. cache time of the name server's address.
3.6.2. Resulting considerations 3.6.2. Resulting Considerations
The important conclusion from this study is that operators cannot The important conclusion from this study is that operators cannot
depend on their published TTL values alone -- the parent's values are depend on their published TTL values alone -- the parent's values are
also used for timing cache entries in the wild. Operators that are also used for timing cache entries in the wild. Operators that are
planning on infrastructure changes should assume that older planning on infrastructure changes should assume that an older
infrastructure must be left on and operational for at least the infrastructure must be left on and operational for at least the
maximum of both the parent and child's TTLs. maximum of both the parent and child's TTLs.
4. Security considerations 4. Security Considerations
This document discusses applying measured research results to This document discusses applying measured research results to
operational deployments. Most of the considerations affect mostly operational deployments. Most of the considerations affect mostly
operational practice, though a few do have security related impacts. operational practice, though a few do have security-related impacts.
Specifically, C4 discusses a couple of strategies to employ when a Specifically, C4 discusses a couple of strategies to employ when a
service is under stress from DDoS attacks and offers operators service is under stress from DDoS attacks and offers operators
additional guidance when handling excess traffic. additional guidance when handling excess traffic.
Similarly, C5 identifies the trade-offs with respect to the Similarly, C5 identifies the trade-offs with respect to the
operational and security benefits of using longer time-to-live operational and security benefits of using longer TTL values.
values.
5. Privacy Considerations 5. Privacy Considerations
This document does not add any practical new privacy issues, aside This document does not add any new, practical privacy issues, aside
from possible benefits in deploying longer TTLs as suggested in C5. from possible benefits in deploying longer TTLs as suggested in C5.
Longer TTLs may help preserve a user's privacy by reducing the number Longer TTLs may help preserve a user's privacy by reducing the number
of requests that get transmitted in both the client-to-resolver and of requests that get transmitted in both client-to-resolver and
resolver-to-authoritative cases. resolver-to-authoritative cases.
6. IANA considerations 6. IANA Considerations
This document has no IANA actions. This document has no IANA actions.
7. Acknowledgements 7. References
This document is a summary of the main considerations of six research
works performed by the authors and others. This document would not
have been possible without the hard work of these authors and co-
authors:
* Ricardo de O. Schmidt
* Wouter B de Vries
* Moritz Mueller
* Lan Wei
* Cristian Hesselman
* Jan Harm Kuipers
* Pieter-Tjerk de Boer
* Aiko Pras
We would like also to thank the reviewers of this draft that offered
valuable suggestions: Duane Wessels, Joe Abley, Toema Gavrichenkov,
John Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink,
Klaus Darilion and Samir Jafferali, and comments provided at the IETF
DNSOP session (IETF104).
Besides those, we would like thank those acknowledged in the papers
this document summarizes for helping produce the results: RIPE NCC
and DNS OARC for their tools and datasets used in this research, as
well as the funding agencies sponsoring the individual research
works.
8. References
8.1. Normative References 7.1. Normative References
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
<https://www.rfc-editor.org/info/rfc1034>. <https://www.rfc-editor.org/info/rfc1034>.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
November 1987, <https://www.rfc-editor.org/info/rfc1035>. November 1987, <https://www.rfc-editor.org/info/rfc1035>.
[RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host
skipping to change at page 17, line 26 skipping to change at line 725
[RFC7094] McPherson, D., Oran, D., Thaler, D., and E. Osterweil, [RFC7094] McPherson, D., Oran, D., Thaler, D., and E. Osterweil,
"Architectural Considerations of IP Anycast", RFC 7094, "Architectural Considerations of IP Anycast", RFC 7094,
DOI 10.17487/RFC7094, January 2014, DOI 10.17487/RFC7094, January 2014,
<https://www.rfc-editor.org/info/rfc7094>. <https://www.rfc-editor.org/info/rfc7094>.
[RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS [RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499, Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499,
January 2019, <https://www.rfc-editor.org/info/rfc8499>. January 2019, <https://www.rfc-editor.org/info/rfc8499>.
[RFC8782] Reddy.K, T., Ed., Boucadair, M., Ed., Patil, P.,
Mortensen, A., and N. Teague, "Distributed Denial-of-
Service Open Threat Signaling (DOTS) Signal Channel
Specification", RFC 8782, DOI 10.17487/RFC8782, May 2020,
<https://www.rfc-editor.org/info/rfc8782>.
[RFC8783] Boucadair, M., Ed. and T. Reddy.K, Ed., "Distributed [RFC8783] Boucadair, M., Ed. and T. Reddy.K, Ed., "Distributed
Denial-of-Service Open Threat Signaling (DOTS) Data Denial-of-Service Open Threat Signaling (DOTS) Data
Channel Specification", RFC 8783, DOI 10.17487/RFC8783, Channel Specification", RFC 8783, DOI 10.17487/RFC8783,
May 2020, <https://www.rfc-editor.org/info/rfc8783>. May 2020, <https://www.rfc-editor.org/info/rfc8783>.
[RFC8955] Loibl, C., Hares, S., Raszuk, R., McPherson, D., and M. [RFC8955] Loibl, C., Hares, S., Raszuk, R., McPherson, D., and M.
Bacher, "Dissemination of Flow Specification Rules", Bacher, "Dissemination of Flow Specification Rules",
RFC 8955, DOI 10.17487/RFC8955, December 2020, RFC 8955, DOI 10.17487/RFC8955, December 2020,
<https://www.rfc-editor.org/info/rfc8955>. <https://www.rfc-editor.org/info/rfc8955>.
8.2. Informative References [RFC9132] Boucadair, M., Ed., Shallow, J., and T. Reddy.K,
"Distributed Denial-of-Service Open Threat Signaling
(DOTS) Signal Channel Specification", RFC 9132,
DOI 10.17487/RFC9132, September 2021,
<https://www.rfc-editor.org/info/rfc9132>.
7.2. Informative References
[AnyBest] Woodcock, B., "Best Practices in DNS Service-Provision [AnyBest] Woodcock, B., "Best Practices in DNS Service-Provision
Architecture", March 2016, Architecture", Version 1.2, March 2016,
<https://meetings.icann.org/en/marrakech55/schedule/mon- <https://meetings.icann.org/en/marrakech55/schedule/mon-
tech/presentation-dns-service-provision-07mar16-en.pdf>. tech/presentation-dns-service-provision-07mar16-en.pdf>.
[AnyFRoot] Woolf, S., "Anycasting f.root-serers.net", January 2003, [AnyFRoot] Woolf, S., "Anycasting f.root-servers.net", January 2003,
<https://archive.nanog.org/meetings/nanog27/presentations/ <https://archive.nanog.org/meetings/nanog27/presentations/
suzanne.pdf>. suzanne.pdf>.
[AnyTest] Schmidt, R.d.O., "Anycast Testbed", December 2018, [AnyTest] Tangled, "Tangled Anycast Testbed",
<http://www.anycast-testbed.com/>. <http://www.anycast-testbed.com/>.
[Ditl17] OARC, D., "2017 DITL data", October 2018, [Ditl17] DNS-OARC, "2017 DITL Data", April 2017,
<https://www.dns-oarc.net/oarc/data/ditl/2017>. <https://www.dns-oarc.net/oarc/data/ditl/2017>.
[IcannHedge18] [IcannHedgehog]
ICANN, ., "DNS-STATS - Hedgehog 2.4.1", October 2018, "hedgehog", commit b136eb0, May 2021,
<http://stats.dns.icann.org/hedgehog/>. <https://github.com/dns-stats/hedgehog>.
[Jung03a] Jung, J., Berger, A.W., and H. Balakrishnan, "Modeling [Jung03a] Jung, J., Berger, A., and H. Balakrishnan, "Modeling TTL-
TTL-based Internet caches", ACM 2003 IEEE INFOCOM, based Internet Caches", ACM 2003 IEEE INFOCOM,
DOI 10.1109/INFCOM.2003.1208693, July 2003, DOI 10.1109/INFCOM.2003.1208693, July 2003,
<http://www.ieee-infocom.org/2003/papers/11_01.PDF>. <http://www.ieee-infocom.org/2003/papers/11_01.PDF>.
[Moura16b] Moura, G.C.M., Schmidt, R.d.O., Heidemann, J., Mueller, [Moura16b] Moura, G.C.M., Schmidt, R. de O., Heidemann, J., de Vries,
M., Wei, L., and C. Hesselman, "Anycast vs DDoS Evaluating W., Müller, M., Wei, L., and C. Hesselman, "Anycast vs.
the November 2015 Root DNS Events.", ACM 2016 Internet DDoS: Evaluating the November 2015 Root DNS Event", ACM
Measurement Conference, DOI /10.1145/2987443.2987446, 14 2016 Internet Measurement Conference,
October 2016, DOI 10.1145/2987443.2987446, November 2016,
<https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>. <https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>.
[Moura18b] Moura, G.C.M., Heidemann, J., Mueller, M., Schmidt, [Moura18b] Moura, G.C.M., Heidemann, J., Müller, M., Schmidt, R. de
R.d.O., and M. Davids, "When the Dike Breaks: Dissecting O., and M. Davids, "When the Dike Breaks: Dissecting DNS
DNS Defenses During DDos", ACM 2018 Internet Measurement Defenses During DDoS", ACM 2018 Internet Measurement
Conference, DOI 10.1145/3278532.3278534, 31 October 2018, Conference, DOI 10.1145/3278532.3278534, October 2018,
<https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>. <https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>.
[Moura19b] Moura, G., Hardaker, W., Heidemann, J., and R.d.O. [Moura19b] Moura, G.C.M., Hardaker, W., Heidemann, J., and R. de O.
Schmidt, "Cache Me If You Can: Effects of DNS Time-to- Schmidt, "Cache Me If You Can: Effects of DNS Time-to-
Live", ACM 2019 Internet Measurement Conference, Live", ACM 2019 Internet Measurement Conference,
DOI 10.1145/3355369.3355568, n.d., DOI 10.1145/3355369.3355568, October 2019,
<https://www.isi.edu/~hardaker/papers/2019-10-cache-me- <https://www.isi.edu/~hardaker/papers/2019-10-cache-me-
ttls.pdf>. ttls.pdf>.
[Mueller17b] [Mueller17b]
Mueller, M., Moura, G.C.M., Schmidt, R.d.O., and J. Müller, M., Moura, G.C.M., Schmidt, R. de O., and J.
Heidemann, "Recursives in the Wild- Engineering Heidemann, "Recursives in the Wild: Engineering
Authoritative DNS Servers.", ACM 2017 Internet Measurement Authoritative DNS Servers", ACM 2017 Internet Measurement
Conference, DOI 10.1145/3131365.3131366, October 2017, Conference, DOI 10.1145/3131365.3131366, November 2017,
<https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>. <https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>.
[Perlroth16] [Perlroth16]
Perlroth, N., "Hackers Used New Weapons to Disrupt Major Perlroth, N., "Hackers Used New Weapons to Disrupt Major
Websites Across U.S.", October 2016, Websites Across U.S.", October 2016,
<https://www.nytimes.com/2016/10/22/business/internet- <https://www.nytimes.com/2016/10/22/business/internet-
problems-attack.html>. problems-attack.html>.
[RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S.
Rose, "DNS Security Introduction and Requirements", Rose, "DNS Security Introduction and Requirements",
skipping to change at page 19, line 31 skipping to change at line 826
(DS) Resource Records (RRs)", RFC 4509, (DS) Resource Records (RRs)", RFC 4509,
DOI 10.17487/RFC4509, May 2006, DOI 10.17487/RFC4509, May 2006,
<https://www.rfc-editor.org/info/rfc4509>. <https://www.rfc-editor.org/info/rfc4509>.
[RFC8811] Mortensen, A., Ed., Reddy.K, T., Ed., Andreasen, F., [RFC8811] Mortensen, A., Ed., Reddy.K, T., Ed., Andreasen, F.,
Teague, N., and R. Compton, "DDoS Open Threat Signaling Teague, N., and R. Compton, "DDoS Open Threat Signaling
(DOTS) Architecture", RFC 8811, DOI 10.17487/RFC8811, (DOTS) Architecture", RFC 8811, DOI 10.17487/RFC8811,
August 2020, <https://www.rfc-editor.org/info/rfc8811>. August 2020, <https://www.rfc-editor.org/info/rfc8811>.
[RipeAtlas15a] [RipeAtlas15a]
Staff, R.N., "RIPE Atlas A Global Internet Measurement RIPE Network Coordination Centre (RIPE NCC), "RIPE Atlas:
Network", September 2015, <http://ipj.dreamhosters.com/wp- A Global Internet Measurement Network", October 2015,
<http://ipj.dreamhosters.com/wp-
content/uploads/issues/2015/ipj18-3.pdf>. content/uploads/issues/2015/ipj18-3.pdf>.
[RipeAtlas19a] [RipeAtlas19a]
NCC, R., "Ripe Atlas - RIPE Network Coordination Centre", RIPE Network Coordination Centre (RIPE NCC), "RIPE Atlas",
September 2019, <https://atlas.ripe.net/>. <https://atlas.ripe.net>.
[Schmidt17a] [Schmidt17a]
Schmidt, R.d.O., Heidemann, J., and J.H. Kuipers, "Anycast Schmidt, R. de O., Heidemann, J., and J. Kuipers, "Anycast
Latency - How Many Sites Are Enough. In Proceedings of the Latency: How Many Sites Are Enough?", PAM 2017 Passive and
Passive and Active Measurement Workshop", PAM Passive and Active Measurement Conference,
Active Measurement Conference, March 2017, DOI 10.1007/978-3-319-54328-4_14, March 2017,
<https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>. <https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>.
[Singla2014] [Singla2014]
Singla, A., Chandrasekaran, B., Godfrey, P.B., and B. Singla, A., Chandrasekaran, B., Godfrey, P., and B. Maggs,
Maggs, "The Internet at the speed of light. In Proceedings "The Internet at the Speed of Light", 13th ACM Workshop on
of the 13th ACM Workshop on Hot Topics in Networks (Oct Hot Topics in Networks, DOI 10.1145/2670518.2673876,
2014)", ACM Workshop on Hot Topics in Networks, October October 2014,
2014,
<http://speedierweb.web.engr.illinois.edu/cspeed/papers/ <http://speedierweb.web.engr.illinois.edu/cspeed/papers/
hotnets14.pdf>. hotnets14.pdf>.
[VerfSrc] Vries, W.d., "Verfploeter source code", November 2018, [VerfSrc] "Verfploeter Source Code", commit f4792dc, May 2019,
<https://github.com/Woutifier/verfploeter>. <https://github.com/Woutifier/verfploeter>.
[Vries17b] Vries, W.d., Schmidt, R.d.O., Hardaker, W., Heidemann, J., [Vries17b] de Vries, W., Schmidt, R. de O., Hardaker, W., Heidemann,
Boer, P.d., and A. Pras, "Verfploeter - Broad and Load- J., de Boer, P-T., and A. Pras, "Broad and Load-Aware
Aware Anycast Mapping", ACM 2017 Internet Measurement Anycast Mapping with Verfploeter", ACM 2017 Internet
Conference, DOI 10.1145/3131365.3131371, October 2017, Measurement Conference, DOI 10.1145/3131365.3131371,
November 2017,
<https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>. <https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>.
Acknowledgements
We would like to thank the reviewers of this document who offered
valuable suggestions as well as comments at the IETF DNSOP session
(IETF 104): Duane Wessels, Joe Abley, Toema Gavrichenkov, John
Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink, Klaus
Darilion, and Samir Jafferali.
Additionally, we would like thank those acknowledged in the papers
this document summarizes for helping produce the results: RIPE NCC
and DNS OARC for their tools and datasets used in this research, as
well as the funding agencies sponsoring the individual research.
Contributors
This document is a summary of the main considerations of six research
papers written by the authors and the following people who
contributed substantially to the content and should be considered
coauthors; this document would not have been possible without their
hard work:
* Ricardo de O. Schmidt
* Wouter B. de Vries
* Moritz Mueller
* Lan Wei
* Cristian Hesselman
* Jan Harm Kuipers
* Pieter-Tjerk de Boer
* Aiko Pras
Authors' Addresses Authors' Addresses
Giovane C. M. Moura Giovane C. M. Moura
SIDN Labs/TU Delft SIDN Labs/TU Delft
Meander 501 Meander 501
6825 MD Arnhem 6825 MD Arnhem
Netherlands Netherlands
Phone: +31 26 352 5500 Phone: +31 26 352 5500
Email: giovane.moura@sidn.nl Email: giovane.moura@sidn.nl
Wes Hardaker Wes Hardaker
USC/Information Sciences Institute USC/Information Sciences Institute
PO Box 382 PO Box 382
Davis, 95617-0382 Davis, CA 95617-0382
United States of America United States of America
Phone: +1 (530) 404-0099 Phone: +1 (530) 404-0099
Email: ietf@hardakers.net Email: ietf@hardakers.net
John Heidemann John Heidemann
USC/Information Sciences Institute USC/Information Sciences Institute
4676 Admiralty Way 4676 Admiralty Way
Marina Del Rey, 90292-6695 Marina Del Rey, CA 90292-6695
United States of America United States of America
Phone: +1 (310) 448-8708 Phone: +1 (310) 448-8708
Email: johnh@isi.edu Email: johnh@isi.edu
Marco Davids Marco Davids
SIDN Labs SIDN Labs
Meander 501 Meander 501
6825 MD Arnhem 6825 MD Arnhem
Netherlands Netherlands
Phone: +31 26 352 5500 Phone: +31 26 352 5500
Email: marco.davids@sidn.nl Email: marco.davids@sidn.nl
 End of changes. 134 change blocks. 
444 lines changed or deleted 453 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/