rfc9520.original   rfc9520.txt 
Internet Engineering Task Force D. Wessels Internet Engineering Task Force (IETF) D. Wessels
Internet-Draft W. Carroll Request for Comments: 9520 W. Carroll
Updates: 2308, 4035, 4697 (if approved) M. Thomas Updates: 2308, 4035, 4697 M. Thomas
Intended status: Standards Track Verisign Category: Standards Track Verisign
Expires: 24 March 2024 21 September 2023 ISSN: 2070-1721 December 2023
Negative Caching of DNS Resolution Failures Negative Caching of DNS Resolution Failures
draft-ietf-dnsop-caching-resolution-failures-08
Abstract Abstract
In the DNS, resolvers employ caching to reduce both latency for end In the DNS, resolvers employ caching to reduce both latency for end
users and load on authoritative name servers. The process of users and load on authoritative name servers. The process of
resolution may result in one of three types of responses: (1) a resolution may result in one of three types of responses: (1) a
response containing the requested data; (2) a response indicating the response containing the requested data, (2) a response indicating the
requested data does not exist; or (3) a non-response due to a requested data does not exist, or (3) a non-response due to a
resolution failure in which the resolver does not receive any useful resolution failure in which the resolver does not receive any useful
information regarding the data's existence. This document concerns information regarding the data's existence. This document concerns
itself only with the third type. itself only with the third type.
RFC 2308 specifies requirements for DNS negative caching. There, RFC 2308 specifies requirements for DNS negative caching. There,
caching of type (2) responses is mandatory and caching of type (3) caching of TYPE 2 responses is mandatory and caching of TYPE 3
responses is optional. This document updates RFC 2308 to require responses is optional. This document updates RFC 2308 to require
negative caching for DNS resolution failures. negative caching for DNS resolution failures.
RFC 4035 allows DNSSEC validation failure caching. This document RFC 4035 allows DNSSEC validation failure caching. This document
updates RFC 4035 to require caching for DNSSEC validation failures. updates RFC 4035 to require caching for DNSSEC validation failures.
RFC 4697 prohibits aggressive requerying for NS records at a failed RFC 4697 prohibits aggressive requerying for NS records at a failed
zone's parent zone. This document updates RFC 4697 to expand this zone's parent zone. This document updates RFC 4697 to expand this
requirement to all query types and to all ancestor zones. requirement to all query types and to all ancestor zones.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering This document is a product of the Internet Engineering Task Force
Task Force (IETF). Note that other groups may also distribute (IETF). It represents the consensus of the IETF community. It has
working documents as Internet-Drafts. The list of current Internet- received public review and has been approved for publication by the
Drafts is at https://datatracker.ietf.org/drafts/current/. Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
Internet-Drafts are draft documents valid for a maximum of six months Information about the current status of this document, any errata,
and may be updated, replaced, or obsoleted by other documents at any and how to provide feedback on it may be obtained at
time. It is inappropriate to use Internet-Drafts as reference https://www.rfc-editor.org/info/rfc9520.
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 24 March 2024.
Copyright Notice Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Motivation
1.2. Related Work . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Related Work
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.3. Terminology
2. Conditions That Lead to DNS Resolution Failures . . . . . . . 6 2. Conditions That Lead to DNS Resolution Failures
2.1. SERVFAIL Responses . . . . . . . . . . . . . . . . . . . 7 2.1. SERVFAIL Responses
2.2. REFUSED Responses . . . . . . . . . . . . . . . . . . . . 7 2.2. REFUSED Responses
2.3. Timeouts and Unreachable Servers . . . . . . . . . . . . 8 2.3. Timeouts and Unreachable Servers
2.4. Delegation Loops . . . . . . . . . . . . . . . . . . . . 8 2.4. Delegation Loops
2.5. Alias Loops . . . . . . . . . . . . . . . . . . . . . . . 9 2.5. Alias Loops
2.6. DNSSEC Validation Failures . . . . . . . . . . . . . . . 9 2.6. DNSSEC Validation Failures
2.7. FORMERR Responses . . . . . . . . . . . . . . . . . . . . 9 2.7. FORMERR Responses
3. Requirements for Caching DNS Resolution Failures . . . . . . 10 3. Requirements for Caching DNS Resolution Failures
3.1. Retries and Timeouts . . . . . . . . . . . . . . . . . . 10 3.1. Retries and Timeouts
3.2. Caching . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2. Caching
3.3. Requerying Delegation Information . . . . . . . . . . . . 11 3.3. Requerying Delegation Information
3.4. DNSSEC Validation Failures . . . . . . . . . . . . . . . 11 3.4. DNSSEC Validation Failures
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 4. IANA Considerations
5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 5. Security Considerations
6. Privacy Considerations . . . . . . . . . . . . . . . . . . . 12 6. Privacy Considerations
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 7. References
8. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.1. Normative References
9. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 7.2. Informative References
9.1. BIND . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Acknowledgments
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 Authors' Addresses
10.1. Normative References . . . . . . . . . . . . . . . . . . 16
10.2. Informative References . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19
1. Introduction 1. Introduction
Caching has always been a fundamental component of DNS resolution on Caching has always been a fundamental component of DNS resolution on
the Internet. For example [RFC0882] states: the Internet. For example, [RFC0882] states:
"The sheer size of the database and frequency of updates suggest that | The sheer size of the database and frequency of updates suggest
it must be maintained in a distributed manner, with local caching to | that it must be maintained in a distributed manner, with local
improve performance." | caching to improve performance.
The early DNS RFCs ([RFC0882], [RFC0883], [RFC1034], and [RFC1035]) The early DNS RFCs ([RFC0882], [RFC0883], [RFC1034], and [RFC1035])
primarily discuss caching in the context of what [RFC2308] calls primarily discuss caching in the context of what [RFC2308] calls
"positive" responses, that is, when the response includes the "positive responses", that is, when the response includes the
requested data. In this case, a TTL is associated with each resource requested data. In this case, a TTL is associated with each Resource
record in the response. Resolvers can cache and reuse the data until Record (RR) in the response. Resolvers can cache and reuse the data
the TTL expires. until the TTL expires.
Section 4.3.4 of [RFC1034] describes negative response caching, but Section 4.3.4 of [RFC1034] describes negative response caching, but
notes it is optional and only talks about name errors (NXDOMAIN). notes it is optional and only talks about name errors (NXDOMAIN).
This is the origin of using the SOA MINIMUM field as a negative This is the origin of using the SOA MINIMUM field as a negative
caching TTL. caching TTL.
[RFC2308] updated [RFC1034] to specify new requirements for DNS [RFC2308] updated [RFC1034] to specify new requirements for DNS
negative caching, including making it mandatory for caching resolvers negative caching, including making it mandatory for caching resolvers
to cache name error (NXDOMAIN) and no data (NODATA) responses when a to cache name error (NXDOMAIN) and no data (NODATA) responses when an
SOA record is available to provide a TTL. [RFC2308] further SOA record is available to provide a TTL. [RFC2308] further
specified optional negative caching for two DNS resolution failure specified optional negative caching for two DNS resolution failure
cases: server failure and dead / unreachable servers. cases: server failure and dead/unreachable servers.
This document updates [RFC2308] to require negative caching of all This document updates [RFC2308] to require negative caching of all
DNS resolution failures and provides additional examples of DNS resolution failures and provides additional examples of
resolution failures. This document also updates [RFC4035] to require resolution failures, [RFC4035] to require caching for DNSSEC
caching for DNSSEC validation failures as well as [RFC4697] to expand validation failures, as well as [RFC4697] to expand the scope of
the scope of prohibiting aggressive requerying for NS records at a prohibiting aggressive requerying for NS records at a failed zone's
failed zone's parent zone to all query types and to all ancestor parent zone to all query types and to all ancestor zones.
zones.
1.1. Motivation 1.1. Motivation
Operators of DNS services have known for some time that recursive Operators of DNS services have known for some time that recursive
resolvers become more aggressive when they experience resolution resolvers become more aggressive when they experience resolution
failures. A number of different anecdotes, experiments, and failures. A number of different anecdotes, experiments, and
incidents support this claim. incidents support this claim.
In December 2009, a secondary server for a number of in-addr.arpa In December 2009, a secondary server for a number of in-addr.arpa
subdomains saw its traffic suddenly double, and queries of type subdomains saw its traffic suddenly double, and queries of type
DNSKEY in particular increase by approximately two orders of DNSKEY in particular increase by approximately two orders of
magnitude, coinciding with a DNSSEC key rollover by the zone operator magnitude, coinciding with a DNSSEC key rollover by the zone operator
[roll-over-and-die]. This predated a signed root zone and an [DNSSEC-ROLLOVER]. This predated a signed root zone, and an
operating system vendor was providing non-root trust anchors to the operating system vendor was providing non-root trust anchors to the
recursive resolver, which became out of date following the rollover. recursive resolver, which became out of date following the rollover.
Unable to validate responses for the affected in-addr.arpa zones, Unable to validate responses for the affected in-addr.arpa zones,
recursive resolvers aggressively retried their queries. recursive resolvers aggressively retried their queries.
In 2016, the internet infrastructure company Dyn experienced a large In 2016, the Internet infrastructure company Dyn experienced a large
attack that impacted many high-profile customers. As documented in a attack that impacted many high-profile customers. As documented in a
technical presentation detailing the attack [dyn-attack], Dyn staff technical presentation detailing the attack (see [RETRY-STORM]), Dyn
wrote: "At this point we are now experiencing botnet attack traffic staff wrote:
and what is best classified as a 'retry storm'. Looking at certain
large recursive platforms > 10x normal volume."
In 2018 the root zone key signing key (KSK) was rolled over | At this point we are now experiencing botnet attack traffic and
[root-ksk-roll]. Throughout the rollover period, the root servers | what is best classified as a "retry storm"
|
| Looking at certain large recursive platforms > 10x normal volume
In 2018, the root zone Key Signing Key (KSK) was rolled over
[KSK-ROLLOVER]. Throughout the rollover period, the root servers
experienced a significant increase in DNSKEY queries. Before the experienced a significant increase in DNSKEY queries. Before the
rollover, a.root-servers.net and j.root-servers.net together received rollover, a.root-servers.net and j.root-servers.net together received
about 15 million DNSKEY queries per day. At the end of the about 15 million DNSKEY queries per day. At the end of the
revocation period, they received 1.2 billion per day -- an 80x revocation period, they received 1.2 billion per day: an 80x
increase. Removal of the revoked key from the zone caused DNSKEY increase. Removal of the revoked key from the zone caused DNSKEY
queries to drop to post-rollover but pre-revoke levels, indicating queries to drop to post-rollover but pre-revoke levels, indicating
there is still a population of recursive resolvers using the previous there is still a population of recursive resolvers using the previous
root trust anchor and aggressively retrying DNSKEY queries. root trust anchor and aggressively retrying DNSKEY queries.
In 2021, Verisign researchers used botnet query traffic to In 2021, Verisign researchers used botnet query traffic to
demonstrate that certain large, public recursive DNS services exhibit demonstrate that certain large public recursive DNS services exhibit
very high query rates when all authoritative name servers for a zone very high query rates when all authoritative name servers for a zone
return REFUSED or SERVFAIL [botnet]. When the authoritative servers return refused (REFUSED) or server failure (SERVFAIL) responses (see
were configured normally, query rates for a single botnet domain [BOTNET]). When the authoritative servers were configured normally,
averaged approximately 50 queries per second. However, with the query rates for a single botnet domain averaged approximately 50
servers configured to return SERVFAIL, the query rate increased to queries per second. However, with the servers configured to return
60,000 per second. Furthermore, increases were also observed at the SERVFAIL, the query rate increased to 60,000 per second.
Root and TLD levels, even though delegations at those levels were Furthermore, increases were also observed at the root and Top-Level
Domain (TLD) levels, even though delegations at those levels were
unchanged and continued operating normally. unchanged and continued operating normally.
Later that same year, on October 4, Facebook experienced a widespread Later that same year, on October 4, Facebook experienced a widespread
and well-publicized outage [fb-outage]. During the 6-hour outage, and well-publicized outage [FB-OUTAGE]. During the 6-hour outage,
none of Facebook's authoritative name servers were reachable and did none of Facebook's authoritative name servers were reachable and did
not respond to queries. Recursive name servers attempting to resolve not respond to queries. Recursive name servers attempting to resolve
Facebook domains experienced timeouts. During this time, query Facebook domains experienced timeouts. During this time, query
traffic on the .COM/.NET infrastructure increased from 7,000 to traffic on the .COM/.NET infrastructure increased from 7,000 to
900,000 queries per second [fb-outage-verisign]. 900,000 queries per second [OUTAGE-RESOLVER].
1.2. Related Work 1.2. Related Work
[RFC2308] describes negative caching for four types of DNS queries [RFC2308] describes negative caching for four types of DNS queries
and responses: Name errors, no data, server failures, and dead / and responses: name errors, no data, server failures, and dead/
unreachable servers. It places the strongest requirements on unreachable servers. It places the strongest requirements on
negative caching for name errors and no data responses, while server negative caching for name errors and no data responses, while server
failures and dead servers are left as optional. failures and dead servers are left as optional.
[RFC4697] is a Best Current Practice that documents observed [RFC4697] is a Best Current Practice that documents observed
resolution misbehaviors. It describes a number of situations that resolution misbehaviors. It describes a number of situations that
can lead to excessive queries from recursive resolvers, including: can lead to excessive queries from recursive resolvers, including
requerying for delegation data, lame servers, responses blocked by requerying for delegation data, lame servers, responses blocked by
firewalls, and records with zero TTL. [RFC4697] makes a number of firewalls, and records with zero TTL. [RFC4697] makes a number of
recommendations, varying from "SHOULD" to "MUST." recommendations, varying from "SHOULD" to "MUST".
An expired Internet-Draft describes "The DNS thundering herd problem" [THUNDERING-HERD] describes "The DNS thundering herd problem" as a
[thundering-herd] as a situation arising when cached data expires at situation arising when cached data expires at the same time for a
the same time for a large number of users. Although that document is large number of users. Although that document is not focused on
not focused on negative caching, it does describe the benefits of negative caching, it does describe the benefits of combining multiple
combining multiple, identical queries to upstream name servers. That identical queries to upstream name servers. That is, when a
is, when a recursive resolver receives multiple queries for the same recursive resolver receives multiple queries for the same name,
name, class, and type that cannot be answered from cached data, it class, and type that cannot be answered from cached data, it should
should combine or join them into a single upstream query, rather than combine or join them into a single upstream query rather than emit
emit repeated, identical upstream queries. repeated identical upstream queries.
[RFC5452], "Measures for Making DNS More Resilient against Forged [RFC5452], "Measures for Making DNS More Resilient against Forged
Answers," includes a section that describes the phenomenon known as Answers", includes a section that describes the phenomenon known as
birthday attacks. Here, again, the problem arises when a recursive "Birthday Attacks". Here, again, the problem arises when a recursive
resolver emits multiple, identical upstream queries. Multiple resolver emits multiple identical upstream queries. Multiple
outstanding queries makes it easier for an attacker to guess and outstanding queries make it easier for an attacker to guess and
correctly match some of the DNS message parameters, such as the port correctly match some of the DNS message parameters, such as the port
number and ID field. This situation is further exacerbated in the number and ID field. This situation is further exacerbated in the
case of timeout-based resolution failures. DNSSEC, of course, is a case of timeout-based resolution failures. Of course, DNSSEC is a
suitable defense to spoofing attacks. suitable defense to spoofing attacks.
[RFC8767] describes "Serving Stale Data to Improve DNS Resiliency." [RFC8767] describes "Serving Stale Data to Improve DNS Resiliency".
This permits a recursive resolver to return possibly stale data when This permits a recursive resolver to return possibly stale data when
it is unable to refresh cached, expired data. It introduces the idea it is unable to refresh cached, expired data. It introduces the idea
of a failure recheck timer and says: "Attempts to refresh from non- of a failure recheck timer and says:
responsive or otherwise failing authoritative nameservers are
recommended to be done no more frequently than every 30 seconds." | Attempts to refresh from non-responsive or otherwise failing
| authoritative nameservers are recommended to be done no more
| frequently than every 30 seconds.
1.3. Terminology 1.3. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
* DNS Transport: In this document, DNS transport means a protocol DNS transport: In this document, "DNS transport" means a protocol
used to transport DNS messages between a client and a server. used to transport DNS messages between a client and a server.
This includes "classic DNS" transports, i.e., DNS-over-UDP and This includes "classic DNS" transports, i.e., DNS-over-UDP and
DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS
transports such as DNS-over-TLS [RFC7858], DNS-over-HTTPS transports, such as DNS-over-TLS [RFC7858], DNS-over-HTTPS
[RFC8484], DNS-over-QUIC [RFC9250], and similar communication of [RFC8484], DNS-over-QUIC [RFC9250], and similar communication of
DNS messages using other protocols. NOTE: at the time of this DNS messages using other protocols. Note: at the time of writing,
writing not all DNS transports are standardized for all types of not all DNS transports are standardized for all types of servers
servers, but may become standardized in the future. but may become standardized in the future.
2. Conditions That Lead to DNS Resolution Failures 2. Conditions That Lead to DNS Resolution Failures
A DNS resolution failure occurs when none of the servers available to A DNS resolution failure occurs when none of the servers available to
a resolver client provide any useful response data for a particular a resolver client provide any useful response data for a particular
query name, type, and class. A response is considered useful when it query name, type, and class. A response is considered useful when it
provides either the requested data, a referral to a descendant zone, provides either the requested data, a referral to a descendant zone,
or an indication that no data exists at the given name. or an indication that no data exists at the given name.
It is common for resolvers to have multiple servers from which to It is common for resolvers to have multiple servers from which to
choose for a particular query. For example, in the case of stub-to- choose for a particular query. For example, in the case of stub-to-
recursive, the stub resolver may be configured with multiple recursive, the stub resolver may be configured with multiple
recursive resolver addresses. In the case of recursive-to- recursive resolver addresses. In the case of recursive-to-
authoritative, a given zone usually has more than one name server (NS authoritative, a given zone usually has more than one name server (NS
record), each of which can have multiple IP addresses and multiple record), each of which can have multiple IP addresses and multiple
DNS transports. DNS transports.
Nothing in this document prevents a resolver from retrying a query at Nothing in this document prevents a resolver from retrying a query at
a different server, or the same server over a different DNS a different server or the same server over a different DNS transport.
transport. In the case of timeouts, a resolver can retry the same In the case of timeouts, a resolver can retry the same server and DNS
server and DNS transport a limited number of times. transport a limited number of times.
If any one of the available servers provides a useful response, then If any one of the available servers provides a useful response, then
it is not considered a resolution failure. However, if none of the it is not considered a resolution failure. However, if none of the
servers for a given query tuple <name, type, class> provide a useful servers for a given query tuple <name, type, class> provide a useful
response, the result is a resolution failure. response, the result is a resolution failure.
Note that NXDOMAIN and NOERROR/NODATA responses are not conditions Note that NXDOMAIN and NOERROR/NODATA responses are not conditions
for resolution failure. In these cases, the server is providing a for resolution failure. In these cases, the server is providing a
useful response, either indicating that a name does not exist, or useful response, indicating either that a name does not exist or that
that no data of the requested type exists at the name. These no data of the requested type exists at the name. These negative
negative responses can be cached as described in [RFC2308]. responses can be cached as described in [RFC2308].
The remainder of this section describes a number of different The remainder of this section describes a number of different
conditions that can lead to resolution failure. This section is not conditions that can lead to resolution failure. This section is not
exhaustive. Additional conditions may be expected to cause similar exhaustive. Additional conditions may be expected to cause similar
resolution failures. resolution failures.
2.1. SERVFAIL Responses 2.1. SERVFAIL Responses
Server failure is defined in [RFC1035] as "The name server was unable Server failure is defined in [RFC1035] as: "The name server was
to process this query due to a problem with the name server." A unable to process this query due to a problem with the name server."
server failure is signaled by setting the RCODE field to SERVFAIL. A server failure is signaled by setting the RCODE field to SERVFAIL.
Authoritative servers return SERVFAIL when they don't have any valid Authoritative servers return SERVFAIL when they don't have any valid
data for a zone. For example, a secondary server has been configured data for a zone. For example, a secondary server has been configured
to serve a particular zone, but is unable to retrieve or refresh the to serve a particular zone but is unable to retrieve or refresh the
zone data from the primary server. zone data from the primary server.
Recursive servers return SERVFAIL in response to a number of Recursive servers return SERVFAIL in response to a number of
different conditions, including many described below. different conditions, including many described below.
Although the extended DNS errors method exists "primarily to extend Although the extended DNS errors method exists "primarily to extend
SERVFAIL to provide additional information," it "does not change the SERVFAIL to provide additional information," it "does not change the
processing of RCODEs" [RFC8914]. This document operates at the level processing of RCODEs" [RFC8914]. This document operates at the level
of resolution failure and does not concern particular causes. of resolution failure and does not concern particular causes.
2.2. REFUSED Responses 2.2. REFUSED Responses
A name server returns a message with the RCODE field set to REFUSED A name server returns a message with the RCODE field set to REFUSED
when it refuses to process the query, e.g., for policy or other when it refuses to process the query, e.g., for policy or other
reasons [RFC1035]. reasons [RFC1035].
Authoritative servers generally return REFUSED when processing a Authoritative servers generally return REFUSED when processing a
query for which they are not authoritative. For example, a server query for which they are not authoritative. For example, a server
that is configured to be authoritative for only the example.net zone, that is configured to be authoritative for only the example.net zone
may return REFUSED in response to a query for example.com. may return REFUSED in response to a query for example.com.
Recursive servers generally return REFUSED for query sources that do Recursive servers generally return REFUSED for query sources that do
not match configured access control lists. For example, a server not match configured access control lists. For example, a server
that is configured to allow queries from only 2001:db8:1::/48 may that is configured to allow queries from only 2001:db8:1::/48 may
return REFUSED in response to a query from 2001:db8:5::1. return REFUSED in response to a query from 2001:db8:5::1.
2.3. Timeouts and Unreachable Servers 2.3. Timeouts and Unreachable Servers
A timeout occurs when a resolver fails to receive any response from a A timeout occurs when a resolver fails to receive any response from a
server within a reasonable amount of time. Additionally, a DNS server within a reasonable amount of time. Additionally, a DNS
transport may more quickly indicate lack of reachability in a way transport may more quickly indicate lack of reachability in a way
that wouldn't be considered a timeout. For example: an ICMP port that wouldn't be considered a timeout: for example, an ICMP port
unreachable message, a TCP "connection refused" error, or a TLS unreachable message, a TCP "connection refused" error, or a TLS
handshake failure. [RFC2308] refers to these conditions collectively handshake failure. [RFC2308] refers to these conditions collectively
as "dead / unreachable servers." as "dead / unreachable servers".
Note that resolver implementations may have two types of timeouts: a Note that resolver implementations may have two types of timeouts: a
smaller timeout which might trigger a query retry and a larger smaller timeout that might trigger a query retry and a larger timeout
timeout after which the server is considered unresponsive. after which the server is considered unresponsive. Section 3.1
Section 3.1 discusses the requirements for resolvers when retrying discusses the requirements for resolvers when retrying queries.
queries.
Timeouts can present a particular problem for negative caching, Timeouts can present a particular problem for negative caching,
depending on how the resolver handles multiple, outstanding queries depending on how the resolver handles multiple outstanding queries
for the same <query name, type, class> tuple. For example, consider for the same <query name, type, class> tuple. For example, consider
a very popular website in a zone whose name servers are all a very popular website in a zone whose name servers are all
unresponsive. A recursive resolver might receive tens or hundreds of unresponsive. A recursive resolver might receive tens or hundreds of
queries per second for the popular website. If the recursive server queries per second for that website. If the recursive server
implementation "joins" these outstanding queries together, then it implementation joins these outstanding queries together, then it only
only sends one recursive-to-authoritative query for the numerous sends one recursive-to-authoritative query for the numerous pending
pending stub-to-recursive queries. If, however, the implementation stub-to-recursive queries. However, if the implementation does not
does not join outstanding queries together, then it sends one join outstanding queries together, then it sends one recursive-to-
recursive-to-authoritative query for each stub-to-recursive query. authoritative query for each stub-to-recursive query. If the
If the incoming query rate is high and the timeout is large, this incoming query rate is high and the timeout is large, this might
might result in hundreds or thousands of recursive-to-authoritative result in hundreds or thousands of recursive-to-authoritative queries
queries while waiting for an authoritative server to time out. while waiting for an authoritative server to time out.
A recursive resolver that does not join outstanding queries together A recursive resolver that does not join outstanding queries together
is more susceptible to birthday attacks ([RFC5452] Section 5), is more susceptible to Birthday Attacks ([RFC5452], Section 5),
especially when those queries result in timeouts. especially when those queries result in timeouts.
2.4. Delegation Loops 2.4. Delegation Loops
A delegation loop, or cycle, can occur when one domain utilizes name A delegation loop, or cycle, can occur when one domain utilizes name
servers in a second domain, and the second domain uses name servers servers in a second domain, and the second domain uses name servers
in the first. For example: in the first. For example:
FOO.EXAMPLE. NS NS1.EXAMPLE.COM. FOO.EXAMPLE. NS NS1.EXAMPLE.COM.
FOO.EXAMPLE. NS NS2.EXAMPLE.COM. FOO.EXAMPLE. NS NS2.EXAMPLE.COM.
skipping to change at page 9, line 15 skipping to change at line 373
In this example, no names under foo.example or example.com can be In this example, no names under foo.example or example.com can be
resolved because of the delegation loop. Note that a delegation loop resolved because of the delegation loop. Note that a delegation loop
may involve more than two domains. A resolver that does not detect may involve more than two domains. A resolver that does not detect
delegation loops may generate DDoS-levels of attack traffic to delegation loops may generate DDoS-levels of attack traffic to
authoritative name servers, as documented in the TsuNAME authoritative name servers, as documented in the TsuNAME
vulnerability [TsuNAME]. vulnerability [TsuNAME].
2.5. Alias Loops 2.5. Alias Loops
An alias loop, or cycle, can occur when one CNAME or DNAME RR refers An alias loop, or cycle, can occur when one CNAME or DNAME RR refers
to a second name, which in turn is specified as an alias for the to a second name, which, in turn, is specified as an alias for the
first. For example: first. For example:
APP.FOO.EXAMPLE. CNAME APP.EXAMPLE.NET. APP.FOO.EXAMPLE. CNAME APP.EXAMPLE.NET.
APP.EXAMPLE.NET. CNAME APP.FOO.EXAMPLE. APP.EXAMPLE.NET. CNAME APP.FOO.EXAMPLE.
The need to detect CNAME loops has been known since at least The need to detect CNAME loops has been known since at least
[RFC1034] which states in Section 3.6.2: [RFC1034], which states in Section 3.6.2:
"Of course, by the robustness principle, domain software should not | Of course, by the robustness principle, domain software should not
fail when presented with CNAME chains or loops; CNAME chains should | fail when presented with CNAME chains or loops; CNAME chains
be followed and CNAME loops signaled as an error." | should be followed and CNAME loops signalled as an error.
2.6. DNSSEC Validation Failures 2.6. DNSSEC Validation Failures
For zones that are signed with DNSSEC, a resolution failure can occur For zones that are signed with DNSSEC, a resolution failure can occur
when a security-aware resolver believes it should be able to when a security-aware resolver believes it should be able to
establish a chain-of-trust for an RRset but is unable to do so, establish a chain of trust for an RRset but is unable to do so,
possibly after trying multiple authoritative name servers. DNSSEC possibly after trying multiple authoritative name servers. DNSSEC
validation failures may be due to signature mismatch, missing DNSKEY validation failures may be due to signature mismatch, missing DNSKEY
RRs, problems with denial-of-existence records, clock skew, or other RRs, problems with denial-of-existence records, clock skew, or other
reasons. reasons.
Section 4.7 of [RFC4035] already discusses the requirements and Section 4.7 of [RFC4035] already discusses the requirements and
reasons for caching validation failures. Section 3.4 of this reasons for caching validation failures. Section 3.4 of this
document strengthens those requirements. document strengthens those requirements.
2.7. FORMERR Responses 2.7. FORMERR Responses
A name server returns a message with the RCODE field set to FORMERR A name server returns a message with the RCODE field set to FORMERR
when it is unable to interpret the query [RFC1035]. FORMERR when it is unable to interpret the query [RFC1035]. FORMERR
responses are often associated with problems processing EDNS(0) responses are often associated with problems processing Extension
Extensions [RFC6891]. Authoritative servers may return FORMERR when Mechanisms for DNS (EDNS(0)) [RFC6891]. Authoritative servers may
they do not implement EDNS(0), or when EDNS(0) option fields are return FORMERR when they do not implement EDNS(0), or when EDNS(0)
malformed, but not for unknown EDNS(0) options. option fields are malformed, but not for unknown EDNS(0) options.
Upon receipt of a FORMERR response, some recursive clients will retry Upon receipt of a FORMERR response, some recursive clients will retry
their queries without EDNS(0), while others will not. Nonetheless, their queries without EDNS(0), while others will not. Nonetheless,
resolution failures from FORMERR responses are rare. resolution failures from FORMERR responses are rare.
3. Requirements for Caching DNS Resolution Failures 3. Requirements for Caching DNS Resolution Failures
3.1. Retries and Timeouts 3.1. Retries and Timeouts
A resolver MUST NOT retry a given query to a server address over a A resolver MUST NOT retry a given query to a server address over a
given DNS transport more than twice (i.e., three queries in total) given DNS transport more than twice (i.e., three queries in total)
before considering the server address unresponsive over that DNS before considering the server address unresponsive over that DNS
transport for that query. transport for that query.
A resolver MAY retry a given query over a different DNS transport to A resolver MAY retry a given query over a different DNS transport to
the same server if it has reason to believe the DNS transport is the same server if it has reason to believe the DNS transport is
available for that server and is compatible with the resolver's available for that server and is compatible with the resolver's
security policies. security policies.
This document does not place any requirements on how long an This document does not place any requirements on how long an
implementation should wait before retrying a query (aka timeout implementation should wait before retrying a query (aka a timeout
value), which may be implementation- or configuration-dependent. It value), which may be implementation or configuration dependent. It
is generally expected that typical timeout values range from 3 to 30 is generally expected that typical timeout values range from 3 to 30
seconds. seconds.
3.2. Caching 3.2. Caching
Resolvers MUST implement a cache for resolution failures. The Resolvers MUST implement a cache for resolution failures. The
purpose of this cache is to eliminate repeated upstream queries that purpose of this cache is to eliminate repeated upstream queries that
cannot be resolved. When an incoming query matches a cached cannot be resolved. When an incoming query matches a cached
resolution failure, the resolver MUST NOT send any corresponding resolution failure, the resolver MUST NOT send any corresponding
outgoing queries until after the cache entries expire. outgoing queries until after the cache entries expire.
skipping to change at page 11, line 7 skipping to change at line 457
implementation choices so that operators know what behaviors to implementation choices so that operators know what behaviors to
expect when resolution failures are cached. expect when resolution failures are cached.
Resolvers MUST cache resolution failures for at least 1 second. Resolvers MUST cache resolution failures for at least 1 second.
Resolvers MAY cache different types of resolution failures for Resolvers MAY cache different types of resolution failures for
different (i.e., longer) amounts of time. Consistent with [RFC2308], different (i.e., longer) amounts of time. Consistent with [RFC2308],
resolution failures MUST NOT be cached for longer than 5 minutes. resolution failures MUST NOT be cached for longer than 5 minutes.
The minimum cache duration SHOULD be configurable by the operator. A The minimum cache duration SHOULD be configurable by the operator. A
longer cache duration for resolution failures will reduce the longer cache duration for resolution failures will reduce the
processing burden from repeated queries, but may also increase the processing burden from repeated queries but may also increase the
time to recover from transitory issues. time to recover from transitory issues.
Resolvers SHOULD employ an exponential or linear backoff algorithm to Resolvers SHOULD employ an exponential or linear backoff algorithm to
increase the cache duration for persistent resolution failures. For increase the cache duration for persistent resolution failures. For
example, the initial time for negatively caching a resolution failure example, the initial time for negatively caching a resolution failure
might be set to 5 seconds, and increased after each retry that might be set to 5 seconds and increased after each retry that results
results in another resolution failure, up to a configurable maximum, in another resolution failure, up to a configurable maximum, not to
not to exceed the 5-minute upper limit. exceed the 5-minute upper limit.
Notwithstanding the above, resolvers SHOULD implement measures to Notwithstanding the above, resolvers SHOULD implement measures to
mitigate resource exhaustion attacks on the failed resolution cache. mitigate resource exhaustion attacks on the failed resolution cache.
That is, the resolver should limit the amount of memory and/or That is, the resolver should limit the amount of memory and/or
processing time devoted to this cache. processing time devoted to this cache.
3.3. Requerying Delegation Information 3.3. Requerying Delegation Information
Section 2.1 of [RFC4697] identifies circumstances in which "every Section 2.1 of [RFC4697] identifies circumstances in which:
name server in a zone's NS RRSet is unreachable (e.g., during a
network outage), unavailable (e.g., the name server process is not | ...every name server in a zone's NS RRSet is unreachable (e.g.,
running on the server host), or misconfigured (e.g., the name server | during a network outage), unavailable (e.g., the name server
is not authoritative for the given zone, also known as 'lame')." It | process is not running on the server host), or misconfigured
prohibits unnecessary "aggressive requerying" to the parent of a non- | (e.g., the name server is not authoritative for the given zone,
responsive zone by sending NS queries. | also known as "lame").
It prohibits unnecessary "aggressive requerying" to the parent of a
non-responsive zone by sending NS queries.
The problem of aggressive requerying to parent zones is not limited The problem of aggressive requerying to parent zones is not limited
to queries of type NS. This document updates the requirement from to queries of type NS. This document updates the requirement from
section 2.1.1 of [RFC4697] to apply more generally: Upon encountering Section 2.1.1 of [RFC4697] to apply more generally:
a zone whose name servers are all non-responsive, a resolver MUST
cache the resolution failure. Furthermore, the resolver MUST limit | Upon encountering a zone whose name servers are all non-
queries to the non-responsive zone's parent zone (and to other | responsive, a resolver MUST cache the resolution failure.
ancestor zones) just as it would limit subsequent queries to the non- | Furthermore, the resolver MUST limit queries to the non-responsive
responsive zone. | zone's parent zone (and to other ancestor zones) just as it would
| limit subsequent queries to the non-responsive zone.
3.4. DNSSEC Validation Failures 3.4. DNSSEC Validation Failures
Section 4.7 of [RFC4035] states: Section 4.7 of [RFC4035] states:
To prevent such unnecessary DNS traffic, security-aware resolvers MAY | To prevent such unnecessary DNS traffic, security-aware resolvers
cache data with invalid signatures, with some restrictions. | MAY cache data with invalid signatures, with some restrictions.
This document updates [RFC4035] with the following, stronger This document updates [RFC4035] with the following, stronger,
requirement: requirement:
To prevent such unnecessary DNS traffic, security-aware resolvers | To prevent such unnecessary DNS traffic, security-aware resolvers
MUST cache DNSSEC validation failures, with some restrictions. | MUST cache DNSSEC validation failures, with some restrictions.
One of the restrictions mentioned in [RFC4035] is to use a small TTL One of the restrictions mentioned in [RFC4035] is to use a small TTL
when caching data that fails DNSSEC validation. This is, in part, when caching data that fails DNSSEC validation. This is, in part,
because the provided TTL cannot be trusted. The advice from because the provided TTL cannot be trusted. The advice from
Section 3.2 herein can be used as guidance on TTLs for caching DNSSEC Section 3.2 herein can be used as guidance on TTLs for caching DNSSEC
validation failures. validation failures.
4. IANA Considerations 4. IANA Considerations
This document has no IANA actions. This document has no IANA actions.
5. Security Considerations 5. Security Considerations
As noted in Section 3.2, an attacker might attempt a resource As noted in Section 3.2, an attacker might attempt a resource
exhaustion attack by sending queries for a large number of names and/ exhaustion attack by sending queries for a large number of names and/
or types that result in resolution failure. Resolvers SHOULD or types that result in resolution failure. Resolvers SHOULD
implement measures to protect themselves and bound the amount of implement measures to protect themselves and bound the amount of
memory devoted to caching resolution failures. memory devoted to caching resolution failures.
A cache poisoning attack (see section 2.2 of [RFC7873]) resulting in A cache poisoning attack (see Section 2.2 of [RFC7873]) resulting in
denial of service may be possible because failure messages cannot be denial of service may be possible because failure messages cannot be
signed. An attacker might generate queries and send forged failure signed. An attacker might generate queries and send forged failure
messages, causing the resolver to cease sending queries to the messages, causing the resolver to cease sending queries to the
authoritative name server (see 2.6 of [RFC4732] for a similar "data authoritative name server (see Section 2.6 of [RFC4732] for a similar
corruption attack"). However, this would require continued spoofing "data corruption attack" and Section 5.2 of [TuDoor] for a "DNSDoS
throughout the backoff period and required attacks due to the 5 attack"). However, this would require continued spoofing throughout
minute cache limit. As in section 4.1.12 of [RFC4686], this attack's the backoff period and repeated attacks due to the 5-minute cache
effects would be "localized and of limited duration." limit. As in Section 4.1.12 of [RFC4686], this attack's effects
would be "localized and of limited duration".
6. Privacy Considerations 6. Privacy Considerations
This specification has no impact on user privacy. This specification has no impact on user privacy.
7. Acknowledgments 7. References
The authors wish to thank Mukund Sivaraman, Petr Spacek, Peter van
Dijk, Tim Wicinksi, Joe Abley, Evan Hunt, Barry Leiba, Lucas Pardue,
Paul Wouters, and other members of the DNSOP working group for their
feedback and contributions.
8. Change Log
RFC Editor: Please remove this section before publication.
This section lists substantial changes to the document as it is being
worked on.
From -00 to -01:
* use phrase "the initial TTL for negatively caching a resolution
failure" instead of "negative cache TTL"
* typos, etc
From dwmtwc-01 to ietf-00:
* Adopted by WG
From -00 to -01:
* Clarify retries and timeouts to apply on a per-query basis.
* Say more about the 5 second caching requirement in TTLs section.
* Expanded opening paragraphs of section 2, now titled "Conditions
That Lead To DNS Resolution Failures".
* Text from the former section 3.3 ("Scope") moved to top of section
2.
* Section 3.2 was formerly "TTLs" and is now "Caching". The draft
no longer requires e.g. caching by tuples, but now just requires
caching failures so that repeated queries are not sent out.
* State that resolvers should protect themselves from cache resource
exhaustion attacks.
From -01 to -02:
* Added cache poisoning attack to Security Considerations.
From -02 to -03:
* Added missing reference to Verisign blog post.
From -03 to -04:
* Address most of Peter van Dijk's DNS Directorate review comments.
* Removed "For Discussion" section from introduction referencing
apparent inconsistent RFC2119 keyword use in RFC2308.
* Replaced "For Discussion" section from "Requerying Delegation
Information" to generalize RFC 4697 requirements not to requery
parent zones to cover all query types.
* Replaced "For Discussion" section from "DNSSEC Validation
Failures" to strengthen RFC 4035 to require caching of DNSSEC
validation failures.
* Added RFC 4035 and RFC 4697 to updated RFCs list.
* Added (empty) Implementation Status section.
From -04 to -05:
* Expanded abstract to include updates to RFCs 4035 and 4697.
* Removed reference to unused terms from RFC 8126.
* Reworded "server transport" to "a server address over a given
transport".
* Added explanatory text in "Server Failure" section for exclusion
of extended DNS errors
* Changed "Timeouts" section to "Timeouts and Unreachable Servers"
and added reference to transport layer indicators from RFC 2308.
* Clarified meaning of "timeout value".
From -05 to -06:
* Changed minimum 5 second caching to 1 second, with other changes
to give implementors and operators more leeway.
* Changed "exponential backoff" to more general concept of
increasing backoff.
* Added some implementation status notes for BIND, from dnsop list
email.
From -06 to -07:
* Artart review: minor editorial clarifications
* Genart review: remove confusing and superfluous section
references.
* Genart review: clarify resolution failure caching time range.
* Genart review: better define DNS transports
* Dnsdir review: clarify FORMERR response retries.
From -07 to -08:
* "only exacerbated" -> "further exacerbated"
* lowercase IPv6 addresses
* lowercase example domain in text
* updated introduction to include all updated RFCs
* change 3.2 SHOULD to should
* section 3.4: say a little about "some restrictions" from RFC 4035
* Intdir telechat review: a few grammatical nits
* Various IESG reviewer suggestions
9. Implementation Status
RFC Editor: Please remove this section before publication.
This section records the status of known implementations of the
protocol defined by this specification at the time of posting of this
Internet-Draft, and is based on a proposal described in RFC 7942.
The description of implementations in this section is intended to
assist the IETF in its decision processes in progressing drafts to
RFCs. Please note that the listing of any individual implementation
here does not imply endorsement by the IETF. Furthermore, no effort
has been spent to verify the information presented here that was
supplied by IETF contributors. This is not intended as, and must not
be construed to be, a catalog of available implementations or their
features. Readers are advised to note that other implementations may
exist.
9.1. BIND
The following is excerpted from a message to the dnsop mailing list
regarding how BIND caches resolution failures:
BIND implemented a SERVFAIL cache in 2014 with a default cache
duration of 10 seconds; after a slew of complaints, in 2015 we
lowered it to 1 second, and also reduced the configurable maximum
from 5 minutes to 30 seconds. The reason was that certain common
failure conditions are transitory, and it's not unreasonable to
prioritize rapid recovery.
Now, to be clear, the comparison isn't exactly apples to apples: the
BIND SERVFAIL cache is a somewhat stupider mechanism than the one
outlined in the draft. It caches *all* SERVFAIL responses,
regardless of the reason they were generated. For example: when the
cache is cold, a query may time out or hit DDoS mitigation limits
before it's finished getting through the whole iteration process; an
immediate retry would start further along the delegation chain and
would succeed. Such problems weren't noticeable until we implemented
the 10-second cache, but became very noticeable afterward.
If we were able to selectively cache *only* those SERVFAILs that are
unlikely to recover soon, then five seconds might indeed be a good
starting point. But, with our relatively dumb cache, we found that
one second did a fairly good job reducing the processing burden from
repeated queries, and eliminated the user complaints about the
resolver taking forever to recover from short-lived problems. It's
been working well enough that it hasn't been a priority to develop a
more complex failure cache.
10. References
10.1. Normative References 7.1. Normative References
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
<https://www.rfc-editor.org/info/rfc1034>. <https://www.rfc-editor.org/info/rfc1034>.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
November 1987, <https://www.rfc-editor.org/info/rfc1035>. November 1987, <https://www.rfc-editor.org/info/rfc1035>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
skipping to change at page 17, line 9 skipping to change at line 575
<https://www.rfc-editor.org/info/rfc4035>. <https://www.rfc-editor.org/info/rfc4035>.
[RFC4697] Larson, M. and P. Barber, "Observed DNS Resolution [RFC4697] Larson, M. and P. Barber, "Observed DNS Resolution
Misbehavior", BCP 123, RFC 4697, DOI 10.17487/RFC4697, Misbehavior", BCP 123, RFC 4697, DOI 10.17487/RFC4697,
October 2006, <https://www.rfc-editor.org/info/rfc4697>. October 2006, <https://www.rfc-editor.org/info/rfc4697>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
10.2. Informative References 7.2. Informative References
[botnet] Wessels, D. and M. Thomas, "Botnet Traffic Observed at [BOTNET] Wessels, D. and M. Thomas, "Botnet Traffic Observed at
Various Levels of the DNS Hierarchy", May 2021, Various Levels of the DNS Hierarchy", May 2021,
<https://indico.dns-oarc.net/event/38/contributions/841/>. <https://indico.dns-oarc.net/event/38/contributions/841/>.
[dyn-attack] [DNSSEC-ROLLOVER]
Sullivan, A., "Dyn, DDoS, and DNS", March 2017, Michaleson, G., Wallström, P., Arends, R., and G. Huston,
<https://ccnso.icann.org/sites/default/files/file/field- "Roll Over and Die?", February 2010,
file-attach/2017-04/presentation-oracle-dyn-ddos-dns- <https://www.potaroo.net/ispcol/2010-02/rollover.html>.
13mar17-en.pdf>.
[fb-outage] [FB-OUTAGE]
Janardhan, S., "More details about the October 4 outage", Janardhan, S., "More details about the October 4 outage",
October 2021, <https://engineering.fb.com/2021/10/05/ October 2021, <https://engineering.fb.com/2021/10/05/
networking-traffic/outage-details/>. networking-traffic/outage-details/>.
[fb-outage-verisign] [KSK-ROLLOVER]
Müller, M., Thomas, M., Wessels, D., Hardaker, W., Chung,
T., Toorop, W., and R. van Rijswijk-Deij, "Roll, Roll,
Roll Your Root: A Comprehensive Analysis of the First Ever
DNSSEC Root KSK Rollover", IMC '19: Proceedings of the
Internet Measurement Conference, Pages 1-14,
DOI 10.1145/3355369.3355570, October 2019,
<https://doi.org/10.1145/3355369.3355570>.
[OUTAGE-RESOLVER]
Verisign, "Observations on Resolver Behavior During DNS Verisign, "Observations on Resolver Behavior During DNS
Outages", 20 January 2022, Outages", January 2022,
<https://blog.verisign.com/security/facebook-dns-outage/>. <https://blog.verisign.com/security/facebook-dns-outage/>.
[RETRY-STORM]
Sullivan, A., "Dyn, DDoS, and DNS", March 2017,
<https://ccnso.icann.org/sites/default/files/file/field-
file-attach/2017-04/presentation-oracle-dyn-ddos-dns-
13mar17-en.pdf>.
[RFC0882] Mockapetris, P., "Domain names: Concepts and facilities", [RFC0882] Mockapetris, P., "Domain names: Concepts and facilities",
RFC 882, DOI 10.17487/RFC0882, November 1983, RFC 882, DOI 10.17487/RFC0882, November 1983,
<https://www.rfc-editor.org/info/rfc882>. <https://www.rfc-editor.org/info/rfc882>.
[RFC0883] Mockapetris, P., "Domain names: Implementation [RFC0883] Mockapetris, P., "Domain names: Implementation
specification", RFC 883, DOI 10.17487/RFC0883, November specification", RFC 883, DOI 10.17487/RFC0883, November
1983, <https://www.rfc-editor.org/info/rfc883>. 1983, <https://www.rfc-editor.org/info/rfc883>.
[RFC4686] Fenton, J., "Analysis of Threats Motivating DomainKeys [RFC4686] Fenton, J., "Analysis of Threats Motivating DomainKeys
Identified Mail (DKIM)", RFC 4686, DOI 10.17487/RFC4686, Identified Mail (DKIM)", RFC 4686, DOI 10.17487/RFC4686,
skipping to change at page 18, line 43 skipping to change at line 671
[RFC8914] Kumari, W., Hunt, E., Arends, R., Hardaker, W., and D. [RFC8914] Kumari, W., Hunt, E., Arends, R., Hardaker, W., and D.
Lawrence, "Extended DNS Errors", RFC 8914, Lawrence, "Extended DNS Errors", RFC 8914,
DOI 10.17487/RFC8914, October 2020, DOI 10.17487/RFC8914, October 2020,
<https://www.rfc-editor.org/info/rfc8914>. <https://www.rfc-editor.org/info/rfc8914>.
[RFC9250] Huitema, C., Dickinson, S., and A. Mankin, "DNS over [RFC9250] Huitema, C., Dickinson, S., and A. Mankin, "DNS over
Dedicated QUIC Connections", RFC 9250, Dedicated QUIC Connections", RFC 9250,
DOI 10.17487/RFC9250, May 2022, DOI 10.17487/RFC9250, May 2022,
<https://www.rfc-editor.org/info/rfc9250>. <https://www.rfc-editor.org/info/rfc9250>.
[roll-over-and-die] [THUNDERING-HERD]
Michaleson, G., Wallström, P., Arends, R., and G. Huston, Sivaraman, M. and C. Liu, "The DNS thundering herd
"Roll Over and Die?", February 2010, problem", Work in Progress, Internet-Draft, draft-muks-
<https://www.potaroo.net/ispcol/2010-02/rollover.html>. dnsop-dns-thundering-herd-00, 25 June 2020,
<https://datatracker.ietf.org/doc/html/draft-muks-dnsop-
[root-ksk-roll] dns-thundering-herd-00>.
Müller, M., Thomas, M., Wessels, D., Hardaker, W., Chung,
T., Toorop, W., and R.v. Rijswijk-Deij, "Roll, Roll, Roll
Your Root: A Comprehensive Analysis of the First Ever
DNSSEC Root KSK Rollover", October 2019,
<https://dl.acm.org/doi/10.1145/3355369.3355570>.
[thundering-herd]
Sivaraman, M. and C. Liu, "The DNS thundering herd problem
(expired Internet-Draft)", June 2020,
<https://datatracker.ietf.org/doc/draft-muks-dnsop-dns-
thundering-herd/>.
[TsuNAME] Moura, G. C. M., Castro, S., Heidemann, J., and W. [TsuNAME] Moura, G. C. M., Castro, S., Heidemann, J., and W.
Hardaker, "TsuNAME: exploiting misconfiguration and Hardaker, "TsuNAME: exploiting misconfiguration and
vulnerability to DDoS DNS", November 2021, vulnerability to DDoS DNS", IMC '21: Proceedings of the
<https://dl.acm.org/doi/10.1145/3487552.3487824>. 21st ACM Internet Measurement Conference, Pages 398-418,
DOI 10.1145/3487552.3487824, November 2021,
<https://doi.org/10.1145/3487552.3487824>.
[TuDoor] Li, X., Xu, W., Liu, B., Zhang, M., Li, Z., Zhang, J.,
Chang, D., Zheng, X., Wang, C., Chen, J., Duan, H., and Q.
Li, "TuDoor Attack: Systematically Exploring and
Exploiting Logic Vulnerabilities in DNS Response Pre-
processing with Malformed Packets", IEEE Symposium on
Security and Privacy (SP), DOI 10.1109/SP54263.2024.00046,
2024, <https://doi.ieeecomputersociety.org/10.1109/
SP54263.2024.00046>.
Acknowledgments
The authors wish to thank Mukund Sivaraman, Petr Spacek, Peter van
Dijk, Tim Wicinksi, Joe Abley, Evan Hunt, Barry Leiba, Lucas Pardue,
Paul Wouters, and other members of the DNSOP Working Group for their
feedback and contributions.
Authors' Addresses Authors' Addresses
Duane Wessels Duane Wessels
Verisign Verisign
12061 Bluemont Way 12061 Bluemont Way
Reston, VA 20190 Reston, VA 20190
United States of America United States of America
Phone: +1 703 948-3200 Phone: +1 703 948-3200
Email: dwessels@verisign.com Email: dwessels@verisign.com
 End of changes. 73 change blocks. 
393 lines changed or deleted 240 lines changed or added

This html diff was produced by rfcdiff 1.48.