rfc9520.original | rfc9520.txt | |||
---|---|---|---|---|
Internet Engineering Task Force D. Wessels | Internet Engineering Task Force (IETF) D. Wessels | |||
Internet-Draft W. Carroll | Request for Comments: 9520 W. Carroll | |||
Updates: 2308, 4035, 4697 (if approved) M. Thomas | Updates: 2308, 4035, 4697 M. Thomas | |||
Intended status: Standards Track Verisign | Category: Standards Track Verisign | |||
Expires: 24 March 2024 21 September 2023 | ISSN: 2070-1721 December 2023 | |||
Negative Caching of DNS Resolution Failures | Negative Caching of DNS Resolution Failures | |||
draft-ietf-dnsop-caching-resolution-failures-08 | ||||
Abstract | Abstract | |||
In the DNS, resolvers employ caching to reduce both latency for end | In the DNS, resolvers employ caching to reduce both latency for end | |||
users and load on authoritative name servers. The process of | users and load on authoritative name servers. The process of | |||
resolution may result in one of three types of responses: (1) a | resolution may result in one of three types of responses: (1) a | |||
response containing the requested data; (2) a response indicating the | response containing the requested data, (2) a response indicating the | |||
requested data does not exist; or (3) a non-response due to a | requested data does not exist, or (3) a non-response due to a | |||
resolution failure in which the resolver does not receive any useful | resolution failure in which the resolver does not receive any useful | |||
information regarding the data's existence. This document concerns | information regarding the data's existence. This document concerns | |||
itself only with the third type. | itself only with the third type. | |||
RFC 2308 specifies requirements for DNS negative caching. There, | RFC 2308 specifies requirements for DNS negative caching. There, | |||
caching of type (2) responses is mandatory and caching of type (3) | caching of TYPE 2 responses is mandatory and caching of TYPE 3 | |||
responses is optional. This document updates RFC 2308 to require | responses is optional. This document updates RFC 2308 to require | |||
negative caching for DNS resolution failures. | negative caching for DNS resolution failures. | |||
RFC 4035 allows DNSSEC validation failure caching. This document | RFC 4035 allows DNSSEC validation failure caching. This document | |||
updates RFC 4035 to require caching for DNSSEC validation failures. | updates RFC 4035 to require caching for DNSSEC validation failures. | |||
RFC 4697 prohibits aggressive requerying for NS records at a failed | RFC 4697 prohibits aggressive requerying for NS records at a failed | |||
zone's parent zone. This document updates RFC 4697 to expand this | zone's parent zone. This document updates RFC 4697 to expand this | |||
requirement to all query types and to all ancestor zones. | requirement to all query types and to all ancestor zones. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | This document is a product of the Internet Engineering Task Force | |||
Task Force (IETF). Note that other groups may also distribute | (IETF). It represents the consensus of the IETF community. It has | |||
working documents as Internet-Drafts. The list of current Internet- | received public review and has been approved for publication by the | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | Information about the current status of this document, any errata, | |||
and may be updated, replaced, or obsoleted by other documents at any | and how to provide feedback on it may be obtained at | |||
time. It is inappropriate to use Internet-Drafts as reference | https://www.rfc-editor.org/info/rfc9520. | |||
material or to cite them other than as "work in progress." | ||||
This Internet-Draft will expire on 24 March 2024. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2023 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Motivation | |||
1.2. Related Work . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Related Work | |||
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Terminology | |||
2. Conditions That Lead to DNS Resolution Failures . . . . . . . 6 | 2. Conditions That Lead to DNS Resolution Failures | |||
2.1. SERVFAIL Responses . . . . . . . . . . . . . . . . . . . 7 | 2.1. SERVFAIL Responses | |||
2.2. REFUSED Responses . . . . . . . . . . . . . . . . . . . . 7 | 2.2. REFUSED Responses | |||
2.3. Timeouts and Unreachable Servers . . . . . . . . . . . . 8 | 2.3. Timeouts and Unreachable Servers | |||
2.4. Delegation Loops . . . . . . . . . . . . . . . . . . . . 8 | 2.4. Delegation Loops | |||
2.5. Alias Loops . . . . . . . . . . . . . . . . . . . . . . . 9 | 2.5. Alias Loops | |||
2.6. DNSSEC Validation Failures . . . . . . . . . . . . . . . 9 | 2.6. DNSSEC Validation Failures | |||
2.7. FORMERR Responses . . . . . . . . . . . . . . . . . . . . 9 | 2.7. FORMERR Responses | |||
3. Requirements for Caching DNS Resolution Failures . . . . . . 10 | 3. Requirements for Caching DNS Resolution Failures | |||
3.1. Retries and Timeouts . . . . . . . . . . . . . . . . . . 10 | 3.1. Retries and Timeouts | |||
3.2. Caching . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 3.2. Caching | |||
3.3. Requerying Delegation Information . . . . . . . . . . . . 11 | 3.3. Requerying Delegation Information | |||
3.4. DNSSEC Validation Failures . . . . . . . . . . . . . . . 11 | 3.4. DNSSEC Validation Failures | |||
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | 4. IANA Considerations | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 | 5. Security Considerations | |||
6. Privacy Considerations . . . . . . . . . . . . . . . . . . . 12 | 6. Privacy Considerations | |||
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 | 7. References | |||
8. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 7.1. Normative References | |||
9. Implementation Status . . . . . . . . . . . . . . . . . . . . 15 | 7.2. Informative References | |||
9.1. BIND . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | Acknowledgments | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | ||||
10.2. Informative References . . . . . . . . . . . . . . . . . 17 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 | ||||
1. Introduction | 1. Introduction | |||
Caching has always been a fundamental component of DNS resolution on | Caching has always been a fundamental component of DNS resolution on | |||
the Internet. For example [RFC0882] states: | the Internet. For example, [RFC0882] states: | |||
"The sheer size of the database and frequency of updates suggest that | | The sheer size of the database and frequency of updates suggest | |||
it must be maintained in a distributed manner, with local caching to | | that it must be maintained in a distributed manner, with local | |||
improve performance." | | caching to improve performance. | |||
The early DNS RFCs ([RFC0882], [RFC0883], [RFC1034], and [RFC1035]) | The early DNS RFCs ([RFC0882], [RFC0883], [RFC1034], and [RFC1035]) | |||
primarily discuss caching in the context of what [RFC2308] calls | primarily discuss caching in the context of what [RFC2308] calls | |||
"positive" responses, that is, when the response includes the | "positive responses", that is, when the response includes the | |||
requested data. In this case, a TTL is associated with each resource | requested data. In this case, a TTL is associated with each Resource | |||
record in the response. Resolvers can cache and reuse the data until | Record (RR) in the response. Resolvers can cache and reuse the data | |||
the TTL expires. | until the TTL expires. | |||
Section 4.3.4 of [RFC1034] describes negative response caching, but | Section 4.3.4 of [RFC1034] describes negative response caching, but | |||
notes it is optional and only talks about name errors (NXDOMAIN). | notes it is optional and only talks about name errors (NXDOMAIN). | |||
This is the origin of using the SOA MINIMUM field as a negative | This is the origin of using the SOA MINIMUM field as a negative | |||
caching TTL. | caching TTL. | |||
[RFC2308] updated [RFC1034] to specify new requirements for DNS | [RFC2308] updated [RFC1034] to specify new requirements for DNS | |||
negative caching, including making it mandatory for caching resolvers | negative caching, including making it mandatory for caching resolvers | |||
to cache name error (NXDOMAIN) and no data (NODATA) responses when a | to cache name error (NXDOMAIN) and no data (NODATA) responses when an | |||
SOA record is available to provide a TTL. [RFC2308] further | SOA record is available to provide a TTL. [RFC2308] further | |||
specified optional negative caching for two DNS resolution failure | specified optional negative caching for two DNS resolution failure | |||
cases: server failure and dead / unreachable servers. | cases: server failure and dead/unreachable servers. | |||
This document updates [RFC2308] to require negative caching of all | This document updates [RFC2308] to require negative caching of all | |||
DNS resolution failures and provides additional examples of | DNS resolution failures and provides additional examples of | |||
resolution failures. This document also updates [RFC4035] to require | resolution failures, [RFC4035] to require caching for DNSSEC | |||
caching for DNSSEC validation failures as well as [RFC4697] to expand | validation failures, as well as [RFC4697] to expand the scope of | |||
the scope of prohibiting aggressive requerying for NS records at a | prohibiting aggressive requerying for NS records at a failed zone's | |||
failed zone's parent zone to all query types and to all ancestor | parent zone to all query types and to all ancestor zones. | |||
zones. | ||||
1.1. Motivation | 1.1. Motivation | |||
Operators of DNS services have known for some time that recursive | Operators of DNS services have known for some time that recursive | |||
resolvers become more aggressive when they experience resolution | resolvers become more aggressive when they experience resolution | |||
failures. A number of different anecdotes, experiments, and | failures. A number of different anecdotes, experiments, and | |||
incidents support this claim. | incidents support this claim. | |||
In December 2009, a secondary server for a number of in-addr.arpa | In December 2009, a secondary server for a number of in-addr.arpa | |||
subdomains saw its traffic suddenly double, and queries of type | subdomains saw its traffic suddenly double, and queries of type | |||
DNSKEY in particular increase by approximately two orders of | DNSKEY in particular increase by approximately two orders of | |||
magnitude, coinciding with a DNSSEC key rollover by the zone operator | magnitude, coinciding with a DNSSEC key rollover by the zone operator | |||
[roll-over-and-die]. This predated a signed root zone and an | [DNSSEC-ROLLOVER]. This predated a signed root zone, and an | |||
operating system vendor was providing non-root trust anchors to the | operating system vendor was providing non-root trust anchors to the | |||
recursive resolver, which became out of date following the rollover. | recursive resolver, which became out of date following the rollover. | |||
Unable to validate responses for the affected in-addr.arpa zones, | Unable to validate responses for the affected in-addr.arpa zones, | |||
recursive resolvers aggressively retried their queries. | recursive resolvers aggressively retried their queries. | |||
In 2016, the internet infrastructure company Dyn experienced a large | In 2016, the Internet infrastructure company Dyn experienced a large | |||
attack that impacted many high-profile customers. As documented in a | attack that impacted many high-profile customers. As documented in a | |||
technical presentation detailing the attack [dyn-attack], Dyn staff | technical presentation detailing the attack (see [RETRY-STORM]), Dyn | |||
wrote: "At this point we are now experiencing botnet attack traffic | staff wrote: | |||
and what is best classified as a 'retry storm'. Looking at certain | ||||
large recursive platforms > 10x normal volume." | ||||
In 2018 the root zone key signing key (KSK) was rolled over | | At this point we are now experiencing botnet attack traffic and | |||
[root-ksk-roll]. Throughout the rollover period, the root servers | | what is best classified as a "retry storm" | |||
| | ||||
| Looking at certain large recursive platforms > 10x normal volume | ||||
In 2018, the root zone Key Signing Key (KSK) was rolled over | ||||
[KSK-ROLLOVER]. Throughout the rollover period, the root servers | ||||
experienced a significant increase in DNSKEY queries. Before the | experienced a significant increase in DNSKEY queries. Before the | |||
rollover, a.root-servers.net and j.root-servers.net together received | rollover, a.root-servers.net and j.root-servers.net together received | |||
about 15 million DNSKEY queries per day. At the end of the | about 15 million DNSKEY queries per day. At the end of the | |||
revocation period, they received 1.2 billion per day -- an 80x | revocation period, they received 1.2 billion per day: an 80x | |||
increase. Removal of the revoked key from the zone caused DNSKEY | increase. Removal of the revoked key from the zone caused DNSKEY | |||
queries to drop to post-rollover but pre-revoke levels, indicating | queries to drop to post-rollover but pre-revoke levels, indicating | |||
there is still a population of recursive resolvers using the previous | there is still a population of recursive resolvers using the previous | |||
root trust anchor and aggressively retrying DNSKEY queries. | root trust anchor and aggressively retrying DNSKEY queries. | |||
In 2021, Verisign researchers used botnet query traffic to | In 2021, Verisign researchers used botnet query traffic to | |||
demonstrate that certain large, public recursive DNS services exhibit | demonstrate that certain large public recursive DNS services exhibit | |||
very high query rates when all authoritative name servers for a zone | very high query rates when all authoritative name servers for a zone | |||
return REFUSED or SERVFAIL [botnet]. When the authoritative servers | return refused (REFUSED) or server failure (SERVFAIL) responses (see | |||
were configured normally, query rates for a single botnet domain | [BOTNET]). When the authoritative servers were configured normally, | |||
averaged approximately 50 queries per second. However, with the | query rates for a single botnet domain averaged approximately 50 | |||
servers configured to return SERVFAIL, the query rate increased to | queries per second. However, with the servers configured to return | |||
60,000 per second. Furthermore, increases were also observed at the | SERVFAIL, the query rate increased to 60,000 per second. | |||
Root and TLD levels, even though delegations at those levels were | Furthermore, increases were also observed at the root and Top-Level | |||
Domain (TLD) levels, even though delegations at those levels were | ||||
unchanged and continued operating normally. | unchanged and continued operating normally. | |||
Later that same year, on October 4, Facebook experienced a widespread | Later that same year, on October 4, Facebook experienced a widespread | |||
and well-publicized outage [fb-outage]. During the 6-hour outage, | and well-publicized outage [FB-OUTAGE]. During the 6-hour outage, | |||
none of Facebook's authoritative name servers were reachable and did | none of Facebook's authoritative name servers were reachable and did | |||
not respond to queries. Recursive name servers attempting to resolve | not respond to queries. Recursive name servers attempting to resolve | |||
Facebook domains experienced timeouts. During this time, query | Facebook domains experienced timeouts. During this time, query | |||
traffic on the .COM/.NET infrastructure increased from 7,000 to | traffic on the .COM/.NET infrastructure increased from 7,000 to | |||
900,000 queries per second [fb-outage-verisign]. | 900,000 queries per second [OUTAGE-RESOLVER]. | |||
1.2. Related Work | 1.2. Related Work | |||
[RFC2308] describes negative caching for four types of DNS queries | [RFC2308] describes negative caching for four types of DNS queries | |||
and responses: Name errors, no data, server failures, and dead / | and responses: name errors, no data, server failures, and dead/ | |||
unreachable servers. It places the strongest requirements on | unreachable servers. It places the strongest requirements on | |||
negative caching for name errors and no data responses, while server | negative caching for name errors and no data responses, while server | |||
failures and dead servers are left as optional. | failures and dead servers are left as optional. | |||
[RFC4697] is a Best Current Practice that documents observed | [RFC4697] is a Best Current Practice that documents observed | |||
resolution misbehaviors. It describes a number of situations that | resolution misbehaviors. It describes a number of situations that | |||
can lead to excessive queries from recursive resolvers, including: | can lead to excessive queries from recursive resolvers, including | |||
requerying for delegation data, lame servers, responses blocked by | requerying for delegation data, lame servers, responses blocked by | |||
firewalls, and records with zero TTL. [RFC4697] makes a number of | firewalls, and records with zero TTL. [RFC4697] makes a number of | |||
recommendations, varying from "SHOULD" to "MUST." | recommendations, varying from "SHOULD" to "MUST". | |||
An expired Internet-Draft describes "The DNS thundering herd problem" | [THUNDERING-HERD] describes "The DNS thundering herd problem" as a | |||
[thundering-herd] as a situation arising when cached data expires at | situation arising when cached data expires at the same time for a | |||
the same time for a large number of users. Although that document is | large number of users. Although that document is not focused on | |||
not focused on negative caching, it does describe the benefits of | negative caching, it does describe the benefits of combining multiple | |||
combining multiple, identical queries to upstream name servers. That | identical queries to upstream name servers. That is, when a | |||
is, when a recursive resolver receives multiple queries for the same | recursive resolver receives multiple queries for the same name, | |||
name, class, and type that cannot be answered from cached data, it | class, and type that cannot be answered from cached data, it should | |||
should combine or join them into a single upstream query, rather than | combine or join them into a single upstream query rather than emit | |||
emit repeated, identical upstream queries. | repeated identical upstream queries. | |||
[RFC5452], "Measures for Making DNS More Resilient against Forged | [RFC5452], "Measures for Making DNS More Resilient against Forged | |||
Answers," includes a section that describes the phenomenon known as | Answers", includes a section that describes the phenomenon known as | |||
birthday attacks. Here, again, the problem arises when a recursive | "Birthday Attacks". Here, again, the problem arises when a recursive | |||
resolver emits multiple, identical upstream queries. Multiple | resolver emits multiple identical upstream queries. Multiple | |||
outstanding queries makes it easier for an attacker to guess and | outstanding queries make it easier for an attacker to guess and | |||
correctly match some of the DNS message parameters, such as the port | correctly match some of the DNS message parameters, such as the port | |||
number and ID field. This situation is further exacerbated in the | number and ID field. This situation is further exacerbated in the | |||
case of timeout-based resolution failures. DNSSEC, of course, is a | case of timeout-based resolution failures. Of course, DNSSEC is a | |||
suitable defense to spoofing attacks. | suitable defense to spoofing attacks. | |||
[RFC8767] describes "Serving Stale Data to Improve DNS Resiliency." | [RFC8767] describes "Serving Stale Data to Improve DNS Resiliency". | |||
This permits a recursive resolver to return possibly stale data when | This permits a recursive resolver to return possibly stale data when | |||
it is unable to refresh cached, expired data. It introduces the idea | it is unable to refresh cached, expired data. It introduces the idea | |||
of a failure recheck timer and says: "Attempts to refresh from non- | of a failure recheck timer and says: | |||
responsive or otherwise failing authoritative nameservers are | ||||
recommended to be done no more frequently than every 30 seconds." | | Attempts to refresh from non-responsive or otherwise failing | |||
| authoritative nameservers are recommended to be done no more | ||||
| frequently than every 30 seconds. | ||||
1.3. Terminology | 1.3. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
* DNS Transport: In this document, DNS transport means a protocol | DNS transport: In this document, "DNS transport" means a protocol | |||
used to transport DNS messages between a client and a server. | used to transport DNS messages between a client and a server. | |||
This includes "classic DNS" transports, i.e., DNS-over-UDP and | This includes "classic DNS" transports, i.e., DNS-over-UDP and | |||
DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS | DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS | |||
transports such as DNS-over-TLS [RFC7858], DNS-over-HTTPS | transports, such as DNS-over-TLS [RFC7858], DNS-over-HTTPS | |||
[RFC8484], DNS-over-QUIC [RFC9250], and similar communication of | [RFC8484], DNS-over-QUIC [RFC9250], and similar communication of | |||
DNS messages using other protocols. NOTE: at the time of this | DNS messages using other protocols. Note: at the time of writing, | |||
writing not all DNS transports are standardized for all types of | not all DNS transports are standardized for all types of servers | |||
servers, but may become standardized in the future. | but may become standardized in the future. | |||
2. Conditions That Lead to DNS Resolution Failures | 2. Conditions That Lead to DNS Resolution Failures | |||
A DNS resolution failure occurs when none of the servers available to | A DNS resolution failure occurs when none of the servers available to | |||
a resolver client provide any useful response data for a particular | a resolver client provide any useful response data for a particular | |||
query name, type, and class. A response is considered useful when it | query name, type, and class. A response is considered useful when it | |||
provides either the requested data, a referral to a descendant zone, | provides either the requested data, a referral to a descendant zone, | |||
or an indication that no data exists at the given name. | or an indication that no data exists at the given name. | |||
It is common for resolvers to have multiple servers from which to | It is common for resolvers to have multiple servers from which to | |||
choose for a particular query. For example, in the case of stub-to- | choose for a particular query. For example, in the case of stub-to- | |||
recursive, the stub resolver may be configured with multiple | recursive, the stub resolver may be configured with multiple | |||
recursive resolver addresses. In the case of recursive-to- | recursive resolver addresses. In the case of recursive-to- | |||
authoritative, a given zone usually has more than one name server (NS | authoritative, a given zone usually has more than one name server (NS | |||
record), each of which can have multiple IP addresses and multiple | record), each of which can have multiple IP addresses and multiple | |||
DNS transports. | DNS transports. | |||
Nothing in this document prevents a resolver from retrying a query at | Nothing in this document prevents a resolver from retrying a query at | |||
a different server, or the same server over a different DNS | a different server or the same server over a different DNS transport. | |||
transport. In the case of timeouts, a resolver can retry the same | In the case of timeouts, a resolver can retry the same server and DNS | |||
server and DNS transport a limited number of times. | transport a limited number of times. | |||
If any one of the available servers provides a useful response, then | If any one of the available servers provides a useful response, then | |||
it is not considered a resolution failure. However, if none of the | it is not considered a resolution failure. However, if none of the | |||
servers for a given query tuple <name, type, class> provide a useful | servers for a given query tuple <name, type, class> provide a useful | |||
response, the result is a resolution failure. | response, the result is a resolution failure. | |||
Note that NXDOMAIN and NOERROR/NODATA responses are not conditions | Note that NXDOMAIN and NOERROR/NODATA responses are not conditions | |||
for resolution failure. In these cases, the server is providing a | for resolution failure. In these cases, the server is providing a | |||
useful response, either indicating that a name does not exist, or | useful response, indicating either that a name does not exist or that | |||
that no data of the requested type exists at the name. These | no data of the requested type exists at the name. These negative | |||
negative responses can be cached as described in [RFC2308]. | responses can be cached as described in [RFC2308]. | |||
The remainder of this section describes a number of different | The remainder of this section describes a number of different | |||
conditions that can lead to resolution failure. This section is not | conditions that can lead to resolution failure. This section is not | |||
exhaustive. Additional conditions may be expected to cause similar | exhaustive. Additional conditions may be expected to cause similar | |||
resolution failures. | resolution failures. | |||
2.1. SERVFAIL Responses | 2.1. SERVFAIL Responses | |||
Server failure is defined in [RFC1035] as "The name server was unable | Server failure is defined in [RFC1035] as: "The name server was | |||
to process this query due to a problem with the name server." A | unable to process this query due to a problem with the name server." | |||
server failure is signaled by setting the RCODE field to SERVFAIL. | A server failure is signaled by setting the RCODE field to SERVFAIL. | |||
Authoritative servers return SERVFAIL when they don't have any valid | Authoritative servers return SERVFAIL when they don't have any valid | |||
data for a zone. For example, a secondary server has been configured | data for a zone. For example, a secondary server has been configured | |||
to serve a particular zone, but is unable to retrieve or refresh the | to serve a particular zone but is unable to retrieve or refresh the | |||
zone data from the primary server. | zone data from the primary server. | |||
Recursive servers return SERVFAIL in response to a number of | Recursive servers return SERVFAIL in response to a number of | |||
different conditions, including many described below. | different conditions, including many described below. | |||
Although the extended DNS errors method exists "primarily to extend | Although the extended DNS errors method exists "primarily to extend | |||
SERVFAIL to provide additional information," it "does not change the | SERVFAIL to provide additional information," it "does not change the | |||
processing of RCODEs" [RFC8914]. This document operates at the level | processing of RCODEs" [RFC8914]. This document operates at the level | |||
of resolution failure and does not concern particular causes. | of resolution failure and does not concern particular causes. | |||
2.2. REFUSED Responses | 2.2. REFUSED Responses | |||
A name server returns a message with the RCODE field set to REFUSED | A name server returns a message with the RCODE field set to REFUSED | |||
when it refuses to process the query, e.g., for policy or other | when it refuses to process the query, e.g., for policy or other | |||
reasons [RFC1035]. | reasons [RFC1035]. | |||
Authoritative servers generally return REFUSED when processing a | Authoritative servers generally return REFUSED when processing a | |||
query for which they are not authoritative. For example, a server | query for which they are not authoritative. For example, a server | |||
that is configured to be authoritative for only the example.net zone, | that is configured to be authoritative for only the example.net zone | |||
may return REFUSED in response to a query for example.com. | may return REFUSED in response to a query for example.com. | |||
Recursive servers generally return REFUSED for query sources that do | Recursive servers generally return REFUSED for query sources that do | |||
not match configured access control lists. For example, a server | not match configured access control lists. For example, a server | |||
that is configured to allow queries from only 2001:db8:1::/48 may | that is configured to allow queries from only 2001:db8:1::/48 may | |||
return REFUSED in response to a query from 2001:db8:5::1. | return REFUSED in response to a query from 2001:db8:5::1. | |||
2.3. Timeouts and Unreachable Servers | 2.3. Timeouts and Unreachable Servers | |||
A timeout occurs when a resolver fails to receive any response from a | A timeout occurs when a resolver fails to receive any response from a | |||
server within a reasonable amount of time. Additionally, a DNS | server within a reasonable amount of time. Additionally, a DNS | |||
transport may more quickly indicate lack of reachability in a way | transport may more quickly indicate lack of reachability in a way | |||
that wouldn't be considered a timeout. For example: an ICMP port | that wouldn't be considered a timeout: for example, an ICMP port | |||
unreachable message, a TCP "connection refused" error, or a TLS | unreachable message, a TCP "connection refused" error, or a TLS | |||
handshake failure. [RFC2308] refers to these conditions collectively | handshake failure. [RFC2308] refers to these conditions collectively | |||
as "dead / unreachable servers." | as "dead / unreachable servers". | |||
Note that resolver implementations may have two types of timeouts: a | Note that resolver implementations may have two types of timeouts: a | |||
smaller timeout which might trigger a query retry and a larger | smaller timeout that might trigger a query retry and a larger timeout | |||
timeout after which the server is considered unresponsive. | after which the server is considered unresponsive. Section 3.1 | |||
Section 3.1 discusses the requirements for resolvers when retrying | discusses the requirements for resolvers when retrying queries. | |||
queries. | ||||
Timeouts can present a particular problem for negative caching, | Timeouts can present a particular problem for negative caching, | |||
depending on how the resolver handles multiple, outstanding queries | depending on how the resolver handles multiple outstanding queries | |||
for the same <query name, type, class> tuple. For example, consider | for the same <query name, type, class> tuple. For example, consider | |||
a very popular website in a zone whose name servers are all | a very popular website in a zone whose name servers are all | |||
unresponsive. A recursive resolver might receive tens or hundreds of | unresponsive. A recursive resolver might receive tens or hundreds of | |||
queries per second for the popular website. If the recursive server | queries per second for that website. If the recursive server | |||
implementation "joins" these outstanding queries together, then it | implementation joins these outstanding queries together, then it only | |||
only sends one recursive-to-authoritative query for the numerous | sends one recursive-to-authoritative query for the numerous pending | |||
pending stub-to-recursive queries. If, however, the implementation | stub-to-recursive queries. However, if the implementation does not | |||
does not join outstanding queries together, then it sends one | join outstanding queries together, then it sends one recursive-to- | |||
recursive-to-authoritative query for each stub-to-recursive query. | authoritative query for each stub-to-recursive query. If the | |||
If the incoming query rate is high and the timeout is large, this | incoming query rate is high and the timeout is large, this might | |||
might result in hundreds or thousands of recursive-to-authoritative | result in hundreds or thousands of recursive-to-authoritative queries | |||
queries while waiting for an authoritative server to time out. | while waiting for an authoritative server to time out. | |||
A recursive resolver that does not join outstanding queries together | A recursive resolver that does not join outstanding queries together | |||
is more susceptible to birthday attacks ([RFC5452] Section 5), | is more susceptible to Birthday Attacks ([RFC5452], Section 5), | |||
especially when those queries result in timeouts. | especially when those queries result in timeouts. | |||
2.4. Delegation Loops | 2.4. Delegation Loops | |||
A delegation loop, or cycle, can occur when one domain utilizes name | A delegation loop, or cycle, can occur when one domain utilizes name | |||
servers in a second domain, and the second domain uses name servers | servers in a second domain, and the second domain uses name servers | |||
in the first. For example: | in the first. For example: | |||
FOO.EXAMPLE. NS NS1.EXAMPLE.COM. | FOO.EXAMPLE. NS NS1.EXAMPLE.COM. | |||
FOO.EXAMPLE. NS NS2.EXAMPLE.COM. | FOO.EXAMPLE. NS NS2.EXAMPLE.COM. | |||
skipping to change at page 9, line 15 ¶ | skipping to change at line 373 ¶ | |||
In this example, no names under foo.example or example.com can be | In this example, no names under foo.example or example.com can be | |||
resolved because of the delegation loop. Note that a delegation loop | resolved because of the delegation loop. Note that a delegation loop | |||
may involve more than two domains. A resolver that does not detect | may involve more than two domains. A resolver that does not detect | |||
delegation loops may generate DDoS-levels of attack traffic to | delegation loops may generate DDoS-levels of attack traffic to | |||
authoritative name servers, as documented in the TsuNAME | authoritative name servers, as documented in the TsuNAME | |||
vulnerability [TsuNAME]. | vulnerability [TsuNAME]. | |||
2.5. Alias Loops | 2.5. Alias Loops | |||
An alias loop, or cycle, can occur when one CNAME or DNAME RR refers | An alias loop, or cycle, can occur when one CNAME or DNAME RR refers | |||
to a second name, which in turn is specified as an alias for the | to a second name, which, in turn, is specified as an alias for the | |||
first. For example: | first. For example: | |||
APP.FOO.EXAMPLE. CNAME APP.EXAMPLE.NET. | APP.FOO.EXAMPLE. CNAME APP.EXAMPLE.NET. | |||
APP.EXAMPLE.NET. CNAME APP.FOO.EXAMPLE. | APP.EXAMPLE.NET. CNAME APP.FOO.EXAMPLE. | |||
The need to detect CNAME loops has been known since at least | The need to detect CNAME loops has been known since at least | |||
[RFC1034] which states in Section 3.6.2: | [RFC1034], which states in Section 3.6.2: | |||
"Of course, by the robustness principle, domain software should not | | Of course, by the robustness principle, domain software should not | |||
fail when presented with CNAME chains or loops; CNAME chains should | | fail when presented with CNAME chains or loops; CNAME chains | |||
be followed and CNAME loops signaled as an error." | | should be followed and CNAME loops signalled as an error. | |||
2.6. DNSSEC Validation Failures | 2.6. DNSSEC Validation Failures | |||
For zones that are signed with DNSSEC, a resolution failure can occur | For zones that are signed with DNSSEC, a resolution failure can occur | |||
when a security-aware resolver believes it should be able to | when a security-aware resolver believes it should be able to | |||
establish a chain-of-trust for an RRset but is unable to do so, | establish a chain of trust for an RRset but is unable to do so, | |||
possibly after trying multiple authoritative name servers. DNSSEC | possibly after trying multiple authoritative name servers. DNSSEC | |||
validation failures may be due to signature mismatch, missing DNSKEY | validation failures may be due to signature mismatch, missing DNSKEY | |||
RRs, problems with denial-of-existence records, clock skew, or other | RRs, problems with denial-of-existence records, clock skew, or other | |||
reasons. | reasons. | |||
Section 4.7 of [RFC4035] already discusses the requirements and | Section 4.7 of [RFC4035] already discusses the requirements and | |||
reasons for caching validation failures. Section 3.4 of this | reasons for caching validation failures. Section 3.4 of this | |||
document strengthens those requirements. | document strengthens those requirements. | |||
2.7. FORMERR Responses | 2.7. FORMERR Responses | |||
A name server returns a message with the RCODE field set to FORMERR | A name server returns a message with the RCODE field set to FORMERR | |||
when it is unable to interpret the query [RFC1035]. FORMERR | when it is unable to interpret the query [RFC1035]. FORMERR | |||
responses are often associated with problems processing EDNS(0) | responses are often associated with problems processing Extension | |||
Extensions [RFC6891]. Authoritative servers may return FORMERR when | Mechanisms for DNS (EDNS(0)) [RFC6891]. Authoritative servers may | |||
they do not implement EDNS(0), or when EDNS(0) option fields are | return FORMERR when they do not implement EDNS(0), or when EDNS(0) | |||
malformed, but not for unknown EDNS(0) options. | option fields are malformed, but not for unknown EDNS(0) options. | |||
Upon receipt of a FORMERR response, some recursive clients will retry | Upon receipt of a FORMERR response, some recursive clients will retry | |||
their queries without EDNS(0), while others will not. Nonetheless, | their queries without EDNS(0), while others will not. Nonetheless, | |||
resolution failures from FORMERR responses are rare. | resolution failures from FORMERR responses are rare. | |||
3. Requirements for Caching DNS Resolution Failures | 3. Requirements for Caching DNS Resolution Failures | |||
3.1. Retries and Timeouts | 3.1. Retries and Timeouts | |||
A resolver MUST NOT retry a given query to a server address over a | A resolver MUST NOT retry a given query to a server address over a | |||
given DNS transport more than twice (i.e., three queries in total) | given DNS transport more than twice (i.e., three queries in total) | |||
before considering the server address unresponsive over that DNS | before considering the server address unresponsive over that DNS | |||
transport for that query. | transport for that query. | |||
A resolver MAY retry a given query over a different DNS transport to | A resolver MAY retry a given query over a different DNS transport to | |||
the same server if it has reason to believe the DNS transport is | the same server if it has reason to believe the DNS transport is | |||
available for that server and is compatible with the resolver's | available for that server and is compatible with the resolver's | |||
security policies. | security policies. | |||
This document does not place any requirements on how long an | This document does not place any requirements on how long an | |||
implementation should wait before retrying a query (aka timeout | implementation should wait before retrying a query (aka a timeout | |||
value), which may be implementation- or configuration-dependent. It | value), which may be implementation or configuration dependent. It | |||
is generally expected that typical timeout values range from 3 to 30 | is generally expected that typical timeout values range from 3 to 30 | |||
seconds. | seconds. | |||
3.2. Caching | 3.2. Caching | |||
Resolvers MUST implement a cache for resolution failures. The | Resolvers MUST implement a cache for resolution failures. The | |||
purpose of this cache is to eliminate repeated upstream queries that | purpose of this cache is to eliminate repeated upstream queries that | |||
cannot be resolved. When an incoming query matches a cached | cannot be resolved. When an incoming query matches a cached | |||
resolution failure, the resolver MUST NOT send any corresponding | resolution failure, the resolver MUST NOT send any corresponding | |||
outgoing queries until after the cache entries expire. | outgoing queries until after the cache entries expire. | |||
skipping to change at page 11, line 7 ¶ | skipping to change at line 457 ¶ | |||
implementation choices so that operators know what behaviors to | implementation choices so that operators know what behaviors to | |||
expect when resolution failures are cached. | expect when resolution failures are cached. | |||
Resolvers MUST cache resolution failures for at least 1 second. | Resolvers MUST cache resolution failures for at least 1 second. | |||
Resolvers MAY cache different types of resolution failures for | Resolvers MAY cache different types of resolution failures for | |||
different (i.e., longer) amounts of time. Consistent with [RFC2308], | different (i.e., longer) amounts of time. Consistent with [RFC2308], | |||
resolution failures MUST NOT be cached for longer than 5 minutes. | resolution failures MUST NOT be cached for longer than 5 minutes. | |||
The minimum cache duration SHOULD be configurable by the operator. A | The minimum cache duration SHOULD be configurable by the operator. A | |||
longer cache duration for resolution failures will reduce the | longer cache duration for resolution failures will reduce the | |||
processing burden from repeated queries, but may also increase the | processing burden from repeated queries but may also increase the | |||
time to recover from transitory issues. | time to recover from transitory issues. | |||
Resolvers SHOULD employ an exponential or linear backoff algorithm to | Resolvers SHOULD employ an exponential or linear backoff algorithm to | |||
increase the cache duration for persistent resolution failures. For | increase the cache duration for persistent resolution failures. For | |||
example, the initial time for negatively caching a resolution failure | example, the initial time for negatively caching a resolution failure | |||
might be set to 5 seconds, and increased after each retry that | might be set to 5 seconds and increased after each retry that results | |||
results in another resolution failure, up to a configurable maximum, | in another resolution failure, up to a configurable maximum, not to | |||
not to exceed the 5-minute upper limit. | exceed the 5-minute upper limit. | |||
Notwithstanding the above, resolvers SHOULD implement measures to | Notwithstanding the above, resolvers SHOULD implement measures to | |||
mitigate resource exhaustion attacks on the failed resolution cache. | mitigate resource exhaustion attacks on the failed resolution cache. | |||
That is, the resolver should limit the amount of memory and/or | That is, the resolver should limit the amount of memory and/or | |||
processing time devoted to this cache. | processing time devoted to this cache. | |||
3.3. Requerying Delegation Information | 3.3. Requerying Delegation Information | |||
Section 2.1 of [RFC4697] identifies circumstances in which "every | Section 2.1 of [RFC4697] identifies circumstances in which: | |||
name server in a zone's NS RRSet is unreachable (e.g., during a | ||||
network outage), unavailable (e.g., the name server process is not | | ...every name server in a zone's NS RRSet is unreachable (e.g., | |||
running on the server host), or misconfigured (e.g., the name server | | during a network outage), unavailable (e.g., the name server | |||
is not authoritative for the given zone, also known as 'lame')." It | | process is not running on the server host), or misconfigured | |||
prohibits unnecessary "aggressive requerying" to the parent of a non- | | (e.g., the name server is not authoritative for the given zone, | |||
responsive zone by sending NS queries. | | also known as "lame"). | |||
It prohibits unnecessary "aggressive requerying" to the parent of a | ||||
non-responsive zone by sending NS queries. | ||||
The problem of aggressive requerying to parent zones is not limited | The problem of aggressive requerying to parent zones is not limited | |||
to queries of type NS. This document updates the requirement from | to queries of type NS. This document updates the requirement from | |||
section 2.1.1 of [RFC4697] to apply more generally: Upon encountering | Section 2.1.1 of [RFC4697] to apply more generally: | |||
a zone whose name servers are all non-responsive, a resolver MUST | ||||
cache the resolution failure. Furthermore, the resolver MUST limit | | Upon encountering a zone whose name servers are all non- | |||
queries to the non-responsive zone's parent zone (and to other | | responsive, a resolver MUST cache the resolution failure. | |||
ancestor zones) just as it would limit subsequent queries to the non- | | Furthermore, the resolver MUST limit queries to the non-responsive | |||
responsive zone. | | zone's parent zone (and to other ancestor zones) just as it would | |||
| limit subsequent queries to the non-responsive zone. | ||||
3.4. DNSSEC Validation Failures | 3.4. DNSSEC Validation Failures | |||
Section 4.7 of [RFC4035] states: | Section 4.7 of [RFC4035] states: | |||
To prevent such unnecessary DNS traffic, security-aware resolvers MAY | | To prevent such unnecessary DNS traffic, security-aware resolvers | |||
cache data with invalid signatures, with some restrictions. | | MAY cache data with invalid signatures, with some restrictions. | |||
This document updates [RFC4035] with the following, stronger | This document updates [RFC4035] with the following, stronger, | |||
requirement: | requirement: | |||
To prevent such unnecessary DNS traffic, security-aware resolvers | | To prevent such unnecessary DNS traffic, security-aware resolvers | |||
MUST cache DNSSEC validation failures, with some restrictions. | | MUST cache DNSSEC validation failures, with some restrictions. | |||
One of the restrictions mentioned in [RFC4035] is to use a small TTL | One of the restrictions mentioned in [RFC4035] is to use a small TTL | |||
when caching data that fails DNSSEC validation. This is, in part, | when caching data that fails DNSSEC validation. This is, in part, | |||
because the provided TTL cannot be trusted. The advice from | because the provided TTL cannot be trusted. The advice from | |||
Section 3.2 herein can be used as guidance on TTLs for caching DNSSEC | Section 3.2 herein can be used as guidance on TTLs for caching DNSSEC | |||
validation failures. | validation failures. | |||
4. IANA Considerations | 4. IANA Considerations | |||
This document has no IANA actions. | This document has no IANA actions. | |||
5. Security Considerations | 5. Security Considerations | |||
As noted in Section 3.2, an attacker might attempt a resource | As noted in Section 3.2, an attacker might attempt a resource | |||
exhaustion attack by sending queries for a large number of names and/ | exhaustion attack by sending queries for a large number of names and/ | |||
or types that result in resolution failure. Resolvers SHOULD | or types that result in resolution failure. Resolvers SHOULD | |||
implement measures to protect themselves and bound the amount of | implement measures to protect themselves and bound the amount of | |||
memory devoted to caching resolution failures. | memory devoted to caching resolution failures. | |||
A cache poisoning attack (see section 2.2 of [RFC7873]) resulting in | A cache poisoning attack (see Section 2.2 of [RFC7873]) resulting in | |||
denial of service may be possible because failure messages cannot be | denial of service may be possible because failure messages cannot be | |||
signed. An attacker might generate queries and send forged failure | signed. An attacker might generate queries and send forged failure | |||
messages, causing the resolver to cease sending queries to the | messages, causing the resolver to cease sending queries to the | |||
authoritative name server (see 2.6 of [RFC4732] for a similar "data | authoritative name server (see Section 2.6 of [RFC4732] for a similar | |||
corruption attack"). However, this would require continued spoofing | "data corruption attack" and Section 5.2 of [TuDoor] for a "DNSDoS | |||
throughout the backoff period and required attacks due to the 5 | attack"). However, this would require continued spoofing throughout | |||
minute cache limit. As in section 4.1.12 of [RFC4686], this attack's | the backoff period and repeated attacks due to the 5-minute cache | |||
effects would be "localized and of limited duration." | limit. As in Section 4.1.12 of [RFC4686], this attack's effects | |||
would be "localized and of limited duration". | ||||
6. Privacy Considerations | 6. Privacy Considerations | |||
This specification has no impact on user privacy. | This specification has no impact on user privacy. | |||
7. Acknowledgments | 7. References | |||
The authors wish to thank Mukund Sivaraman, Petr Spacek, Peter van | ||||
Dijk, Tim Wicinksi, Joe Abley, Evan Hunt, Barry Leiba, Lucas Pardue, | ||||
Paul Wouters, and other members of the DNSOP working group for their | ||||
feedback and contributions. | ||||
8. Change Log | ||||
RFC Editor: Please remove this section before publication. | ||||
This section lists substantial changes to the document as it is being | ||||
worked on. | ||||
From -00 to -01: | ||||
* use phrase "the initial TTL for negatively caching a resolution | ||||
failure" instead of "negative cache TTL" | ||||
* typos, etc | ||||
From dwmtwc-01 to ietf-00: | ||||
* Adopted by WG | ||||
From -00 to -01: | ||||
* Clarify retries and timeouts to apply on a per-query basis. | ||||
* Say more about the 5 second caching requirement in TTLs section. | ||||
* Expanded opening paragraphs of section 2, now titled "Conditions | ||||
That Lead To DNS Resolution Failures". | ||||
* Text from the former section 3.3 ("Scope") moved to top of section | ||||
2. | ||||
* Section 3.2 was formerly "TTLs" and is now "Caching". The draft | ||||
no longer requires e.g. caching by tuples, but now just requires | ||||
caching failures so that repeated queries are not sent out. | ||||
* State that resolvers should protect themselves from cache resource | ||||
exhaustion attacks. | ||||
From -01 to -02: | ||||
* Added cache poisoning attack to Security Considerations. | ||||
From -02 to -03: | ||||
* Added missing reference to Verisign blog post. | ||||
From -03 to -04: | ||||
* Address most of Peter van Dijk's DNS Directorate review comments. | ||||
* Removed "For Discussion" section from introduction referencing | ||||
apparent inconsistent RFC2119 keyword use in RFC2308. | ||||
* Replaced "For Discussion" section from "Requerying Delegation | ||||
Information" to generalize RFC 4697 requirements not to requery | ||||
parent zones to cover all query types. | ||||
* Replaced "For Discussion" section from "DNSSEC Validation | ||||
Failures" to strengthen RFC 4035 to require caching of DNSSEC | ||||
validation failures. | ||||
* Added RFC 4035 and RFC 4697 to updated RFCs list. | ||||
* Added (empty) Implementation Status section. | ||||
From -04 to -05: | ||||
* Expanded abstract to include updates to RFCs 4035 and 4697. | ||||
* Removed reference to unused terms from RFC 8126. | ||||
* Reworded "server transport" to "a server address over a given | ||||
transport". | ||||
* Added explanatory text in "Server Failure" section for exclusion | ||||
of extended DNS errors | ||||
* Changed "Timeouts" section to "Timeouts and Unreachable Servers" | ||||
and added reference to transport layer indicators from RFC 2308. | ||||
* Clarified meaning of "timeout value". | ||||
From -05 to -06: | ||||
* Changed minimum 5 second caching to 1 second, with other changes | ||||
to give implementors and operators more leeway. | ||||
* Changed "exponential backoff" to more general concept of | ||||
increasing backoff. | ||||
* Added some implementation status notes for BIND, from dnsop list | ||||
email. | ||||
From -06 to -07: | ||||
* Artart review: minor editorial clarifications | ||||
* Genart review: remove confusing and superfluous section | ||||
references. | ||||
* Genart review: clarify resolution failure caching time range. | ||||
* Genart review: better define DNS transports | ||||
* Dnsdir review: clarify FORMERR response retries. | ||||
From -07 to -08: | ||||
* "only exacerbated" -> "further exacerbated" | ||||
* lowercase IPv6 addresses | ||||
* lowercase example domain in text | ||||
* updated introduction to include all updated RFCs | ||||
* change 3.2 SHOULD to should | ||||
* section 3.4: say a little about "some restrictions" from RFC 4035 | ||||
* Intdir telechat review: a few grammatical nits | ||||
* Various IESG reviewer suggestions | ||||
9. Implementation Status | ||||
RFC Editor: Please remove this section before publication. | ||||
This section records the status of known implementations of the | ||||
protocol defined by this specification at the time of posting of this | ||||
Internet-Draft, and is based on a proposal described in RFC 7942. | ||||
The description of implementations in this section is intended to | ||||
assist the IETF in its decision processes in progressing drafts to | ||||
RFCs. Please note that the listing of any individual implementation | ||||
here does not imply endorsement by the IETF. Furthermore, no effort | ||||
has been spent to verify the information presented here that was | ||||
supplied by IETF contributors. This is not intended as, and must not | ||||
be construed to be, a catalog of available implementations or their | ||||
features. Readers are advised to note that other implementations may | ||||
exist. | ||||
9.1. BIND | ||||
The following is excerpted from a message to the dnsop mailing list | ||||
regarding how BIND caches resolution failures: | ||||
BIND implemented a SERVFAIL cache in 2014 with a default cache | ||||
duration of 10 seconds; after a slew of complaints, in 2015 we | ||||
lowered it to 1 second, and also reduced the configurable maximum | ||||
from 5 minutes to 30 seconds. The reason was that certain common | ||||
failure conditions are transitory, and it's not unreasonable to | ||||
prioritize rapid recovery. | ||||
Now, to be clear, the comparison isn't exactly apples to apples: the | ||||
BIND SERVFAIL cache is a somewhat stupider mechanism than the one | ||||
outlined in the draft. It caches *all* SERVFAIL responses, | ||||
regardless of the reason they were generated. For example: when the | ||||
cache is cold, a query may time out or hit DDoS mitigation limits | ||||
before it's finished getting through the whole iteration process; an | ||||
immediate retry would start further along the delegation chain and | ||||
would succeed. Such problems weren't noticeable until we implemented | ||||
the 10-second cache, but became very noticeable afterward. | ||||
If we were able to selectively cache *only* those SERVFAILs that are | ||||
unlikely to recover soon, then five seconds might indeed be a good | ||||
starting point. But, with our relatively dumb cache, we found that | ||||
one second did a fairly good job reducing the processing burden from | ||||
repeated queries, and eliminated the user complaints about the | ||||
resolver taking forever to recover from short-lived problems. It's | ||||
been working well enough that it hasn't been a priority to develop a | ||||
more complex failure cache. | ||||
10. References | ||||
10.1. Normative References | 7.1. Normative References | |||
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", | |||
STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, | STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, | |||
<https://www.rfc-editor.org/info/rfc1034>. | <https://www.rfc-editor.org/info/rfc1034>. | |||
[RFC1035] Mockapetris, P., "Domain names - implementation and | [RFC1035] Mockapetris, P., "Domain names - implementation and | |||
specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, | specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, | |||
November 1987, <https://www.rfc-editor.org/info/rfc1035>. | November 1987, <https://www.rfc-editor.org/info/rfc1035>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
skipping to change at page 17, line 9 ¶ | skipping to change at line 575 ¶ | |||
<https://www.rfc-editor.org/info/rfc4035>. | <https://www.rfc-editor.org/info/rfc4035>. | |||
[RFC4697] Larson, M. and P. Barber, "Observed DNS Resolution | [RFC4697] Larson, M. and P. Barber, "Observed DNS Resolution | |||
Misbehavior", BCP 123, RFC 4697, DOI 10.17487/RFC4697, | Misbehavior", BCP 123, RFC 4697, DOI 10.17487/RFC4697, | |||
October 2006, <https://www.rfc-editor.org/info/rfc4697>. | October 2006, <https://www.rfc-editor.org/info/rfc4697>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
10.2. Informative References | 7.2. Informative References | |||
[botnet] Wessels, D. and M. Thomas, "Botnet Traffic Observed at | [BOTNET] Wessels, D. and M. Thomas, "Botnet Traffic Observed at | |||
Various Levels of the DNS Hierarchy", May 2021, | Various Levels of the DNS Hierarchy", May 2021, | |||
<https://indico.dns-oarc.net/event/38/contributions/841/>. | <https://indico.dns-oarc.net/event/38/contributions/841/>. | |||
[dyn-attack] | [DNSSEC-ROLLOVER] | |||
Sullivan, A., "Dyn, DDoS, and DNS", March 2017, | Michaleson, G., Wallström, P., Arends, R., and G. Huston, | |||
<https://ccnso.icann.org/sites/default/files/file/field- | "Roll Over and Die?", February 2010, | |||
file-attach/2017-04/presentation-oracle-dyn-ddos-dns- | <https://www.potaroo.net/ispcol/2010-02/rollover.html>. | |||
13mar17-en.pdf>. | ||||
[fb-outage] | [FB-OUTAGE] | |||
Janardhan, S., "More details about the October 4 outage", | Janardhan, S., "More details about the October 4 outage", | |||
October 2021, <https://engineering.fb.com/2021/10/05/ | October 2021, <https://engineering.fb.com/2021/10/05/ | |||
networking-traffic/outage-details/>. | networking-traffic/outage-details/>. | |||
[fb-outage-verisign] | [KSK-ROLLOVER] | |||
Müller, M., Thomas, M., Wessels, D., Hardaker, W., Chung, | ||||
T., Toorop, W., and R. van Rijswijk-Deij, "Roll, Roll, | ||||
Roll Your Root: A Comprehensive Analysis of the First Ever | ||||
DNSSEC Root KSK Rollover", IMC '19: Proceedings of the | ||||
Internet Measurement Conference, Pages 1-14, | ||||
DOI 10.1145/3355369.3355570, October 2019, | ||||
<https://doi.org/10.1145/3355369.3355570>. | ||||
[OUTAGE-RESOLVER] | ||||
Verisign, "Observations on Resolver Behavior During DNS | Verisign, "Observations on Resolver Behavior During DNS | |||
Outages", 20 January 2022, | Outages", January 2022, | |||
<https://blog.verisign.com/security/facebook-dns-outage/>. | <https://blog.verisign.com/security/facebook-dns-outage/>. | |||
[RETRY-STORM] | ||||
Sullivan, A., "Dyn, DDoS, and DNS", March 2017, | ||||
<https://ccnso.icann.org/sites/default/files/file/field- | ||||
file-attach/2017-04/presentation-oracle-dyn-ddos-dns- | ||||
13mar17-en.pdf>. | ||||
[RFC0882] Mockapetris, P., "Domain names: Concepts and facilities", | [RFC0882] Mockapetris, P., "Domain names: Concepts and facilities", | |||
RFC 882, DOI 10.17487/RFC0882, November 1983, | RFC 882, DOI 10.17487/RFC0882, November 1983, | |||
<https://www.rfc-editor.org/info/rfc882>. | <https://www.rfc-editor.org/info/rfc882>. | |||
[RFC0883] Mockapetris, P., "Domain names: Implementation | [RFC0883] Mockapetris, P., "Domain names: Implementation | |||
specification", RFC 883, DOI 10.17487/RFC0883, November | specification", RFC 883, DOI 10.17487/RFC0883, November | |||
1983, <https://www.rfc-editor.org/info/rfc883>. | 1983, <https://www.rfc-editor.org/info/rfc883>. | |||
[RFC4686] Fenton, J., "Analysis of Threats Motivating DomainKeys | [RFC4686] Fenton, J., "Analysis of Threats Motivating DomainKeys | |||
Identified Mail (DKIM)", RFC 4686, DOI 10.17487/RFC4686, | Identified Mail (DKIM)", RFC 4686, DOI 10.17487/RFC4686, | |||
skipping to change at page 18, line 43 ¶ | skipping to change at line 671 ¶ | |||
[RFC8914] Kumari, W., Hunt, E., Arends, R., Hardaker, W., and D. | [RFC8914] Kumari, W., Hunt, E., Arends, R., Hardaker, W., and D. | |||
Lawrence, "Extended DNS Errors", RFC 8914, | Lawrence, "Extended DNS Errors", RFC 8914, | |||
DOI 10.17487/RFC8914, October 2020, | DOI 10.17487/RFC8914, October 2020, | |||
<https://www.rfc-editor.org/info/rfc8914>. | <https://www.rfc-editor.org/info/rfc8914>. | |||
[RFC9250] Huitema, C., Dickinson, S., and A. Mankin, "DNS over | [RFC9250] Huitema, C., Dickinson, S., and A. Mankin, "DNS over | |||
Dedicated QUIC Connections", RFC 9250, | Dedicated QUIC Connections", RFC 9250, | |||
DOI 10.17487/RFC9250, May 2022, | DOI 10.17487/RFC9250, May 2022, | |||
<https://www.rfc-editor.org/info/rfc9250>. | <https://www.rfc-editor.org/info/rfc9250>. | |||
[roll-over-and-die] | [THUNDERING-HERD] | |||
Michaleson, G., Wallström, P., Arends, R., and G. Huston, | Sivaraman, M. and C. Liu, "The DNS thundering herd | |||
"Roll Over and Die?", February 2010, | problem", Work in Progress, Internet-Draft, draft-muks- | |||
<https://www.potaroo.net/ispcol/2010-02/rollover.html>. | dnsop-dns-thundering-herd-00, 25 June 2020, | |||
<https://datatracker.ietf.org/doc/html/draft-muks-dnsop- | ||||
[root-ksk-roll] | dns-thundering-herd-00>. | |||
Müller, M., Thomas, M., Wessels, D., Hardaker, W., Chung, | ||||
T., Toorop, W., and R.v. Rijswijk-Deij, "Roll, Roll, Roll | ||||
Your Root: A Comprehensive Analysis of the First Ever | ||||
DNSSEC Root KSK Rollover", October 2019, | ||||
<https://dl.acm.org/doi/10.1145/3355369.3355570>. | ||||
[thundering-herd] | ||||
Sivaraman, M. and C. Liu, "The DNS thundering herd problem | ||||
(expired Internet-Draft)", June 2020, | ||||
<https://datatracker.ietf.org/doc/draft-muks-dnsop-dns- | ||||
thundering-herd/>. | ||||
[TsuNAME] Moura, G. C. M., Castro, S., Heidemann, J., and W. | [TsuNAME] Moura, G. C. M., Castro, S., Heidemann, J., and W. | |||
Hardaker, "TsuNAME: exploiting misconfiguration and | Hardaker, "TsuNAME: exploiting misconfiguration and | |||
vulnerability to DDoS DNS", November 2021, | vulnerability to DDoS DNS", IMC '21: Proceedings of the | |||
<https://dl.acm.org/doi/10.1145/3487552.3487824>. | 21st ACM Internet Measurement Conference, Pages 398-418, | |||
DOI 10.1145/3487552.3487824, November 2021, | ||||
<https://doi.org/10.1145/3487552.3487824>. | ||||
[TuDoor] Li, X., Xu, W., Liu, B., Zhang, M., Li, Z., Zhang, J., | ||||
Chang, D., Zheng, X., Wang, C., Chen, J., Duan, H., and Q. | ||||
Li, "TuDoor Attack: Systematically Exploring and | ||||
Exploiting Logic Vulnerabilities in DNS Response Pre- | ||||
processing with Malformed Packets", IEEE Symposium on | ||||
Security and Privacy (SP), DOI 10.1109/SP54263.2024.00046, | ||||
2024, <https://doi.ieeecomputersociety.org/10.1109/ | ||||
SP54263.2024.00046>. | ||||
Acknowledgments | ||||
The authors wish to thank Mukund Sivaraman, Petr Spacek, Peter van | ||||
Dijk, Tim Wicinksi, Joe Abley, Evan Hunt, Barry Leiba, Lucas Pardue, | ||||
Paul Wouters, and other members of the DNSOP Working Group for their | ||||
feedback and contributions. | ||||
Authors' Addresses | Authors' Addresses | |||
Duane Wessels | Duane Wessels | |||
Verisign | Verisign | |||
12061 Bluemont Way | 12061 Bluemont Way | |||
Reston, VA 20190 | Reston, VA 20190 | |||
United States of America | United States of America | |||
Phone: +1 703 948-3200 | Phone: +1 703 948-3200 | |||
Email: dwessels@verisign.com | Email: dwessels@verisign.com | |||
End of changes. 73 change blocks. | ||||
393 lines changed or deleted | 240 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |