IETF
Recommendations Regarding Active Queue Management
Cisco Systems
Santa Barbara
93117
California
USA
fred@cisco.com
Internet Engineering Task Force
This memo presents recommendations to the Internet community
concerning measures to improve and preserve Internet performance. It
presents a strong recommendation for testing, standardization, and
widespread deployment of active queue management in routers, to improve
the performance of today's Internet. It also urges a concerted effort of
research, measurement, and ultimate deployment of router mechanisms to
protect the Internet from flows that are not sufficiently responsive to
congestion notification.
The note largely repeats the recommendations of RFC 2309, updated after fifteen years of experience and new research.
The Internet protocol architecture is based on a connectionless end-
to-end packet service using the Internet Protocol, whether IPv4 or IPv6. The
advantages of its connectionless design, flexibility and robustness,
have been amply demonstrated. However, these advantages are not without
cost: careful design is required to provide good service under heavy
load. In fact, lack of attention to the dynamics of packet forwarding
can result in severe service degradation or "Internet meltdown". This
phenomenon was first observed during the early growth phase of the
Internet of the mid 1980s , and is technically called "congestive
collapse".
The original fix for Internet meltdown was provided by Van Jacobsen.
Beginning in 1986, Jacobsen developed the congestion avoidance
mechanisms that are now required in TCP implementations . These
mechanisms operate in the hosts to cause TCP connections to "back off"
during congestion. We say that TCP flows are "responsive" to congestion
signals (i.e., marked or dropped packets) from the network. It is
primarily these TCP congestion avoidance algorithms that prevent the
congestive collapse of today's Internet.
However, that is not the end of the story. Considerable research has
been done on Internet dynamics since 1988, and the Internet has grown.
It has become clear that the TCP congestion
avoidance mechanisms, while necessary and powerful, are not
sufficient to provide good service in all circumstances. Basically,
there is a limit to how much control can be accomplished from the edges
of the network. Some mechanisms are needed in the routers to complement
the endpoint congestion avoidance mechanisms.
It is useful to distinguish between two classes of router algorithms
related to congestion control: "queue management" versus "scheduling"
algorithms. To a rough approximation, queue management algorithms manage
the length of packet queues by marking or dropping packets when
necessary or appropriate, while scheduling algorithms determine which
packet to send next and are used primarily to manage the allocation of
bandwidth among flows. While these two router mechanisms are closely
related, they address rather different performance issues.
This memo highlights two performance issues. The first issue is the
need for an advanced form of queue management that we call "active queue
management." summarizes the benefits
that active queue management can bring. A number of Active Queue
Management procedures are described in the literature, with different
characteristics. This document does not recommend any of them in
particular, but does make recommendations that ideally would affect the
choice of procedure used in a given implementation.
The second issue, discussed in of
this memo, is the potential for future congestive collapse of the
Internet due to flows that are unresponsive, or not sufficiently
responsive, to congestion indications. Unfortunately, there is no
consensus solution to controlling congestion caused by such aggressive
flows; significant research and engineering will be required before any
solution will be available. It is imperative that this work be
energetically pursued, to ensure the future stability of the
Internet.
concludes the memo with a set of
recommendations to the Internet community concerning these topics.
The discussion in this memo applies to "best-effort" traffic, which
is to say, traffic generated by applications that accept the occasional
loss, duplication, or reordering of traffic in flight. It is most
effective, on time scales of a single RTT or a small number of RTTs, for
elastic traffic, but also impacts real
time traffic generated by adaptive applications.
resulted from past discussions of
end-to-end performance, Internet congestion, and RED in the End-to-End
Research Group of the Internet Research Task Force (IRTF). This update
results from experience with that and other algorithms, and the Active
Queue Management discussion within the IETF.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .
The traditional technique for managing router queue lengths is to set
a maximum length (in terms of packets) for each queue, accept packets
for the queue until the maximum length is reached, then reject (drop)
subsequent incoming packets until the queue decreases because a packet
from the queue has been transmitted. This technique is known as "tail
drop", since the packet that arrived most recently (i.e., the one on the
tail of the queue) is dropped when the queue is full. This method has
served the Internet well for years, but it has two important drawbacks.
Lock-Out In some situations tail drop
allows a single connection or a few flows to monopolize queue space,
preventing other connections from getting room in the queue. This
"lock-out" phenomenon is often the result of synchronization or
other timing effects.
Full Queues The tail drop discipline
allows queues to maintain a full (or, almost full) status for long
periods of time, since tail drop signals congestion (via a packet
drop) only when the queue has become full. It is important to reduce
the steady-state queue size, and this is perhaps queue management's
most important goal. The naive assumption
might be that there is a simple tradeoff between delay and
throughput, and that the recommendation that queues be maintained in
a "non-full" state essentially translates to a recommendation that
low end-to-end delay is more important than high throughput.
However, this does not take into account the critical role that
packet bursts play in Internet performance. Even though TCP
constrains a flow's window size, packets often arrive at routers in
bursts . If the queue is full or
almost full, an arriving burst will cause multiple packets to be
dropped. This can result in a global synchronization of flows
throttling back, followed by a sustained period of lowered link
utilization, reducing overall throughput.
The point of buffering in the network is to absorb data bursts and
to transmit them during the (hopefully) ensuing bursts of silence.
This is essential to permit the transmission of bursty data. It
should be clear why we would like to have normally- small queues in
routers: we want to have queue capacity to absorb the bursts. The
counter-intuitive result is that maintaining normally-small queues
can result in higher throughput as well as lower end-to-end delay.
In short, queue limits should not reflect the steady state queues we
want maintained in the network; instead, they should reflect the
size of bursts we need to absorb.
Besides tail drop, two alternative queue disciplines that can be
applied when the queue becomes full are "random drop on full" or "drop
front on full". Under the random drop on full discipline, a router drops
a randomly selected packet from the queue (which can be an expensive
operation, since it naively requires an O(N) walk through the packet
queue) when the queue is full and a new packet arrives. Under the "drop
front on full" discipline , the router
drops the packet at the front of the queue when the queue is full and a
new packet arrives. Both of these solve the lock-out problem, but
neither solves the full-queues problem described above.
We know in general how to solve the full-queues problem for
"responsive" flows, i.e., those flows that throttle back in response to
congestion notification. In the current Internet, dropped packets serve
as a critical mechanism of congestion notification to end nodes. The
solution to the full-queues problem is for routers to drop packets
before a queue becomes full, so that end nodes can respond to congestion
before buffers overflow. We call such a proactive approach "active queue
management". By dropping packets before buffers overflow, active queue
management allows routers to control when and how many packets to
drop.
In summary, an active queue management mechanism can provide the
following advantages for responsive flows.
Reduce number of packets dropped in routers Packet bursts are an unavoidable aspect of packet
networks . If all the queue space
in a router is already committed to "steady state" traffic or if the
buffer space is inadequate, then the router will have no ability to
buffer bursts. By keeping the average queue size small, active queue
management will provide greater capacity to absorb naturally-
occurring bursts without dropping packets.
Furthermore, without active queue management, more packets will be
dropped when a queue does overflow. This is undesirable for several
reasons. First, with a shared queue and the tail drop discipline, an
unnecessary global synchronization of flows cutting back can result
in lowered average link utilization, and hence lowered network
throughput. Second, TCP recovers with more difficulty from a burst
of packet drops than from a single packet drop. Third, unnecessary
packet drops represent a possible waste of bandwidth on the way to
the drop point. We note that while Active
Queue Management can manage queue lengths and reduce end- to-end
latency even in the absence of end-to-end congestion control, Active
Queue Management will be able to reduce packet dropping only in an
environment that continues to be dominated by end-to-end congestion
control.
Provide lower-delay interactive service
By keeping the average queue size small, queue management will
reduce the delays seen by flows. This is particularly important for
interactive applications such as short Web transfers, Telnet
traffic, or interactive audio-video sessions, whose subjective (and
objective) performance is better when the end-to-end delay is
low.
Avoid lock-out behavior Active queue
management can prevent lock-out behavior by ensuring that there will
almost always be a buffer available for an incoming packet. For the
same reason, active queue management can prevent a router bias
against low bandwidth but highly bursty flows. It is clear that lock-out is undesirable because
it constitutes a gross unfairness among groups of flows. However, we
stop short of calling this benefit "increased fairness", because
general fairness among flows requires per-flow state, which is not
provided by queue management. For example, in a router using queue
management but only FIFO scheduling, two TCP flows may receive very
different bandwidths simply because they have different round-trip
times , and a flow that does not use
congestion control may receive more bandwidth than a flow that does.
Per-flow state to achieve general fairness might be maintained by a
per-flow scheduling algorithm such as Fair Queueing (FQ) , or a class-based scheduling algorithm
such as CBQ , for example. On the other hand, active queue management is
needed even for routers that use per-flow scheduling algorithms such
as FQ or class-based scheduling algorithms such as CBQ. This is
because per-flow scheduling algorithms by themselves do nothing to
control the overall queue size or the size of individual queues.
Active queue management is needed to control the overall average
queue sizes, so that arriving bursts can be accommodated without
dropping packets. In addition, active queue management should be
used to control the queue size for each individual flow or class, so
that they do not experience unnecessarily high delays. Therefore,
active queue management should be applied across the classes or
flows as well as within each class or flow. In short, scheduling algorithms and queue
management should be seen as complementary, not as replacements for
each other.
One of the keys to the success of the Internet has been the
congestion avoidance mechanisms of TCP. Because TCP "backs off" during
congestion, a large number of TCP connections can share a single,
congested link in such a way that bandwidth is shared reasonably
equitably among similarly situated flows. The equitable sharing of
bandwidth among flows depends on the fact that all flows are running
basically the same congestion avoidance algorithms, conformant with the
current TCP specification .
Flows that behaves under congestion like a flow produced by a
conformant TCP have come to be called "TCP
Friendly". A TCP Friendly flow is responsive to congestion
notification, and in steady-state it uses no more bandwidth than a
conformant TCP running under comparable conditions (drop rate, RTT, MTU,
etc.)
It is convenient to divide flows into three classes: (1) TCP Friendly
flows, (2) unresponsive flows, i.e., flows that do not slow down when
congestion occurs, and (3) flows that are responsive but are not TCP
Friendly. The last two classes contain more aggressive flows that pose
significant threats to Internet performance, as we will now discuss.
Non-Responsive Flows There is a growing
set of UDP-based applications whose congestion avoidance algorithms
are inadequate or nonexistent (i.e, the flow does not throttle back
upon receipt of congestion notification). Such UDP applications
include streaming applications like packet voice and video, and also
multicast bulk data transport . If no
action is taken, such unresponsive flows could lead to a new
congestive collapse. In general, all
UDP-based streaming applications should incorporate effective
congestion avoidance mechanisms. For example, recent research has
shown the possibility of incorporating congestion avoidance
mechanisms such as Receiver- driven Layered Multicast (RLM) within
UDP-based streaming applications such as packet video . Further
research and development on ways to accomplish congestion avoidance
for streaming applications will be very important. However, it will also be important for the network
to be able to protect itself against unresponsive flows, and
mechanisms to accomplish this must be developed and deployed.
Deployment of such mechanisms would provide incentive for every
streaming application to become responsive by incorporating its own
congestion control.
Non-TCP-Friendly Transport Protocols
The second threat is posed by transport protocol implementations
that are responsive to congestion notification but, either
deliberately or through faulty implementations, are not TCP
Friendly. Such applications can grab an unfair share of the network
bandwidth. For example, the popularity of
the Internet has caused a proliferation in the number of TCP
implementations. Some of these may fail to implement the TCP
congestion avoidance mechanisms correctly because of poor
implementation. Others may deliberately be implemented with
congestion avoidance algorithms that are more aggressive in their
use of bandwidth than other TCP implementations; this would allow a
vendor to claim to have a "faster TCP". The logical consequence of
such implementations would be a spiral of increasingly aggressive
TCP implementations, leading back to the point where there is
effectively no congestion avoidance and the Internet is chronically
congested. Another example of such flows
is RTP/UDP video data flows in which the application uses an
adaptive codec. Such data flows are not responsive to congestion
signals in a time frame comparable to a small number of end-to-end
transmission delays. However, over a longer timescale, perhaps
seconds in duration, they will moderate their speed, or will
increase their speed if they determine bandwidth to be available.
Note that there is a well-known way to
achieve more aggressive TCP performance without even changing TCP:
open multiple connections to the same place, as has been done in
multiple Web browsers and in Peer-to-Peer applications such as
BitTorrent.
The projected increase in more aggressive flows of both these
classes, as a fraction of total Internet traffic, clearly poses a threat
to the future Internet. There is an urgent need for measurements of
current conditions and for further research into the various ways of
managing such flows. There are many difficult issues in identifying and
isolating unresponsive or Non-TCP-Friendly flows at an acceptable router
overhead cost. Finally, there is little measurement or simulation
evidence available about the rate at which these threats are likely to
be realized, or about the expected benefit of router algorithms for
managing such flows.
There is an issue about the appropriate granularity of a "flow".
There are a few "natural" answers: 1) a TCP or UDP connection (source
address/port, destination address/port); 2) a source/destination host
pair; 3) a given source host or a given destination host. We would guess
that the source/destination host pair gives the most appropriate
granularity in many circumstances. However, it is possible that
different vendors/providers could set different granularities for
defining a flow (as a way of "distinguishing" themselves from one
another), or that different granularities could be chosen for different
places in the network. It may be the case that the granularity is less
important than the fact that we are dealing with more unresponsive flows
at *some* granularity. The granularity of flows for congestion
management is, at least in part, a policy question that needs to be
addressed in the wider IETF community.
The IRTF, in developing , and the IETF
in subsequent discussion, has developed a set of specific
recommendations regarding the implementation and operational use of
Active Queue Management procedures. These include:
Internet routers SHOULD implement some active queue management
mechanism to manage queue lengths, reduce end-to-end latency, reduce
packet dropping, and avoid lock-out phenomena within the
Internet.
Deployed Active Queue Management SHOULD use ECN as well as loss
in signaling congestion to endpoints.
Active Queue Management algorithms deployed SHOULD NOT require
operational (especially manual) configuration or tuning.
Active Queue Management algorithms deployed SHOULD be effective
on all common Internet traffic, including traffic that uses TCP,
SCTP, UDP, and DCCP as transports.
TCP and SCTP congestion control algorithms SHOULD maximize their
use of available bandwidth without incurring loss or undue round
trip delay when possible.
It is urgent to continue research, engineering, and measurement
efforts contributing to the design of mechanisms to deal with flows
that are unresponsive to congestion notification or are responsive
but more aggressive than TCP.
These recommendations are expressed using the word "SHOULD". This is
in recognition that there may be use cases unenvisaged in this document
in which the recommendation does not apply. However, care should be
taken in concluding that one's use case falls in that category; during
the life of the Internet, such use cases have been rarely if ever
observed and reported on. To the contrary, available research says that even high speed links
in network cores that are normally very stable in depth and behavior
experience occasional issues that need moderation.
In short, Active Queue Management procedures are designed to
minimize delay induced in the network by queues which have filled as a
result of host behavior. Marking and loss behaviors signal to the
senders of data that network buffers are becoming unnecessarily full,
and they would do well to moderate their behavior.
Means of signaling to an endpoint regarding its effect on the
network and how it might consider adapting include, at least:
Delaying data segments in flight, such as in a queue, which
affects Ack Clocking and as a result the transmission of new
data.
Marking traffic, such as using Explicit Congestion Control .
Dropping traffic in transit.
The use of advanced scheduling mechanisms, such as priority
queuing, classful queuing, and fair queuing, is often effective in
networks to help a network to serve the needs of an application. It
can be used to manage traffic passing a choke point. This is discussed
in and .
They are used operationally when an operator considers it important to
do so.
Loss has two effects. It protects the network, which is the primary
reason the network imposes it. Its use as a signal to TCP or SCTP is a
pragmatic heuristic; "when the network discards a message in flight,
it may imply the presence of faulty equipment or media in a path, and
it may imply the presence of congestion. Presume the latter." However,
it also has an effect on the efficiency of the data flow. The data in
question must be retransmitted, or its absence must otherwise be
adapted to by the application in question, which implies at least
inefficient use of available bandwidth and may affect other data
flows. Hence, loss is not entirely positive; it is a necessary
evil.
Explicit Congestion Control, however, communicates information
about network congestion that is assuredly about congestion, and
avoids the unintended consequences of loss.
Hence, network communication to the host regarding the moderation
of its traffic flow SHOULD use an AQM algorithm to determine which
packets it should affect, and then implement that effect by marking
ECN-capable traffic "Congestion Experienced (CE)" or dropping
non-ECN-capable traffic.
Due to the possibility of abuse, the queue must also impose an
upper bound, so that even ECN-capable traffic experiences tail-drop if
necessary; this possibility, while equipment must design for the end
case, should in theory be very uncommon.
A number of algorithms have been proposed. Many require some form
of tuning or initial condition, which makes them difficult to use
operationally. Hence, self-tuning algorithms are to be preferred.
Active Queue Management algorithms often target TCP, as it is by far the predominant transport
in the Internet today. However, we have significant use of UDP in voice and video services, and find
utility in SCTP and DCCP . Hence, Active Queue Management
algorithms that are effective with all of those transports and the
applications that use them are to be preferred.
The terms "knee" and "cliff" area defined by . They respectively refer to the minimum and
maximum values of the effective window that have the effect of
maximizing transmission rate in a congestion control algorithm such as
is used by TCP or SCTP. For the sender of data, exceeding the cliff is
ineffective, as it (by definition) induces loss; operating at a point
close to the cliff has a negative impact on other traffic and
applications, triggering operator activities such as discussed in
.
Operating below the knee is also ineffective, as it fails to use
available network capacity. If the objective is to deliver data from
its source to its recipient in the least possible time, as a result,
the behavior of any TCP/SCTP congestion control algorithm SHOULD be to
seek and use effective window values at or above the knee and well
below the cliff.
called for, as its second
recommendation, further research in the interaction between network
queues and host applications, and the means of signaling between them.
This research occurred, and we as a community have learned a lot.
However, we are not done. An obvious example in 2013 is in the use of
Map/Reduce applications in data centers; do we need to extend our
taxonomy of TCP/SCTP sessions to include not only "mice" and
"elephants", but "lemmings"? "Lemmings" are flash crowds of "mice"
that the network inadvertently tries to signal to as if they were
elephant flows, resulting in head of line blocking in data center
applications.
Hence, this document reiterates the call: we need continuing
research as applications develop.
This memo asks the IANA for no new parameters.
While security is a very important issue, it is largely orthogonal to
the performance issues discussed in this memo. We note, however, that
denial-of-service attacks may create unresponsive traffic flows that are
indistinguishable from flows from normal high-bandwidth isochronous
applications, and the mechanism suggested in The recommendation in
support of ongoing research will be equally applicable to such
attacks.
This document, by itself, presents no new privacy issues.
The original recommendation in was
written by the End-to-End Research Group, which is to say Bob Braden,
Dave Clark, Jon Crowcroft, Bruce Davie, Steve Deering, Deborah Estrin,
Sally Floyd, Van Jacobson, Greg Minshall, Craig Partridge, Larry
Peterson, KK Ramakrishnan, Scott Shenker, John Wroclawski, and Lixia
Zhang. This is an edited version of that document, with much of its text
and arguments unchanged.
The need for an updated document was agreed to in the tsvarea meeting
at IETF 86. This document was reviewed on the aqm@ietf.org list. Comments
came from Colin Perkins, Richard Scheffenegger, and Dave Taht.
Connections with Multiple Congested Gateways in
Packet-Switched Networks Part 1: One-way Traffic.
Link-sharing and Resource Management Models for Packet
Networks
Analysis and Simulation of a Fair Queueing Algorithm,
Internetworking: Research and Experience
A Reliable Multicast Framework for Light-weight Sessions and
Application Level Framing
Receiver-driven Layered Multicast
Scalable Feedback Control for Multicast Video Distribution in
the Internet
Self-Similarity Through High-Variability: Statistical
Analysis of Ethernet LAN Traffic at the Source Level
Congestion Avoidance and Control
Lawrence Berkeley Network Labs
The Drop From Front Strategy in TCP Over ATM and Its
Interworking with Other Control Features
On the Self-Similar Nature of Ethernet Traffic (Extended
Version)
Congestion avoidance scheme for computer networks
Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation
Analysis of Point-To-Point Packet Delay In an Operational
Network
Sprint ATL
KAIST
University of Minnesota
Sprint ATL
Intel Research