wdiff rfc8239.original rfc8239.txt

Internet Engineering Task Force (IETF) L. Avramov
INTERNET-DRAFT, Intended Status: Informational
Request for Comments: 8239 Google
Expires December 23,2017
Category: Informational J. Rapp
June 21, 2017
ISSN: 2070-1721 VMware
August 2017

Data Center Benchmarking Methodology
draft-ietf-bmwg-dcbench-methodology-18

Abstract

The purpose of this informational document is to establish test and
evaluation methodology and measurement techniques for physical
network equipment in the data center. A pre-requisite to this
publication RFC 8238 is the a prerequisite for
this document, as it contains terminology document [draft-ietf-bmwg-dcbench-
terminology]. that is considered
normative. Many of these terms and methods may be applicable beyond this publication's
the scope of this document as the technologies originally applied in
the data center are deployed elsewhere.

Status of this This Memo

This Internet-Draft document is submitted in full conformance with the
provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents not an Internet Standards Track specification; it is
published for informational purposes.

This document is a product of the Internet Engineering Task Force
(IETF). Note that other groups may also distribute working
documents as Internet-Drafts. The list It represents the consensus of current Internet-Drafts is
at http://datatracker.ietf.org/drafts/current.

Internet-Drafts are draft documents valid the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a maximum candidate for any level of Internet
Standard; see Section 2 of RFC 7841.

Information about the current status of six months this document, any errata,
and how to provide feedback on it may be updated, replaced, or obsoleted by other documents obtained at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
https://www.rfc-editor.org/info/rfc8239.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info)
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.

Table of Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 ....................................................3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 ......................................4
1.2. Methodology format Format and repeatability recommendation . . . . 5 Repeatability Recommendation ........4
2. Line Rate Line-Rate Testing . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 ...............................................4
2.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 ..................................................4
2.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 ................................................4
2.3. Reporting Format . . . . . . . . . . . . . . . . . . . . . . 6 ...........................................5
3. Buffering Testing . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 ...............................................6
3.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 ..................................................6
3.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 ................................................7
3.3. Reporting format . . . . . . . . . . . . . . . . . . . . . . 10
4 Format ...........................................9
4. Microburst Testing . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 .............................................10
4.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 .................................................10
4.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 ...............................................10
4.3. Reporting Format . . . . . . . . . . . . . . . . . . . . . . 12 ..........................................11
5. Head of Line Head-of-Line Blocking . . . . . . . . . . . . . . . . . . . . . 13
5.1 ..........................................12
5.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 .................................................12
5.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 ...............................................12
5.3. Reporting Format . . . . . . . . . . . . . . . . . . . . . . 15 ..........................................14
6. Incast Stateful and Stateless Traffic . . . . . . . . . . . . . 15
6.1 ..........................15
6.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.2 .................................................15
6.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3 ...............................................15
6.3. Reporting Format . . . . . . . . . . . . . . . . . . . . . . 17 ..........................................17
7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 ........................................17
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 ............................................17
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 .....................................................18
9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 ......................................18
9.2. Informative References . . . . . . . . . . . . . . . . . . 19
9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20 ....................................18
Acknowledgments ...................................................19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 ................................................19

1. Introduction

Traffic patterns in the data center are not uniform and are
constantly changing. They are dictated by the nature and variety of
applications utilized in the data center. It They can be largely
east-west traffic flows (server to server inside the data center) in
one data center and north-south (outside (from the outside of the data center
to the server) in another, while others may combine both. Traffic
patterns can be bursty in nature and contain many-to-one,
many-to-many, or one-to-
many one-to-many flows. Each flow may also be small and
latency sensitive or large and throughput sensitive while containing
a mix of UDP and TCP traffic. All of these can coexist in a single
cluster and flow through a single network device simultaneously.
Benchmarking of tests for network devices have long used [RFC1242],
[RFC2432], [RFC2544],
[RFC2889] [RFC2889], and [RFC3918] [RFC3918], which have largely
been focused around various latency attributes and Throughput throughput
[RFC2889] of the Device Under Test (DUT) being benchmarked. These
standards are good at measuring theoretical Throughput, throughput, forwarding rates
rates, and latency under testing conditions; however, they do not
represent real traffic patterns that may affect these networking
devices.

Currently, typical data center networking devices are
characterized by:

-High

- High port density (48 ports of more)

-High or more).

- High speed (up (currently, up to 100 GB/s currently per port)

-High port).

- High throughput (line rate on all ports for Layer 2 and/or
Layer 3)

-Low 3).

- Low latency (in the microsecond or nanosecond range)

-Low range).

- Low amount of buffer (in the MB range per networking device)

-Layer device).

- Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) mandatory).

This document provides a methodology for benchmarking Data Center data center
physical network equipment DUT DUTs, including congestion scenarios,
switch buffer analysis, microburst, head of line and head-of-line blocking, while
also using a wide mix of traffic conditions. The terminology document [draft-
ietf-bmwg-dcbench-terminology] [RFC8238] is a pre-requisite.
prerequisite for this document, as it contains terminology that is
considered normative.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.

1.2. Methodology format Format and repeatability recommendation Repeatability Recommendation

The following format is used for each section in Sections 2 through 6 of this document is the following:

-Objective

-Methodology

-Reporting
document:

- Objective

- Methodology

- Reporting Format

For each test methodology described, described in this document, it is critical to obtain
that repeatability in of the results. results be obtained. The recommendation is
to perform enough iterations of the given test and to make sure that
the result is consistent. This is especially important for section in the
context of the tests described in Section 3, as the buffering testing
has been historically been the least reliable. The number of iterations
SHOULD be explicitly reported. The relative standard deviation
SHOULD be below 10%.

2. Line Rate Line-Rate Testing

2.1

2.1. Objective

Provide

The objective of this test is to provide a maximum rate "maximum rate" test for
the performance values for
Throughput, latency throughput, latency, and jitter. It is
meant to provide (1) the tests to
perform, perform and (2) methodology to verify for
verifying that a DUT is capable of forwarding packets at line rate
under non-congested conditions.

2.2

2.2. Methodology

A traffic generator SHOULD be connected to all ports on the DUT. Two
tests MUST be conducted: (1) a port-pair test [RFC 2544/3918 section 15
compliant] [RFC2544] [RFC3918] and also in
(2) a full mesh type of DUT test [2889/3918
section 16 compliant]. using a full-mesh DUT [RFC2889] [RFC3918].

For all tests, the test traffic generator generator's sending rate MUST be less than
or equal to 99.98% of the nominal value of Line Rate the line rate (with no
further PPM Parts Per Million (PPM) adjustment to account for interface
clock tolerances), to ensure stressing of the DUT in reasonable worst case
worst-case conditions (see RFC
[draft-ietf-bmwg-dcbench-terminology] section [RFC8238], Section 5 for more details --
note to RFC Editor, please replace all [draft-ietf-bmwg-dcbench-
terminology] references in this document with the future RFC number
of that draft). Tests details).
Test results at a lower rate MAY be provided for better understanding
of performance increase in terms of latency and jitter when the rate
is lower than 99.98%. The receiving rate of the traffic SHOULD be
captured during this test in % as a percentage of line rate.

The test MUST provide the statistics of minimum, average average, and
maximum of the latency distribution, for the exact same iteration of
the test.

The test MUST provide the statistics of minimum, average average, and maximum
of the jitter distribution, for the exact same iteration of the test.

Alternatively

Alternatively, when a traffic generator can not cannot be connected to all
ports on the DUT, a snake test MUST be used for line rate line-rate testing,
excluding latency and jitter jitter, as those became then would become irrelevant. The
snake test consists in the following method:

-connect is performed as follows:

- Connect the first and last port of the DUT to a traffic generator

-connect generator.

- Connect, back to back sequentially and sequentially, all the ports in between:
port 2 to port 3, port 4 to 5 etc port 5, etc., to port n-2 N-2 to port n-1; N-1,
where n N is the total number of ports of the DUT

-configure DUT.

- Configure port 1 and port 2 in the same vlan VLAN X, port 3 and port 4
in the same
vlan VLAN Y, etc. etc., and port n-1 N-1 and port n N in the same vlan
VLAN Z.

This snake test provides a the capability to test line rate for Layer 2
and Layer 3 RFC 2544/3918 [RFC2544] [RFC3918] in instance instances where a traffic
generator with only two ports is available. The latency Latency and jitter are
not to be considered with for this test.

2.3

2.3. Reporting Format

The report MUST include:

-physical layer include the following:

- Physical-layer calibration information information, as defined into [draft-ietf-
bmwg-dcbench-terminology] section in [RFC8238],
Section 4.

-number

- Number of ports used

-reading used.

- Reading for "Throughput "throughput received in as a percentage of bandwidth",
while sending 99.98% of the nominal value of Line Rate the line rate on each
port, for each packet size from 64 bytes to 9216 bytes. As
guidance, an with a packet-size increment of 64 byte packet size bytes between each
iteration being ideal, a 256 byte 256-byte and 512 bytes being 512-byte packets are also
often used. The most common packets
sizes order packet-size ordering for the report is:
64b,128b,256b,512b,1024b,1518b,4096,8000,9216b.
is 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes,
1518 bytes, 4096 bytes, 8000 bytes, and 9216 bytes.

The pattern for testing can be expressed using [RFC 6985].

-Throughput [RFC6985].

- Throughput needs to be expressed in % as a percentage of total
transmitted frames

-For packet drops, they frames.

- Packet drops MUST be expressed as a count of packets and SHOULD be
expressed in % as a percentage of line rate

-For rate.

- For latency and jitter, values are expressed in unit units of time [usually
microsecond
(usually microseconds or nanosecond] nanoseconds), reading across packet size sizes
from 64 bytes to 9216 bytes

-For bytes.

- For latency and jitter, provide minimum, average average, and maximum
values. If different iterations are done to gather the minimum, average
average, and
maximum, it maximum values, this SHOULD be specified in the report
report, along with a justification on for why the information could
not have been gathered at in the same test iteration

-For iteration.

- For jitter, a histogram describing the population of packets
measured per latency or latency buckets is RECOMMENDED

-The RECOMMENDED.

- The tests for Throughput, latency throughput, latency, and jitter MAY be conducted as
individual independent trials, with proper documentation provided
in the
report report, but SHOULD be conducted at the same time.

-The

- The methodology makes an assumption assumes that the DUT has at least nine ports, as
certain methodologies require that number of ports nine or
more. more ports.

3. Buffering Testing

3.1

3.1. Objective

The objective of this test is to measure the size of the buffer of a
DUT under
typical|many|multiple typical/many/multiple conditions. Buffer architectures
between multiple DUTs can differ and include egress buffering, shared
egress buffering SoC (Switch-on-Chip), ingress buffering buffering, or a combination.
combination thereof. The test methodology covers the buffer measurement
measurement, regardless of buffer architecture used in the DUT.

3.2

3.2. Methodology

A traffic generator MUST be connected to all ports on the DUT. The
methodology for measuring buffering for a data-center data center switch is based
on using known congestion of known fixed packet size size, along with
maximum latency value measurements. The maximum latency will
increase until the first packet drop occurs. At this point, the
maximum latency value will remain constant. This is the point of
inflection of this maximum latency change to a constant value. There
MUST be multiple ingress ports receiving a known amount of frames at
a known fixed size, destined for the same egress port in order to
create a known congestion condition. The total amount of packets
sent from the oversubscribed port minus one, multiplied by the packet size
size, represents the maximum port buffer size at the measured
inflection point.

1) Measure

Note that the highest buffer efficiency

The tests described in procedures 1), 2), 3), and 4) in
this section have iterations called "first iteration", "second iteration" and,
iteration", and "last iteration". The idea is to show the first
two iterations so the reader understands the logic on of how to keep
incrementing the iterations. The last iteration shows the end state
of the variables.

1) Measure the highest buffer efficiency.

o First iteration: ingress Ingress port 1 sending 64-byte packets at line
rate to egress port 2, while port 3 is sending a known low
amount of over-subscription oversubscription traffic (1% recommended) with a the
same packet size of 64 bytes to egress port 2. Measure the
buffer size value of the number of frames sent from the port
sending the oversubscribed traffic up to the inflection point
multiplied by the frame size.

o Second iteration: ingress Ingress port 1 sending 65-byte packets at
line rate to egress port 2, while port 3 is sending a known low
amount of over-subscription oversubscription traffic (1% recommended) with the
same packet size of 65 bytes to egress port 2. Measure the
buffer size value of the number of frames sent from the port
sending the oversubscribed traffic up to the inflection point
multiplied by the frame size.

o Last iteration: ingress Ingress port 1 sending packets of size B bytes
at line rate to egress port 2, while port 3 is sending a known
low amount of over-subscription oversubscription traffic (1% recommended) with
the same packet size of B bytes to egress port 2. Measure the
buffer size value of the number of frames sent from the port
sending the oversubscribed traffic up to the inflection point
multiplied by the frame size.

When the B value is found to provide the largest buffer size, then
size B allows the highest buffer efficiency.

2) Measure maximum port buffer size

The tests described in this section have iterations called "first
iteration", "second iteration" and, "last iteration". The idea is to
show the first two iterations so the reader understands the logic on
how to keep incrementing the iterations. The last iteration shows the
end state of the variables.

At fixed packet size.

At fixed packet size B as determined in procedure 1), for a fixed
default Differentiated Services Code Point (DSCP)/Class (DSCP) / Class of
Service
(COS) (CoS) value of 0 and for unicast traffic traffic, proceed with the
following:

o First iteration: ingress Ingress port 1 sending line rate to egress
port 2, while port 3 is sending a known low amount of over-subscription
oversubscription traffic (1% recommended) with the same packet
size to the egress port 2. Measure the buffer size value by
multiplying the number of extra frames sent by the frame size.

o Second iteration: ingress Ingress port 2 sending line rate to egress
port 3, while port 4 is sending a known low amount of over-subscription
oversubscription traffic (1% recommended) with the same packet
size to the egress port 3. Measure the buffer size value by
multiplying the number of extra frames sent by the frame size.

o Last iteration: ingress Ingress port N-2 sending line rate traffic to egress
port N-1, while port N is sending a known low amount of over-
subscription
oversubscription traffic (1% recommended) with the same packet
size to the egress port N. Measure the buffer size value by
multiplying the number of extra frames sent by the frame size.

This test series MAY be repeated using all different DSCP/COS DSCP/CoS
values of traffic traffic, and then using Multicast type of multicast traffic, in order to
find out if there is any DSCP/COS DSCP/CoS impact on the buffer size.

3) Measure maximum port pair buffer sizes

o First iteration: ingress Ingress port 1 sending line rate to egress
port 2; 2, ingress port 3 sending line rate to egress port 4 4, etc.
Ingress port N-1 and port N will respectively over subscribe oversubscribe, at 1% of line rate
rate, egress port 2 and port 3. 3, respectively. Measure the
buffer size value by multiplying the number of extra frames
sent by the frame size for each egress port.

o Second iteration: ingress Ingress port 1 sending line rate to egress
port 2; 2, ingress port 3 sending line rate to egress port 4 4, etc.
Ingress port N-1 and port N will respectively over subscribe oversubscribe, at 1% of line rate
rate, egress port 4 and port 5. 5, respectively. Measure the
buffer size value by multiplying the number of extra frames
sent by the frame size for each egress port.

o Last iteration: ingress Ingress port 1 sending line rate to egress
port 2; 2, ingress port 3 sending line rate to egress port 4 4, etc.
Ingress port N-1 and port N will respectively over subscribe oversubscribe, at 1% of line rate
rate, egress port N-3 and port N-2. N-2, respectively. Measure the
buffer size value by multiplying the number of extra frames
sent by the frame size for each egress port.

This test series MAY be repeated using all different DSCP/COS DSCP/CoS
values of traffic and then using Multicast type of multicast traffic.

4) Measure maximum DUT buffer size with many to one ports

o First iteration: ingress Ingress ports 1,2,... N-1 sending each [(1/[N-
1])*99.98]+[1/[N-1]] sending
[(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port to the N egress port.
port N.

o Second iteration: ingress Ingress ports 2,... N sending each [(1/[N-
1])*99.98]+[1/[N-1]] sending
[(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port to the 1 egress port.
port 1.

o Last iteration: ingress Ingress ports N,1,2...N-2 sending each [(1/[N-
1])*99.98]+[1/[N-1]] sending
[(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port to the N-1 egress port.
port N-1.

This test series MAY be repeated using all different COS CoS values of
traffic and then using Multicast type of multicast traffic.

Unicast traffic traffic, and then Multicast traffic multicast traffic, SHOULD be used in order
to determine the proportion of buffer for the documented selection of
tests.
Also Also, the COS CoS value for the packets SHOULD be provided for
each test
iteration iteration, as the buffer allocation size MAY differ per COS CoS
value. It is RECOMMENDED that the ingress and egress ports are be varied
in a
random, random but documented fashion in multiple tests in order to
measure the buffer size for each port of the DUT.

3.3

3.3. Reporting format Format

The report MUST include: include the following:

- The packet size used for the most efficient buffer used,
along with DSCP/COS value the DSCP/CoS value.

- The maximum port buffer size for each port port.

- The maximum DUT buffer size size.

- The packet size used in the test test.

- The amount of over-subscription oversubscription, if different than 1% 1%.

- The number of ingress and egress ports ports, along with their location
on the DUT DUT.

- The repeatability of the test needs to be indicated: the number of
iterations of the same test and the percentage of variation
between results for each of the tests (min, max, avg) avg).

The percentage of variation is a metric providing a sense of how big
the difference is between the measured value and the previous ones. values.

For example, for a latency test where the minimum latency is
measured, the percentage of variation (PV) of the minimum latency
will indicate by how much this value has varied between the current
test executed and the previous one.

PV=((x2-x1)/x1)*100

PV = ((x2-x1)/x1)*100, where x2 is the minimum latency value in the
current test and x1 is the minimum latency value obtained in the
previous test.

The same formula is used for max maximum and avg average variations measured.

4. Microburst Testing

4.1

4.1. Objective

The objective of this test is to find the maximum amount of packet
bursts that a DUT can sustain under various configurations.

This test provides additional methodology to that supplements the other RFC tests:

-All tests
described in [RFC1242], [RFC2432], [RFC2544], [RFC2889], and
[RFC3918].

- All bursts should be send sent with 100% intensity. Note: intensity "Intensity"
is defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1

-All [RFC8238], Section 6.1.1.

- All ports of the DUT must be used for this test

-All ports are test.

- It is recommended to that all ports be testes simultaneously

4.2 tested simultaneously.

4.2. Methodology

A traffic generator MUST be connected to all ports on the DUT. In
order to cause congestion, two or more ingress ports MUST send bursts
of packets destined for the same egress port. The simplest of the
setups would be two ingress ports and one egress port (2-to-1). (2 to 1).

The burst MUST be sent with an intensity of 100% (intensity is (as defined in [draft-ietf-bmwg-dcbench-terminology] section 6.1.1), [RFC8238],
Section 6.1.1) of 100%, meaning that the burst of packets will be
sent with a minimum inter-packet interpacket gap. The amount of packet packets contained
in the burst will be trial variable and increase until there is a
non-zero packet loss measured. The aggregate amount of packets from
all the senders will be used to calculate the maximum amount of microburst
amount that the DUT can sustain.

It is RECOMMENDED that the ingress and egress ports are be varied in
multiple tests in order to measure the maximum microburst capacity.

The intensity of a microburst (see [RFC8238], Section 6.1.1) MAY be
varied in order to obtain the microburst capacity at various
ingress rates. Intensity of microburst
is defined in [draft-ietf-bmwg-dcbench-terminology].

It is RECOMMENDED that all ports on the DUT will be tested
simultaneously simultaneously,
and in various configurations configurations, in order to understand all the
combinations of ingress ports, egress ports ports, and intensities.

An example would be:

o First Iteration: iteration: N-1 Ingress ingress ports sending to 1 Egress Ports one egress port.

o Second Iterations: iteration: N-2 Ingress ingress ports sending to 2 Egress Ports two egress ports.

o Last Iterations: 2 Ingress iteration: Two ingress ports sending to N-2 Egress Ports

4.3 egress ports.

4.3. Reporting Format

The report MUST include: include the following:

- The maximum number of packets received per ingress port with the
maximum burst size obtained with zero packet loss loss.

- The packet size used in the test test.

- The number of ingress and egress ports ports, along with their location
on the DUT DUT.

- The repeatability of the test needs to be indicated: the number of
iterations of the same test and the percentage of variation
between results (min, max, avg) avg).

5. Head of Line Head-of-Line Blocking

5.1

5.1. Objective

Head-of-line blocking (HOLB) is a performance-limiting phenomenon
that occurs when packets are held-up held up by the first packet ahead
waiting to be transmitted to a different output port. This is
defined in RFC 2889 section 5.5, Congestion Control. 2889, Section 5.5 ("Congestion Control"). This
section expands on RFC 2889 in the context of Data Center Benchmarking. data center
benchmarking.

The objective of this test is to understand the DUT DUT's behavior under
head of line blocking in the
HOLB scenario and measure the packet loss.

Here are the

The differences between this HOLB test and RFC 2889:

-This 2889 are as follows:

- This HOLB test starts with 8 eight ports in two groups of 4, four ports
each, instead of 4 four ports (as compared with Section 5.5 of
RFC
2889

-This 2889).

- This HOLB test shifts all the port numbers by one in a second
iteration of the test, test; this is new new, as compared to the HOLB test
described in RFC 2889. The shifting port numbers continue until
all ports are the first in the group. The group; the purpose of this is to
make sure to have tested that all permutations are tested in order to cover
differences of in behavior in the SoC of the DUT

-Another DUT.

- Another test in within this HOLB test expands the group of ports,
such that traffic is divided among 4 four ports instead of two
(25% instead of 50% per port)

-Section port).

- Section 5.3 adds additional reporting lists requirements from Congestion
Control that supplement the requirements
listed in RFC 2889

5.2 2889, Section 5.5.

5.2. Methodology

In order to cause congestion in the form of head of line blocking, HOLB, groups of
four ports are used. A group has 2 two ingress ports and 2 two
egress ports. The first ingress port MUST have two flows configured configured,
each going to a different egress port. The second ingress port will
congest the second egress port by sending line rate. The goal is to
measure if there is loss on the flow for the first egress port port, which
is not over-subscribed. oversubscribed.

A traffic generator MUST be connected to at least eight ports on the
DUT and SHOULD be connected using all the DUT ports.

1) Measure two groups with eight DUT ports
The

Note that the tests described in procedures 1) and 2) in this section
have iterations called "first iteration", "second iteration" and, iteration", and
"last iteration". The idea is to show the first two iterations so
the reader understands the logic on of how to keep incrementing the
iterations. The last iteration shows the end state of the variables.

1) Measure two groups with eight DUT ports.

o First iteration: measure Measure the packet loss for two groups with
consecutive ports ports.

The composition of the first group is composed of: ingress as follows:

Ingress port 1 is sending 50% of traffic to egress port 3
and ingress port 1 is sending 50% of traffic to egress port 4.
Ingress port 2 is sending line rate to egress port 4.
Measure the amount of traffic loss for the traffic from ingress
port 1 to egress port 3.

The composition of the second group is composed of: ingress as follows:

Ingress port 5 is sending 50% of traffic to egress port 7
and ingress port 5 is sending 50% of traffic to egress port 8.
Ingress port 6 is sending line rate to egress port 8.
Measure the amount of traffic loss for the traffic from ingress
port 5 to egress port 7.

o Second iteration: repeat Repeat the first iteration by shifting all
the ports from N to N+1.

The composition of the first group is composed of: ingress as follows:

Ingress port 2 is sending 50% of traffic to egress port 4
and ingress port 2 is sending 50% of traffic to egress port 5.
Ingress port 3 is sending line rate to egress port 5.
Measure the amount of traffic loss for the traffic from ingress
port 2 to egress port 4.

The composition of the second group is composed of: ingress as follows:

Ingress port 6 is sending 50% of traffic to egress port 8
and ingress port 6 is sending 50% of traffic to egress port 9.
Ingress port 7 is sending line rate to egress port 9.
Measure the amount of traffic loss for the traffic from ingress
port 6 to egress port 8.

o Last iteration: when When the first port of the first group is
connected
on to the last DUT port and the last port of the second
group is connected to the seventh port of the DUT.

Measure the amount of traffic loss for the traffic from ingress
port N to egress port 2 and from ingress port 4 to egress port 6.

2) Measure with N/4 groups with N DUT ports

The traffic from the ingress port is split across 4 four egress
ports (100/4=25%). (100/4 = 25%).

o First iteration: Expand to fully utilize all the DUT ports in
increments of four. Repeat the methodology of procedure 1)
with all the group groups of ports possible to achieve on the device device,
and measure for each port
group the amount of traffic loss. loss for each port group.

o Second iteration: Shift by +1 the start of each consecutive ports
port of
groups the port groups.

o Last iteration: Shift by N-1 the start of each consecutive ports port
of
groups the port groups, and measure the amount of traffic loss for
each port group.

5.3

5.3. Reporting Format

For each test test, the report MUST include: include the following:

- The port configuration configuration, including the number and location of
ingress and egress ports located on the DUT DUT.

- If HOLB was observed in accordance with the HOLB test described in section 5
Section 5.

- Percent of traffic loss loss.

- The repeatability of the test needs to be indicated: the number of
iteration
iterations of the same test and the percentage of variation
between results (min, max, avg) avg).

6. Incast Stateful and Stateless Traffic

6.1

6.1. Objective

The objective of this test is to measure the values for TCP Goodput
[1]
[TCP-INCAST] and latency with a mix of large and small flows. The
test is designed to simulate a mixed environment of stateful flows
that require high rates of goodput and stateless flows that require
low latency. Stateful flows are created by generating TCP traffic and, traffic,
and stateless flows are created using UDP type of traffic.

6.2

6.2. Methodology

In order to simulate the effects of stateless and stateful traffic on
the DUT, there MUST be multiple ingress ports receiving traffic
destined for the same egress port. There also MAY be a mix of
stateful and stateless traffic arriving on a single ingress port.
The simplest setup would be 2 two ingress ports receiving traffic
destined to the same egress port.

One ingress port MUST be maintaining maintain a TCP connection trough through the ingress
port to a receiver connected to an egress port. Traffic in the TCP
stream MUST be sent at the maximum rate allowed by the traffic
generator. At the same time, the TCP traffic is flowing through the DUT
DUT, and the stateless traffic is sent destined to a receiver on the
same egress port. The stateless traffic MUST be a microburst of
100% intensity.

It is RECOMMENDED that the ingress and egress ports are be varied in
multiple tests in order to measure the maximum microburst capacity.

The intensity of a microburst MAY be varied in order to obtain the
microburst capacity at various ingress rates.

It is RECOMMENDED that all ports on the DUT be used in the test.

The tests described bellow below have iterations called "first iteration",
"second iteration" and, iteration", and "last iteration". The idea is to show the
first two iterations so the reader understands the logic on of how to
keep incrementing the iterations. The last iteration shows the
end state of the variables.

For example:

Stateful Traffic traffic port variation (TCP traffic):

TCP traffic needs to be generated in for this section. test. During Iterations the
iterations, the number of Egress egress ports MAY vary as well.

o First Iteration: 1 Ingress iteration: One ingress port receiving stateful TCP traffic
and 1
Ingress one ingress port receiving stateless traffic destined to 1 Egress Port
one egress port.

o Second Iteration: 2 Ingress port iteration: Two ingress ports receiving stateful TCP traffic
and 1
Ingress one ingress port receiving stateless traffic destined to 1 Egress Port
one egress port.

o Last Iteration: iteration: N-2 Ingress port ingress ports receiving stateful TCP traffic
and 1
Ingress one ingress port receiving stateless traffic destined to 1 Egress Port
one egress port.

Stateless Traffic traffic port variation (UDP traffic):

UDP traffic needs to be generated for this test. During Iterations, the
iterations, the number of Egress egress ports MAY vary as well.

o First Iteration: 1 Ingress iteration: One ingress port receiving stateful TCP traffic
and 1
Ingress one ingress port receiving stateless traffic destined to 1 Egress Port
one egress port.

o Second Iteration: 1 Ingress iteration: One ingress port receiving stateful TCP traffic
and 2
Ingress port two ingress ports receiving stateless traffic destined to 1 Egress Port
one egress port.

o Last Iteration: 1 Ingress iteration: One ingress port receiving stateful TCP traffic
and N-2
Ingress port ingress ports receiving stateless traffic destined to 1 Egress Port

6.3
one egress port.

6.3. Reporting Format

The report MUST include the following:

- Number of ingress and egress ports ports, along with designation of
stateful or stateless flow assignment.

- Stateful flow goodput goodput.

- Stateless flow latency latency.

- The repeatability of the test needs to be indicated: the number of
iterations of the same test and the percentage of variation
between results (min, max, avg) avg).

7. Security Considerations

Benchmarking activities as described in this memo are limited to
technology characterization using controlled stimuli in a laboratory
environment, with dedicated address space and the constraints
specified in the sections above.

The benchmarking network topology will be an independent test setup
and MUST NOT be connected to devices that may forward the test
traffic into a production network, network or misroute traffic to the test
management network.

Further, benchmarking is performed on a "black-box" basis, relying
solely on measurements observable external to the DUT.

Special capabilities SHOULD NOT exist in the DUT specifically for
benchmarking purposes. Any implications for network security arising
from the DUT SHOULD be identical in the lab and in production
networks.

8. IANA Considerations

This document does not require any IANA Action is requested at this time. actions.

9. References

9.1. Normative References

[RFC1242] Bradner, S. S., "Benchmarking Terminology for Network
Interconnection Devices", BCP 14, RFC 1242, DOI 10.17487/RFC1242,
July 1991, <http://www.rfc-
editor.org/info/rfc1242> <https://www.rfc-editor.org/info/rfc1242>.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.

[RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
Network Interconnect Devices", BCP 14, RFC 2544,
DOI 10.17487/RFC2544, March 1999, <http://www.rfc-
editor.org/info/rfc2544>

9.2. Informative References

[draft-ietf-bmwg-dcbench-terminology] Avramov
<https://www.rfc-editor.org/info/rfc2544>.

[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in
RFC 2119 Key Words", BCP 14, RFC 8174,
DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>.

[RFC8238] Avramov, L. and Rapp J., J. Rapp, "Data Center Benchmarking
Terminology", April 2017, RFC "draft-ietf-
bmwg-dcbench-terminology", Date [to be fixed when the RFC is
published and 1 to be replaced by the 8238, DOI 10.17487/RFC8238, August 2017,
<https://www.rfc-editor.org/info/rfc8238>.

9.2. Informative References

[RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking",
RFC number 2432, DOI 10.17487/RFC2432, October 1998,
<https://www.rfc-editor.org/info/rfc2432>.

[RFC2889] Mandeville Mandeville, R. and Perser J., J. Perser, "Benchmarking Methodology
for LAN Switching Devices", RFC 2889,
DOI 10.17487/RFC2889, August 2000, <http://www.rfc-
editor.org/info/rfc2889>
<https://www.rfc-editor.org/info/rfc2889>.

[RFC3918] Stopp Stopp, D. and Hickman B., B. Hickman, "Methodology for IP Multicast
Benchmarking", RFC 3918, DOI 10.17487/RFC3918,
October 2004, <http://www.rfc-
editor.org/info/rfc3918>

[RFC 6985] A. <https://www.rfc-editor.org/info/rfc3918>.

[RFC6985] Morton, A., "IMIX Genome: Specification of Variable Packet
Sizes for Additional Testing", RFC 6985,
DOI 10.17487/RFC6985, July 2013,
<http://www.rfc-editor.org/info/rfc6985>

[1] Yanpei
<https://www.rfc-editor.org/info/rfc6985>.

[TCP-INCAST]
Chen, Rean Y., Griffith, Junda Liu, Randy H. Katz, Anthony D. R., Zats, D., Joseph, A., and R. Katz,
"Understanding TCP Incast Throughput Collapse in
Datacenter Networks,
"http://yanpeichen.com/professional/usenixLoginIncastReady.pdf"

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119,
March 1997, <http://www.rfc-editor.org/info/rfc2119>

[RFC2432] Dubray, K., "Terminology and Its Implications for IP Multicast
Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October
1998, <http://www.rfc-editor.org/info/rfc2432>

9.2. Acknowledgements Big
Data Workloads", April 2012, <http://yanpeichen.com/
professional/usenixLoginIncastReady.pdf>.

Acknowledgments

The authors would like to thank Alfred Al Morton and Scott Bradner for their
reviews and feedback.

Authors' Addresses

Lucien Avramov
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
United States
Phone: +1 408 774 9077 of America

Email: lucien.avramov@gmail.com

Jacob Rapp
VMware
3401 Hillview Ave Ave.
Palo Alto, CA 94304
United States
Phone: +1 650 857 3367 of America

Email: jrapp@vmware.com jhrapp@gmail.com