Internet Engineering Task Force (IETF) L. AvramovINTERNET-DRAFT, Intended Status: InformationalRequest for Comments: 8239 GoogleExpires December 23,2017Category: Informational J. RappJune 21, 2017ISSN: 2070-1721 VMware August 2017 Data Center Benchmarking Methodologydraft-ietf-bmwg-dcbench-methodology-18Abstract The purpose of this informational document is to establish test and evaluation methodology and measurement techniques for physical network equipment in the data center.A pre-requisite to this publicationRFC 8238 isthea prerequisite for this document, as it contains terminologydocument [draft-ietf-bmwg-dcbench- terminology].that is considered normative. Many of these terms and methods may be applicable beyondthis publication'sthe scope of this document as the technologies originally applied in the data center are deployed elsewhere. Status ofthisThis Memo ThisInternet-Draftdocument issubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet-Drafts is at http://datatracker.ietf.org/drafts/current. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are amaximumcandidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status ofsix monthsthis document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."https://www.rfc-editor.org/info/rfc8239. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(http://trustee.ietf.org/license-info)(https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3....................................................3 1.1. Requirements Language. . . . . . . . . . . . . . . . . . 5......................................4 1.2. MethodologyformatFormat andrepeatability recommendation . . . . 5Repeatability Recommendation ........4 2.Line RateLine-Rate Testing. . . . . . . . . . . . . . . . . . . . . . . 5 2.1...............................................4 2.1. Objective. . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2..................................................4 2.2. Methodology. . . . . . . . . . . . . . . . . . . . . . . . 5 2.3................................................4 2.3. Reporting Format. . . . . . . . . . . . . . . . . . . . . . 6...........................................5 3. Buffering Testing. . . . . . . . . . . . . . . . . . . . . . . 7 3.1...............................................6 3.1. Objective. . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2..................................................6 3.2. Methodology. . . . . . . . . . . . . . . . . . . . . . . . 7 3.3................................................7 3.3. Reportingformat . . . . . . . . . . . . . . . . . . . . . . 10 4Format ...........................................9 4. Microburst Testing. . . . . . . . . . . . . . . . . . . . . . . 11 4.1.............................................10 4.1. Objective. . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2.................................................10 4.2. Methodology. . . . . . . . . . . . . . . . . . . . . . . . 11 4.3...............................................10 4.3. Reporting Format. . . . . . . . . . . . . . . . . . . . . . 12..........................................11 5.Head of LineHead-of-Line Blocking. . . . . . . . . . . . . . . . . . . . . 13 5.1..........................................12 5.1. Objective. . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2.................................................12 5.2. Methodology. . . . . . . . . . . . . . . . . . . . . . . . 13 5.3...............................................12 5.3. Reporting Format. . . . . . . . . . . . . . . . . . . . . . 15..........................................14 6. Incast Stateful and Stateless Traffic. . . . . . . . . . . . . 15 6.1..........................15 6.1. Objective. . . . . . . . . . . . . . . . . . . . . . . . . 15 6.2.................................................15 6.2. Methodology. . . . . . . . . . . . . . . . . . . . . . . . 15 6.3...............................................15 6.3. Reporting Format. . . . . . . . . . . . . . . . . . . . . . 17..........................................17 7. Security Considerations. . . . . . . . . . . . . . . . . . . 17........................................17 8. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 17............................................17 9. References. . . . . . . . . . . . . . . . . . . . . . . . . . 18.....................................................18 9.1. Normative References. . . . . . . . . . . . . . . . . . . 19......................................18 9.2. Informative References. . . . . . . . . . . . . . . . . . 19 9.2. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20....................................18 Acknowledgments ...................................................19 Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 20................................................19 1. Introduction Traffic patterns in the data center are not uniform and are constantly changing. They are dictated by the nature and variety of applications utilized in the data center.ItThey can be largely east-west traffic flows (server to server inside the data center) in one data center and north-south(outside(from the outside of the data center to the server) in another, while others may combine both. Traffic patterns can be bursty in nature and contain many-to-one, many-to-many, orone-to- manyone-to-many flows. Each flow may also be small and latency sensitive or large and throughput sensitive while containing a mix of UDP and TCP traffic. All of these can coexist in a single cluster and flow through a single network device simultaneously. Benchmarkingoftests for network devices have long used [RFC1242], [RFC2432], [RFC2544],[RFC2889][RFC2889], and[RFC3918][RFC3918], which have largely been focused around various latency attributes andThroughputthroughput [RFC2889] of the Device Under Test (DUT) being benchmarked. These standards are good at measuring theoreticalThroughput,throughput, forwardingratesrates, and latency under testing conditions; however, they do not represent real traffic patterns that may affect these networking devices. Currently, typical data center networking devices are characterized by:-High- High port density (48 portsof more) -Highor more). - High speed(up(currently, up to 100 GB/scurrentlyperport) -Highport). - High throughput (line rate on all ports for Layer 2 and/or Layer3) -Low3). - Low latency (in the microsecond or nanosecondrange) -Lowrange). - Low amount of buffer (in the MB range per networkingdevice) -Layerdevice). - Layer 2 and Layer 3 forwarding capability (Layer 3 notmandatory)mandatory). This document provides a methodology for benchmarkingData Centerdata center physical network equipmentDUTDUTs, including congestion scenarios, switch buffer analysis, microburst,head of lineand head-of-line blocking, while also using a wide mix of traffic conditions.The terminology document [draft- ietf-bmwg-dcbench-terminology][RFC8238] is apre-requisite.prerequisite for this document, as it contains terminology that is considered normative. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described inRFC 2119 [RFC2119].BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 1.2. MethodologyformatFormat andrepeatability recommendationRepeatability Recommendation The following format is usedfor each sectionin Sections 2 through 6 of thisdocument is the following: -Objective -Methodology -Reportingdocument: - Objective - Methodology - Reporting Format For each test methodologydescribed,described in this document, it is criticalto obtainthat repeatabilityinof theresults.results be obtained. The recommendation is to perform enough iterations of the given test and to make sure that the result is consistent. This is especially importantfor sectionin the context of the tests described in Section 3, as the buffering testing hasbeenhistorically been the least reliable. The number of iterations SHOULD be explicitly reported. The relative standard deviation SHOULD be below 10%. 2.Line RateLine-Rate Testing2.12.1. ObjectiveProvideThe objective of this test is to provide amaximum rate"maximum rate" test for the performance values forThroughput, latencythroughput, latency, and jitter. It is meant to provide (1) the tests toperform,perform and (2) methodologyto verifyfor verifying that a DUT is capable of forwarding packets at line rate under non-congested conditions.2.22.2. Methodology A traffic generator SHOULD be connected to all ports on the DUT. Two tests MUST be conducted: (1) a port-pair test[RFC 2544/3918 section 15 compliant][RFC2544] [RFC3918] andalso in(2) afull mesh type of DUTtest[2889/3918 section 16 compliant].using a full-mesh DUT [RFC2889] [RFC3918]. For all tests, thetesttrafficgeneratorgenerator's sending rate MUST be less than or equal to 99.98% of the nominal value ofLine Ratethe line rate (with no furtherPPMParts Per Million (PPM) adjustment to account for interface clock tolerances), to ensure stressing of the DUT in reasonableworst caseworst-case conditions (seeRFC [draft-ietf-bmwg-dcbench-terminology] section[RFC8238], Section 5 for moredetails -- note to RFC Editor, please replace all [draft-ietf-bmwg-dcbench- terminology] references in this document with the future RFC number of that draft). Testsdetails). Test results at a lower rate MAY be provided for better understanding of performance increase in terms of latency and jitter when the rate is lower than 99.98%. The receiving rate of the traffic SHOULD be captured during this testin %as a percentage of line rate. The test MUST provide the statistics of minimum,averageaverage, and maximum of the latency distribution, for the exact same iteration of the test. The test MUST provide the statistics of minimum,averageaverage, and maximum of the jitter distribution, for the exact same iteration of the test.AlternativelyAlternatively, when a traffic generatorcan notcannot be connected to all ports on the DUT, a snake test MUST be used forline rateline-rate testing, excluding latency andjitterjitter, as thosebecame thenwould become irrelevant. The snake testconsists in the following method: -connectis performed as follows: - Connect the first and last port of the DUT to a trafficgenerator -connectgenerator. - Connect, back to backsequentiallyand sequentially, all the ports in between: port 2 to port 3, port 4 to5 etcport 5, etc., to portn-2N-2 to portn-1;N-1, wherenN is the total number of ports of theDUT -configureDUT. - Configure port 1 and port 2 in the samevlanVLAN X, port 3 and port 4 in the samevlanVLAN Y,etc.etc., and portn-1N-1 and portnN in the samevlanVLAN Z. This snake test providesathe capability to test line rate for Layer 2 and Layer 3RFC 2544/3918[RFC2544] [RFC3918] ininstanceinstances where a traffic generator with only two ports is available.The latencyLatency and jitter are not to be consideredwithfor this test.2.32.3. Reporting Format The report MUSTinclude: -physical layerinclude the following: - Physical-layer calibrationinformationinformation, as definedinto [draft-ietf- bmwg-dcbench-terminology] sectionin [RFC8238], Section 4.-number- Number of portsused -readingused. - Reading for"Throughput"throughput receivedinas a percentage of bandwidth", while sending 99.98% of the nominal value ofLine Ratethe line rate on each port, for each packet size from 64 bytes to 9216 bytes. As guidance,anwith a packet-size increment of 64byte packet sizebytes between each iteration being ideal,a 256 byte256-byte and512 bytes being512-byte packets are also often used. The most commonpackets sizes orderpacket-size ordering for the reportis: 64b,128b,256b,512b,1024b,1518b,4096,8000,9216b.is 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, 1518 bytes, 4096 bytes, 8000 bytes, and 9216 bytes. The pattern for testing can be expressed using[RFC 6985]. -Throughput[RFC6985]. - Throughput needs to be expressedin %as a percentage of total transmittedframes -For packet drops, theyframes. - Packet drops MUST be expressed as a count of packets and SHOULD be expressedin %as a percentage of linerate -Forrate. - For latency and jitter, values are expressed inunitunits of time[usually microsecond(usually microseconds ornanosecond]nanoseconds), reading across packetsizesizes from 64 bytes to 9216bytes -Forbytes. - For latency and jitter, provide minimum,averageaverage, and maximum values. If different iterations are done to gather the minimum,averageaverage, andmaximum, itmaximum values, this SHOULD be specified in thereportreport, along with a justificationonfor why the information could not have been gatheredatin the same testiteration -Foriteration. - For jitter, a histogram describing the population of packets measured per latency or latency buckets isRECOMMENDED -TheRECOMMENDED. - The tests forThroughput, latencythroughput, latency, and jitter MAY be conducted as individual independent trials, with proper documentation provided in thereportreport, but SHOULD be conducted at the same time.-The- The methodologymakes an assumptionassumes that the DUT has at least nine ports, as certain methodologies requirethat number of portsnine ormore.more ports. 3. Buffering Testing3.13.1. ObjectiveToThe objective of this test is to measure the size of the buffer of a DUT undertypical|many|multipletypical/many/multiple conditions. Buffer architectures between multiple DUTs can differ and include egress buffering, shared egress buffering SoC (Switch-on-Chip), ingressbufferingbuffering, or acombination.combination thereof. The test methodology covers the buffermeasurementmeasurement, regardless of buffer architecture used in the DUT.3.23.2. Methodology A traffic generator MUST be connected to all ports on the DUT. The methodology for measuring buffering for adata-centerdata center switch is based on using known congestion of known fixed packetsizesize, along with maximum latency value measurements. The maximum latency will increase until the first packet drop occurs. At this point, the maximum latency value will remain constant. This is the point of inflection of this maximum latency change to a constant value. There MUST be multiple ingress ports receiving a known amount of frames at a known fixed size, destined for the same egress port in order to create a known congestion condition. The total amount of packets sent from the oversubscribed port minus one, multiplied by the packetsizesize, represents the maximum port buffer size at the measured inflection point.1) MeasureNote that thehighest buffer efficiency Thetests described in procedures 1), 2), 3), and 4) in this section have iterations called "first iteration", "seconditeration" and,iteration", and "last iteration". The idea is to show the first two iterations so the reader understands the logiconof how to keep incrementing the iterations. The last iteration shows the end state of the variables. 1) Measure the highest buffer efficiency. o First iteration:ingressIngress port 1 sending 64-byte packets at line rate to egress port 2, while port 3 is sending a known low amount ofover-subscriptionoversubscription traffic (1% recommended) withathe same packet size of 64 bytes to egress port 2. Measure the buffer size value of the number of frames sent from the port sending the oversubscribed traffic up to the inflection point multiplied by the frame size. o Second iteration:ingressIngress port 1 sending 65-byte packets at line rate to egress port 2, while port 3 is sending a known low amount ofover-subscriptionoversubscription traffic (1% recommended) with the same packet size of 65 bytes to egress port 2. Measure the buffer size value of the number of frames sent from the port sending the oversubscribed traffic up to the inflection point multiplied by the frame size. o Last iteration:ingressIngress port 1 sending packets of size B bytes at line rate to egress port 2, while port 3 is sending a known low amount ofover-subscriptionoversubscription traffic (1% recommended) with the same packet size of B bytes to egress port 2. Measure the buffer size value of the number of frames sent from the port sending the oversubscribed traffic up to the inflection point multiplied by the frame size. When the B value is found to provide the largest buffer size, then size B allows the highest buffer efficiency. 2) Measure maximum port buffersize The tests described in this section have iterations called "first iteration", "second iteration" and, "last iteration". The idea is to show the first two iterations so the reader understands the logic on how to keep incrementing the iterations. The last iteration shows the end state of the variables. At fixed packetsize. At fixed packet size B as determined in procedure 1), for a fixed default Differentiated Services Code Point(DSCP)/Class(DSCP) / Class of Service(COS)(CoS) value of 0 and for unicasttraffictraffic, proceed with the following: o First iteration:ingressIngress port 1 sending line rate to egress port 2, while port 3 is sending a known low amount ofover-subscriptionoversubscription traffic (1% recommended) with the same packet size totheegress port 2. Measure the buffer size value by multiplying the number of extra frames sent by the frame size. o Second iteration:ingressIngress port 2 sending line rate to egress port 3, while port 4 is sending a known low amount ofover-subscriptionoversubscription traffic (1% recommended) with the same packet size totheegress port 3. Measure the buffer size value by multiplying the number of extra frames sent by the frame size. o Last iteration:ingressIngress port N-2 sending line ratetrafficto egress port N-1, while port N is sending a known low amount ofover- subscriptionoversubscription traffic (1% recommended) with the same packet size totheegress port N. Measure the buffer size value by multiplying the number of extra frames sent by the frame size. This test series MAY be repeated using all differentDSCP/COSDSCP/CoS values oftraffictraffic, and then usingMulticast type ofmulticast traffic, in order to find out if there is anyDSCP/COSDSCP/CoS impact on the buffer size. 3) Measure maximum port pair buffersizes The tests described in this section have iterations called "first iteration", "second iteration" and, "last iteration". The idea is to show the first two iterations so the reader understands the logic on how to keep incrementing the iterations. The last iteration shows the end state of the variables.sizes. o First iteration:ingressIngress port 1 sending line rate to egress port2;2, ingress port 3 sending line rate to egress port44, etc. Ingress port N-1 and port N willrespectively over subscribeoversubscribe, at 1% of lineraterate, egress port 2 and port3.3, respectively. Measure the buffer size value by multiplying the number of extra frames sent by the frame size for each egress port. o Second iteration:ingressIngress port 1 sending line rate to egress port2;2, ingress port 3 sending line rate to egress port44, etc. Ingress port N-1 and port N willrespectively over subscribeoversubscribe, at 1% of lineraterate, egress port 4 and port5.5, respectively. Measure the buffer size value by multiplying the number of extra frames sent by the frame size for each egress port. o Last iteration:ingressIngress port 1 sending line rate to egress port2;2, ingress port 3 sending line rate to egress port44, etc. Ingress port N-1 and port N willrespectively over subscribeoversubscribe, at 1% of lineraterate, egress port N-3 and portN-2.N-2, respectively. Measure the buffer size value by multiplying the number of extra frames sent by the frame size for each egress port. This test series MAY be repeated using all differentDSCP/COSDSCP/CoS values of traffic and then usingMulticast type ofmulticast traffic. 4) Measure maximum DUT buffer size withmany to one ports The tests described in this section have iterations called "first iteration", "second iteration" and, "last iteration". The idea is to show the first two iterations so the reader understands the logic on how to keep incrementing the iterations. The last iteration shows the end state of the variables.many-to-one ports. o First iteration:ingressIngress ports 1,2,... N-1sendingeach[(1/[N- 1])*99.98]+[1/[N-1]]sending [(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port tothe Negressport.port N. o Second iteration:ingressIngress ports 2,... Nsendingeach[(1/[N- 1])*99.98]+[1/[N-1]]sending [(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port tothe 1egressport.port 1. o Last iteration:ingressIngress ports N,1,2...N-2sendingeach[(1/[N- 1])*99.98]+[1/[N-1]]sending [(1/[N-1])*99.98]+[1/[N-1]] % of line rate per port tothe N-1egressport.port N-1. This test series MAY be repeated using all differentCOSCoS values of traffic and then usingMulticast type ofmulticast traffic. Unicasttraffictraffic, and thenMulticast trafficmulticast traffic, SHOULD be used in order to determine the proportion of buffer for the documented selection of tests.AlsoAlso, theCOSCoS value for the packets SHOULD be provided for each testiterationiteration, as the buffer allocation size MAY differ perCOSCoS value. It is RECOMMENDED that the ingress and egress portsarebe varied in arandom,random but documented fashion in multiple tests in order to measure the buffer size for each port of the DUT.3.33.3. ReportingformatFormat The report MUSTinclude:include the following: - The packet size used for the most efficient buffer used, along withDSCP/COS valuethe DSCP/CoS value. - The maximum port buffer size for eachportport. - The maximum DUT buffersizesize. - The packet size used in thetesttest. - The amount ofover-subscriptionoversubscription, if different than1%1%. - The number of ingress and egressportsports, along with their location on theDUTDUT. - The repeatability of the test needs to be indicated: the number of iterations of the same test and the percentage of variation between results for each of the tests (min, max,avg)avg). The percentage of variation is a metric providing a sense of how big the difference is between the measured value and the previousones.values. For example, for a latency test where the minimum latency is measured, the percentage of variation (PV) of the minimum latency will indicate by how much this value has varied between the current test executed and the previous one.PV=((x2-x1)/x1)*100PV = ((x2-x1)/x1)*100, where x2 is the minimum latency value in the current test and x1 is the minimum latency value obtained in the previous test. The same formula is used formaxmaximum andavgaverage variations measured.44. Microburst Testing4.14.1. ObjectiveToThe objective of this test is to find the maximum amount of packet bursts that a DUT can sustain under various configurations. This test provides additional methodologytothat supplements theother RFC tests: -Alltests described in [RFC1242], [RFC2432], [RFC2544], [RFC2889], and [RFC3918]. - All bursts should besendsent with 100% intensity. Note:intensity"Intensity" is defined in[draft-ietf-bmwg-dcbench-terminology] section 6.1.1 -All[RFC8238], Section 6.1.1. - All ports of the DUT must be used for thistest -All ports aretest. - It is recommendedtothat all ports betestes simultaneously 4.2tested simultaneously. 4.2. Methodology A traffic generator MUST be connected to all ports on the DUT. In order to cause congestion, two or more ingress ports MUST send bursts of packets destined for the same egress port. The simplest of the setups would be two ingress ports and one egress port(2-to-1).(2 to 1). The burst MUST be sent with an intensityof 100% (intensity is(as defined in[draft-ietf-bmwg-dcbench-terminology] section 6.1.1),[RFC8238], Section 6.1.1) of 100%, meaning that the burst of packets will be sent with a minimuminter-packetinterpacket gap. The amount ofpacketpackets contained in the burst will be trial variable and increase until there is a non-zero packet loss measured. The aggregate amount of packets from all the senders will be used to calculate the maximumamount ofmicroburst amount that the DUT can sustain. It is RECOMMENDED that the ingress and egress portsarebe varied in multiple tests in order to measure the maximum microburst capacity. The intensity of a microburst (see [RFC8238], Section 6.1.1) MAY be varied in order to obtain the microburst capacity at various ingress rates.Intensity of microburst is defined in [draft-ietf-bmwg-dcbench-terminology].It is RECOMMENDED that all ports on the DUTwillbe testedsimultaneouslysimultaneously, and in variousconfigurationsconfigurations, in order to understand all the combinations of ingress ports, egressportsports, and intensities. An example would be: o FirstIteration:iteration: N-1Ingressingress ports sending to1 Egress Portsone egress port. o SecondIterations:iteration: N-2Ingressingress ports sending to2 Egress Portstwo egress ports. o LastIterations: 2 Ingressiteration: Two ingress ports sending to N-2Egress Ports 4.3egress ports. 4.3. Reporting Format The report MUSTinclude:include the following: - The maximum number of packets received per ingress port with the maximum burst size obtained with zero packetlossloss. - The packet size used in thetesttest. - The number of ingress and egressportsports, along with their location on theDUTDUT. - The repeatability of the test needs to be indicated: the number of iterations of the same test and the percentage of variation between results (min, max,avg)avg). 5.Head of LineHead-of-Line Blocking5.15.1. Objective Head-of-line blocking (HOLB) is a performance-limiting phenomenon that occurs when packets areheld-upheld up by the first packet ahead waiting to be transmitted to a different output port. This is defined in RFC2889 section 5.5, Congestion Control.2889, Section 5.5 ("Congestion Control"). This section expands on RFC 2889 in the context ofData Center Benchmarking.data center benchmarking. The objective of this test is to understand theDUTDUT's behaviorunder head of line blockingin the HOLB scenario and measure the packet loss.Here are theThe differences between this HOLB test and RFC2889: -This2889 are as follows: - This HOLB test starts with8eight ports in two groups of4,four ports each, instead of4four ports (as compared with Section 5.5 of RFC2889 -This2889). - This HOLB test shifts all the port numbers by one in a second iteration of thetest,test; this isnewnew, as compared to the HOLB test described in RFC 2889. The shifting port numbers continue until all ports are the first in thegroup. Thegroup; the purpose of this is to make sureto have testedthat all permutations are tested in order to cover differencesofin behavior in the SoC of theDUT -AnotherDUT. - Another testinwithin this HOLB test expands the group of ports, such that traffic is divided among4four ports instead of two (25% instead of 50% perport) -Sectionport). - Section 5.3adds additional reportinglists requirementsfrom Congestion Controlthat supplement the requirements listed in RFC2889 5.22889, Section 5.5. 5.2. Methodology In order to cause congestion in the form ofhead of line blocking,HOLB, groups of four ports are used. A group has2two ingress ports and2two egress ports. The first ingress port MUST have two flowsconfiguredconfigured, each going to a different egress port. The second ingress port will congest the second egress port by sending line rate. The goal is to measure if there is loss on the flow for the first egressportport, which is notover-subscribed.oversubscribed. A traffic generator MUST be connected to at least eight ports on the DUT and SHOULD be connected using all the DUT ports.1) Measure two groups with eight DUT ports TheNote that the tests described in procedures 1) and 2) in this section have iterations called "first iteration", "seconditeration" and,iteration", and "last iteration". The idea is to show the first two iterations so the reader understands the logiconof how to keep incrementing the iterations. The last iteration shows the end state of the variables. 1) Measure two groups with eight DUT ports. o First iteration:measureMeasure the packet loss for two groups with consecutiveportsports. The composition of the first group iscomposed of: ingressas follows: Ingress port 1issending 50% of traffic to egress port 3 and ingress port 1issending 50% of traffic to egress port 4. Ingress port 2issending line rate to egress port 4. Measure the amount of traffic loss for the traffic from ingress port 1 to egress port 3. The composition of the second group iscomposed of: ingressas follows: Ingress port 5issending 50% of traffic to egress port 7 and ingress port 5issending 50% of traffic to egress port 8. Ingress port 6issending line rate to egress port 8. Measure the amount of traffic loss for the traffic from ingress port 5 to egress port 7. o Second iteration:repeatRepeat the first iteration by shifting all the ports from N to N+1. The composition of the first group iscomposed of: ingressas follows: Ingress port 2issending 50% of traffic to egress port 4 and ingress port 2issending 50% of traffic to egress port 5. Ingress port 3issending line rate to egress port 5. Measure the amount of traffic loss for the traffic from ingress port 2 to egress port 4. The composition of the second group iscomposed of: ingressas follows: Ingress port 6issending 50% of traffic to egress port 8 and ingress port 6issending 50% of traffic to egress port 9. Ingress port 7issending line rate to egress port 9. Measure the amount of traffic loss for the traffic from ingress port 6 to egress port 8. o Last iteration:whenWhen the first port of the first group is connectedonto the last DUT port and the last port of the second group is connected to the seventh port of the DUT. Measure the amount of traffic loss for the traffic from ingress port N to egress port 2 and from ingress port 4 to egress port 6. 2) Measure with N/4 groups with N DUTports The tests described in this section have iterations called "first iteration", "second iteration" and, "last iteration". The idea is to show the first two iterations so the reader understands the logic on how to keep incrementing the iterations. The last iteration shows the end state of the variables.ports. The traffic from the ingress port is split across4four egress ports(100/4=25%).(100/4 = 25%). o First iteration: Expand to fully utilize all the DUT ports in increments of four. Repeat the methodology of procedure 1) with all thegroupgroups of ports possible to achieve on thedevicedevice, and measurefor each port groupthe amount of trafficloss.loss for each port group. o Second iteration: Shift by +1 the start of each consecutiveportsport ofgroupsthe port groups. o Last iteration: Shift by N-1 the start of each consecutiveportsport ofgroupsthe port groups, and measure the amount of traffic loss for each port group.5.35.3. Reporting Format For eachtesttest, the report MUSTinclude:include the following: - The portconfigurationconfiguration, including the number and location of ingress and egress ports located on theDUTDUT. - If HOLB was observed in accordance with the HOLB test described insection 5Section 5. - Percent of trafficlossloss. - The repeatability of the test needs to be indicated: the number ofiterationiterations of the same test and the percentage of variation between results (min, max,avg)avg). 6. Incast Stateful and Stateless Traffic6.16.1. Objective The objective of this test is to measure the values for TCP Goodput[1][TCP-INCAST] and latency with a mix of large and small flows. The test is designed to simulate a mixed environment of stateful flows that require high rates of goodput and stateless flows that require low latency. Stateful flows are created by generating TCPtraffic and,traffic, and stateless flows are created using UDPtype oftraffic.6.26.2. Methodology In order to simulate the effects of stateless and stateful traffic on the DUT, there MUST be multiple ingress ports receiving traffic destined for the same egress port. There also MAY be a mix of stateful and stateless traffic arriving on a single ingress port. The simplest setup would be2two ingress ports receiving traffic destined to the same egress port. One ingress port MUSTbe maintainingmaintain a TCP connectiontroughthrough the ingress port to a receiver connected to an egress port. Traffic in the TCP stream MUST be sent at the maximum rate allowed by the traffic generator. At the same time, the TCP traffic is flowing through theDUTDUT, and the stateless traffic is sent destined to a receiver on the same egress port. The stateless traffic MUST be a microburst of 100% intensity. It is RECOMMENDED that the ingress and egress portsarebe varied in multiple tests in order to measure the maximum microburst capacity. The intensity of a microburst MAY be varied in order to obtain the microburst capacity at various ingress rates. It is RECOMMENDED that all ports on the DUT be used in the test. The tests describedbellowbelow have iterations called "first iteration", "seconditeration" and,iteration", and "last iteration". The idea is to show the first two iterations so the reader understands the logiconof how to keep incrementing the iterations. The last iteration shows the end state of the variables. For example: StatefulTraffictraffic port variation (TCP traffic): TCP traffic needs to be generatedinfor thissection.test. DuringIterationsthe iterations, the number ofEgressegress ports MAY vary as well. o FirstIteration: 1 Ingressiteration: One ingress port receiving stateful TCP traffic and1 Ingressone ingress port receiving stateless traffic destined to1 Egress Portone egress port. o SecondIteration: 2 Ingress portiteration: Two ingress ports receiving stateful TCP traffic and1 Ingressone ingress port receiving stateless traffic destined to1 Egress Portone egress port. o LastIteration:iteration: N-2Ingress portingress ports receiving stateful TCP traffic and1 Ingressone ingress port receiving stateless traffic destined to1 Egress Portone egress port. StatelessTraffictraffic port variation (UDP traffic): UDP traffic needs to be generated for this test. DuringIterations,the iterations, the number ofEgressegress ports MAY vary as well. o FirstIteration: 1 Ingressiteration: One ingress port receiving stateful TCP traffic and1 Ingressone ingress port receiving stateless traffic destined to1 Egress Portone egress port. o SecondIteration: 1 Ingressiteration: One ingress port receiving stateful TCP traffic and2 Ingress porttwo ingress ports receiving stateless traffic destined to1 Egress Portone egress port. o LastIteration: 1 Ingressiteration: One ingress port receiving stateful TCP traffic and N-2Ingress portingress ports receiving stateless traffic destined to1 Egress Port 6.3one egress port. 6.3. Reporting Format The report MUST include the following: - Number of ingress and egressportsports, along with designation of stateful or stateless flow assignment. - Stateful flowgoodputgoodput. - Stateless flowlatencylatency. - The repeatability of the test needs to be indicated: the number of iterations of the same test and the percentage of variation between results (min, max,avg)avg). 7. Security Considerations Benchmarking activities as described in this memo are limited to technology characterization using controlled stimuli in a laboratory environment, with dedicated address space and the constraints specified in the sections above. The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a productionnetwork,network or misroute traffic to the test management network. Further, benchmarking is performed on a "black-box" basis, relying solely on measurements observable external to the DUT. Special capabilities SHOULD NOT exist in the DUT specifically for benchmarking purposes. Any implications for network security arising from the DUT SHOULD be identical in the lab and in production networks. 8. IANA ConsiderationsNOThis document does not require any IANAAction is requested at this time.actions. 9. References 9.1. Normative References [RFC1242] Bradner,S.S., "Benchmarking Terminology for Network Interconnection Devices",BCP 14,RFC 1242, DOI 10.17487/RFC1242, July 1991,<http://www.rfc- editor.org/info/rfc1242><https://www.rfc-editor.org/info/rfc1242>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices",BCP 14,RFC 2544, DOI 10.17487/RFC2544, March 1999,<http://www.rfc- editor.org/info/rfc2544> 9.2. Informative References [draft-ietf-bmwg-dcbench-terminology] Avramov<https://www.rfc-editor.org/info/rfc2544>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. [RFC8238] Avramov, L. andRapp J.,J. Rapp, "Data Center Benchmarking Terminology",April 2017, RFC "draft-ietf- bmwg-dcbench-terminology", Date [to be fixed when theRFCis published and 1 to be replaced by the8238, DOI 10.17487/RFC8238, August 2017, <https://www.rfc-editor.org/info/rfc8238>. 9.2. Informative References [RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking", RFCnumber2432, DOI 10.17487/RFC2432, October 1998, <https://www.rfc-editor.org/info/rfc2432>. [RFC2889]MandevilleMandeville, R. andPerser J.,J. Perser, "Benchmarking Methodology for LAN Switching Devices", RFC 2889, DOI 10.17487/RFC2889, August 2000,<http://www.rfc- editor.org/info/rfc2889><https://www.rfc-editor.org/info/rfc2889>. [RFC3918]StoppStopp, D. andHickman B.,B. Hickman, "Methodology for IP Multicast Benchmarking", RFC 3918, DOI 10.17487/RFC3918, October 2004,<http://www.rfc- editor.org/info/rfc3918> [RFC 6985] A.<https://www.rfc-editor.org/info/rfc3918>. [RFC6985] Morton, A., "IMIX Genome: Specification of Variable Packet Sizes for Additional Testing", RFC 6985, DOI 10.17487/RFC6985, July 2013,<http://www.rfc-editor.org/info/rfc6985> [1] Yanpei<https://www.rfc-editor.org/info/rfc6985>. [TCP-INCAST] Chen,ReanY., Griffith,Junda Liu, Randy H. Katz, Anthony D.R., Zats, D., Joseph, A., and R. Katz, "Understanding TCP IncastThroughput Collapse in Datacenter Networks, "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119> [RFC2432] Dubray, K., "Terminologyand Its Implications forIP Multicast Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 1998, <http://www.rfc-editor.org/info/rfc2432> 9.2. AcknowledgementsBig Data Workloads", April 2012, <http://yanpeichen.com/ professional/usenixLoginIncastReady.pdf>. Acknowledgments The authors would like to thankAlfredAl Morton and Scott Bradner for their reviews and feedback. Authors' Addresses Lucien Avramov Google 1600 Amphitheatre Parkway Mountain View, CA 94043 United StatesPhone: +1 408 774 9077of America Email: lucien.avramov@gmail.com Jacob Rapp VMware 3401 HillviewAveAve. Palo Alto, CA 94304 United StatesPhone: +1 650 857 3367of America Email:jrapp@vmware.comjhrapp@gmail.com