Internet Engineering Task Force (IETF) L. AvramovINTERNET-DRAFT, Intended status: InformationalRequest for Comments: 8238 GoogleExpires: December 24,2017Category: Informational J. RappJune 22, 2017ISSN: 2070-1721 VMware August 2017 Data Center Benchmarking Terminologydraft-ietf-bmwg-dcbench-terminology-19Abstract Thepurposepurposes of this informational documentisare to establish definitions and describe measurement techniques for data center benchmarking, as well asit isto introduce newterminologiesterminology applicable to performance evaluations of data center network equipment. This document establishes the important concepts for benchmarking network switches and routers in the data centerand,and is apre-requisite toprerequisite for the test methodologypublication [draft-ietf-bmwg-dcbench-methodology].document (RFC 8239). Many of these terms and methods may be applicable to network equipment beyondthis publication'sthe scope of this document as the technologies originally applied in the data center are deployed elsewhere. Status ofthisThis Memo ThisInternet-Draftdocument issubmitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documentsnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF).Note that other groups may also distribute working documents as Internet-Drafts. The listIt represents the consensus ofcurrent Internet-Drafts is at http://datatracker.ietf.org/drafts/current. Internet-Drafts are draft documents validthe IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are amaximumcandidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status ofsix monthsthis document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."https://www.rfc-editor.org/info/rfc8238. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents(http://trustee.ietf.org/license-info)(https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 3....................................................4 1.1. Requirements Language. . . . . . . . . . . . . . . . . . 4......................................5 1.2. Definitionformat . . . . . . . . . . . . . . . . . . . . . 4Format ..........................................5 2. Latency. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.........................................................5 2.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.................................................5 2.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.................................................7 2.3. Measurement Units. . . . . . . . . . . . . . . . . . . . . 6 3..........................................7 3. Jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1..........................................................8 3.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.................................................8 3.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.................................................8 3.3. Measurement Units. . . . . . . . . . . . . . . . . . . . . 7 4..........................................8 4. Calibration of the Physical LayerCalibration . . . . . . . . . . . . . . . . . . . 7 4.1...............................9 4.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2.................................................9 4.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3.................................................9 4.3. Measurement Units. . . . . . . . . . . . . . . . . . . . . 8 5..........................................9 5. Linerate . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1Rate ......................................................10 5.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . 8 5.2................................................10 5.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . . . 9 5.3................................................10 5.3. Measurement Units. . . . . . . . . . . . . . . . . . . . . 10 6.........................................11 6. Buffering. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1......................................................12 6.1. Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.1....................................................12 6.1.1. Definition. . . . . . . . . . . . . . . . . . . . . . . 11 6.1.2.........................................12 6.1.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . 12 6.1.3.........................................14 6.1.3. Measurement Units. . . . . . . . . . . . . . . . . . . 12 6.2..................................14 6.2. Incast. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.1....................................................15 6.2.1. Definition. . . . . . . . . . . . . . . . . . . . . . . 13 6.2.2.........................................15 6.2.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . 14 6.2.3.........................................15 6.2.3. Measurement Units. . . . . . . . . . . . . . . . . . . 14 7..................................16 7. Application Throughput: Data Center Goodput. . . . . . . . . . 14....................16 7.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . 14................................................16 7.2. Discussion. . . . . . . . . . . . . . . . . . . . . . . . 14................................................16 7.3. Measurement Units. . . . . . . . . . . . . . . . . . . . . 15.........................................16 8. Security Considerations. . . . . . . . . . . . . . . . . . . 16........................................17 9. IANA Considerations. . . . . . . . . . . . . . . . . . . . . 16............................................18 10. References. . . . . . . . . . . . . . . . . . . . . . . . . 16....................................................18 10.1. Normative References. . . . . . . . . . . . . . . . . . 16.....................................18 10.2. Informative References. . . . . . . . . . . . . . . . . 17 10.3....................................19 Acknowledgments. . . . . . . . . . . . . . . . . . . . . 17...................................................20 Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 17................................................20 1. Introduction Traffic patterns in the data center are not uniform and are constantly changing. They are dictated by the nature and variety of applications utilized in the data center.ItThey can be largely east-west traffic flows (server to server inside the data center) in one data center and north-south(outside(from the outside of the data center to the server) in another, while some may combine both. Traffic patterns can be bursty in nature and contain many-to-one, many-to-many, or one-to-many flows. Each flow may also be small and latency sensitive or large and throughput sensitive while containing a mix of UDP and TCP traffic.One or moreAll of these may coexist in a single cluster and flow through a single network device simultaneously. Benchmarkingoftests for network devices have long used [RFC1242], [RFC2432], [RFC2544],[RFC2889][RFC2889], and [RFC3918]. These benchmarks have largely been focused around various latency attributes and max throughput of the Device Under Test (DUT) being benchmarked. These standards are good at measuring theoretical max throughput, forwardingratesrates, and latency under testing conditions, but they do not represent real traffic patterns that may affect these networking devices. The data center networking devices covered are switches and routers. Currently, typical data center networking devices are characterized by:-High- High port density (48 portsof more) -Highor more). - High speed(up(currently, up to 100 GB/scurrentlyperport) -Highport). - High throughput (line rate on all ports for Layer 2 and/or Layer3) -Low3). - Low latency (in the microsecond or nanosecondrange) -Lowrange). - Low amount of buffer (in the MB range per networkingdevice) -Layerdevice). - Layer 2 and Layer 3 forwarding capability (Layer 3 notmandatory) The followingmandatory). This document defines a set of definitions,metricsmetrics, andterminologiesnew terminology, including congestionscenarios,scenarios and switch bufferanalysisanalysis, and redefines basic definitions in order to represent a wide mix of traffic conditions. The test methodologies are defined in[draft- ietf-bmwg-dcbench-methodology].[RFC8239]. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described inRFC 2119 [RFC2119].BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 1.2. DefinitionformatFormat - Term to bedefined.defined (e.g.,Latency)"latency"). - Definition: The specific definition for the term. - Discussion: A brief discussion about the term, itsapplicationapplication, and any restrictions on measurement procedures. - Measurement Units: Methodology forthe measuremeasurements and units used to report measurements ofthis term,the term in question, if applicable. 2. Latency 2.1. Definition Latency isathe amount of time it takes a frame to transit theDevice Under Test (DUT).DUT. Latency is measured in units of time (seconds, milliseconds,microsecondsmicroseconds, and so on). The purpose of measuring latency is to understand the impact of adding a device in the communication path. TheLatencylatency interval can be assessed between different combinations of events, regardless of the type of switching device (bitforwardingforwarding, akacut-through,cut-through; or a store-and-forwardtype ofdevice). [RFC1242] definedLatencylatency differently for each of these types of devices.TraditionallyTraditionally, the latency measurement definitions are: - FILO (First In LastOut)Out): The time interval starting when the end of the first bit of the input frame reaches the input port and ending when the last bit of the output frame is seen on the output port. - FIFO (First In First Out): The time interval starting when the end of the first bit of the input frame reaches the input port and ending when the start of the first bit of the output frame is seen on the output port.[RFC1242]Latency (as defined in [RFC1242]) forbit forwardingbit-forwarding devices uses these events. - LILO (Last In Last Out): The time interval starting when the last bit of the input frame reaches the input port and the last bit of the output frame is seen on the output port. - LIFO (Last In First Out): The time interval starting when the last bit of the input frame reaches the input port and ending when the first bit of the output frame is seen on the output port.[RFC1242]Latency (as defined in [RFC1242]) forbit forwardingstore-and-forward devices uses these events. Anotherpossibilitypossible way to summarize the fourdifferentdefinitions above is to refer to the bitpositionpositions as they normally occur:Inputinput to output. - FILO is FL (First bit Last bit). - FIFO is FF (First bit First bit). - LILO is LL (Last bit Last bit). - LIFO is LF (Last bit First bit). Thisdefinitiondefinition, as explained in this section in the context of data centerswitching benchmarkingswitch benchmarking, is in lieu of the previous definition ofLatency defined"latency" as provided in RFC 1242,sectionSection 3.8 andisquoted here: For store and forward devices: The time interval starting when the last bit of the input frame reaches the input port and ending when the first bit of the output frame is seen on the output port. For bit forwarding devices: The time interval starting when the end of the first bit of the input frame reaches the input port and ending when the start of the first bit of the output frame is seen on the output port. To accommodate both types of network devices and hybrids of the two types that have emerged, switchLatencylatency measurements made according to this document MUST be measured with the FILO events. FILO will include the latency of the switch and the latency of the frame as well as the serialization delay. It is a picture of the'whole'"whole" latency going through the DUT. For applicationswhichthat are latency sensitive and can function with initial bytes of the frame, FIFO(or RFC 1242 Latency(or, forbit forwarding devices)bit-forwarding devices, latency per RFC 1242) MAY be used. In all cases, the eventcombinationcombinations used inLatency measurementlatency measurements MUST be reported.2.22.2. Discussion As mentioned insectionSection 2.1, FILO is the most important measuring definition. Not all DUTs are exclusively cut-through or store-and-forward. DataCentercenter DUTs are frequently store-and-forward for smaller packet sizes and thenadopting a cut-through behavior. Thechangeofto cut-through behaviorhappensat specific larger packet sizes. The value of the packet sizeforat which the behaviorto changechanges MAY beconfigurableconfigurable, depending on the DUT manufacturer. FILO coversallboth scenarios:Store-and-forward or cut- through.store-and-forward and cut-through. The thresholdof behaviorfor the change in behavior does not matter forbenchmarkingbenchmarking, since FILO covers both possible scenarios. The LIFO mechanism can be used withstore forward type ofstore-and-forward switches but not with cut-throughtype ofswitches, as it will provide negative latency values for larger packet sizes because LIFO removes the serialization delay. Therefore, this mechanism MUST NOT be used when comparing the latencies of two different DUTs.2.32.3. Measurement Units The measuring methods to use for benchmarking purposes are as follows: 1) FILO MUST be used as a measuring method, as this will include the latency of the packet;and todaytoday, the application commonly needs to read the whole packet to process the information and take an action. 2) FIFO MAY be used for certain applications able toproceedprocess the data as the first bitsarrive, as for examplearrive -- for example, with a Field-Programmable Gate Array(FPGA)(FPGA). 3) LIFO MUST NOT beused, becauseused because, unlike all the other methods, it subtracts the latency of thepacket; unlike all the other methods. 3packet. 3. Jitter3.13.1. DefinitionJitter inIn thedata centercontext of the data center, jitter is synonymous with the common termDelay variation."delay variation". It is derived from multiple measurements of one-way delay, as described in RFC 3393. The mandatory definition ofDelay Variation"delay variation" is the Packet Delay Variation (PDV)from sectionas defined in Section 4.2 of [RFC5481]. When considering a stream of packets, the delays of all packets are subtracted from the minimum delay over all packets in the stream. This facilitates the assessment of the range of delay variation (Max -Min),Min) or a high percentile of PDV (99th percentile, for robustness against outliers). When First-bit to Last-bit timestamps are used forDelaydelay measurement, thenDelay Variationdelay variation MUST be measured using packets or frames of the same size, since the definition of latency includes the serialization time for each packet.OtherwiseOtherwise, if using First-bit to First-bit, the size restriction does not apply.3.23.2. Discussion In addition to a PDVRangerange and/or a high percentile of PDV,Inter- PacketInter-Packet Delay Variation (IPDV) as defined insectionSection 4.1 of [RFC5481] (differences between two consecutive packets) MAY be used for the purpose of determining how packet spacing has changed duringtransfer,transfer -- for example, to see if a packet stream has becomeclosely-closely spaced or "bursty". However, theAbsolute Valueabsolute value of IPDV SHOULD NOT be used, as thiscollapses"collapses" the "bursty" and "dispersed" sides of the IPDV distribution together.3.33.3. Measurement Units The measurement of delay variation is expressed in units of seconds. A PDV histogram MAY be provided for the population of packets measured.44. Calibration of the Physical LayerCalibration 4.14.1. DefinitionThe calibrationCalibration of the physical layer consists of defining and measuring the latency of the physical devices used to perform tests on the DUT. It includes the list of allphysical layerphysical-layer componentsusedused, aslisted here after: -Typespecified here: - Type of device used to generate traffic / measuretraffic -Typetraffic. - Type of line cards used on the trafficgenerator -Typegenerator. - Type of transceivers on the trafficgenerator -Typegenerator. - Type of transceivers onDUT -Typethe DUT. - Type ofcables -Lengthcables. - Length ofcables -Software name,cables. - Software name and version of the traffic generator andDUT -ListDUT. - A list of enabled features on the DUT MAY be provided and is recommended (especially in thecontrol plane protocolscase of control-plane protocols, such as the Link Layer DiscoveryProtocol, Spanning-Tree etc.).Protocol and Spanning Tree). A comprehensive configuration file MAY be provided to this effect.4.24.2. DiscussionPhysical layer calibration is partCalibration of theendphysical layer contributes toend latency, whichend-to-end latency and should be taken intoacknowledgment whileaccount when evaluating the DUT. Small variationsofin the physical components of the test may impact the latency beingmeasured, thereforemeasured; therefore, they MUST be described when presenting results.4.34.3. Measurement Units It is RECOMMENDEDto usethat all cablesof: The same type,used for testing (1) be of the samelength, when possible usingtype and length and (2) come from the samevendor.vendor whenever possible. It is a MUST to document thecablescable specificationson section 4.1listed in Section 4.1, along with the test results. The test report MUST specifyifwhether or not the cable latency has beenremovedsubtracted from the testmeasures or not.measurements. The accuracy of thetraffic generator measuretraffic-generator measurements MUST be provided(this(for current test equipment, this is usually a valuein the 20nswithin a rangefor current test equipment). 5of 20 ns). 5. Linerate 5.1Rate 5.1. Definition The transmit timing, or maximum transmitted dataraterate, is controlled by the "transmit clock" in the DUT. The receive timing (maximum ingress data rate) is derived from the transmit clock of the connected interface. The line rate orphysical layerphysical-layer frame rate is the maximum capacity to send frames of a specific size at the transmit clock frequency of the DUT. The term "nominal value ofLine Rate"line rate" defines the maximum speed capability for the givenport;port -- for example1GE, 10GE, 40GE, 100GE etc.(expressed as Gigabit Ethernet), 1 GE, 10 GE, 40 GE, 100 GE. The frequency ("clock rate") of the transmit clock in any two connected interfaces will never be precisely the same; therefore, a tolerance is needed. This will be expressed by a Parts Per Million (PPM) value. The IEEE standards allow a specific +/- variance in the transmit clock rate, and Ethernet is designed to allow for small, normal variations between the two clock rates. This results in a tolerance of theline rateline-rate value when traffic is generated froma testingtest equipment to a DUT. Line rate SHOULD be measured in frames persecond. 5.2second (FPS). 5.2. Discussion For a transmit clock source, most Ethernet switches use "clock modules" (also called "oscillator modules") that are sealed, internally temperature-compensated, and very accurate. The output frequency of these modules is not adjustable because it is not necessary. Many test sets, however, offer a software-controlled adjustment of the transmit clock rate. These adjustments SHOULD be used tocompensate"compensate" the test equipment in order to not send more than the line rate of the DUT. To allow for the minor variations typically found in the clock rate ofcommercially-availablecommercially available clock modules and other crystal-based oscillators, Ethernet standards specify the maximum transmitclock rateclock-rate variation to be not more than +/- 100 PPM(parts per million)from a calculated center frequency.ThereforeTherefore, a DUT must be able to accept frames at a rate within +/- 100 PPM to comply with the standards. Very few clock circuits are precisely +/- 0.0 PPM because:1.The1. The Ethernet standards allow a maximum variance of +/- 100 PPM(parts per million) varianceover time.ThereforeTherefore, it is normal for the frequency of the oscillator circuits to experience variation over time and over a wide temperature range, among other external factors.2.The2. The crystals, or clock modules, usually have a specific +/- PPM variance that is significantly better than +/- 100 PPM.Often timesOftentimes, this is +/- 30 PPM or better in order to be considered a "certification instrument". When testing an Ethernet switch throughput at "line rate", any specific switch will have aclock rateclock-rate variance. If a test set is running +1 PPM faster than a switch undertest,test and a sustainedline rateline-rate test is performed, a gradual increase in latencyand eventuallyand, eventually, packet drops as buffers fill and overflow in theswitchswitch, can be observed. Depending on how much clock variance there is between the two connected systems, the effect may be seen after the traffic stream has been running for a few hundred microseconds, a few milliseconds, or seconds. The same lowlatencylatency, andno-packet-lossno packet loss, can be demonstrated by setting the testsetset's link occupancy to slightly less than 100 percent link occupancy.TypicallyTypically, 99 percent link occupancy produces excellentlow-latency andlow latency and no packet loss. No Ethernet switch or router will have a transmit clock rate of exactly +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is precisely +/- 0.0 PPM.Test setTest-set equipment manufacturers arewell-awarewell aware of thestandards,standards and allow a software-controlled +/- 100 PPM "offset" (clock-rate adjustment) to compensate for normal variations in the clock speed of DUTs. This offset adjustment allows engineers to determine the approximate speed at which the connected device isoperating,operating and verify that it is within parameters allowed by standards.5.35.3. Measurement Units "LineRate"rate" can be measured in terms of"Frame Rate":"frame rate": Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap + Preamble + Start-Frame Delimiter) Minimum_Gap represents theinter frameinterframe gap. This formula "scales up" or "scales down" to represent 1 GB Ethernet,or10 GBEthernetEthernet, and so on. Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = 1,000,000,000/(64*8/ (64*8 + 96 + 56 + 8)Frame Rate= 1,000,000,000 / 672Frame Rate= 1,488,095.2frames per second.FPS Considering the allowance of +/- 100 PPM, a switch may "legally" transmit traffic at a frame rate between 1,487,946.4 FPS and 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to a1.488 frame-per-second frame rateframe-rate increase ordecrease.decrease of 1.488 FPS. In a production network, it is very unlikelytothat one would see precise line rate over a very brief period. There is no observable difference between dropping packets at 99% of line rate and 100% of line rate. Line rate can be measured at 100% of line rate with a-100PPM-100 PPM adjustment. Line rate SHOULD be measured at99,98%99.98% with a 0 PPM adjustment. The PPM adjustment SHOULD only be used for aline rate type ofline-rate measurement.66. Buffering6.16.1. Buffer6.1.16.1.1. Definition Buffer Size: The termbuffer size"buffer size" represents the total amount offrame bufferingframe-buffering memory available on a DUT. This size is expressed in B(byte);(bytes), KB(kilobyte),(kilobytes), MB(megabyte)(megabytes), or GB(gigabyte). When the buffer size is expressed it SHOULD be defined by a size metric stated above.(gigabytes). When the buffer size is expressed, an indication of the frame MTU (Maximum Transmission Unit) used for that measurement is alsonecessarynecessary, as well as thecos (classCoS (Class ofservice)Service) ordscp (differentiated services code point)DSCP (Differentiated Services Code Point) valueset;set, asoften timesoftentimes the buffers are carved byquality of servicea quality-of-service implementation. Please refer tothe buffer efficiency sectionSection 3 of [RFC8239] for further details. Example: The Buffer Size of the DUT when sending1518 byte1518-byte frames is 18 MB. Port Buffer Size: The port buffer size is the amount of buffer for a single ingress port, a single egressportport, or a combination of ingress and egress bufferinglocationlocations for a single port.The reason for mentioningWe mention the three locations for the port bufferisbecause theDUTDUT's buffering scheme can be unknown or untested,andso knowing the buffer location helps clarify the buffer architectureand consequentlyand, consequently, the total buffer size. The Port Buffer Size is an informational value that MAY be providedfromby the DUT vendor. It is not a value that is tested by benchmarking. Benchmarking will be done using the Maximum Port Buffer Size or Maximum Buffer Size methodology. Maximum Port Buffer Size: In most cases, this is the same as the Port Buffer Size. In a certain type of switch architecture calledSoC"SoC" (switch on chip), there is a port buffer and a shared buffer pool available for all ports. The Maximum Port BufferSize ,Size, in terms of an SoC buffer, represents the sum of the port buffer and the maximum value of shared buffer allowed for this port, defined in terms of B(byte),(bytes), KB(kilobyte),(kilobytes), MB(megabyte),(megabytes), or GB(gigabyte).(gigabytes). The Maximum Port Buffer Size needs to be expressed along with the frame MTU used for the measurement and thecosCoS ordscpDSCP bit value set for the test. Example: A DUT has been measured to have3KB3 KB of port buffer for1518 frame size packets1518-byte frames, and a total of 4.7 MB of maximum port buffer for1518 frame size packets1518-byte frames and acosCoS of 0. Maximum DUT Buffer Size: This is the total buffer sizeof Bufferthat a DUT can be measured to have. Itis,is mostlikely,likely different thanthanthe Maximum Port Buffer Size. It can also be different from the sum of Maximum Port Buffer Size. The Maximum Buffer Size needs to be expressed along with the frame MTU used for the measurement and along with thecosCoS ordscpDSCP value set during the test. Example: A DUT has been measured to have3KB3 KB of port buffer for1518 frame size packets1518-byte frames and a total of 4.7 MB of maximum port buffer for1518 B frame size packets.1518-byte frames. The DUT has a Maximum Buffer Size of 18 MB at 1500 B and acosCoS of 0. Burst:TheA burst is a fixed number of packets sent over a percentage oflinerate ofline rate for a defined port speed. The amount of frames sentareis evenly distributed across theinterval,interval T. A constant, C, can be defined to provide the average time between twoconsecutive packetsevenlyspaced.spaced consecutive packets. Microburst:It is a burst.A microburst iswhena type of burst where packet drops occur when there is not sustained or noticeable congestionuponon a link or device.A characterizationOne characteristic of a microburst is when theBurstburst is not evenly distributed overT,T and is less than the constant C[C=(C = the average time between twoconsecutive packetsevenly spacedout].consecutive packets). Intensity of Microburst: This is apercentage, representingpercentage and represents thelevel of microburstlevel, between 1 and100%.100%, of the microburst. The higher thenumbernumber, the higher the microburst is. I=[1-[(TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1)(Tp2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / Sum(packets)]]*100 The above definitions are not meant to comment on the ideal sizing of abuffer,buffer but rather on how to measure it. A larger buffer is not necessarily better and can cause issues withbuffer bloat. 6.1.2bufferbloat. 6.1.2. Discussion When measuring buffering on a DUT, it is important to understand the behaviorforof each andall ports.every port. This provides data for the total amount of buffering available on the switch. The terms of buffer efficiencyhere helpshelp one understand the optimum packet size for thebuffer,buffer or the real volume of the buffer available for a specific packet size. This section does not discuss how to conduct the test methodology; instead, it explains the buffer definitions and what metrics should be provided foracomprehensive data centerdevice bufferingdevice-buffering benchmarking.6.1.36.1.3. Measurement Units WhenBufferthe DUT buffer is measured:-The- The buffer size MUST bemeasured -Themeasured. - The port buffer size MAY be provided for eachport -Theport. - The maximum port buffer size MUST bemeasured -Themeasured. - The maximum DUT buffer size MUST bemeasured -Themeasured. - The intensity of the microburst MAY be mentioned when a microburst test isperformed -The cosperformed. - The CoS ordscpDSCP value set during the test SHOULD beprovided 6.2provided. 6.2. Incast6.2.16.2.1. Definition The termIncast,"Incast", very commonly utilized in the data center, refers to thetraffic pattern ofmany-to-one or many-to-many traffic patterns.ItAs defined in this section, it measures the number of ingress and egress ports and thelevelpercentage of synchronizationattributed, as defined in this section. Typicallyattributed to them. Typically, in the datacentercenter, it would refer to many different ingress server ports (many), sending traffic to a common uplink (many-to-one), or multiple uplinks (many-to-many). This pattern is generalized for any network as many incoming ports sending traffic to one or a few uplinks. Synchronous arrival time: Whentwo,two ormore,more frames ofrespectivesizes L1 and L2 arrive at their respectiveoneingress port or multiple ingressports,ports and there is an overlap ofthearrivaltimetimes for any of the bits on theDevice Under Test (DUT),DUT, then theframesL1 and L2 frames haveasynchronous arrival times. This is calledIncast"Incast", regardless ofinwhether the pattern is many-to-one(simpler form) or,(simpler) or many-to-many. Asynchronous arrival time:AnyThis is any condition not defined bysynchronous"synchronous arrivaltime.time". Percentage of synchronization: This defines the level of overlap[amount(amount ofbits]bits) betweentheframesL1,L2..Ln.of sizes L1,L2..Ln. Example: Two64 bytes frames,64-byte frames of length L1 andL2,L2 arrivetoat ingress port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes between thetwotwo, where the L1 and L2 frames wereat the same timeonthetheir respective ingressports. Thereforeports at the same time. Therefore, the percentage of synchronization is 10%. Statefultypetraffic: Stateful trafficdefinesis packets exchanged with a statefulprotocolprotocol, such as TCP. Statelesstypetraffic: Stateless trafficdefinesis packets exchanged with a statelessprotocolprotocol, such as UDP.6.2.26.2.2. Discussion In this scenario, buffers aresolicitedused on the DUT. In an ingress buffering mechanism, the ingress port buffers would besolicitedused along withVirtual Output Queues,virtual output queues, whenavailable;available, whereas in an egressbufferbuffering mechanism, the egress buffer of the one outgoing port would be used. In either case, regardless of where the buffer memory is locatedonin the switch architecture, the Incast creates buffer utilization. When one or more frameshavinghave synchronous arrival times at theDUTDUT, they are considered to be forming an Incast.6.2.36.2.3. Measurement Units It is a MUST to measure the number of ingress and egress ports. It is a MUST to have a non-null percentage of synchronization, which MUST be specified.77. Application Throughput: Data Center Goodput 7.1. Definition InData Center Networking,data center networking, a balanced network is a function of maximal throughput and minimal loss at any given time. This is captured by the Goodput[4].[TCP-INCAST]. Goodput is the application-level throughput. For standard TCP applications, a very small loss can have a dramatic effect on application throughput. [RFC2647]hasprovides a definition of Goodput; the definition in thispublicationdocument is avariance.variant of that definition. Goodput is the number of bits per unit of time forwarded to the correct destination interface of the DUT, minus any bits retransmitted. 7.2. Discussion In data center benchmarking, the goodput is a value that SHOULD be measured. It provides a realistic idea of the usage of the available bandwidth. A goal in data center environments is to maximize the goodput while minimizingtheloss. 7.3. Measurement Units The Goodput, G, is then measured by the following formula:G=(S/F)G = (S/F) x V bytes per second-S- S represents the payload bytes,which doesnotincludeincluding packet or TCPheaders -Fheaders. - F is the framesize -Vsize. - V is the speed of the media in bytes persecondsecond. Example: A TCP file transfer over HTTPprotocolona 10GB/s10 GB/s media. The file cannot be transferred over Ethernet as a single continuous stream. It must be broken down into individual frames of1500B1500 B when the standard MTU(Maximum Transmission Unit)is used. Each packet requires20B20 B of IP header information and20B20 B of TCP header information;therefore 1460Btherefore, 1460 B are available per packet for the file transfer.Linux basedLinux-based systems are further limited to1448B1448 B, as they also carry a12B12 B timestamp. Finally, in this example the date is transmittedin this exampleoverEthernetEthernet, which addsa 26B26 B of overhead perpacket. G=packet to 1500 B, increasing it to 1526 B. G = 1460/1526 x 10Gbit/sGbit/s, which is 9.567Gbit per secondGbit/s or 1.196GB per second.GB/s. Please note: This example does not take into consideration the additional Ethernet overhead, such as the interframe gap (a minimum of 96 bit times), nor does it account for collisions (which have a variable impact, depending on the network load). When conducting Goodputmeasurementsmeasurements, pleasedocumentdocument, in addition to the4.1 sectionitems listed in Section 4.1, the following information:-The- The TCPStack used -OS Versions -NICstack used. - OS versions. - Network Interface Card (NIC) firmware version andmodelmodel. For example, Windows TCP stacks and different Linux versions can influenceTCP based testsTCP-based test results. 8. Security Considerations Benchmarking activities as described in this memo are limited to technology characterization using controlled stimuli in a laboratory environment, with dedicated address space and the constraints specified in the sections above. The benchmarking network topology will be an independent test setup and MUST NOT be connected to devices that may forward the test traffic into a productionnetwork,network or misroute traffic to the test management network. Further, benchmarking is performed on a "black-box" basis, relying solely on measurements observable external to the DUT. Special capabilities SHOULD NOT exist in the DUT specifically for benchmarking purposes. Any implications for network security arising from the DUT SHOULD be identical in the lab and in production networks. 9. IANA ConsiderationsNOThis document does not require any IANAAction is requested at this time.actions. 10. References 10.1. Normative References[draft-ietf-bmwg-dcbench-methodology] Avramov L. and Rapp J., "Data Center Benchmarking Methodology", RFC "draft-ietf-bmwg-dcbench- methodology", DATE (to be updated once published)[RFC1242] Bradner,S.S., "Benchmarking Terminology for Network Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242, July 1991,<http://www.rfc- editor.org/info/rfc1242> [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, March 1999, <http://www.rfc-editor.org/info/rfc2544><https://www.rfc-editor.org/info/rfc1242>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997,<http://www.rfc-editor.org/info/rfc2119><https://www.rfc-editor.org/info/rfc2119>. [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, DOI 10.17487/RFC2544, March 1999, <https://www.rfc-editor.org/info/rfc2544>. [RFC5481],Morton,A.,A. and B. Claise, "Packet Delay Variation Applicability Statement",BCP 14,RFC 5481, DOI 10.17487/RFC5481, March 2009,<http://www.rfc- editor.org/info/rfc5481><https://www.rfc-editor.org/info/rfc5481>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. [RFC8239] Avramov, L. and J. Rapp, "Data Center Benchmarking Methodology", RFC 8239, DOI 10.17487/RFC8239, August 2017, <https://www.rfc-editor.org/info/rfc8239>. 10.2. Informative References [RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking", RFC 2432, DOI 10.17487/RFC2432, October 1998, <https://www.rfc-editor.org/info/rfc2432>. [RFC2647] Newman, D., "Benchmarking Terminology for Firewall Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999, <https://www.rfc-editor.org/info/rfc2647>. [RFC2889]MandevilleMandeville, R. andPerser J.,J. Perser, "Benchmarking Methodology for LAN Switching Devices", RFC 2889, DOI 10.17487/RFC2889, August 2000,<http://www.rfc-editor.org/info/rfc2889><https://www.rfc-editor.org/info/rfc2889>. [RFC3918]StoppStopp, D. andHickman B.,B. Hickman, "Methodology for IP Multicast Benchmarking", RFC 3918, DOI 10.17487/RFC3918, October 2004,<http://www.rfc- editor.org/info/rfc3918> [4] Yanpei<https://www.rfc-editor.org/info/rfc3918>. [TCP-INCAST] Chen,ReanY., Griffith,Junda Liu, Randy H. Katz, Anthony D.R., Zats, D., Joseph, A., and R. Katz, "Understanding TCP IncastThroughput Collapse in Datacenter Networks, "http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" [RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October 1998, <http://www.rfc-editor.org/info/rfc2432> [RFC2647] Newman D. ,"Benchmarking Terminologyand Its Implications forFirewall Performance" BCP 14, RFC 2647, August 1999, <http://www.rfc- editor.org/info/rfc2647> 10.3.Big Data Workloads", April 2012, <http://yanpeichen.com/ professional/usenixLoginIncastReady.pdf>. Acknowledgments The authors would like to thankAlfredAl Morton, Scott Bradner, Ian Cox, and Tim Stevenson for their reviews and feedback. Authors' Addresses Lucien Avramov Google 1600 Amphitheatre Parkway Mountain View, CA 94043 United StatesPhone: +1 408 774 9077of America Email: lucien.avramov@gmail.com Jacob Rapp VMware 3401 HillviewAveAve. Palo Alto, CA 94304 United StatesPhone: +1 650 857 3367of America Email:jrapp@vmware.comjhrapp@gmail.com