Network ManagementInternet ResearchGroupTask Force (IRTF) J. NobreInternet-DraftRequest for Comments: 8316 University of Vale do Rio dos SinosIntended status:Category: Informational L. GranvilleExpires: May 19, 2018ISSN: 2070-1721 Federal University of Rio Grande do Sul A. Clemm Huawei A. Gonzalez Prieto VMwareNovember 15, 2017February 2018 Autonomic Networking Use Case for Distributed Detection ofSLAService Level Agreement (SLA) Violationsdraft-irtf-nmrg-autonomic-sla-violation-detection-13Abstract This document describes an experimental use caseforthat employs autonomic networkingconcerningfor the monitoring of Service Level Agreements (SLAs). The use caseaims to detectis for detecting violations of SLAs in a distributedfashion, strivingfashion. It strives to optimize and dynamically adapt the autonomic deployment of active measurement probes in a way that maximizes the likelihood of detectingservice levelservice-level violations with a given resource budget to perform activemeasurements,measurements. This optimization andis able to do soadaptation should be done without any outside guidance or intervention. This document is a product of the IRTF Network Management Research Group (NMRG). It is published for informational purposes. Status of This Memo ThisInternet-Draftdocument issubmitted in full conformance withnot an Internet Standards Track specification; it is published for informational purposes. This document is a product of theprovisionsInternet Research Task Force (IRTF). The IRTF publishes the results ofBCP 78Internet-related research andBCP 79. Internet-Drafts are working documentsdevelopment activities. These results might not be suitable for deployment. This RFC represents the consensus of the Network Management Research Group of the InternetEngineeringResearch Task Force(IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid(IRTF). Documents approved for publication by the IRSG are not amaximumcandidate for any level of Internet Standard; see Section 2 ofsix monthsRFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 19, 2018.https://www.rfc-editor.org/info/rfc8316. Copyright Notice Copyright (c)20172018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 5 3. Current Approaches . . . . . . . . . . . . . . . . . . . . . 6 4. Use Case Description . . . . . . . . . . . . . . . . . . . . 6 5. A Distributed Autonomic Solution . . . . . . . . . . . . . . 7 6. Intended User Experience . . . . . . . . . . . . . . . . . . 10 7. Implementation Considerations . . . . . . . . . . . . . . . . 10 7.1.Device BasedDevice-Based Self-Knowledge and Decisions . . . . . . . . 11 7.2. Interaction withother devicesOther Devices . . . . . . . . . . . . . 11 8. Comparison withcurrent solutionsCurrent Solutions . . . . . . . . . . . . . . 11 9. Related IETF Work . . . . . . . . . . . . . . . . . . . . . . 12 10.Acknowledgements .IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 11.IANASecurity Considerations . . . . . . . . . . . . . . . . . . .. .12 12.Security ConsiderationsInformative References . . . . . . . . . . . . . . . . . . . 1313. Informative ReferencesAcknowledgements . . . . . . . . . . . . . . . . . . .13. . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .1415 1. Introduction The Internet has been growing dramatically in terms of size, capacity, and accessibility inthe lastrecent years. Communication requirements of distributed services and applications running on top of the Internet have become increasingly demanding. Some examples are real-time interactive video or financial trading. Providing such services involves stringent requirements in terms of acceptable latency, loss,orand jitter. Performance requirements lead to the articulation of Service Level Objectives (SLOs)whichthat must be met. Those SLOs are part of Service Level Agreements (SLAs) that define a contract between the provider and the consumer of a service. SLOs, in effect, constitute aservice levelservice-level guarantee that the consumer of the service can expect to receive (and often has to pay for). Likewise, the provider of a service needs to ensure that theservice levelservice-level guarantee and associated SLOs are met. Some examples of clauses that relate toservice level objectivesSLOs can be found in[RFC7297]).[RFC7297]. Violations of SLOs can be associated with significant financial loss, which can by divided into two categories.For one,First, there is the loss that can be incurred by the user of a service when the agreed service levels are not provided. For example, a financial brokerage's stock orders might suffer losses when it is unable to execute stock transactions in a timely manner. An electronic retailer may lose customers whentheirits online presence is perceived by customers as sluggish. An online gaming provider may not be able to provide fair access to online players, resulting in frustrated players who are lost as customers. In each case, the failure of a service provider to meet promisedservice levelservice-level guarantees can have a substantial financial impact on users of the service.By the same token,Second, there is the loss that is incurred by the provider of a service who is unable to meet promisedservice level objectives.SLOs. Those losses can take several forms, such as penalties fornot meetingviolating the serviceand, in many cases more important,level agreement and even loss of future revenue due to reduced customersatisfaction.satisfaction (which, in many cases, is more serious). Hence,service level objectivesSLOs are a key concern for the service provider. In order to ensure that SLOs are not being violated, service levels need to be continuously monitored at the network infrastructure layer in order to know, for example, when mitigating actions need to be taken. To that end,service levelservice-level measurements must take place. Network measurements can be performed using active or passive measurement techniques. In passive measurements, production traffic isobservedobserved, and no monitoring traffic is created by the measurement process itself. That is, network conditions are checked in anonnon- intrusive way. In the context of IP Flow InformationeXportExport (IPFIX), several documents were produced that define how to export data associated with flow records,i.e.i.e., data that is collected as part of passive measurement mechanisms, generally applied against flows of production traffic (e.g., [RFC7011]). In addition, itwould beis possible to collect real data traffic (not just summarized flow records) with time-stamped packets, possibly sampled (e.g., per[RFC5474],[RFC5474]), as a means of measuring and inferring service levels. Active measurements, on the other hand, are more intrusive to the network in the sense thatit involvesthey involve injecting synthetic test traffic into the network to measure network service levels, as opposed to simply observing production traffic. The IP Performance Metrics (IPPM)WGWorking Group produced documents that describe active measurementmechanisms,mechanisms suchas:as the One-Way Active Measurement Protocol (OWAMP) [RFC4656], the Two-Way Active Measurement Protocol (TWAMP) [RFC5357], and the CiscoService LevelService-Level Assurance Protocol(SLA)[RFC6812]. In addition, there are some mechanisms that do not cleanly fit into either active or passive categories, such as Performance and Diagnostic Metrics (PDM) Destination Option(PDM)techniques [RFC8250]. Active measurement mechanisms offer a high level of controlofover what and how to measure. They do not require inspecting production traffic. Because of this, active measurements usually offer better accuracy and privacy than passive measurement mechanisms. Traffic encryption and regulations that limit the amount of payload inspection that can occur are non-issues. Furthermore, active measurement mechanisms are able to detect end-to-end network performance problems in a fine-grained way (e.g., simulating the traffic that must be handled considering specificService Level Objectives -SLOs). As a result, active measurements are often preferred over passive measurement for SLA monitoring. Measurement probes must be hosted in network devices and measurement sessions must be activated to compute the current network metrics(e.g., considering those(for example, metrics such as the ones described in[RFC4148]).[RFC4148], although note that [RFC4148] was obsoleted by [RFC6248]). This activation should be dynamic in order to follow changes in network conditions, such as those relatedwithto routes being added or new customer demands. While offering many advantages, active measurements are expensive in terms of network resource consumption. Active measurements generally involve measurement probes that generate synthetic test traffic that is directed at a responder. The responder needs to timestamp test traffic it receives and reflect it back to the originating measurement probe. The measurement probe subsequently processes the returned packets along withtime stampingtime-stamping information in order to compute service levels. Accordingly, active measurements consume substantial CPU cycles as well as memory of network devices to generate and process test traffic. In addition, synthetic traffic increases network load.ActiveThus, active measurementsthuscompete for resources with other functions, including routing and switching. The resources required and traffic generated by the active measurement sessionsare toare, in a largepartpart, a function of the number of measured network destinations. (In addition, the amount of traffic generated for each measurement plays arole, whichrole that, inturnturn, influences the accuracy of the measurement.)TheWhen more destinations arebeing measured, the larger the amountmeasured, a greater number of resources are consumed and more traffic is needed to perform the measurements. Thus, to haveabetter monitoringcoveragecoverage, it is necessary to deploy moresessionssessions, which consequently increases consumed resources. Otherwise, enabling the observation of just a small subset of all network flows can lead toaninsufficient coverage. Furthermore, while some end-to-end service levels can be determined by adding up the service levels observed across different path segments, the same is not true for all service levels. For example, the end-to-end delay or packet loss from a node A to a node C routed via a node B can often be computed simply by adding delays (or loss) from A toB,B and from B to C. This allowsto decomposethe decomposition of a large set of end-to-end measurements into a much smaller set of segment measurements. However, end-to-end jitter and(for example) Mean Opinion Scoresmean opinion scores cannot be decomposed as easily and, for higher accuracy, must be measured end-to-end. Hence, the decision about how to place measurement probes becomes an important management activity. The goal is to obtain the maximum benefits ofservice levelservice-level monitoring with a limited amount of measurement overhead. Specifically, the goal is to maximize the number ofservice levelservice-level violations that are detected with a limitedamountnumber of resources. The use case and the solution approach described in this document address an important practical issue. They are intended to provide a basis for further experimentation to leadintoto solutions for wider deployment. This document represents the consensus of the IRTF's Network Management Research Group (NMRG). It was discussed extensively and received three separate in-depth reviews. 2. Definitions and Acronyms Active Measurements: Techniques to measure service levels that involve generating and observing synthetic test traffic Passive Measurements: Techniques used to measure service levels based on observation of production trafficAN:AutonomicNetwork; aNetwork: A network containing exclusively autonomic nodes, requiring noconfigurationconfiguration, and deriving all required information through self-knowledge, discovery, or intent. Autonomic Service Agent (ASA): An agent implemented on an autonomic node that implements an autonomic function, either in part (in the case of a distributed function, as in the context of thisdocument),document) orwhole.whole Measurement Session: A communications association between aProbeprobe and aResponderresponder used to send and reflect synthetic test traffic for active measurements Probe: The source of synthetic test traffic in an active measurement Responder: The destination for synthetic test traffic in an active measurement SLA: Service Level Agreement SLO: Service Level Objective P2P: Peer-to-Peer (Note: The definitionsof ANfor "Autonomic Network" andASA"Autonomic Service Agent" are borrowed from [RFC7575]). 3. Current ApproachesThe current best practice inFor feasible deployments of active measurement solutions to distribute the available measurement sessions along thenetworknetwork, the current best practice consistsinof relying entirely on the humanadministratoradministrator's expertise to inferwhich would bethe best location to activate such sessions. This is done through several steps. First, it is necessary to collect traffic information in order to grasp the traffic matrix. Then, the administrator uses this information to inferwhich arethe best destinations for measurement sessions. After that, the administrator activates sessions on the chosen subset ofdestinations consideringdestinations, taking the availableresources.resources into account. This practice, however, does not scale well because it is still labor intensive and error-prone for the administrator to determine which sessions should be activated given the set of critical flows that needs to be measured. Even worse, this practice completely fails in networkswhosewhere the most critical flowsare too shortchange rapidly, resulting intime anddynamicin terms of traversing network path, likechanges to what would be the most important destinations. For example, this can be the case in modern cloud environments.ThatThis issobecause fast reactions are necessary to reconfigure thesessionssessions, and administrators are just not quick enough in computing and activating the new set of required sessions every time the network traffic pattern changes. Finally, the current practice for active measurementspracticeusually covers only a fraction of the network flows that should be observed, which invariably leads to the damaging consequence of undetected SLA violations. 4. Use Case Description The use case involves aservice levelservice-level providerwhothat needs to monitor the network to detectservice levelservice-level violations using activeserviceservice- levelmeasurements,measurements and wants to be able to do so with minimal human intervention. The goal is to conduct the measurements in an effective mannermaximizingto maximize the percentage of detectedservice levelservice-level violations. Theservice levelservice-level provider has a bounded resource budget withregardsregard to measurements that can be performed,specifically, with regards tospecifically the number of measurements that can be conducted concurrently from any one networkdevice,device and possiblywith regards tothe total amount of measurement traffic on the network. However, while at any one point in time the number of measurements conducted is limited, it is possible for a device to change which destinations to measure over time. This can be exploited to achieve a balance of eventually covering all possible destinations using a reasonable amount of "sampling" where measurement coverage of a destination cannot be continuous. The solution needs to be dynamic andbeable to cope with network conditionswhichthat may change over time. The solution should also be embeddable inside network devices that control the deployment of active measurement mechanisms. The goal is to conduct the measurements in a smart manner that ensures that the network is broadly covered and that the likelihood of detectingservice levelservice-level violations is maximized. In order to maximize that likelihood, it is reasonable to focus measurement resources on destinations that are more likely to incur a violation, while spendinglessfewer resources on destinations that are more likely to be in compliance. In order to doso,this, there are various aspects that can be exploited, including past measurements (destinations close to aservice levelservice-level threshold requiring more focus than destinationsfurtherfarther from it), complementation with passive measurements such as flow data (to identify network destinations that are currently popular and critical), and observations from other parts of the network. In addition, measurements can be coordinated among different network devices to avoid hitting the same destination at the same time and tobe able toshare results that may be useful in future probe placement. Clearly, static solutions will have severe limitations. At the same time, human administrators cannot be in the loop for continuous dynamic reconfigurations of measurementprobe reconfigurations. Accordingly,probes. Thus, an automatedor, ideally,solution, or ideally an autonomicsolutionsolution, is neededin whichso that network measurements are automatically orchestrated and dynamically reconfigured from within the network. This can be accomplished using an autonomic solution that is distributed, usingAutonomic Service AgentsASAs that are implemented on nodes in the network. 5. A Distributed Autonomic Solution The use of Autonomic Networking (AN) [RFC7575] can help such detection through an efficient activation of measurement sessions. Such an approach, along with a detailed assessment confirming its viability,has beenis described in [P2PBNM-Nobre-2012]. The problem to be solved by AN in the present use case is how to steer the process ofMeasurement Sessionmeasurement session activation by a complete solution that sets all necessary parameters for this activation to operate efficiently,reliablyreliably, and securely, with no required human intervention other than setting overall policy. When a node first comes online, it has no information about which measurements are more critical than others. In the absence of information about past measurements and information from measurement peers, it may start with an initial set of measurement sessions, possibly randomly seeding a set of startermeasurements,measurements and perhaps taking around robinround-robin approach for subsequent measurement rounds. However, as measurements are collected, a node will gain an increasing amount of information that it can utilize to refine its strategy of selecting measurement targets going forward. For one, it may take note of which targets returned measurement results very close toservice level thresholds thatservice-level thresholds; these targets maythereforerequire closer scrutiny compared to others. Second, it may utilize observations that are made by its measurement peers in order to conclude which measurement targets may be more critical thanothers,others andin orderto ensure that proper overall measurement coverage is obtained (so that not every node incidentallymeasuremeasures the same targets, while other targets are not measured at all). We advocate for embeddingPeer-to-Peer (P2P)P2P technology in network devices in order toconduct the Measurement Session activation decisions usinguse autonomic controlloops.loops to make decisions about measurement sessions. Specifically, we advocate for network devices to implement an autonomic functionto monitorthat monitors service levels for violations ofservice level objectives, determiningSLOs and that determines whichMeasurement Sessionsmeasurement sessions to set up at any given point in time based on current and past observations of thenode,node and of other peer nodes. By performing these functions locally and autonomically on the device itself, which measurements to conduct can be modified quickly based on local observations while taking local resource availability into account. This allows a solution to be more robust and react more dynamically to rapidly changing service levels than a solution that has to rely on central coordination. However, in order to optimize decisions about which measurements to conduct, a node will need to communicate with other nodes. This allows a node to take into account other nodes' observations in addition to its own in its decisions. For example, remote destinations whose observed service levels are on the verge of violating stated objectives may require closer monitoring than remote destinations that are comfortably within a range of tolerance.ItA distributed autonomic solution also allows nodes to coordinate their probing decisions to collectively achieve the best possible measurement coverage.AsBecause theamountnumber of resources available formonitoring and for exchange ofmonitoring, exchanging measurementdatadata, andcoordinationcoordinating with other nodesareis limited, a node mayfurtherbe interested in identifying other nodes whose observations aremostsimilar to and correlated with its own. This helps a node prioritize andguide withdecide which other nodes toprimarilycoordinate and exchange data with. All of this requires the use of a P2P overlay. A P2P overlay is essential for several reasons: o It makes it possible for nodes(respectively Autonomic Service Agents(or more specifically, the ASAs that are deployed on those nodes) in the network to autonomically set upMeasurement Sessions,measurement sessions without having to rely on a central management system or controller to perform configuration operations associated with configuring measurement probes and responders. o It facilitates the exchange of data between different nodes to share measurement results so that each node can refine its measurement strategy based not just on its own observations, but also on observations from its peers. o It allows nodes to coordinate their measurements to obtain the best possible test coverage and avoid measurements that have a very low likelihood of detectingservice levelservice-level violations. The provisioning of the P2P overlay should be transparent for the network administrator. An Autonomic Control Plane such as defined in[I-D.anima-autonomic-control-plane][ACP] provides an ideal candidate for the P2P overlay to run on. An autonomic solution for the distributed detection of SLA violationsprovideprovides several benefits. First,efficiency:it provides efficiency; this solution should optimize the resource consumption and avoid resource starvation on the network devices. A device that is "self-aware" of its available resources will be able to adjust measurement activities rapidly as needed, without requiring a separate control loop involving resource monitoring by an external system.Secondly,Second, placing logic about where to conduct measurementsininto the node enables rapid control loopsin whichthat allow devicesare ableto react instantly to observations and adjust their measurement strategy. For example, a device could decide to adjust the amount of synthetic test traffic being sent during the measurement itself depending on results observed so far on this andonother concurrent measurement sessions. As a result, the solution could decrease the time necessary to detect SLA violations. Adaptivity features of an autonomic loop could capturefasterthe network dynamics faster thanana human administratorandor even a central controller. Finally, the solution could help to reduce the workload of humanadministrator, or, at least, to avoid their need to perform operational tasks.administrators. In practice, these factors combine to maximize the likelihood of SLA violations being detected while operating within a given resource budget, allowingto conducta continuous measurement strategy that takes into account past measurementresults,results to be conducted, observations of other measures such as link utilization or flow data,sharing ofmeasurement results shared between network devices, andcoordinatingfuture measurement activities coordinated among nodes.CombinedCombined, this can result in efficient measurement decisions that achieve a golden balance between offering broad network coverage and honing in onservice levelservice-level "hot spots". 6. Intended User Experience The autonomic solution should not require any human intervention in the distributed detection of SLA violations. By virtue of the solution being autonomic, human users will not have to plan which measurements to conduct in a network, which is often a verylaborlabor- intensive tasktodaythat requires detailed analysis of traffic matrices and network topologies and is not prone to easy dynamic adjustment. Likewise, they will not have to configure measurement probes and responders. There are some ways in which a human administrator may still interact with the solution.For one,First, the human administratorwillwill, ofcoursecourse, be notified and obtain reports aboutservice levelservice-level violations that are observed. Second, a human administrator may setapolicies regarding how closely to monitor the network forservice levelservice-level violations and how many resources to spend. For example, an administrator may set a resource budget that is assigned to network devices for measurement operations. With that given budget, the number of SLO violations that are detected will be maximized. Alternatively, an administrator may set a target for the percentage of SLO violations that must be detected,i.e.i.e., a target for the ratio between the number of detected SLOviolations,violations and the number of total SLO violations that are actually occurring (some of which might go undetected). In that case, the solution will aim to minimize the resources spent(i.e.(i.e., the amount of test traffic andMeasurement Sessions)number of measurement sessions) that are required to achieve that target. 7. Implementation Considerations The active measurement model assumes that a typical infrastructure will have multiple networksegments andsegments, multiple Autonomous Systems(ASs),(ASes), and a reasonably large number of routers. It also considers that multiple SLOs can be in place at a given time. Since interoperability in aheterogenousheterogeneous network is a goal, features found on different active measurement mechanisms(e.g.(e.g., OWAMP, TWAMP, andIPSLA)Cisco Service Level Assurance Protocol) and deviceprogramabilityprogrammability interfaces (such as Juniper's Junos API or Cisco's Embedded Event Manager) could be used for the implementation. The autonomic solution should include and/or reference specific algorithms, protocols,metricsmetrics, and technologies for the implementation of distributed detection of SLA violations as a whole. Finally, it should be noted that there are multiple deployment scenarios, including deployment scenarios that involve physical devices hosting autonomicfunctions,functions or virtualized infrastructure hosting the same. Co-deployment in conjunction with Virtual Network Functions(VNF)(VNFs) is a possibility for further study. 7.1.Device BasedDevice-Based Self-Knowledge and Decisions Each device has self-knowledge about the local SLA monitoring. This could be in the form of historical measurement data and SLOs. Besides that, the devices would have algorithms that could decide which probes should be activatedinat a given time. The choice of which algorithm is better for a specific situation would be also autonomic. 7.2. Interaction withother devicesOther Devices Network devices should share information aboutservice levelservice-level measurement results. This information can speed up the detection of SLA violations and increase the number of detected SLA violations. For example, if one device detects that a remote destination is in danger of violating an SLO, other devices may conduct additional measurements to the same destination or other destinations in its proximity. For any given network device, the exchange of data may be more important with some devices (for example, devices in the same networkneighborhood,neighborhood or devices that are "correlated" by some other means) than with others.The definition ofDefining the network devices that exchange measurementdata, i.e.,data (i.e., managementpeers,peers) creates a new topology. Different approaches could be used to define this topology (e.g., correlated peers [P2PBNM-Nobre-2012]). To bootstrap peer selection, each device should use its knownendpointsneighbors (e.g., FIB and RIB tables) astheinitialseedseeds togetidentify possible peers. It should be noted that a solution will benefit if topology information and network discovery functions are provided by the underlying autonomic framework. A solution will need to be able to discover measurement peers as well as measurement targets, specifically measurement targets that support active measurement responders andwhichthat will be able to respond to measurement requests and reflect measurement traffic as needed. 8. Comparison withcurrent solutionsCurrent Solutions There is no standardized solution for distributed autonomic detection of SLA violations. Current solutions are restricted to ad hoc scripts running on aper nodeper-node fashion to automate someadministrator'sadministrator actions. There are some proposals for passive probe activation (e.g., DECON [DECON] andCSAMP),CSAMP [CSAMP]), butwithout thethese do not focus on autonomic features. 9. Related IETF WorkThe following paragraphs discussThis section discusses related IETF work andareis provided for reference. This section is notexhaustive, ratherexhaustive; rather, it provides an overview of the various initiatives and how they relate to autonomic distributed detection of SLA violations. 1.[LMAP]:LMAP: The Large-Scale Measurement of Broadband Performance Working Groupaims atstandardizes thestandardsLMAP measurement system for performancemanagement. Since their mechanisms also consist in deploying measurement probes themanagement of broadband access devices. The autonomic solution could be relevantforto LMAPspecially consideringbecause it deploys measurement probes and could be used for screening for SLAviolation screening.violations. Besides that, a solution to decrease the workload of human administrators in service providers is probably highly desirable. 2.[IPFIX]:IPFIX: IP Flow InformationEXportExport (IPFIX)aims at the process of standardization ofWorking Group (now concluded) aimed to standardize IP flows (i.e., netflows). IPFIX uses measurement probes (i.e., metering exporters) to gather flow data. In this context, the autonomic solution for the activation of active measurement probes couldbepossibly be extended toaddressalso address passive measurement probes. Besides that, flow information could be used inthe decisionmakingofdecisions regarding probe activation. 3.[ALTO]:ALTO: TheApplication LayerApplication-Layer Traffic Optimization Working Group aims to provide topological information at a higher abstraction layer, which can be based upon network policy, and with application-relevant service functions located in it. Their work could be leveragedfor the definition ofto define the topologyregarding thefor network deviceswhichthat exchange measurement data.11.10. IANA Considerations Thismemo includesdocument has norequest to IANA. 12.IANA actions. 11. Security ConsiderationsSecurityThe security ofthethis solution hinges on the security of the network underlay,i.e.i.e., the Autonomic Control Plane. If the Autonomic Control Plane were to be compromised, an attacker could undermine the effectiveness of measurement coordination by reporting fraudulent measurement results to peers. This would cause measurement probes to be deployed in an ineffective manner that would increase the likelihood that violations ofservice level objectivesSLOs go undetected. Likewise, the security of the solution hinges on the security of the deployment mechanism for autonomicfunctions, infunctions (in this case, the autonomic function that conducts theservice level measurements.service-level measurements). If an attacker were able to hijack an autonomic function, it could try to exhaust or exceed the resources that should be spent on autonomic measurements in order to deplete network resources, including network bandwidth due to higher-than-necessary volumes of synthetic test traffic generated by measurement probes. Again, it could also lead to reporting of misleadingresults,results; among otherthings resultingthings, this could result in non-optimal selection of measurement targetsandand, inturnturn, an increase in the likelihood thatservice levelservice-level violations go undetected.13.12. Informative References[draft-anima-boot] Pritikin, M., Richardson, M., Behringer, M., Bjarnason, S., and K. Watsen, "draft-ietf-anima-bootstrapping- keyinfra", draft-ietf-anima-bootstrapping-keyinfra-08 (work in progress), October 2017. [I-D.anima-autonomic-control-plane] Behringer, M.,[ACP] Eckert, T., Ed., Behringer, M., Ed., and S. Bjarnason, "An Autonomic ControlPlane", draft-ietf-anima-autonomic-control- plane-12 (workPlane (ACP)", Work inprogress), OctoberProgress, draft- ietf-anima-autonomic-control-plane-13, December 2017. [CSAMP] Sekar, V., Reiter, M., Willinger, W., Zhang, H., Kompella, R., and D. Andersen, "CSAMP: A System for Network-Wide Flow Monitoring", NSDI USENIX Symposium Networked Systems Design and Implementation, April 2008. [DECON] di Pietro, A., Huici, F., Costantini, D., and S. Niccolini, "DECON: Decentralized Coordination for Large- Scale Flow Monitoring", IEEE INFOCOM Workshops, DOI 10.1109/INFCOMW.2010.5466642, March 2010. [P2PBNM-Nobre-2012] Nobre, J., Granville, L., Clemm, A., and A. Gonzalez Prieto, "Decentralized Detection of SLA Violations Using P2P Technology, 8th International Conference Network and Service Management (CNSM)", 8th International Conference on Network and Service Management (CNSM), 2012, <http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=6379997>. [RFC4148] Stephan, E., "IP Performance Metrics (IPPM) Metrics Registry", BCP 108, RFC 4148, DOI 10.17487/RFC4148, August 2005, <https://www.rfc-editor.org/info/rfc4148>. [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. Zekauskas, "A One-way Active Measurement Protocol (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, <https://www.rfc-editor.org/info/rfc4656>. [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, DOI 10.17487/RFC5357, October 2008, <https://www.rfc-editor.org/info/rfc5357>. [RFC5474] Duffield, N., Ed., Chiou, D., Claise, B., Greenberg, A., Grossglauser, M., and J. Rexford, "A Framework for Packet Selection and Reporting", RFC 5474, DOI 10.17487/RFC5474, March 2009, <https://www.rfc-editor.org/info/rfc5474>. [RFC6248] Morton, A., "RFC 4148 and the IP Performance Metrics (IPPM) Registry of Metrics Are Obsolete", RFC 6248, DOI 10.17487/RFC6248, April 2011, <https://www.rfc-editor.org/info/rfc6248>. [RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare, S., and E. Yedavalli, "Cisco Service-Level Assurance Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013, <https://www.rfc-editor.org/info/rfc6812>. [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013, <https://www.rfc-editor.org/info/rfc7011>. [RFC7297] Boucadair, M., Jacquenet, C., and N. Wang, "IP Connectivity Provisioning Profile (CPP)", RFC 7297, DOI 10.17487/RFC7297, July 2014, <https://www.rfc-editor.org/info/rfc7297>. [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic Networking: Definitions and Design Goals", RFC 7575, DOI 10.17487/RFC7575, June 2015, <https://www.rfc-editor.org/info/rfc7575>. [RFC8250] Elkins, N., Hamilton, R., and M. Ackermann, "IPv6 Performance andDiagnosticsDiagnostic Metrics (PDM) Destination Option", RFC 8250,October 2017. 10.DOI 10.17487/RFC8250, September 2017, <https://www.rfc-editor.org/info/rfc8250>. Acknowledgements We wish to acknowledge the helpful contributions, comments, and suggestions that were received from Mohamed Boucadair, Brian Carpenter, Hanlin Fang, Bruno Klauser, Diego Lopez, Vincent Roca, and Eric Voit. In addition, we thank Diego Lopez, Vincent Roca, and Brian Carpenter for their detailed reviews. Authors' Addresses Jeferson Campos Nobre University of Vale do Rio dos Sinos Porto Alegre Brazil Email: jcnobre@unisinos.br Lisandro Zambenedetti Granvile Federal University of Rio Grande do Sul Porto Alegre Brazil Email: granville@inf.ufrgs.br Alexander Clemm Huawei USA - Futurewei Technologies Inc. Santa Clara, CaliforniaUSAUnited States of America Email:ludwig@clemm.orgludwig@clemm.org, alexander.clemm@huawei.com Alberto Gonzalez Prieto VMware Palo Alto, CaliforniaUSAUnited States of America Email: agonzalezpri@vmware.com