rfc9544.original | rfc9544.txt | |||
---|---|---|---|---|
Network Working Group G. Mirsky | Internet Engineering Task Force (IETF) G. Mirsky | |||
Internet-Draft J. Halpern | Request for Comments: 9544 J. Halpern | |||
Intended status: Informational Ericsson | Category: Informational Ericsson | |||
Expires: 3 June 2024 X. Min | ISSN: 2070-1721 X. Min | |||
ZTE Corp. | ZTE Corp. | |||
A. Clemm | A. Clemm | |||
J. Strassner | J. Strassner | |||
Futurewei | Futurewei | |||
J. Francois | J. Francois | |||
Inria and University of Luxembourg | Inria and University of Luxembourg | |||
1 December 2023 | February 2024 | |||
Precision Availability Metrics for Services Governed by Service Level | Precision Availability Metrics (PAMs) for Services Governed by Service | |||
Objectives (SLOs) | Level Objectives (SLOs) | |||
draft-ietf-ippm-pam-09 | ||||
Abstract | Abstract | |||
This document defines a set of metrics for networking services with | This document defines a set of metrics for networking services with | |||
performance requirements expressed as Service Level Objectives (SLO). | performance requirements expressed as Service Level Objectives | |||
These metrics, referred to as Precision Availability Metrics (PAM), | (SLOs). These metrics, referred to as "Precision Availability | |||
are useful for defining and monitoring SLOs. For example, PAM can be | Metrics (PAMs)", are useful for defining and monitoring SLOs. For | |||
used by providers and/or customers of an RFC XXXX Network Slice | example, PAMs can be used by providers and/or customers of an RFC | |||
Service to assess whether the service is provided in compliance with | 9543 Network Slice Service to assess whether the service is provided | |||
its defined SLOs. | in compliance with its defined SLOs. | |||
Note to the RFC Editor: Please update "RFC XXXX Network Slice" with | ||||
the RFC number assigned to draft-ietf-teas-ietf-network-slices. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
approved by the IESG are candidates for any level of Internet | ||||
Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 3 June 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9544. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 | 2. Conventions | |||
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 2.1. Terminology | |||
2.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2.2. Acronyms | |||
3. Precision Availability Metrics . . . . . . . . . . . . . . . 5 | 3. Precision Availability Metrics | |||
3.1. Introducing Violated Intervals . . . . . . . . . . . . . 5 | 3.1. Introducing Violated Intervals | |||
3.2. Derived Precision Availability Metrics . . . . . . . . . 6 | 3.2. Derived Precision Availability Metrics | |||
3.3. PAM Configuration Settings and Service Availability . . . 8 | 3.3. PAM Configuration Settings and Service Availability | |||
4. Statistical SLO . . . . . . . . . . . . . . . . . . . . . . . 9 | 4. Statistical SLO | |||
5. Other Expected PAM Benefits . . . . . . . . . . . . . . . . . 10 | 5. Other Expected PAM Benefits | |||
6. Extensions and Future Work . . . . . . . . . . . . . . . . . 10 | 6. Extensions and Future Work | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | 7. IANA Considerations | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | 8. Security Considerations | |||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 | 9. Informative References | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | Acknowledgments | |||
10.1. Informative References . . . . . . . . . . . . . . . . . 11 | Contributors | |||
Contributors' Addresses . . . . . . . . . . . . . . . . . . . . . 13 | Authors' Addresses | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | ||||
1. Introduction | 1. Introduction | |||
Service providers and users often need to assess the quality with | Service providers and users often need to assess the quality with | |||
which network services are being delivered. In particular, in cases | which network services are being delivered. In particular, in cases | |||
where service level guarantees are documented (including their | where service-level guarantees are documented (including their | |||
companion metrology) as part of a contract established between the | companion metrology) as part of a contract established between the | |||
customer and the service provider, and Service Level Objectives | customer and the service provider, and Service Level Objectives | |||
(SLOs) are defined, it is essential to provide means to verify that | (SLOs) are defined, it is essential to provide means to verify that | |||
what has been delivered complies with what has been possibly | what has been delivered complies with what has been possibly | |||
negotiated and (contractually) defined between the customer and the | negotiated and (contractually) defined between the customer and the | |||
service provider. Examples of SLOs would be target values for the | service provider. Examples of SLOs would be target values for the | |||
maximum packet delay (one-way and/or round-trip) or maximum packet | maximum packet delay (one-way and/or round-trip) or maximum packet | |||
loss ratio that would be deemed acceptable. | loss ratio that would be deemed acceptable. | |||
More generally, SLOs can be used to characterize the ability of a | More generally, SLOs can be used to characterize the ability of a | |||
particular set of nodes to communicate according to certain | particular set of nodes to communicate according to certain | |||
measurable expectations. Those expectations can include but are not | measurable expectations. Those expectations can include but are not | |||
limited to aspects such as latency, delay variation, loss, capacity/ | limited to aspects such as latency, delay variation, loss, capacity/ | |||
throughput, ordering, and fragmentation. Whatever SLO parameters are | throughput, ordering, and fragmentation. Whatever SLO parameters are | |||
chosen and whichever way service level parameters are being measured, | chosen and whichever way service-level parameters are being measured, | |||
precision availability metrics indicate whether or not a given | Precision Availability Metrics indicate whether or not a given | |||
service has been available according to expectations at all times. | service has been available according to expectations at all times. | |||
Several metrics (often documented in the IANA Registry of Performance | Several metrics (often documented in the IANA "Performance Metrics" | |||
Metrics [IANA-PM-Registry] according to [RFC8911] and [RFC8912]), can | registry [IANA-PM-Registry] according to [RFC8911] and [RFC8912]) can | |||
be used to characterize the service quality, expressing the perceived | be used to characterize the service quality, expressing the perceived | |||
quality of delivered networking services versus their SLOs. Of | quality of delivered networking services versus their SLOs. Of | |||
concern is not so much the absolute service level (for example, | concern is not so much the absolute service level (for example, | |||
actual latency experienced) but whether the service is provided in | actual latency experienced) but whether the service is provided in | |||
compliance with the negotiated and eventually contracted service | compliance with the negotiated and eventually contracted service | |||
levels. For instance, this may include whether the experienced | levels. For instance, this may include whether the experienced | |||
packet delay falls within an acceptable range that has been | packet delay falls within an acceptable range that has been | |||
contracted for the service. The specific quality of service depends | contracted for the service. The specific quality of service depends | |||
on the SLO or a set thereof for a given service that is in effect. A | on the SLO or a set thereof for a given service that is in effect. | |||
non-compliance to an SLO might result in the degradation of the | Non-compliance to an SLO might result in the degradation of the | |||
quality of experience for gamers or even jeopardize the safety of a | quality of experience for gamers or even jeopardize the safety of a | |||
large geographical area. | large geographical area. | |||
The same service level may be deemed acceptable for one application, | The same service level may be deemed acceptable for one application, | |||
while unacceptable for another, depending on the needs of the | while unacceptable for another, depending on the needs of the | |||
application. Hence it is not sufficient to measure service levels | application. Hence, it is not sufficient to measure service levels | |||
per se over time, but to assess the quality of the service being | per se over time; the quality of the service being contextually | |||
contextually provided (e.g., with the applicable SLO in mind). | provided (e.g., with the applicable SLO in mind) must be also | |||
However, at this point, there are no standard metrics that can be | assessed. However, at this point, there are no standard metrics that | |||
used to account for the quality with which services are delivered | can be used to account for the quality with which services are | |||
relative to their SLOs, and whether their SLOs are being met at all | delivered relative to their SLOs or to determine whether their SLOs | |||
times. Such metrics and the instrumentation to support them are | are being met at all times. Such metrics and the instrumentation to | |||
essential for various purposes, including monitoring (to ensure that | support them are essential for various purposes, including monitoring | |||
networking services are performing according to their objectives) as | (to ensure that networking services are performing according to their | |||
well as accounting (to maintain a record of service levels delivered, | objectives) as well as accounting (to maintain a record of service | |||
which is important for the monetization of such services as well as | levels delivered, which is important for the monetization of such | |||
for the triaging of problems). | services as well as for the triaging of problems). | |||
The current state-of-the-art of metrics includes, for example, | The current state-of-the-art of metrics include, for example, | |||
interface metrics, useful to obtain statistical data on traffic | interface metrics that can be used to obtain statistical data on | |||
volume and behavior that can be observed at an interface [RFC2863] | traffic volume and behavior that can be observed at an interface | |||
and [RFC8343]. However, they are agnostic of actual service levels | [RFC2863] [RFC8343]. However, they are agnostic of actual service | |||
and not specific to distinct flows. Flow records [RFC7011] and | levels and not specific to distinct flows. Flow records [RFC7011] | |||
[RFC7012] maintain statistics about flows, including flow volume and | [RFC7012] maintain statistics about flows, including flow volume and | |||
flow duration, but again, contain very little information about | flow duration, but again, they contain very little information about | |||
service levels, let alone whether the service levels delivered meet | service levels, let alone whether the service levels delivered meet | |||
their respective targets, i.e., their associated SLOs. | their respective targets, i.e., their associated SLOs. | |||
This specification introduces a new set of metrics, Precision | This specification introduces a new set of metrics, Precision | |||
Availability Metrics (PAM), aimed at capturing service levels for a | Availability Metrics (PAMs), aimed at capturing service levels for a | |||
flow, specifically the degree to which the flow complies with the | flow, specifically the degree to which the flow complies with the | |||
SLOs that are in effect. PAM can be used to assess whether a service | SLOs that are in effect. PAMs can be used to assess whether a | |||
is provided in compliance with its defined SLOs. This information | service is provided in compliance with its defined SLOs. This | |||
can be used in multiple ways, for example, to optimize service | information can be used in multiple ways, for example, to optimize | |||
delivery, take timely counteractions in the event of service | service delivery, take timely counteractions in the event of service | |||
degradation, or account for the quality of services being delivered. | degradation, or account for the quality of services being delivered. | |||
Availability is discussed in Section 3.4 of [RFC7297]. In this | Availability is discussed in Section 3.4 of [RFC7297]. In this | |||
document, the term "availability" reflects that a service that is | document, the term "availability" reflects that a service that is | |||
characterized by its SLOs is considered unavailable whenever those | characterized by its SLOs is considered unavailable whenever those | |||
SLOs are violated, even if basic connectivity is still working. | SLOs are violated, even if basic connectivity is still working. | |||
"Precision" refers to services whose service levels are governed by | "Precision" refers to services whose service levels are governed by | |||
SLOs and must be delivered precisely according to the associated | SLOs and must be delivered precisely according to the associated | |||
quality and performance requirements. It should be noted that | quality and performance requirements. It should be noted that | |||
precision refers to what is being assessed, not the mechanism used to | precision refers to what is being assessed, not the mechanism used to | |||
measure it. In other words, it does not refer to the precision of | measure it. In other words, it does not refer to the precision of | |||
the mechanism with which actual service levels are measured. | the mechanism with which actual service levels are measured. | |||
Furthermore, the precision, with respect to the delivery of an SLO, | Furthermore, the precision, with respect to the delivery of an SLO, | |||
particularly applies when a metric value approaches the specified | particularly applies when a metric value approaches the specified | |||
threshold levels in the SLO. | threshold levels in the SLO. | |||
The specification and implementation of methods that provide for | The specification and implementation of methods that provide for | |||
accurate measurements are separate topics independent of the | accurate measurements are separate topics independent of the | |||
definition of the metrics in which the results of such measurements | definition of the metrics in which the results of such measurements | |||
would be expressed. Likewise, Service Level Expectations (SLEs), as | would be expressed. Likewise, Service Level Expectations (SLEs), as | |||
defined in Section 5.1 of [I-D.ietf-teas-ietf-network-slices], are | defined in Section 5.1 of [RFC9543], are outside the scope of this | |||
outside the scope of this document. | document. | |||
2. Conventions and Terminology | 2. Conventions | |||
2.1. Terminology | 2.1. Terminology | |||
In this document, SLA and SLO are used as defined in [RFC3198]. The | In this document, SLA and SLO are used as defined in [RFC3198]. The | |||
reader may refer to Section 5.1 of | reader may refer to Section 5.1 of [RFC9543] for an applicability | |||
[I-D.ietf-teas-ietf-network-slices] for an applicability example of | example of these concepts in the context of RFC 9543 Network Slice | |||
these concepts in the context of RFC XXXX Network Slice Services. | Services. | |||
Note to the RFC Editor: Please update "RFC XXXX Network Slice" with | ||||
the RFC number assigned to [I-D.ietf-teas-ietf-network-slices]. | ||||
2.2. Acronyms | 2.2. Acronyms | |||
PAM Precision Availability Metric | IPFIX IP Flow Information Export | |||
OAM Operations, Administration, and Maintenance | PAM Precision Availability Metric | |||
SLA Service Level Agreement | ||||
SLE Service Level Expectations | SLA Service Level Agreement | |||
SLO Service Level Objective | SLE Service Level Expectation | |||
VI Violated Interval | SLO Service Level Objective | |||
VIR Violated Interval Ratio | SVI Severely Violated Interval | |||
VPC Violated Packets Count | SVIR Severely Violated Interval Ratio | |||
SVI Severely Violated Interval | SVPC Severely Violated Packets Count | |||
SVIR Severely Violated Interval Ratio | VFI Violation-Free Interval | |||
SVPC Severely Violated Packets Count | VI Violated Interval | |||
VFI Violation-Free Interval | VIR Violated Interval Ratio | |||
VPC Violated Packets Count | ||||
3. Precision Availability Metrics | 3. Precision Availability Metrics | |||
3.1. Introducing Violated Intervals | 3.1. Introducing Violated Intervals | |||
When analyzing the availability metrics of a service between two | When analyzing the availability metrics of a service between two | |||
measurement points, a time interval as the unit of PAM needs to be | measurement points, a time interval as the unit of PAMs needs to be | |||
selected. In [ITU.G.826], a time interval of one second is used. | selected. In [ITU.G.826], a time interval of one second is used. | |||
That is reasonable, but some services may require different | That is reasonable, but some services may require different | |||
granularity (e.g., decamillisecond). For that reason, the time | granularity (e.g., decamillisecond). For that reason, the time | |||
interval in PAM is viewed as a variable parameter though constant for | interval in PAMs is viewed as a variable parameter, though constant | |||
a particular measurement session. Furthermore, for the purpose of | for a particular measurement session. Furthermore, for the purpose | |||
PAM, each time interval is classified either as Violated Interval | of PAMs, each time interval is classified as either Violated Interval | |||
(VI), Severely Violated Interval (SVI), or Violation-Free Interval | (VI), Severely Violated Interval (SVI), or Violation-Free Interval | |||
(VFI). These are defined as follows: | (VFI). These are defined as follows: | |||
* VI is a time interval during which at least one of the performance | * VI is a time interval during which at least one of the performance | |||
parameters degraded below its configurable optimal level | parameters degraded below its configurable optimal threshold. | |||
threshold. | ||||
* SVI is a time interval during which at least one of the | * SVI is a time interval during which at least one of the | |||
performance parameters degraded below its configurable critical | performance parameters degraded below its configurable critical | |||
threshold. | threshold. | |||
* Consequently, VFI is a time interval during which all performance | * Consequently, VFI is a time interval during which all performance | |||
parameters are at or better than their respective pre-defined | parameters are at or better than their respective pre-defined | |||
optimal levels. | optimal levels. | |||
The monitoring of performance parameters to determine the quality of | The monitoring of performance parameters to determine the quality of | |||
an interval is performed between the elements of the network that are | an interval is performed between the elements of the network that are | |||
referred to for the SLO corresponding to the performance parameter. | identified in the SLO corresponding to the performance parameter. | |||
Mechanisms of setting levels of a threshold of an SLO are outside the | Mechanisms for setting levels of a threshold of an SLO are outside | |||
scope of this document. | the scope of this document. | |||
From these definitions, a set of basic metrics can be defined that | From the definitions above, a set of basic metrics can be defined | |||
count the numbers of time intervals that fall into each category: | that count the number of time intervals that fall into each category: | |||
* VI count. | * VI count | |||
* SVI count. | * SVI count | |||
* VFI count. | * VFI count | |||
These count metrics are essential in calculating respective ratios | These count metrics are essential in calculating respective ratios | |||
(see Section 3.2) that can be used to assess the instability of a | (see Section 3.2) that can be used to assess the instability of a | |||
service. | service. | |||
Beyond accounting for violated intervals, it is sometimes beneficial | Beyond accounting for violated intervals, it is sometimes beneficial | |||
to maintain counts of packets for which a performance threshold is | to maintain counts of packets for which a performance threshold is | |||
violated. For example, this allows distinguishing between cases in | violated. For example, this allows for distinguishing between cases | |||
which violated intervals are caused by isolated violation occurrences | in which violated intervals are caused by isolated violation | |||
(such as, a sporadic issue that may be caused by a temporary spike in | occurrences (such as a sporadic issue that may be caused by a | |||
a queue depth along the packet's path) or by broad violations across | temporary spike in a queue depth along the packet's path) or by broad | |||
multiple packets (such as a problem with slow route convergence | violations across multiple packets (such as a problem with slow route | |||
across the network or more foundational issues such as insufficient | convergence across the network or more foundational issues such as | |||
network resources). Maintaining such counts and comparing them with | insufficient network resources). Maintaining such counts and | |||
the overall amount of traffic also facilitates assessing compliance | comparing them with the overall amount of traffic also facilitate | |||
with statistical SLOs (see Section 4). For these reasons, the | assessing compliance with statistical SLOs (see Section 4). For | |||
following additional metrics are defined: | these reasons, the following additional metrics are defined: | |||
* VPC: Violated packets count | * VPC (Violated Packets Count) | |||
* SVPC: Severely violated packets count | * SVPC (Severely Violated Packets Count) | |||
3.2. Derived Precision Availability Metrics | 3.2. Derived Precision Availability Metrics | |||
A set of metrics can be created based on PAM introduced in Section 3. | A set of metrics can be created based on PAMs as introduced in this | |||
In this document, these metrics are referred to as "derived PAM". | document. In this document, these metrics are referred to as | |||
Some of these metrics are modeled after Mean Time Between Failure | "derived PAMs". Some of these metrics are modeled after Mean Time | |||
(MTBF) metrics - a "failure" in this context referring to a failure | Between Failure (MTBF) metrics; a "failure" in this context refers to | |||
to deliver a service according to its SLO. | a failure to deliver a service according to its SLO. | |||
* Time since the last violated interval (e.g., since last violated | * Time since the last violated interval (e.g., since last violated | |||
ms, since last violated second). (This parameter is suitable for | ms or since last violated second). This parameter is suitable for | |||
monitoring the current compliance status of the service, e.g., for | monitoring the current compliance status of the service, e.g., for | |||
trending analysis.) | trending analysis. | |||
* Number of packets since the last violated packet. (This parameter | * Number of packets since the last violated packet. This parameter | |||
is suitable for the monitoring of the current compliance status of | is suitable for the monitoring of the current compliance status of | |||
the service.) | the service. | |||
* Mean time between VIs (e.g., between violated milliseconds, | * Mean time between VIs (e.g., between violated milliseconds or | |||
violated seconds) is the arithmetic mean of time between | between violated seconds). This parameter is the arithmetic mean | |||
consecutive VIs. | of time between consecutive VIs. | |||
* Mean packets between VIs is the arithmetic mean of the number of | * Mean packets between VIs. This parameter is the arithmetic mean | |||
SLO-compliant packets between consecutive VIs. (Another variation | of the number of SLO-compliant packets between consecutive VIs. | |||
of "MTBF" in a service setting.) | It is another variation of MTBF in a service setting. | |||
An analogous set of metrics can be produced for SVI: | An analogous set of metrics can be produced for SVI: | |||
* Time since the last SVI (e.g., since last violated ms, since last | * Time since the last SVI (e.g., since last violated ms or since | |||
violated second). (This parameter is suitable for the monitoring | last violated second). This parameter is suitable for the | |||
of the current compliance status of the service.) | monitoring of the current compliance status of the service. | |||
* Number of packets since the last severely violated packet. (This | * Number of packets since the last severely violated packet. This | |||
parameter is suitable for the monitoring of the current compliance | parameter is suitable for the monitoring of the current compliance | |||
status of the service.) | status of the service. | |||
* Mean time between SVIs (e.g., between severely violated | * Mean time between SVIs (e.g., between severely violated | |||
milliseconds, severely violated seconds) is the arithmetic mean of | milliseconds or between severely violated seconds). This | |||
time between consecutive SVIs. | parameter is the arithmetic mean of time between consecutive SVIs. | |||
* Mean packets between SVIs is the arithmetic mean of the number of | * Mean packets between SVIs. This parameter is the arithmetic mean | |||
SLO-compliant packets between consecutive SVIs. (Another | of the number of SLO-compliant packets between consecutive SVIs. | |||
variation of "MTBF" in a service setting.) | It is another variation of "MTBF" in a service setting. | |||
To indicate a historic degree of precision availability, additional | To indicate a historic degree of precision availability, additional | |||
derived PAMs can be defined as follows: | derived PAMs can be defined as follows: | |||
* Violated Interval Ratio (VIR) is the ratio of the summed numbers | * Violated Interval Ratio (VIR) is the ratio of the summed numbers | |||
of VIs and SVIs to the total number of time unit intervals in a | of VIs and SVIs to the total number of time unit intervals in a | |||
time of the availability periods during a fixed measurement | time of the availability periods during a fixed measurement | |||
session. | session. | |||
* Severely Violated Interval Ratio (SVIR) is the ratio of SVIs to | * Severely Violated Interval Ratio (SVIR) is the ratio of SVIs to | |||
the total number of time unit intervals in a time of the | the total number of time unit intervals in a time of the | |||
availability periods during a fixed measurement session. | availability periods during a fixed measurement session. | |||
3.3. PAM Configuration Settings and Service Availability | 3.3. PAM Configuration Settings and Service Availability | |||
It might be useful for a service provider to determine the current | It might be useful for a service provider to determine the current | |||
condition of the service for which PAMs are maintained. To | condition of the service for which PAMs are maintained. To | |||
facilitate this, it is conceivable to complement PAM with a state | facilitate this, it is conceivable to complement PAMs with a state | |||
model. Such a state model can be used to indicate whether a service | model. Such a state model can be used to indicate whether a service | |||
is currently considered as available or unavailable depending on the | is currently considered as available or unavailable depending on the | |||
network's recent ability to provide service without incurring | network's recent ability to provide service without incurring | |||
intervals during which violations occur. It is conceivable to define | intervals during which violations occur. It is conceivable to define | |||
such a state model in which transitions occur per some predefined PAM | such a state model in which transitions occur per some predefined PAM | |||
settings. | settings. | |||
While the definition of a service state model is outside the scope of | While the definition of a service state model is outside the scope of | |||
this document, the following section provides some considerations for | this document, this section provides some considerations for how such | |||
how such a state model and accompanying configuration settings could | a state model and accompanying configuration settings could be | |||
be defined. | defined. | |||
For example, a state model could be defined by a Finite State Machine | For example, a state model could be defined by a Finite State Machine | |||
featuring two states, "available" and "unavailable". The initial | featuring two states: "available" and "unavailable". The initial | |||
state could be "available". A service could subsequently be deemed | state could be "available". A service could subsequently be deemed | |||
as "unavailable" based on the number of successive interval | as "unavailable" based on the number of successive interval | |||
violations that have been experienced up to the particular | violations that have been experienced up to the particular | |||
observation time moment. To return to a state of "available", a | observation time moment. To return to a state of "available", a | |||
number of intervals without violations would need to be observed. | number of intervals without violations would need to be observed. | |||
The number of successive intervals with violations, as well as the | The number of successive intervals with violations, as well as the | |||
number of successive intervals that are free of violations, required | number of successive intervals that are free of violations, required | |||
for a state to transition to another state is defined by a | for a state to transition to another state is defined by a | |||
configuration setting. Specifically, the following configuration | configuration setting. Specifically, the following configuration | |||
parameters are defined: | parameters are defined: | |||
* Unavailability threshold: The number of successive intervals | Unavailability threshold: The number of successive intervals during | |||
during which a violation occurs to transition to an unavailable | which a violation occurs to transition to an unavailable state. | |||
state. | ||||
* Availability threshold: The number of successive intervals during | Availability threshold: The number of successive intervals during | |||
which no violations must occur to allow transition to an available | which no violations must occur to allow transition to an available | |||
state from a previously unavailable state. | state from a previously unavailable state. | |||
Additional configuration parameters could be defined to account for | Additional configuration parameters could be defined to account for | |||
the severity of violations. Likewise, it is conceivable to define | the severity of violations. Likewise, it is conceivable to define | |||
configuration settings that also take VIR and SVIR into account. | configuration settings that also take VIR and SVIR into account. | |||
4. Statistical SLO | 4. Statistical SLO | |||
It should be noted that certain SLAs may be statistical, requiring | It should be noted that certain SLAs may be statistical, requiring | |||
skipping to change at page 9, line 23 ¶ | skipping to change at line 384 ¶ | |||
not necessarily constitute an SLO violation. However, it is still | not necessarily constitute an SLO violation. However, it is still | |||
useful to maintain those statistics, as the number of out-of-SLO | useful to maintain those statistics, as the number of out-of-SLO | |||
packets still matters when looked at in proportion to the total | packets still matters when looked at in proportion to the total | |||
number of packets. | number of packets. | |||
Along that vein, an SLA might establish a multi-tiered SLO of, say, | Along that vein, an SLA might establish a multi-tiered SLO of, say, | |||
end-to-end latency (from the lowest to highest tier) as follows: | end-to-end latency (from the lowest to highest tier) as follows: | |||
* not to exceed 30 ms for any packet; | * not to exceed 30 ms for any packet; | |||
* to not exceed 25 ms for 99.999% of packets; | * not to exceed 25 ms for 99.999% of packets; and | |||
* to not exceed 20 ms for 99% of packets. | * not to exceed 20 ms for 99% of packets. | |||
In that case, any individual packet with a latency greater than 20 ms | In that case, any individual packet with a latency greater than 20 ms | |||
latency and lower than 30 ms cannot be considered an SLO violation in | latency and lower than 30 ms cannot be considered an SLO violation in | |||
itself, but compliance with the SLO may need to be assessed after the | itself, but compliance with the SLO may need to be assessed after the | |||
fact. | fact. | |||
To support statistical SLOs more directly requires additional | To support statistical SLOs more directly requires additional | |||
metrics, for example, metrics that represent histograms for service | metrics, for example, metrics that represent histograms for service- | |||
level parameters with buckets corresponding to individual service | level parameters with buckets corresponding to individual SLOs. | |||
level objectives. Although the definition of histogram metrics is | Although the definition of histogram metrics is outside the scope of | |||
outside the scope of this document and could be considered for future | this document and could be considered for future work (see | |||
work Section 6, for the example just given, a histogram for a | Section 6), for the example just given, a histogram for a particular | |||
particular flow could be maintained with four buckets: one containing | flow could be maintained with four buckets: one containing the count | |||
the count of packets within 20 ms, a second with a count of packets | of packets within 20 ms, a second with a count of packets between 20 | |||
between 20 and 25 ms (or simply all within 25 ms), a third with a | and 25 ms (or simply all within 25 ms), a third with a count of | |||
count of packets between 25 and 30 ms (or merely all packets within | packets between 25 and 30 ms (or merely all packets within 30 ms), | |||
30 ms, and a fourth with a count of anything beyond (or simply a | and a fourth with a count of anything beyond (or simply a total | |||
total count). Of course, the number of buckets and the boundaries | count). Of course, the number of buckets and the boundaries between | |||
between those buckets should correspond to the needs of the SLA | those buckets should correspond to the needs of the SLA associated | |||
associated with the application, i.e., to the specific guarantees and | with the application, i.e., to the specific guarantees and SLOs that | |||
SLOs that were provided. | were provided. | |||
5. Other Expected PAM Benefits | 5. Other Expected PAM Benefits | |||
PAM provides several benefits with other, more conventional | PAMs provide several benefits with other, more conventional | |||
performance metrics. Without PAM, it would be possible to conduct | performance metrics. Without PAMs, it would be possible to conduct | |||
ongoing measurements of service levels and maintain a time-series of | ongoing measurements of service levels, maintain a time series of | |||
service level records, then assess compliance with specific SLOs | service-level records, and then assess compliance with specific SLOs | |||
after the fact. However, doing so would require the collection of | after the fact. However, doing so would require the collection of | |||
vast amounts of data that would need to be generated, exported, | vast amounts of data that would need to be generated, exported, | |||
transmitted, collected, and stored. In addition, extensive | transmitted, collected, and stored. In addition, extensive post- | |||
postprocessing would be required to compare that data against SLOs | processing would be required to compare that data against SLOs and | |||
and analyze its compliance. Being able to perform these tasks at | analyze its compliance. Being able to perform these tasks at scale | |||
scale and in real-time would present significant additional | and in real time would present significant additional challenges. | |||
challenges. | ||||
Adding PAM allows for a more compact expression of service level | Adding PAMs allows for a more compact expression of service-level | |||
compliance. In that sense, PAM does not simply represent raw data | compliance. In that sense, PAMs do not simply represent raw data but | |||
but expresses actionable information. In conjunction with proper | expresses actionable information. In conjunction with proper | |||
instrumentation, PAM can thus help avoid expensive postprocessing. | instrumentation, PAMs can thus help avoid expensive post-processing. | |||
6. Extensions and Future Work | 6. Extensions and Future Work | |||
The following is a list of items that are outside the scope of this | The following is a list of items that are outside the scope of this | |||
specification, but which will be useful extensions and opportunities | specification but will be useful extensions and opportunities for | |||
for future work: | future work: | |||
* A YANG data model will allow PAM to be incorporated into | * A YANG data model will allow PAMs to be incorporated into | |||
monitoring applications based on the YANG/NETCONF/RESTCONF | monitoring applications based on the YANG, NETCONF, and RESTCONF | |||
framework. In addition, a YANG data model will enable the | frameworks. In addition, a YANG data model will enable the | |||
configuration and retrieval of PAM-related settings. | configuration and retrieval of PAM-related settings. | |||
* A set of IPFIX Information Elements will allow PAM to be | * A set of IPFIX Information Elements will allow PAMs to be | |||
associated with flow records and exported as part of flow data, | associated with flow records and exported as part of flow data, | |||
for example, for processing by accounting applications that assess | for example, for processing by accounting applications that assess | |||
compliance of delivered services with quality guarantees. | compliance of delivered services with quality guarantees. | |||
* Additional second-order metrics, such as "longest disruption of | * Additional second-order metrics, such as "longest disruption of | |||
service time" (measuring consecutive time units with SVIs), can be | service time" (measuring consecutive time units with SVIs), can be | |||
defined and would be deemed useful by some users. At the same | defined and would be deemed useful by some users. At the same | |||
time, such metrics can be computed in a straightforward manner and | time, such metrics can be computed in a straightforward manner and | |||
will in many cases be application-specific. For this reason, | will be application specific in many cases. For this reason, such | |||
further such metrics are omitted here in order to not overburden | metrics are omitted here in order to not overburden this | |||
this specification. | specification. | |||
* The definition of the metrics that represent histograms for | * Metrics can be defined to represent histograms for service-level | |||
service level parameters with buckets corresponding to individual | parameters with buckets corresponding to individual SLOs. | |||
service level objectives, | ||||
7. IANA Considerations | 7. IANA Considerations | |||
This document has no IANA actions. | This document has no IANA actions. | |||
8. Security Considerations | 8. Security Considerations | |||
Instrumentation for metrics that are used to assess compliance with | Instrumentation for metrics that are used to assess compliance with | |||
SLOs constitute an attractive target for an attacker. By interfering | SLOs constitutes an attractive target for an attacker. By | |||
with the maintenance of such metrics, services could be falsely | interfering with the maintenance of such metrics, services could be | |||
identified as complying (when they are not) or vice-versa (i.e., | falsely identified as complying (when they are not) or vice versa | |||
flagged as being non-compliant when indeed they are). While this | (i.e., flagged as being non-compliant when indeed they are). While | |||
document does not specify how networks should be instrumented to | this document does not specify how networks should be instrumented to | |||
maintain the identified metrics, such instrumentation needs to be | maintain the identified metrics, such instrumentation needs to be | |||
adequately secured to ensure accurate measurements and prohibit | adequately secured to ensure accurate measurements and prohibit | |||
tampering with metrics being kept. | tampering with metrics being kept. | |||
Where metrics are being defined relative to an SLO, the configuration | Where metrics are being defined relative to an SLO, the configuration | |||
of those SLOs needs to be adequately secured. Likewise, where SLOs | of those SLOs needs to be adequately secured. Likewise, where SLOs | |||
can be adjusted, the correlation between any metric instance and a | can be adjusted, the correlation between any metric instance and a | |||
particular SLO must be unambiguous. The same service levels that | particular SLO must be unambiguous. The same service levels that | |||
constitute SLO violations for one flow that should be maintained as | constitute SLO violations for one flow and should be maintained as | |||
part of the "violated time units" and related metrics, may be | part of the "violated time units" and related metrics may be | |||
compliant for another flow. In cases when it is impossible to tie | compliant for another flow. In cases when it is impossible to tie | |||
together SLOs and PAM, it will be preferable to merely maintain | together SLOs and PAMs, it is preferable to merely maintain | |||
statistics about service levels delivered (for example, overall | statistics about service levels delivered (for example, overall | |||
histograms of end-to-end latency) without assessing which constitutes | histograms of end-to-end latency) without assessing which constitute | |||
violations. | violations. | |||
By the same token, where the definition of what constitutes a | By the same token, the definition of what constitutes a "severe" or a | |||
"severe" or a "significant" violation depends on configuration | "significant" violation depends on configuration settings or context. | |||
settings or context. The configuration of such settings or context | The configuration of such settings or context needs to be specially | |||
needs to be specially secured. Also, the configuration must be bound | secured. Also, the configuration must be bound to the metrics being | |||
to the metrics being maintained. Thus, it will be clear which | maintained. Thus, it will be clear which configuration setting was | |||
configuration setting was in effect when those metrics were being | in effect when those metrics were being assessed. An attacker that | |||
assessed. An attacker that can tamper with such configuration | can tamper with such configuration settings will render the | |||
settings will render the corresponding metrics useless (in the best | corresponding metrics useless (in the best case) or misleading (in | |||
case) or misleading (in the worst case). | the worst case). | |||
9. Acknowledgments | ||||
The authors greatly appreciate review and comments by Bjørn Ivar | ||||
Teigen and Christian Jacquenet. | ||||
10. References | ||||
10.1. Informative References | ||||
[I-D.ietf-teas-ietf-network-slices] | 9. Informative References | |||
Farrel, A., Drake, J., Rokui, R., Homma, S., Makhijani, | ||||
K., Contreras, L. M., and J. Tantsura, "A Framework for | ||||
Network Slices in Networks Built from IETF Technologies", | ||||
Work in Progress, Internet-Draft, draft-ietf-teas-ietf- | ||||
network-slices-25, 14 September 2023, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-teas- | ||||
ietf-network-slices-25>. | ||||
[IANA-PM-Registry] | [IANA-PM-Registry] | |||
IANA, "IANA Registry of Performance Metrics", March 2020, | IANA, "Performance Metrics", | |||
<https://www.iana.org/assignments/performance-metrics/ | <https://www.iana.org/assignments/performance-metrics>. | |||
performance-metrics.xhtml>. | ||||
[ITU.G.826] | [ITU.G.826] | |||
ITU-T, "End-to-end error performance parameters and | ITU-T, "End-to-end error performance parameters and | |||
objectives for international, constant bit-rate digital | objectives for international, constant bit-rate digital | |||
paths and connections", ITU-T G.826, December 2002. | paths and connections", ITU-T G.826, December 2002. | |||
[RFC2863] McCloghrie, K. and F. Kastenholz, "The Interfaces Group | [RFC2863] McCloghrie, K. and F. Kastenholz, "The Interfaces Group | |||
MIB", RFC 2863, DOI 10.17487/RFC2863, June 2000, | MIB", RFC 2863, DOI 10.17487/RFC2863, June 2000, | |||
<https://www.rfc-editor.org/info/rfc2863>. | <https://www.rfc-editor.org/info/rfc2863>. | |||
skipping to change at page 13, line 15 ¶ | skipping to change at line 543 ¶ | |||
[RFC8911] Bagnulo, M., Claise, B., Eardley, P., Morton, A., and A. | [RFC8911] Bagnulo, M., Claise, B., Eardley, P., Morton, A., and A. | |||
Akhter, "Registry for Performance Metrics", RFC 8911, | Akhter, "Registry for Performance Metrics", RFC 8911, | |||
DOI 10.17487/RFC8911, November 2021, | DOI 10.17487/RFC8911, November 2021, | |||
<https://www.rfc-editor.org/info/rfc8911>. | <https://www.rfc-editor.org/info/rfc8911>. | |||
[RFC8912] Morton, A., Bagnulo, M., Eardley, P., and K. D'Souza, | [RFC8912] Morton, A., Bagnulo, M., Eardley, P., and K. D'Souza, | |||
"Initial Performance Metrics Registry Entries", RFC 8912, | "Initial Performance Metrics Registry Entries", RFC 8912, | |||
DOI 10.17487/RFC8912, November 2021, | DOI 10.17487/RFC8912, November 2021, | |||
<https://www.rfc-editor.org/info/rfc8912>. | <https://www.rfc-editor.org/info/rfc8912>. | |||
Contributors' Addresses | [RFC9543] Farrel, A., Ed., Drake, J., Ed., Rokui, R., Homma, S., | |||
Makhijani, K., Contreras, L., and J. Tantsura, "A | ||||
Framework for Network Slices in Networks Built from IETF | ||||
Technologies", RFC 9543, DOI 10.17487/RFC9543, February | ||||
2024, <https://www.rfc-editor.org/info/rfc9543>. | ||||
Acknowledgments | ||||
The authors greatly appreciate review and comments by Bjørn Ivar | ||||
Teigen and Christian Jacquenet. | ||||
Contributors | ||||
Liuyan Han | Liuyan Han | |||
China Mobile | China Mobile | |||
32 XuanWuMenXi Street | 32 XuanWuMenXi Street | |||
Beijing | Beijing | |||
100053 | 100053 | |||
China | China | |||
Email: hanliuyan@chinamobile.com | Email: hanliuyan@chinamobile.com | |||
Mohamed Boucadair | Mohamed Boucadair | |||
skipping to change at page 14, line 4 ¶ | skipping to change at line 587 ¶ | |||
Greg Mirsky | Greg Mirsky | |||
Ericsson | Ericsson | |||
Email: gregimirsky@gmail.com | Email: gregimirsky@gmail.com | |||
Joel Halpern | Joel Halpern | |||
Ericsson | Ericsson | |||
Email: joel.halpern@ericsson.com | Email: joel.halpern@ericsson.com | |||
Xiao Min | Xiao Min | |||
ZTE Corp. | ZTE Corp. | |||
Email: xiao.min2@zte.com.cn | Email: xiao.min2@zte.com.cn | |||
Alexander Clemm | Alexander Clemm | |||
Futurewei | ||||
2330 Central Expressway | ||||
Santa Clara, CA 95050 | ||||
United States of America | ||||
Email: ludwig@clemm.org | Email: ludwig@clemm.org | |||
John Strassner | John Strassner | |||
Futurewei | Futurewei | |||
2330 Central Expressway | 2330 Central Expressway | |||
Santa Clara, CA 95050 | Santa Clara, CA 95050 | |||
United States of America | United States of America | |||
Email: strazpdj@gmail.com | Email: strazpdj@gmail.com | |||
Jerome Francois | Jerome Francois | |||
Inria and University of Luxembourg | Inria and University of Luxembourg | |||
615 Rue du Jardin Botanique | 615 Rue du Jardin Botanique | |||
54600 Villers-les-Nancy | 54600 Villers-les-Nancy | |||
France | France | |||
Email: jerome.francois@inria.fr | Email: jerome.francois@inria.fr | |||
End of changes. 84 change blocks. | ||||
252 lines changed or deleted | 231 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |