rfc9417v2.txt | rfc9417.txt | |||
---|---|---|---|---|
Internet Engineering Task Force (IETF) B. Claise | Internet Engineering Task Force (IETF) B. Claise | |||
Request for Comments: 9417 J. Quilbeuf | Request for Comments: 9417 J. Quilbeuf | |||
Category: Informational Huawei | Category: Informational Huawei | |||
ISSN: 2070-1721 D. Lopez | ISSN: 2070-1721 D. Lopez | |||
Telefonica I+D | Telefonica I+D | |||
D. Voyer | D. Voyer | |||
Bell Canada | Bell Canada | |||
T. Arumugam | T. Arumugam | |||
Cisco Systems, Inc. | Consultant | |||
May 2023 | June 2023 | |||
Service Assurance for Intent-Based Networking Architecture | Service Assurance for Intent-Based Networking Architecture | |||
Abstract | Abstract | |||
This document describes an architecture that provides some assurance | This document describes an architecture that provides some assurance | |||
that service instances are running as expected. As services rely | that service instances are running as expected. As services rely | |||
upon multiple subservices provided by a variety of elements, | upon multiple subservices provided by a variety of elements, | |||
including the underlying network devices and functions, getting the | including the underlying network devices and functions, getting the | |||
assurance of a healthy service is only possible with a holistic view | assurance of a healthy service is only possible with a holistic view | |||
skipping to change at line 99 ¶ | skipping to change at line 99 ¶ | |||
Service orchestrators use Network Service YANG Modules that will | Service orchestrators use Network Service YANG Modules that will | |||
infer network-wide configuration and, therefore, the invocation of | infer network-wide configuration and, therefore, the invocation of | |||
the appropriate device modules (Section 3 of [RFC8969]). Knowing | the appropriate device modules (Section 3 of [RFC8969]). Knowing | |||
that a configuration is applied doesn't imply that the provisioned | that a configuration is applied doesn't imply that the provisioned | |||
service instance is up and running as expected. For instance, the | service instance is up and running as expected. For instance, the | |||
service might be degraded because of a failure in the network, the | service might be degraded because of a failure in the network, the | |||
service quality may be degraded, or a service function may be | service quality may be degraded, or a service function may be | |||
reachable at the IP level but does not provide its intended function. | reachable at the IP level but does not provide its intended function. | |||
Thus, the network operator must monitor the service's operational | Thus, the network operator must monitor the service's operational | |||
data at the same time as the configuration (Section 3.3 of | data at the same time as the configuration (Section 3.3 of | |||
[RFC8969]). To feul that task, the industry has been standardizing | [RFC8969]). To fuel that task, the industry has been standardizing | |||
on telemetry to push network element performance information (e.g., | on telemetry to push network element performance information (e.g., | |||
[RFC9375]). | [RFC9375]). | |||
A network administrator needs to monitor its network and services as | A network administrator needs to monitor its network and services as | |||
a whole, independently of the management protocols. With different | a whole, independently of the management protocols. With different | |||
protocols come different data models and different ways to model the | protocols come different data models and different ways to model the | |||
same type of information. When network administrators deal with | same type of information. When network administrators deal with | |||
multiple management protocols, the network management entities have | multiple management protocols, the network management entities have | |||
to perform the difficult and time-consuming job of mapping data | to perform the difficult and time-consuming job of mapping data | |||
models, e.g., the model used for configuration with the model used | models, e.g., the model used for configuration with the model used | |||
for monitoring when separate models or protocols are used. This | for monitoring when separate models or protocols are used. This | |||
problem is compounded by a large, disparate set of data sources | problem is compounded by a large, disparate set of data sources | |||
(e.g., MIB modules, YANG models [RFC7950], IP Flow Information Export | (e.g., MIB modules, YANG data models [RFC7950], IP Flow Information | |||
(IPFIX) information elements [RFC7011], syslog plain text [RFC5424], | Export (IPFIX) information elements [RFC7011], syslog plain text | |||
Terminal Access Controller Access-Control System Plus (TACACS+) | [RFC5424], Terminal Access Controller Access-Control System Plus | |||
[RFC8907], RADIUS [RFC2865], etc.). In order to avoid this data | (TACACS+) [RFC8907], RADIUS [RFC2865], etc.). In order to avoid this | |||
model mapping, the industry converged on model-driven telemetry to | data model mapping, the industry converged on model-driven telemetry | |||
stream the service operational data, reusing the YANG models used for | to stream the service operational data, reusing the YANG data models | |||
configuration. Model-driven telemetry greatly facilitates the notion | used for configuration. Model-driven telemetry greatly facilitates | |||
of closed-loop automation, whereby events and updated operational | the notion of closed-loop automation, whereby events and updated | |||
states streamed from the network drive remediation change back into | operational states streamed from the network drive remediation change | |||
the network. | back into the network. | |||
However, it proves difficult for network operators to correlate the | However, it proves difficult for network operators to correlate the | |||
service degradation with the network root cause, for example, "Why | service degradation with the network root cause, for example, "Why | |||
does my layer 3 virtual private network (L3VPN) fail to connect?" or | does my layer 3 virtual private network (L3VPN) fail to connect?" or | |||
"Why is this specific service not highly responsive?" The reverse, | "Why is this specific service not highly responsive?" The reverse, | |||
i.e., which services are impacted when a network component fails or | i.e., which services are impacted when a network component fails or | |||
degrades, is also important for operators, for example, "Which | degrades, is also important for operators, for example, "Which | |||
services are impacted when this specific optic decibel milliwatt | services are impacted when this specific optic decibel milliwatt | |||
(dBm) begins to degrade?", "Which applications are impacted by an | (dBm) begins to degrade?", "Which applications are impacted by an | |||
imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that | imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that | |||
skipping to change at line 356 ¶ | skipping to change at line 356 ¶ | |||
graph and computing the health statuses in a distributed manner. The | graph and computing the health statuses in a distributed manner. The | |||
collector is in charge of collecting and displaying the current | collector is in charge of collecting and displaying the current | |||
inferred health status of the service instances and subservices. The | inferred health status of the service instances and subservices. The | |||
collector also detects changes in the assurance graph structures | collector also detects changes in the assurance graph structures | |||
(e.g., an occurrence of a switchover from primary to backup path) and | (e.g., an occurrence of a switchover from primary to backup path) and | |||
forwards the information to the orchestrator, which reconfigures the | forwards the information to the orchestrator, which reconfigures the | |||
agents. Finally, the automation loop is closed by having the SAIN | agents. Finally, the automation loop is closed by having the SAIN | |||
collector provide feedback to the network/service orchestrator. | collector provide feedback to the network/service orchestrator. | |||
In order to make agents, orchestrators, and collectors from different | In order to make agents, orchestrators, and collectors from different | |||
vendors interoperable, their interface is defined as a YANG model in | vendors interoperable, their interface is defined as a YANG module in | |||
a companion document [RFC9418]. In Figure 1, the communications that | a companion document [RFC9418]. In Figure 1, the communications that | |||
are normalized by this YANG model are tagged with a "Y". The use of | are normalized by this YANG module are tagged with a "Y". The use of | |||
this YANG module is further explained in Section 3.5. | this YANG module is further explained in Section 3.5. | |||
+-----------------+ | +-----------------+ | |||
| Service | | | Service | | |||
| Orchestrator |<----------------------+ | | Orchestrator |<----------------------+ | |||
| | | | | | | | |||
+-----------------+ | | +-----------------+ | | |||
| ^ | | | ^ | | |||
| | Network | | | | Network | | |||
| | Service | Feedback | | | Service | Feedback | |||
skipping to change at line 811 ¶ | skipping to change at line 811 ¶ | |||
account in the parent service instance or subservice instance(s) | account in the parent service instance or subservice instance(s) | |||
for informational reasons. | for informational reasons. | |||
Impacting Dependency: | Impacting Dependency: | |||
The type of dependency whose health score impacts the health score | The type of dependency whose health score impacts the health score | |||
of its parent subservice or service instance(s) in the assurance | of its parent subservice or service instance(s) in the assurance | |||
graph. The symptoms are taken into account in the parent service | graph. The symptoms are taken into account in the parent service | |||
instance or subservice instance(s) as the impacting reasons. | instance or subservice instance(s) as the impacting reasons. | |||
The set of dependency types presented here is not exhaustive. More | The set of dependency types presented here is not exhaustive. More | |||
specific dependency types can be defined by extending the YANG model. | specific dependency types can be defined by extending the YANG | |||
For instance, a connectivity subservice depending on several path | module. For instance, a connectivity subservice depending on several | |||
subservices is partially impacted if only one of these paths fails. | path subservices is partially impacted if only one of these paths | |||
Adding these new dependency types requires defining the corresponding | fails. Adding these new dependency types requires defining the | |||
operation for combining statuses of subservices. | corresponding operation for combining statuses of subservices. | |||
Subservices shall not be dependent on the protocol used to retrieve | Subservices shall not be dependent on the protocol used to retrieve | |||
the metrics. To justify this, let's consider the interface | the metrics. To justify this, let's consider the interface | |||
operational status. Depending on the device capabilities, this | operational status. Depending on the device capabilities, this | |||
status can be collected by an industry-accepted YANG module (e.g., | status can be collected by an industry-accepted YANG module (e.g., | |||
IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module, | IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module, | |||
or even by a MIB module. If the subservice was dependent on the | or even by a MIB module. If the subservice was dependent on the | |||
mechanism to collect the operational status, then we would need | mechanism to collect the operational status, then we would need | |||
multiple subservice definitions in order to support all different | multiple subservice definitions in order to support all different | |||
mechanisms. This also implies that, while waiting for all the | mechanisms. This also implies that, while waiting for all the | |||
metrics to be available via standard YANG modules, SAIN agents might | metrics to be available via standard YANG modules, SAIN agents might | |||
have to retrieve metric values via nonstandard YANG models, MIB | have to retrieve metric values via nonstandard YANG data models, MIB | |||
modules, the Command-Line Interface (CLI), etc., effectively | modules, the Command-Line Interface (CLI), etc., effectively | |||
implementing a normalization layer between data models and | implementing a normalization layer between data models and | |||
information models. | information models. | |||
In order to keep subservices independent of metric collection method | In order to keep subservices independent of metric collection method | |||
(or, expressed differently, to support multiple combinations of | (or, expressed differently, to support multiple combinations of | |||
platforms, OSes, and even vendors), the architecture introduces the | platforms, OSes, and even vendors), the architecture introduces the | |||
concept of "metric engine". The metric engine maps each device- | concept of "metric engine". The metric engine maps each device- | |||
independent metric used in the subservices to a list of device- | independent metric used in the subservices to a list of device- | |||
specific metric implementations that precisely define how to fetch | specific metric implementations that precisely define how to fetch | |||
skipping to change at line 1042 ¶ | skipping to change at line 1042 ¶ | |||
This document has no IANA actions. | This document has no IANA actions. | |||
5. Security Considerations | 5. Security Considerations | |||
The SAIN architecture helps operators to reduce the mean time to | The SAIN architecture helps operators to reduce the mean time to | |||
detect and the mean time to repair. However, the SAIN agents must be | detect and the mean time to repair. However, the SAIN agents must be | |||
secured; a compromised SAIN agent may be sending incorrect root | secured; a compromised SAIN agent may be sending incorrect root | |||
causes or symptoms to the management systems. Securing the agents | causes or symptoms to the management systems. Securing the agents | |||
falls back to ensuring the integrity and confidentiality of the | falls back to ensuring the integrity and confidentiality of the | |||
assurance graph. This can be partially achieved by correctly setting | assurance graph. This can be partially achieved by correctly setting | |||
permissions of each node in the YANG model, as described in Section 6 | permissions of each node in the YANG data model, as described in | |||
of [RFC9418]. | Section 6 of [RFC9418]. | |||
Except for the configuration of telemetry, the agents do not need | Except for the configuration of telemetry, the agents do not need | |||
"write access" to the devices they monitor. This configuration is | "write access" to the devices they monitor. This configuration is | |||
applied with a YANG module, whose protection is covered by Secure | applied with a YANG module, whose protection is covered by Secure | |||
Shell (SSH) [RFC6242] for the Network Configuration Protocol | Shell (SSH) [RFC6242] for the Network Configuration Protocol | |||
(NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be | (NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be | |||
configured so that agents have their own credentials with write | configured so that agents have their own credentials with write | |||
access only for the YANG nodes configuring the telemetry. | access only for the YANG nodes configuring the telemetry. | |||
The data collected by SAIN could potentially be compromising to the | The data collected by SAIN could potentially be compromising to the | |||
skipping to change at line 1095 ¶ | skipping to change at line 1095 ¶ | |||
Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018, | Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018, | |||
<https://www.rfc-editor.org/info/rfc8309>. | <https://www.rfc-editor.org/info/rfc8309>. | |||
[RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and | [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and | |||
L. Geng, "A Framework for Automating Service and Network | L. Geng, "A Framework for Automating Service and Network | |||
Management with YANG", RFC 8969, DOI 10.17487/RFC8969, | Management with YANG", RFC 8969, DOI 10.17487/RFC8969, | |||
January 2021, <https://www.rfc-editor.org/info/rfc8969>. | January 2021, <https://www.rfc-editor.org/info/rfc8969>. | |||
[RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. | [RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. | |||
Arumugam, "YANG Modules for Service Assurance", RFC 9418, | Arumugam, "YANG Modules for Service Assurance", RFC 9418, | |||
DOI 10.17487/RFC9418, May 2023, | DOI 10.17487/RFC9418, June 2023, | |||
<https://www.rfc-editor.org/info/rfc9418>. | <https://www.rfc-editor.org/info/rfc9418>. | |||
6.2. Informative References | 6.2. Informative References | |||
[OpenConfig] | [OpenConfig] | |||
"OpenConfig", <https://openconfig.net>. | "OpenConfig", <https://openconfig.net>. | |||
[Piovesan2017] | [Piovesan2017] | |||
Piovesan, A. and E. Griffor, "7 - Reasoning About Safety | Piovesan, A. and E. Griffor, "7 - Reasoning About Safety | |||
and Security: The Logic of Assurance", | and Security: The Logic of Assurance", | |||
skipping to change at line 1217 ¶ | skipping to change at line 1217 ¶ | |||
28006 Madrid | 28006 Madrid | |||
Spain | Spain | |||
Email: diego.r.lopez@telefonica.com | Email: diego.r.lopez@telefonica.com | |||
Dan Voyer | Dan Voyer | |||
Bell Canada | Bell Canada | |||
Canada | Canada | |||
Email: daniel.voyer@bell.ca | Email: daniel.voyer@bell.ca | |||
Thangam Arumugam | Thangam Arumugam | |||
Cisco Systems, Inc. | Consultant | |||
Milpitas, California | Milpitas, California | |||
United States of America | United States of America | |||
Email: tarumuga@cisco.com | Email: thangavelu@yahoo.com | |||
End of changes. 11 change blocks. | ||||
25 lines changed or deleted | 25 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |