3. Base IETF Service Assurance YANG Module
3.1. Concepts
The "ietf-service-assurance" YANG module assumes a set of subservices to be assured independently. A subservice is a feature or a subpart of the network system that a given service instance depends on. Examples of subservice types include the following:¶
- device: Whether a device is healthy, and if not, what are the symptoms? Such a subservice might monitor the device resources, such as CPU, RAM, or Ternary Content-Addressable Memory (TCAM). Potential symptoms are "CPU overloaded", "Out of RAM", or "Out of TCAM".¶
- ip-connectivity: Given two IP addresses bound to two devices, what is the quality of the IP connectivity between them? Potential symptoms are "No route available" or "Equal-Cost Multipaths (ECMPs) imbalance".¶
An instance of the device subservice is representing a subpart of the network system, namely a specific device. An instance of the ip-connectivity subservice is representing a feature of the network, namely the connectivity between two specific IP addresses on two devices. In both cases, these subservices might depend on other subservices, for instance, the connectivity might depend on a subservice representing the routing system and on a subservice representing ECMPs.¶
The two example subservices presented above need different sets of parameters to fully characterize one of their instances. An instance of the device subservice is fully characterized by a single parameter allowing to identify the device to monitor. For the ip-connectivity subservice, at least the device and IP address for both ends of the link are needed to fully characterize an instance.¶
The base model presented in this section specifies a single type of subservice, which represents service instances. Such nodes play a particular role in the assurance graph because they represent the starting point, or root, for the assurance graph of the corresponding service instance. The parameters required to fully identify a service instance are the name of the service and the name of the service instance. To support other types of subservices, such as device or ip-connectivity, the "ietf-service-assurance" module is intended to be augmented.¶
The dependencies are modeled as a list, i.e., each subservice contains a list of references to its dependencies. That list can be empty if the subservice instance does not have any dependencies.¶
By specifying service instances and their dependencies in terms of subservices, one defines a global assurance graph. That assurance graph is the result of merging all the individual assurance graphs for the assured service instances. Each subservice instance is expected to appear only once in the global assurance graph even if several service instances depend on it. For example, an instance of the device subservice is a dependency of every service instance that relies on the corresponding device. The assurance graph of a specific service instance is the subgraph obtained by traversing the global assurance graph through the dependencies, starting from the specific service instance.¶
An assurance agent configured with such a graph is expected to produce, for each configured subservice, a health status that indicates how healthy the subservice is. If the the subservice is not healthy, the agent is expected to produce a list of symptoms explaining why the subservice is not healthy.¶
3.2. Tree View
The following tree diagram [RFC8340] provides an overview of the "ietf-service-assurance" module.¶
module: ietf-service-assurance +--ro assurance-graph-last-change yang:date-and-time +--rw subservices | +--rw subservice* [type id] | +--rw type identityref | +--rw id string | +--ro last-change? yang:date-and-time | +--ro label? string | +--rw under-maintenance! | | +--rw contact string | +--rw (parameter) | | +--:(service-instance-parameter) | | +--rw service-instance-parameter | | +--rw service string | | +--rw instance-name string | +--ro health-score int8 | +--ro symptoms-history-start? yang:date-and-time | +--ro symptoms | | +--ro symptom* [start-date-time agent-id symptom-id] | | +--ro symptom-id leafref | | +--ro agent-id -> /agents/agent/id | | +--ro health-score-weight? uint8 | | +--ro start-date-time yang:date-and-time | | +--ro stop-date-time? yang:date-and-time | +--rw dependencies | +--rw dependency* [type id] | +--rw type | | -> /subservices/subservice/type | +--rw id leafref | +--rw dependency-type? identityref +--ro agents | +--ro agent* [id] | +--ro id string | +--ro symptoms* [id] | +--ro id string | +--ro description string +--ro assured-services +--ro assured-service* [service] +--ro service leafref +--ro instances* [name] +--ro name leafref +--ro subservices* [type id] +--ro type -> /subservices/subservice/type +--ro id leafref¶
The date of the last change in "assurance-graph-last-change" is read only. It must be updated each time the graph structure is changed by addition or deletion of subservices and dependencies or modifications of their configurable attributes, including their maintenance statuses. Such modifications correspond to a structural change in the graph. The date of the last change is useful for a client to quickly check if there is a need to update the graph structure. A change in the health score or symptoms associated to a service or subservice does not change the structure of the graph, and thus has no effect on the date of the last change.¶
The "subservices" list contains all the subservice instances currently known by the server (i.e., SAIN agent or SAIN collector). A subservice declaration MUST provide the following:¶
- a subservice type ("type"): a reference to an identity that inherits from "subservice-base", which is the base identity for any subservice type¶
- an id ("id"): a string uniquely identifying the subservice among those with the same type¶
The type and id uniquely identify a given subservice.¶
The "last-change" indicates when the dependencies or maintenance status of this particular subservice were last modified.¶
The "label" is a human-readable description of the subservice.¶
The presence of the "under-maintenance" container inhibits the emission of symptoms for the subservice and subservices that depend on them. In that case, a "contact" MUST be provided to indicate who or which software is responsible for the maintenance. See Section 3.6 of [RFC9417] for a more detailed discussion.¶
The "parameter" choice is intended to be augmented in order to describe parameters that are specific to the current subservice type. This base module defines only the subservice type representing service instances. Service instances MUST be modeled as a particular type of subservice with two parameters: "service" and "instance-name". The "service" parameter is the name of the service defined in the network orchestrator, for instance, "point-to-point-l2vpn". The "instance-name" parameter is the name assigned to the particular instance to be assured, for instance, the name of the customer using that instance.¶
The "health-score" contains a value normally between 0 and 100, indicating how healthy the subservice is. As mentioned in the health score definition, the special value -1 can be used to specify that no value could be computed for that health score, for instance, if some metric needed for that computation could not be collected.¶
The "symptoms-history-start" is the cutoff date for reporting symptoms. Symptoms that were terminated before that date are not reported anymore in the model.¶
The status of each subservice contains a list of symptoms. Each symptom is specified by:¶
- an identifier "symptom-id", which identifies the symptom locally to an agent,¶
- an agent identifier "agent-id", which identifies the agent raising the symptom,¶
- a "health-score-weight" specifying the impact to the health score incurred by this symptom,¶
- a "start-date-time" indicating when the symptom became active, and¶
- a "stop-date-time" indicating when the symptom stopped being active (this field is not present if the symptom is still active).¶
In order for the pair "agent-id" and "symptom-id" to uniquely identify a symptom, the following is necessary:¶
- "agent-id" MUST be unique among all agents of the system.¶
- "symptom-id" MUST be unique among all symptoms raised by the agent.¶
Note that "agent-id" and "symptom-id" are leafrefs pointing to the objects defined later in the document. While the combination of "symptom-id" and "agent-id" is sufficient as a unique key list, the "start-date-time" second key helps to sort and retrieve relevant symptoms.¶
The "dependency" list contains the dependencies for the current subservice. Each of them is specified by a leafref to both "type" and "id" of the target dependencies. A dependency has a type indicated in the "dependency-type" field. Two types are specified in the model:¶
- Impacting: Such a dependency indicates an impact on the health of the dependent.¶
- Informational: Such a dependency might explain why the dependent has issues but does not impact its health.¶
To illustrate the difference between "impacting" and "informational", consider the interface subservice representing a network interface. If the device to which the network interface belongs goes down, the network interface will transition to a "down" state as well. Therefore, the dependency of the interface subservice towards the device subservice is "impacting". On the other hand, a dependency towards the ecmp-load subservice, which checks that the load between ECMPs remains stable throughout time, is only "informational". Indeed, services might be perfectly healthy even if the load distribution between ECMPs changed. However, such an instability might be a relevant symptom for diagnosing the root cause of a problem.¶
Within the container "agents", the list "agent" contains the list of symptoms per agent. The key of the list is the "id", which MUST be unique among agents of a given assurance system. For each agent, the list "symptoms-description" maps an "id" to its "description". The "id" MUST be unique among the symptoms raised by the agent.¶
Within the container "assured-services", the list "assured-service" contains the subservices indexed by assured service instances. For each service type identified by the "service" leaf, all instances of that service are listed in the "instances" list. For each instance identified by the "name" leaf, the "subservices" list contains all descendant subservices that are part of the assurance graph for that specific instance. These imbricated lists provide a query optimization to get the list of subservices in that assurance graph in a single query instead of recursively querying the dependencies of each subservice, starting from the node representing the service instance.¶
The relation between the health score ("health-score") and the "health-score-weight" of the currently active symptoms is not explicitly defined in this document. The only requirement is that a health score that is strictly smaller than 100 (the maximal value) must be explained by at least one symptom. A way to enforce that requirement is to first detect symptoms and then compute the health score based on the "health-score-weight" of the detected symptoms. As an example, such a computation could be to sum the "health-score-weight" of the active symptoms, subtract that value from 100, and change the value to 0 if the result is negative. The relation between health score and "health-score-weight" is left to the implementor (of an agent [RFC9417]).¶
Keeping the history of the graph structure is out of scope for this YANG module. Only the current version of the assurance graph can be fetched. In order to keep the history of the graph structure, some time-series database (TSDB) or similar storage must be used.¶
3.3. YANG Module
This model contains references to [RFC6991].¶
<CODE BEGINS> file "ietf-service-assurance@2023-06-02.yang" module ietf-service-assurance { yang-version 1.1; namespace "urn:ietf:params:xml:ns:yang:ietf-service-assurance"; prefix sain; import ietf-yang-types { prefix yang; reference "RFC 6991: Common YANG Data Types"; } organization "IETF OPSAWG Working Group"; contact "WG Web: <https://datatracker.ietf.org/wg/opsawg/> WG List: <mailto:opsawg@ietf.org> Author: Benoit Claise <mailto:benoit.claise@huawei.com> Author: Jean Quilbeuf <mailto:jean.quilbeu@huawei.com>"; description "This module defines objects for assuring services based on their decomposition into so-called subservices, according to the Service Assurance for Intent-based Networking (SAIN) architecture. The subservices hierarchically organized by dependencies constitute an assurance graph. This module should be supported by an assurance agent that is able to interact with the devices in order to produce the health status and symptoms for each subservice in the assurance graph. This module is intended for the following use cases: * Assurance graph configuration: - Subservices: Configure a set of subservices to assure by specifying their types and parameters. - Dependencies: Configure the dependencies between the subservices, along with their type. * Assurance telemetry: Export the health statuses of the subservices, along with the observed symptoms. Copyright (c) 2023 IETF Trust and the persons identified as authors of the code. All rights reserved. Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Revised BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info). This version of this YANG module is part of RFC 9418; see the RFC itself for full legal notices. "; revision 2023-06-02 { description "Initial version."; reference "RFC 9418: YANG Modules for Service Assurance"; } identity subservice-base { description "Base identity for subservice types."; } identity service-instance-type { base subservice-base; description "Specific type of subservice that represents a service instance. Instance of this type will depend on other subservices to build the top of the assurance graph."; } identity dependency-type { description "Base identity for representing dependency types."; } identity informational { base dependency-type; description "Indicates that symptoms of the dependency might be of interest for the dependent, but the status of the dependency should not have any impact on the dependent."; } identity impacting { base dependency-type; description "Indicates that the status of the dependency directly impacts the status of the dependent."; } grouping subservice-reference { description "Reference to a specific subservice identified by its type and identifier. This grouping is only for internal use in this module."; leaf type { type leafref { path "/subservices/subservice/type"; } description "The type of the subservice to refer to (e.g., device)."; } leaf id { type leafref { path "/subservices/subservice[type=current()/../type]/id"; } description "The identifier of the subservice to refer to."; } } grouping subservice-dependency { description "Represents a dependency to another subservice. This grouping is only for internal use in this module"; uses subservice-reference; leaf dependency-type { type identityref { base dependency-type; } description "Represents the type of dependency (e.g., informational or impacting)."; } } leaf assurance-graph-last-change { type yang:date-and-time; config false; mandatory true; description "Time and date at which the assurance graph last changed after any structural changes (dependencies and/or maintenance windows parameters) are applied to the subservice(s). The time and date must be the same or more recent than the most recent value of any changed subservices last-change time and date."; } container subservices { description "Root container for the subservices."; list subservice { key "type id"; description "List of configured subservices."; leaf type { type identityref { base subservice-base; } description "Type of the subservice identifying the type of the part or functionality that is being assured by this list entry, for instance, interface, device, or ip-connectivity."; } leaf id { type string; description "Identifier of the subservice instance. Must be unique among subservices of the same type."; } leaf last-change { type yang:date-and-time; config false; description "Date and time at which the structure for this subservice instance last changed, i.e., dependencies and/or maintenance windows parameters."; } leaf label { type string; config false; description "Label of the subservice, i.e., text describing what the subservice is to be displayed on a human interface. It is not intended for random end users but for network/system/software engineers that are able to interpret it. Therefore, no mechanism for language tagging is needed."; } container under-maintenance { presence "true"; description "The presence of this container indicates that the current subservice is under maintenance."; leaf contact { type string; mandatory true; description "A string used to model an administratively assigned name of the resource that is performing maintenance. It is suggested that this freeform field, which could be a URI, contains one or more of the following: IP address, management station name, network manager's name, location, and/or phone number. It might even contain the expected maintenance time. In some cases, the agent itself will be the owner of an entry. In these cases, this string shall be set to a string starting with 'monitor'."; } } choice parameter { mandatory true; description "Specify the required parameters per subservice type. Each module augmenting this module with a new subservice type that is a new identity based on subservice-base should augment this choice as well by adding a container available only if the current subservice type is the newly added identity."; container service-instance-parameter { when "derived-from-or-self(../type, 'sain:service-instance-type')"; description "Specify the parameters of a service instance."; leaf service { type string; mandatory true; description "Name of the service."; } leaf instance-name { type string; mandatory true; description "Name of the instance for that service."; } } // Other modules can augment their own cases into here. } leaf health-score { type int8 { range "-1 .. 100"; } config false; mandatory true; description "Score value of the subservice health. A value of 100 means that the subservice is healthy. A value of 0 means that the subservice is broken. A value between 0 and 100 means that the subservice is degraded. The special value -1 means that the health score could not be computed."; } leaf symptoms-history-start { type yang:date-and-time; config false; description "Date and time at which the symptom's history starts for this subservice instance, either because the subservice instance started at that date and time or because the symptoms before that were removed due to a garbage collection process."; } container symptoms { config false; description "Symptoms for the subservice."; list symptom { key "start-date-time agent-id symptom-id"; unique "agent-id symptom-id"; description "List of symptoms of the subservice. While the start-date-time key is not necessary per se, this would get the entries sorted by start-date-time for easy consumption."; leaf symptom-id { type leafref { path "/agents/agent[id=current()/../agent-id]" + "/symptoms/id"; } description "Identifier of the symptom to be interpreted according to the agent identified by the agent-id."; } leaf agent-id { type leafref { path "/agents/agent/id"; } description "Identifier of the agent raising the current symptom."; } leaf health-score-weight { type uint8 { range "0 .. 100"; } description "The weight to the health score incurred by this symptom. The higher the value, the more of an impact this symptom has. If a subservice health score is not 100, there must be at least one symptom with a health-score-weight larger than 0."; } leaf start-date-time { type yang:date-and-time; description "Date and time at which the symptom was detected."; } leaf stop-date-time { type yang:date-and-time; description "Date and time at which the symptom stopped being detected. Must be after the start-date-time. If the symptom is ongoing, this field should not be populated."; } } } container dependencies { description "Indicates the set of dependencies of the current subservice, along with their types."; list dependency { key "type id"; description "List of dependencies of the subservice."; uses subservice-dependency; } } } } container agents { config false; description "Container for the list of agents's symptoms."; list agent { key "id"; description "Contains symptoms of each agent involved in computing the health status of the current graph. This list acts as a glossary for understanding the symptom ids returned by each agent."; leaf id { type string; description "Id of the agent for which we are defining the symptoms. This identifier must be unique among all agents."; } list symptoms { key "id"; description "List of symptoms raised by the current agent that is identified by the symptom-id."; leaf id { type string; description "Id of the symptom for the current agent. The agent must guarantee the unicity of this identifier."; } leaf description { type string; mandatory true; description "Description of the symptom, i.e., text describing what the symptom is, is to be computer consumable and displayed on a human interface. It is not intended for random end users but for network/system/software engineers that are able to interpret it. Therefore, no mechanism for language tagging is needed."; } } } } container assured-services { config false; description "Container for the index of assured services."; list assured-service { key "service"; description "Service instances that are currently part of the assurance graph. The list must contain an entry for every service that is currently present in the assurance graph. This list presents an alternate access to the graph stored in subservices that optimizes querying the assurance graph of a specific service instance."; leaf service { type leafref { path "/subservices/subservice/service-instance-parameter/" + "service"; } description "Name of the service."; } list instances { key "name"; description "Instances of the service. The list must contain an entry for every instance of the parent service."; leaf name { type leafref { path "/subservices/subservice/service-instance-parameter" + "/instance-name"; } description "Name of the service instance. The leafref must point to a service-instance-parameter whose service leaf matches the parent service."; } list subservices { key "type id"; description "Subservices that appear in the assurance graph of the current service instance. The list must contain the subservice corresponding to the service instance, i.e., the subservice that matches the service and instance-name keys. For every subservice in the list, all subservices listed as dependencies must also appear in the list."; uses subservice-reference; } } } } } <CODE ENDS>¶
3.4. Rejecting Circular Dependencies
The statuses of services and subservices depend on the statuses of their dependencies, and thus circular dependencies between them prevent the computation of statuses. Section 3.1.1 of the SAIN architecture document [RFC9417] discusses how such dependencies appear and how they could be removed. The responsibility of avoiding such dependencies falls to the SAIN orchestrator. However, we specify in this section the expected behavior when a server supporting the "ietf-service-assurance" module receives a data instance containing circular dependencies.¶
Enforcing the absence of circular dependencies as a YANG constraint falls back to implementing a graph traversal algorithm with XPath and checking that the current node is not reachable from its dependencies. Even with such a constraint, there is no guarantee that merging two graphs without dependency loops will result in a graph without dependency loops. Indeed, Section 3.1.1 of [RFC9417] presents an example where merging two graphs without dependency loops results in a graph with a dependency loop.¶
Therefore, a server implementing the "ietf-service-assurance" module MUST check that there is no dependency loop whenever the graph is modified. A modification creating a dependency loop MUST be rejected.¶