rfc9232xml2.original.xml | rfc9232.xml | |||
---|---|---|---|---|
<?xml version="1.0" encoding="US-ASCII"?> | <?xml version="1.0" encoding="utf-8"?> | |||
<!-- This template is for creating an Internet Draft using xml2rfc, | ||||
which is available here: http://xml.resource.org. --> | ||||
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"> | ||||
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> | ||||
<!-- used by XSLT processors --> | ||||
<!-- For a complete list and description of processing instructions (PIs), | ||||
please see http://xml.resource.org/authoring/README.html. --> | ||||
<?rfc strict="yes" ?> | ||||
<!-- give errors regarding ID-nits and DTD validation --> | ||||
<!-- control the table of contents (ToC) --> | ||||
<?rfc toc="yes"?> | ||||
<!-- generate a ToC --> | ||||
<?rfc tocdepth="3"?> | ||||
<!-- the number of levels of subsections in ToC. default: 3 --> | ||||
<!-- control references --> | ||||
<?rfc symrefs="yes"?> | ||||
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] --> | ||||
<?rfc sortrefs="yes" ?> | ||||
<!-- sort the reference entries alphabetically --> | ||||
<!-- control vertical white space | ||||
(using these PIs as follows is recommended by the RFC Editor) --> | ||||
<?rfc compact="yes" ?> | ||||
<!-- do not start each main section on a new page --> | ||||
<?rfc subcompact="no" ?> | ||||
<!-- keep one blank line between list items --> | ||||
<!-- end of list of popular I-D processing instructions --> | ||||
<rfc category="info" docName="draft-ietf-opsawg-ntf-13" ipr="trust200902"> | ||||
<front> | ||||
<title abbrev="Network Telemetry Framework">Network Telemetry Framework</title> | ||||
<author fullname="Haoyu Song" initials="H." surname="Song"> | ||||
<organization>Futurewei</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>USA</country> | ||||
</postal> | ||||
<email>haoyu.song@futurewei.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Fengwei Qin" initials="F." surname="Qin"> | ||||
<organization>China Mobile</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>P.R. China</country> | ||||
</postal> | ||||
<email>qinfengwei@chinamobile.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Julia"> | ||||
<organization>NICT</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>Japan</country> | ||||
</postal> | ||||
<email>pedro@nict.go.jp</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia"> | ||||
<organization>Rakuten Mobile</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>France</country> | ||||
</postal> | ||||
<email>laurent.ciavaglia@rakuten.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Aijun Wang" initials="A." surname="Wang"> | ||||
<organization>China Telecom</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>P.R. China</country> | ||||
</postal> | ||||
<email>wangaj.bri@chinatelecom.cn</email> | ||||
</address> | ||||
</author> | ||||
<date day="3" month="December" year="2021"/> | ||||
<area>Operation and Management Area</area> | ||||
<workgroup>OPSAWG</workgroup> | ||||
<!-- --> | ||||
<keyword>Telemetry, OAM</keyword> | ||||
<abstract> | ||||
<t>Network telemetry is a technology for gaining network insight and facilitatin | ||||
g efficient and automated network management. It encompasses various techniques | ||||
for remote data generation, collection, correlation, and consumption. This docum | ||||
ent describes an architectural framework for network telemetry, motivated by cha | ||||
llenges that are encountered as part of the operation of networks and by the req | ||||
uirements that ensue. This document clarifies the terminologies and classifies t | ||||
he modules and components of a network telemetry system from different perspecti | ||||
ves. The framework and taxonomy help to set a common ground for the collection o | ||||
f related work and provide guidance for related technique and standard developme | ||||
nts.</t> | ||||
</abstract> | <!DOCTYPE rfc [ | |||
</front> | <!ENTITY nbsp " "> | |||
<middle> | <!ENTITY zwsp "​"> | |||
<section title="Introduction"> | <!ENTITY nbhy "‑"> | |||
<!ENTITY wj "⁠"> | ||||
]> | ||||
<t> Network visibility is the ability of management tools to see the state and b | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" docName="draft-ietf-opsawg-ntf-1 | |||
ehavior of a network, which is essential for successful network operation. Netwo | 3" number="9232" ipr="trust200902" obsoletes="" updates="" submissionType="IETF" | |||
rk Telemetry revolves around network data that can help provide insights about t | category="info" consensus="true" xml:lang="en" tocInclude="true" tocDepth="3" s | |||
he current state of the network, including network devices, forwarding, control, | ymRefs="true" sortRefs="true" version="3"> | |||
and management planes, and that can be generated and obtained through a variety | ||||
of techniques, including but not limited to network instrumentation and measure | ||||
ments, and that can be processed for purposes ranging from service assurance to | ||||
network security using a wide variety of data analytical techniques. In this doc | ||||
ument, Network Telemetry refer to both the data itself (i.e., "Network Telemetry | ||||
Data"), and the techniques and processes used to generate, export, collect, and | ||||
consume that data for use by potentially automated management applications. Net | ||||
work telemetry extends beyond the classical network Operations, Administration, | ||||
and Management (OAM) techniques and expects to support better flexibility, scala | ||||
bility, accuracy, coverage, and performance.</t> | ||||
<t> However, the term "network telemetry" lacks an unambiguous definition. The s | ||||
cope and coverage of it cause confusion and misunderstandings. It is beneficial | ||||
to clarify the concept and provide a clear architectural framework for network t | ||||
elemetry, so we can articulate the technical field, and better align the related | ||||
techniques and standard works.</t> | ||||
<t>To fulfill such an undertaking, we first discuss some key characteristics of | ||||
network telemetry which set a clear distinction from the conventional network OA | ||||
M and show that some conventional OAM technologies can be considered a subset of | ||||
the network telemetry technologies. We then provide an architectural framework | ||||
for network telemetry which includes four modules, each concerned with a differe | ||||
nt category of telemetry data and corresponding procedures. All the modules are | ||||
internally structured in the same way, including components that allow the opera | ||||
tor to configure data sources in regard to what data to generate and how to make | ||||
that available to client applications, components that instrument the underlyin | ||||
g data sources, and components that perform the actual rendering, encoding, and | ||||
exporting of the generated data. We show how the network telemetry framework can | ||||
benefit the current and future network operations. Based on the distinction of | ||||
modules and function components, we can map the existing and emerging techniques | ||||
and protocols into the framework. The framework can also simplify designing, ma | ||||
intaining, and understanding a network telemetry system. In addition, we outline | ||||
the evolution stages of the network telemetry system and discuss the potential | ||||
security concerns. </t> | ||||
<t> The purpose of the framework and taxonomy is to set a common ground for the | ||||
collection of related work and provide guidance for future technique and standar | ||||
d developments. To the best of our knowledge, this document is the first such ef | ||||
fort for network telemetry in industry standards organizations. This document do | ||||
es not define specific technologies.</t> | ||||
<!-- | ||||
<section title="Requirements Language"> | ||||
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | ||||
"OPTIONAL" in this document are to be interpreted as described in | ||||
BCP 14 <xref target="RFC2119"></xref><xref target="RFC8174"></xref> w | ||||
hen, and only when, they appear in all | ||||
capitals, as shown here.</t> | ||||
</section> | ||||
--> | ||||
<section title="Applicability Statement"> | <!-- xml2rfc v2v3 conversion 3.12.2 --> | |||
<front> | ||||
<title abbrev="Network Telemetry Framework">Network Telemetry Framework</tit | ||||
le> | ||||
<seriesInfo name="RFC" value="9232"/> | ||||
<author fullname="Haoyu Song" initials="H." surname="Song"> | ||||
<organization>Futurewei</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>United States of America</country> | ||||
</postal> | ||||
<email>haoyu.song@futurewei.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Fengwei Qin" initials="F." surname="Qin"> | ||||
<organization>China Mobile</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>China</country> | ||||
</postal> | ||||
<email>qinfengwei@chinamobile.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Juli | ||||
a"> | ||||
<organization>NICT</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>Japan</country> | ||||
</postal> | ||||
<email>pedro@nict.go.jp</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia"> | ||||
<organization>Rakuten Mobile</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>France</country> | ||||
</postal> | ||||
<email>laurent.ciavaglia@rakuten.com</email> | ||||
</address> | ||||
</author> | ||||
<author fullname="Aijun Wang" initials="A." surname="Wang"> | ||||
<organization>China Telecom</organization> | ||||
<address> | ||||
<postal> | ||||
<street/> | ||||
<city/> | ||||
<country>China</country> | ||||
</postal> | ||||
<email>wangaj3@chinatelecom.cn</email> | ||||
</address> | ||||
</author> | ||||
<date year="2022" month="May" /> | ||||
<t>Large-scale network data collection is a major threat to user privacy and may | <area>Operations and Management Area</area> | |||
be indistinguishable from pervasive monitoring <xref target="RFC7258" />. The | <workgroup>OPSAWG</workgroup> | |||
network telemetry framework presented in this document must not be applied to ge | ||||
nerating, exporting, collecting, analyzing, or retaining individual user data or | ||||
any data that can identify end users or characterize their behavior without con | ||||
sent. Based on this principle, the network telemetry framework is not applicable | ||||
to networks whose endpoints represent individual users, such as general-purpose | ||||
access networks. </t> | ||||
</section> | <keyword>Telemetry</keyword> | |||
<keyword>OAM</keyword> | ||||
<section title="Glossary"> | <abstract> | |||
<t>Before further discussion, we list some key terminology and acronyms used in | <t>Network telemetry is a technology for gaining network insight and facil | |||
this document. We make an intended differentiation between the terms of network | itating efficient and automated network management. It encompasses various techn | |||
telemetry and OAM. However, it should be understood that there is not a hard-lin | iques for remote data generation, collection, correlation, and consumption. This | |||
e distinction between the two concepts. Rather, network telemetry is considered | document describes an architectural framework for network telemetry, motivated | |||
as an extension of OAM. It covers all the existing OAM protocols but puts more e | by challenges that are encountered as part of the operation of networks and by t | |||
mphasis on the newer and emerging techniques and protocols concerning all aspect | he requirements that ensue. | |||
s of network data from acquisition to consumption.</t> | This document clarifies the terminology and classifies the modules and com | |||
<t> | ponents of a network telemetry system from different perspectives. The framework | |||
<list style="hanging"> | and taxonomy help to set a common ground for the collection of related work and | |||
<t hangText="AI:"> Artificial Intelligence. In the network domain, AI refers to | provide guidance for related technique and standard developments.</t> | |||
the machine-learning based technologies for automated network operation and othe | </abstract> | |||
r tasks.</t> | </front> | |||
<t hangText="AM:"> Alternate Marking, a flow performance measurement method, spe | <middle> | |||
cified in <xref target="RFC8321"/>. </t> | <section numbered="true" toc="default"> | |||
<t hangText="BMP:"> BGP Monitoring Protocol, specified in <xref target="RFC7854" | <name>Introduction</name> | |||
/>. </t> | <t> Network visibility is the ability of management tools to see the state | |||
<t hangText="DPI:"> Deep Packet Inspection, referring to the techniques that exa | and behavior of a network, which is essential for successful network operation. | |||
mines packet beyond packet L3/L4 headers. </t> | Network telemetry revolves around network data that 1) can help provide insight | |||
<t hangText="gNMI:"> gRPC Network Management Interface, a network management pro | s about the current state of the network, including network devices, forwarding, | |||
tocol from OpenConfig Operator Working Group, mainly contributed by Google. See | control, and management planes; 2) can be generated and obtained through a vari | |||
<xref target="gnmi"/> for details. </t> | ety of techniques, including but not limited to network instrumentation and meas | |||
<t hangText="GPB:"> Google Protocol Buffer, an extensible mechanism for serializ | urements; and 3) can be processed for purposes ranging from service assurance to | |||
ing structured data. See <xref target="gpb" /> for details. </t> | network security using a wide variety of data analytical techniques. In this do | |||
<t hangText="gRPC:"> gRPC Remote Procedure Call, an open source high performance | cument, network telemetry refers to both the data itself (i.e., "Network Telemet | |||
RPC framework that gNMI is based on. See <xref target="grpc"/> for details. </t | ry Data") and the techniques and processes used to generate, export, collect, an | |||
> | d consume that data for use by potentially automated management applications. Ne | |||
<t hangText="IPFIX:"> IP Flow Information Export Protocol, specified in <xref ta | twork telemetry extends beyond the classical network Operations, Administration, | |||
rget="RFC7011"/>. </t> | and Management (OAM) techniques and expects to support better flexibility, scal | |||
<t hangText="IOAM:"> <xref target="I-D.ietf-ippm-ioam-data">In-situ OAM</xref>, | ability, accuracy, coverage, and performance.</t> | |||
a dataplane on-path telemetry technique. </t> | <t> However, the term "network telemetry" lacks an unambiguous definition. | |||
<t hangText="JSON:"> An open standard file format and data interchange format th | The scope and coverage of it cause confusion and misunderstandings. It is benef | |||
at uses human-readable text to store and transmit data objects, specified in <xr | icial to clarify the concept and provide a clear architectural framework for net | |||
ef target="RFC8259" />. </t> | work telemetry, so we can articulate the technical field and better align the re | |||
<t hangText="MIB:"> Management Information Base, a database used for managing th | lated techniques and standard works.</t> | |||
e entities in a network. </t> | <t>To fulfill such an undertaking, we first discuss some key characteristi | |||
<t hangText="NETCONF:"> Network Configuration Protocol, specified in <xref targe | cs of network telemetry that set a clear distinction from the conventional netwo | |||
t="RFC6241"/>. </t> | rk OAM and show that some conventional OAM technologies can be considered a subs | |||
<t hangText="NetFlow:"> A Cisco protocol for flow record collecting, described i | et of the network telemetry technologies. We then provide an architectural frame | |||
n <xref target="RFC3954"/>. </t> | work for network telemetry that includes four modules, each associated with a di | |||
<t hangText="Network Telemetry:"> The process and instrumentation for acquiring | fferent category of telemetry data and corresponding procedures. All the modules | |||
and utilizing network data remotely for network monitoring and operation. A gene | are internally structured in the same way, including components that allow the | |||
ral term for a large set of network visibility techniques and protocols, concern | operator to configure data sources in regard to what data to generate and how to | |||
ing aspects like data generation, collection, correlation, and consumption. Netw | make that available to client applications, components that instrument the unde | |||
ork telemetry addresses the current network operation issues and enables smooth | rlying data sources, and components that perform the actual rendering, encoding, | |||
evolution toward future intent-driven autonomous networks.</t> | and exporting of the generated data. We show how the network telemetry framewor | |||
<t hangText="NMS:"> Network Management System, referring to applications that al | k can benefit current and future network operations. Based on the distinction of | |||
low network administrators to manage a network. </t> | modules and function components, we can map the existing and emerging technique | |||
<t hangText="OAM:"> Operations, Administration, and Maintenance. A group of netw | s and protocols into the framework. The framework can also simplify designing, m | |||
ork management functions that provide network fault indication, fault localizati | aintaining, and understanding a network telemetry system. In addition, we outlin | |||
on, performance information, and data and diagnosis functions. Most conventional | e the evolution stages of the network telemetry system and discuss the potential | |||
network monitoring techniques and protocols belong to network OAM.</t> | security concerns. </t> | |||
<t hangText="PBT:"> Postcard-Based Telemetry, a dataplane on-path telemetry tech | ||||
nique. A representative technique is described in <xref target="I-D.ietf-ippm-io | ||||
am-direct-export"/>. </t> | ||||
<t hangText="RESTCONF:"> An HTTP-based protocol that provides a programmatic int | ||||
erface for accessing data defined in YANG, using the datastore concepts defined | ||||
in NETCONF, as specified in <xref target="RFC8040"/>. </t> | ||||
<t hangText="SMIv2:"> Structure of Management Information Version 2, defining MI | ||||
B objects, specified in <xref target="RFC2578"/>. </t> | ||||
<t hangText="SNMP:"> Simple Network Management Protocol. Version 1, 2, and 3 are | ||||
specified in <xref target="RFC1157"/>, <xref target="RFC3416"/>, and <xref targ | ||||
et="RFC3411"/>, respectively. </t> | ||||
<t hangText="XML:"> Extensible Markup Language is a markup language for data enc | ||||
oding that is both human-readable and machine-readable, specified by W3C <xref t | ||||
arget="xml" />. </t> | ||||
<t hangText="YANG:"> YANG is a data modeling language for the definition of data | ||||
sent over network management protocols such as the NETCONF and RESTCONF. YANG i | ||||
s defined in <xref target="RFC6020"/> and <xref target="RFC7950"/>. </t> | ||||
<t hangText="YANG ECA:"> A YANG model for Event-Condition-Action policies, defin | ||||
ed in <xref target="I-D.wwx-netmod-event-yang"/>. </t> | ||||
<t hangText="YANG-Push:"> A mechanism that allows subscriber applications to req | ||||
uest a stream of updates from a YANG datastore on a network device. Details are | ||||
specified in <xref target="RFC8641"/> and <xref target="RFC8639"/>. </t> | ||||
</list> | ||||
</t> | ||||
</section> | ||||
</section> | ||||
<section title="Background"> | ||||
<t>The term "big data" is used to describe the extremely large volume of data se | ||||
ts that can be analyzed computationally to reveal patterns, trends, and associat | ||||
ions. Networks are undoubtedly a source of big data because of their scale and t | ||||
he volume of network traffic they forward. When a network's endpoints do not rep | ||||
resent individual users (e.g. in industrial, datacenter, and infrastructure cont | ||||
exts), network operations can often benefit from large-scale data collection wit | ||||
hout breaching user privacy.</t> | ||||
<t>Today one can access advanced big data analytics capability through a plethor | ||||
a of commercial and open source platforms (e.g., Apache Hadoop), tools (e.g., Ap | ||||
ache Spark), and techniques (e.g., machine learning). Thanks to the advance of c | ||||
omputing and storage technologies, network big data analytics gives network oper | ||||
ators an opportunity to gain network insights and move towards network autonomy. | ||||
Some operators start to explore the application of Artificial Intelligence (AI) | ||||
to make sense of network data. Software tools can use the network data to detec | ||||
t and react on network faults, anomalies, and policy violations, as well as pred | ||||
icting future events. In turn, the network policy updates for planning, intrusio | ||||
n prevention, optimization, and self-healing may be applied.</t> | ||||
<t>It is conceivable that an <xref target="RFC7575"> autonomic network </xref> i | ||||
s the logical next step for network evolution following Software Defined Network | ||||
ing (SDN), aiming to reduce (or even eliminate) human labor, make more efficient | ||||
use of network resources, and provide better services more aligned with custome | ||||
r requirements. The IETF ANIMA working group is dedicated to developing and main | ||||
taining protocols and procedures for automated network management and control of | ||||
professionally-managed networks. The related technique of <xref target="I-D.irt | ||||
f-nmrg-ibn-concepts-definitions">Intent-based Networking (IBN)</xref> requires n | ||||
etwork visibility and telemetry data in order to ensure that the network is beha | ||||
ving as intended. </t> | ||||
<t>However, while the data processing capability is improved and applications re | ||||
quire more data to function better, the networks lag behind in extracting and tr | ||||
anslating network data into useful and actionable information in efficient ways. | ||||
The system bottleneck is shifting from data consumption to data supply. Both th | ||||
e number of network nodes and the traffic bandwidth keep increasing at a fast pa | ||||
ce. The network configuration and policy change at smaller time slots than befor | ||||
e. More subtle events and fine-grained data through all network planes need to b | ||||
e captured and exported in real time. In a nutshell, it is a challenge to get en | ||||
ough high-quality data out of the network in a manner that is efficient, timely, | ||||
and flexible. Therefore, we need to survey the existing technologies and protoc | ||||
ols and identify any potential gaps.</t> | ||||
<t>In the remainder of this section, first we clarify the scope of network data | ||||
(i.e., telemetry data) relevant in this document. Then, we discuss several key u | ||||
se cases for today's and future network operations. Next, we show why the curren | ||||
t network OAM techniques and protocols are insufficient for these use cases. The | ||||
discussion underlines the need for new methods, techniques, and protocols, as w | ||||
ell as the extensions of existing ones, which we assign under the umbrella term | ||||
- Network Telemetry. </t> | ||||
<section title="Telemetry Data Coverage"> | ||||
<t>Any information that can be extracted from networks (including data plane, co | ||||
ntrol plane, and management plane) and used to gain visibility or as basis for a | ||||
ctions is considered telemetry data. It includes statistics, event records and l | ||||
ogs, snapshots of state, configuration data, etc. It also covers the outputs of | ||||
any active and passive measurements <xref target="RFC7799"/>. In some cases, raw | ||||
data is processed in network before being sent to a data consumer. Such process | ||||
ed data is also considered telemetry data. The value of telemetry data varies. I | ||||
n some cases, if the cost is acceptable, less but higher quality data are prefer | ||||
red than lots of low quality data. A classification of telemetry data is provide | ||||
d in <xref target="framework"/>. To preserve the privacy of end-users, no user p | ||||
acket content should be collected. Specifically, the data objects generated, ex | ||||
ported, and collected by a network telemetry application should not include any | ||||
packet payload from traffic associated with end-users systems. </t> | ||||
</section> | ||||
<section title="Use Cases"> | ||||
<t>The following set of use cases is essential for network operations. While the | ||||
list is by no means exhaustive, it is enough to highlight the requirements for | ||||
data velocity, variety, volume, and veracity, the attributes of big data, in net | ||||
works. </t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t> Security: Network intrusion detection and prevention systems need to monitor | ||||
network traffic and activities and act upon anomalies. Given increasingly sophi | ||||
sticated attack vectors coupled with increasingly severe consequences of securit | ||||
y breaches, new tools and techniques need to be developed, relying on wider and | ||||
deeper visibility into networks. The ultimate goal is to achieve security with n | ||||
o, or only minimal, human intervention, and without disrupting legitimate traffi | ||||
c flows. </t> | ||||
<t> Policy and Intent Compliance: Network policies are the rules that constrain | ||||
the services for network access, provide service differentiation, or enforce spe | ||||
cific treatment on the traffic. For example, a service function chain is a polic | ||||
y that requires the selected flows to pass through a set of ordered network func | ||||
tions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts-definition | ||||
s"/>, is a set of operational goals that a network should meet and outcomes that | ||||
a network is supposed to deliver, defined in a declarative manner without speci | ||||
fying how to achieve or implement them. An intent requires a complex translation | ||||
and mapping process before being applied on networks. While a policy or intent | ||||
is enforced, the compliance needs to be verified and monitored continuously by r | ||||
elying on visibility that is provided through network telemetry data. Any viola | ||||
tion must be reported immediately, potentially resulting in updates to how the p | ||||
olicy or intent is applied in the network to ensure that it remains in force, or | ||||
otherwise alerting the network administrator to the policy or intent violation. | ||||
</t> | ||||
<t> SLA Compliance: A Service-Level Agreement (SLA) is a service contract betwee | ||||
n a service provider and a client, which include the metrics for the service mea | ||||
surement and remedy/penalty procedures when the service level misses the agreeme | ||||
nt. Users need to check if they get the service as promised and network operator | ||||
s need to evaluate how they can deliver services that can meet the SLA based on | ||||
realtime network telemetry data, including data from network measurements.</t> | ||||
<t> Root Cause Analysis: Many network failure can be the effect of a sequence of | ||||
chained events. Troubleshooting and recovery require quick identification of th | ||||
e root cause of any observable issues. However, the root cause is not always str | ||||
aightforward to identify, especially when the failure is sporadic and the number | ||||
of event messages, both related and unrelated to the same cause, is overwhelmin | ||||
g. While technologies such as machine learning can be used for root cause analys | ||||
is, it is up to the network to sense and provide the relevant diagnostic data wh | ||||
ich are either actively fed into, or passively retrieved by, the root cause anal | ||||
ysis applications.</t> | ||||
<t> Network Optimization: This covers all short-term and long-term network optim | ||||
ization techniques, including load balancing, Traffic Engineering (TE), and netw | ||||
ork planning. Network operators are motivated to optimize their network utilizat | ||||
ion and differentiate services for better Return On Investment (ROI) or lower Ca | ||||
pital Expenditures (CAPEX). The first step is to know the real-time network cond | ||||
itions before applying policies for traffic manipulation. In some cases, micro-b | ||||
ursts need to be detected in a very short time-frame so that fine-grained traffi | ||||
c control can be applied to avoid network congestion. Long-term planning of netw | ||||
ork capacity and topology requires analysis of real-world network telemetry data | ||||
that is obtained over long periods of time.</t> | ||||
<t> Event Tracking and Prediction: The visibility into traffic path and performa | ||||
nce is critical for services and applications that rely on healthy network opera | ||||
tion. Numerous related network events are of interest to network operators. For | ||||
example, Network operators want to learn where and why packets are dropped for a | ||||
n application flow. They also want to be warned of issues in advance, so proacti | ||||
ve actions can be taken to avoid catastrophic consequences. </t> | ||||
</list> | ||||
</t> | ||||
</section> | ||||
<section title="Challenges"> | ||||
<t>For a long time, network operators have relied upon <xref target="RFC3416">SN | ||||
MP</xref>, Command-Line Interface (CLI), or <xref target="RFC5424">Syslog</xref> | ||||
to monitor the network. Some other OAM techniques as described in <xref target= | ||||
"RFC7276"/> are also used to facilitate network troubleshooting. These conventio | ||||
nal techniques are not sufficient to support the above use cases for the followi | ||||
ng reasons: </t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>Most use cases need to continuously monitor the network and dynamically refin | ||||
e the data collection in real-time. Poll-based low-frequency data collection is | ||||
ill-suited for these applications. Subscription-based streaming data directly pu | ||||
shed from the data source (e.g., the forwarding chip) is preferred to provide su | ||||
fficient data quantity and precision at scale.</t> | ||||
<t>Comprehensive data is needed, ranging from packet processing engines to traff | ||||
ic manager, from line cards to main control board, from user flows to control pr | ||||
otocol packets, from device configurations to operations, and from physical laye | ||||
r to application layer. Conventional OAM only covers a narrow range of data (e.g | ||||
., SNMP only handles data from the Management Information Base (MIB)). Classical | ||||
network devices cannot provide all the necessary probes. More open and programm | ||||
able network devices are therefore needed.</t> | ||||
<t>Many application scenarios need to correlate network-wide data from multiple | ||||
sources (i.e., from distributed network devices, different components of a netwo | ||||
rk device, or different network planes). A piecemeal solution is often lacking t | ||||
he capability to consolidate the data from multiple sources. The composition of | ||||
a complete solution, as partly proposed by <xref target="I-D.pedro-nmrg-anticipa | ||||
ted-adaptation">Autonomic Resource Control Architecture(ARCA)</xref>, will be em | ||||
powered and guided by a comprehensive framework. </t> | ||||
<t>Some conventional OAM techniques (e.g., CLI and Syslog) lack a formal data mo | ||||
del. The unstructured data hinder the tool automation and application extensibil | ||||
ity. Standardized data models are essential to support the programmable networks | ||||
. </t> | ||||
<t>Although some conventional OAM techniques support data push (e.g., <xref targ | ||||
et="RFC2981">SNMP Trap</xref><xref target="RFC3877"/>, Syslog, and <xref target= | ||||
"RFC3176">sFlow</xref>), the pushed data are limited to only predefined manageme | ||||
nt plane warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Netwo | ||||
rk operators require the data with arbitrary source, granularity, and precision | ||||
which are beyond the capability of the existing techniques. </t> | ||||
<t>The conventional passive measurement techniques can either consume excessive | ||||
network resources and produce excessive redundant data, or lead to inaccurate re | ||||
sults; on the other hand, the conventional active measurement techniques can int | ||||
erfere with the user traffic and their results are indirect. Techniques that can | ||||
collect direct and on-demand data from user traffic are more favorable.</t> | ||||
</list> | ||||
</t> | ||||
<t>These challenges were addressed by newer standards and techniques (e.g., IPFI | ||||
X/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) and more are emerging. | ||||
These standards and techniques need to be recognized and accommodated in a new f | ||||
ramework.</t> | ||||
</section> | ||||
<section title="Network Telemetry"> | <t> The purpose of the framework and taxonomy is to set a common ground fo | |||
<t>Network telemetry has emerged as a mainstream technical term to refer to the | r the collection of related work and provide guidance for future technique and s | |||
network data collection and consumption techniques. Several network telemetry te | tandard developments. To the best of our knowledge, this document is the first s | |||
chniques and protocols (e.g., <xref target="RFC7011">IPFIX</xref> and <xref targ | uch effort for network telemetry in industry standards organizations. This docum | |||
et="grpc">gRPC</xref>) have been widely deployed. Network telemetry allows separ | ent does not define specific technologies.</t> | |||
ate entities to acquire data from network devices so that data can be visualized | ||||
and analyzed to support network monitoring and operation. Network telemetry cov | ||||
ers the conventional network OAM and has a wider scope. For instance, it is expe | ||||
cted that network telemetry can provide the necessary network insight for autono | ||||
mous networks and address the shortcomings of conventional OAM techniques. </t> | ||||
<t>Network telemetry usually assumes machines as data consumers rather than huma | ||||
n operators. Hence, the network telemetry can directly trigger the automated net | ||||
work operation, while in contrast some conventional OAM tools were designed and | ||||
used to help human operators to monitor and diagnose the networks and guide manu | ||||
al network operations. Such a proposition leads to very different techniques. </ | ||||
t> | ||||
<t>Although new network telemetry techniques are emerging and subject to continu | ||||
ous evolution, several characteristics of network telemetry have been well accep | ||||
ted. Note that network telemetry is intended to be an umbrella term covering a w | ||||
ide spectrum of techniques, so the following characteristics are not expected to | ||||
be held by every specific technique.</t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>Push and Streaming: Instead of polling data from network devices, telemetry c | ||||
ollectors subscribe to streaming data pushed from data sources in network device | ||||
s.</t> | ||||
<t>Volume and Velocity: The telemetry data is intended to be consumed by machine | ||||
s rather than by human being. Therefore, the data volume can be huge and the pro | ||||
cessing is optimized for the needs of automation in realtime.</t> | ||||
<t>Normalization and Unification: Telemetry aims to address the overall network | ||||
automation needs. Efforts are made to normalize the data representation and unif | ||||
y the protocols, so as to simplify data analysis and provide integrated analysis | ||||
across heterogeneous devices and data sources across a network.</t> | ||||
<t>Model-based: The telemetry data is modeled in advance which allows applicatio | ||||
ns to configure and consume data with ease. </t> | ||||
<t>Data Fusion: The data for a single application can come from multiple data so | ||||
urces (e.g., cross-domain, cross-device, and cross-layer) based on common naming | ||||
/ID and needs to be correlated to take effect.</t> | ||||
<t>Dynamic and Interactive: Since the network telemetry means to be used in a cl | ||||
osed control loop for network automation, it needs to run continuously and adapt | ||||
to the dynamic and interactive queries from the network operation controller. < | ||||
/t> | ||||
</list> | ||||
</t> | ||||
<t>In addition, an ideal network telemetry solution may also have the following | ||||
features or properties:</t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>In-Network Customization: The data that is generated can be customized in net | ||||
work at run-time to cater to the specific need of applications. This needs the s | ||||
upport of a programmable data plane which allows probes with custom functions to | ||||
be deployed at flexible locations. </t> | ||||
<t>In-Network Data Aggregation and Correlation: Network devices and aggregation | ||||
points can work out which events and what data needs to be stored, reported, or | ||||
discarded thus reducing the load on the central collection and processing points | ||||
while still ensuring that the right information is ready to be processed in a t | ||||
imely way.</t> | ||||
<t>In-Network Processing: Sometimes it is not necessary or feasible to gather al | ||||
l information to a central point to be processed and acted upon. It is possible | ||||
for the data processing to be done in network, allowing reactive actions to be t | ||||
aken locally.</t> | ||||
<t>Direct Data Plane Export: The data originated from the data plane forwarding | ||||
chips can be directly exported to the data consumer for efficiency, especially w | ||||
hen the data bandwidth is large and the real-time processing is required. </t> | ||||
<t>In-band Data Collection: In addition to the passive and active data collectio | ||||
n approaches, the new hybrid approach allows to directly collect data for any ta | ||||
rget flow on its entire forwarding path <xref target="I-D.song-opsawg-ifit-frame | ||||
work"/>. </t> | ||||
</list> | ||||
</t> | ||||
<t>It is worth noting that a network telemetry system should not be intrusive to | ||||
normal network operations by avoiding the pitfall of the "observer effect". Tha | ||||
t is, it should not change the network behavior and affect the forwarding perfor | ||||
mance. Moreover, high-volume telemetry traffic may cause network congestion unle | ||||
ss proper isolation or traffic engineering techniques are in place, or congestio | ||||
n control mechanisms ensure that telemetry traffic backs off if it exceeds the n | ||||
etwork capacity. <xref target="RFC8084" /> and <xref target="RFC8085" /> are rel | ||||
evant Best Current Practices (BCP) in this space.</t> | ||||
<t>Although in many cases a system for network telemetry involves a remote data | ||||
collecting and consuming entity, it is important to understand that there are no | ||||
inherent assumptions about how a system should be architected. While a network | ||||
architecture with centralized controller (e.g., SDN) seems a natural fit for net | ||||
work telemetry, network telemetry can work in distributed fashions as well. For | ||||
example, telemetry data producers and consumers can have a peer-to-peer relatio | ||||
nship, in which a network node can be the direct consumer of telemetry data from | ||||
other nodes. </t> | ||||
</section> | ||||
<section title="The Necessity of a Network Telemetry Framework"> | <section numbered="true" toc="default"> | |||
<t>Network data analytics (e.g., machine learning) is applied for network operat | <name>Applicability Statement</name> | |||
ion automation, relying on abundant and coherent data from networks. Data acquis | <t>Large-scale network data collection is a major threat to user privacy | |||
ition that is limited to a single source and static in nature will in many cases | and may be indistinguishable from pervasive monitoring <xref target="RFC7258" f | |||
not be sufficient to meet an application's telemetry data needs. As a result, m | ormat="default"/>. The network telemetry framework presented in this document m | |||
ultiple data sources, involving a variety of techniques and standards, will need | ust not be applied to generating, exporting, collecting, analyzing, or retaining | |||
to be integrated. It is desirable to have a framework that classifies and organ | individual user data or any data that can identify end users or characterize th | |||
izes different telemetry data source and types, defines different components of | eir behavior without consent. Based on this principle, the network telemetry fra | |||
a network telemetry system and their interactions, and helps coordinate and inte | mework is not applicable to networks whose endpoints represent individual users, | |||
grate multiple telemetry approaches across layers. This allows flexible combinat | such as general-purpose access networks. </t> | |||
ions of data for different applications, while normalizing and simplifying inter | </section> | |||
faces. In detail, such a framework would benefit the development of network oper | <section numbered="true" toc="default"> | |||
ation applications for the following reasons:</t> | <name>Glossary</name> | |||
<t> | <t>Before further discussion, we list some key terminology and abbreviat | |||
<list style="symbols"> | ions used in this document. There is an intended differentiation between the ter | |||
<t>Future networks, autonomous or otherwise, depend on holistic and comprehensiv | ms of network telemetry and OAM. However, it should be understood that there is | |||
e network visibility. The use cases and applications are better to be supported | not a hard-line distinction between the two concepts. Rather, network telemetry | |||
uniformly and coherently using an integrated, converged mechanism and common tel | is considered an extension of OAM. It covers all the existing OAM protocols but | |||
emetry data representations wherever feasible. Therefore, the protocols and mech | puts more emphasis on the newer and emerging techniques and protocols concerning | |||
anisms should be consolidated into a minimum yet comprehensive set. A telemetry | all aspects of network data from acquisition to consumption.</t> | |||
framework can help to normalize the technique developments.</t> | <dl newline="false" spacing="normal" indent="12"> | |||
<t>Network visibility presents multiple viewpoints. For example, the device view | <dt>AI:</dt> | |||
point takes the network infrastructure as the monitoring object from which the n | <dd> Artificial Intelligence. In the network domain, AI refers to mach | |||
etwork topology and device status can be acquired; the traffic viewpoint takes t | ine-learning-based technologies for automated network operation and other tasks. | |||
he flows or packets as the monitoring object from which the traffic quality and | </dd> | |||
path can be acquired. An application may need to switch its viewpoint during ope | <dt>AM:</dt> | |||
ration. It may also need to correlate a service and its impact on user experienc | <dd> Alternate Marking. A flow performance measurement method, as spec | |||
e to acquire the comprehensive information.</t> | ified in <xref target="RFC8321" format="default"/>. </dd> | |||
<t>Applications require network telemetry to be elastic in order to make efficie | <dt>BMP:</dt> | |||
nt use of network resources and reduce the impact of processing related to netwo | <dd>BGP Monitoring Protocol. Specified in <xref target="RFC7854" forma | |||
rk telemetry on network performance. For example, routine network monitoring sho | t="default"/>. </dd> | |||
uld cover the entire network with a low data sampling rate. Only when issues ari | <dt>DPI:</dt> | |||
se or critical trends emerge should telemetry data sources be modified and telem | <dd>Deep Packet Inspection. Refers to the techniques that examine pack | |||
etry data rates boosted as needed.</t> | ets beyond packet L3/L4 headers. </dd> | |||
<t>Efficient data aggregation is critical for applications to reduce the overall | <dt>gNMI:</dt> | |||
quantity of data and improve the accuracy of analysis.</t> | <dd>gRPC Network Management Interface. A network management protocol f | |||
</list> | rom the OpenConfig Operator Working Group, mainly contributed by Google. See <xr | |||
</t> | ef target="gnmi" format="default"/> for details. </dd> | |||
<t> A telemetry framework collects together all the telemetry-related works from | <dt>GPB:</dt> | |||
different sources and working groups within IETF. This makes it possible to ass | <dd>Google Protocol Buffer. An extensible mechanism for serializing st | |||
emble a comprehensive network telemetry system and to avoid repetitious or redun | ructured data. See <xref target="gpb" format="default"/> for details. </dd> | |||
dant work. The framework should cover the concepts and components from the stand | <dt>gRPC:</dt> | |||
ardization perspective. This document describes the modules which make up a netw | <dd>gRPC Remote Procedure Call. An open-source high-performance RPC fr | |||
ork telemetry framework and decomposes the telemetry system into a set of distin | amework that gNMI is based on. See <xref target="grpc" format="default"/> for de | |||
ct components that existing and future work can easily map to.</t> | tails. </dd> | |||
<dt>IPFIX:</dt> | ||||
<dd>IP Flow Information Export Protocol. Specified in <xref target="RF | ||||
C7011" format="default"/>. </dd> | ||||
<dt>IOAM:</dt> | ||||
<dd> | ||||
<xref target="RFC9197" format="default">In situ OAM</xref>. A data p | ||||
lane on-path telemetry technique. </dd> | ||||
<dt>JSON:</dt> | ||||
<dd>JavaScript Object Notation. An open standard file format and data | ||||
interchange format that uses human-readable text to store and transmit data obje | ||||
cts, as specified in <xref target="RFC8259" format="default"/>. </dd> | ||||
<dt>MIB:</dt> | ||||
<dd>Management Information Base. A database used for managing the enti | ||||
ties in a network. </dd> | ||||
<dt>NETCONF:</dt> | ||||
<dd>Network Configuration Protocol. Specified in <xref target="RFC6241 | ||||
" format="default"/>. </dd> | ||||
<dt>NetFlow:</dt> | ||||
<dd>A Cisco protocol used for flow record collecting, as described in | ||||
<xref target="RFC3954" format="default"/>. </dd> | ||||
<dt>Network Telemetry:</dt> | ||||
<dd>The process and instrumentation for acquiring and utilizing networ | ||||
k data remotely for network monitoring and operation. A general term for a large | ||||
set of network visibility techniques and protocols, concerning aspects like dat | ||||
a generation, collection, correlation, and consumption. Network telemetry addres | ||||
ses current network operation issues and enables smooth evolution toward future | ||||
intent-driven autonomous networks.</dd> | ||||
<dt>NMS:</dt> | ||||
<dd>Network Management System. Refers to applications that allow netwo | ||||
rk administrators to manage a network. </dd> | ||||
<dt>OAM:</dt> | ||||
<dd>Operations, Administration, and Maintenance. A group of network ma | ||||
nagement functions that provide network fault indication, fault localization, pe | ||||
rformance information, and data and diagnosis functions. Most conventional netwo | ||||
rk monitoring techniques and protocols belong to network OAM.</dd> | ||||
</section> | <dt>PBT:</dt> | |||
</section> | <dd>Postcard-Based Telemetry. A data plane on-path telemetry technique | |||
. A representative technique is described in <xref target="IPPM-IOAM-DIRECT-EXPO | ||||
RT" format="default"/>. </dd> | ||||
<dt>RESTCONF:</dt> | ||||
<dd> An HTTP-based protocol that provides a programmatic interface for | ||||
accessing data defined in YANG, using the datastore concepts defined in NETCONF | ||||
, as specified in <xref target="RFC8040" format="default"/>. </dd> | ||||
<dt>SMIv2:</dt> | ||||
<dd>Structure of Management Information Version 2. Defines MIB objects | ||||
, as specified in <xref target="RFC2578" format="default"/>. </dd> | ||||
<dt>SNMP:</dt> | ||||
<dd>Simple Network Management Protocol. Versions 1, 2, and 3 are speci | ||||
fied in <xref target="RFC1157" format="default"/>, <xref target="RFC3416" format | ||||
="default"/>, and <xref target="RFC3411" format="default"/>, respectively. </dd> | ||||
<dt>XML:</dt> | ||||
<dd>Extensible Markup Language. A markup language for data encoding th | ||||
at is both human readable and machine readable, as specified by W3C <xref target | ||||
="W3C.REC-xml-20081126" format="default"/>. </dd> | ||||
<dt>YANG:</dt> | ||||
<dd>YANG is a data modeling language for the definition of data sent o | ||||
ver network management protocols such as NETCONF and RESTCONF. YANG is defined i | ||||
n <xref target="RFC6020" format="default"/> and <xref target="RFC7950" format="d | ||||
efault"/>. </dd> | ||||
<dt>YANG ECA:</dt> | ||||
<dd>A YANG model for Event-Condition-Action policies, as defined in <x | ||||
ref target="I-D.ietf-netmod-eca-policy" format="default"/>. </dd> | ||||
<dt>YANG-Push:</dt> | ||||
<dd> A mechanism that allows subscriber applications to request a stre | ||||
am of updates from a YANG datastore on a network device. Details are specified i | ||||
n <xref target="RFC8639" format="default"/> and <xref target="RFC8641" format="d | ||||
efault"/>. </dd> | ||||
</dl> | ||||
</section> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Background</name> | ||||
<t>The term "big data" is used to describe the extremely large volume of d | ||||
ata sets that can be analyzed computationally to reveal patterns, trends, and as | ||||
sociations. Networks are undoubtedly a source of big data because of their scale | ||||
and the volume of network traffic they forward. When a network's endpoints do n | ||||
ot represent individual users (e.g., in industrial, data-center, and infrastruct | ||||
ure contexts), network operations can often benefit from large-scale data collec | ||||
tion without breaching user privacy.</t> | ||||
<t>Today, one can access advanced big data analytics capability through a | ||||
plethora of commercial and open-source platforms (e.g., Apache Hadoop), tools (e | ||||
.g., Apache Spark), and techniques (e.g., machine learning). Thanks to the advan | ||||
ce of computing and storage technologies, network big data analytics give networ | ||||
k operators an opportunity to gain network insights and move towards network aut | ||||
onomy. Some operators start to explore the application of Artificial Intelligenc | ||||
e (AI) to make sense of network data. Software tools can use the network data to | ||||
detect and react on network faults, anomalies, and policy violations, as well a | ||||
s predict future events. In turn, the network policy updates for planning, intru | ||||
sion prevention, optimization, and self-healing may be applied.</t> | ||||
<t>It is conceivable that an <xref target="RFC7575" format="default"> auto | ||||
nomic network </xref> is the logical next step for network evolution following S | ||||
oftware-Defined Networking (SDN), which aims to reduce (or even eliminate) human | ||||
labor, make more efficient use of network resources, and provide better service | ||||
s more aligned with customer requirements. The IETF ANIMA Working Group is dedic | ||||
ated to developing and maintaining protocols and procedures for automated networ | ||||
k management and control of professionally managed networks. The related techniq | ||||
ue of <xref target="I-D.irtf-nmrg-ibn-concepts-definitions" format="default">Int | ||||
ent-Based Networking (IBN)</xref> requires network visibility and telemetry data | ||||
in order to ensure that the network is behaving as intended. </t> | ||||
<t>However, while the data processing capability is improved and applicati | ||||
ons require more data to function better, the networks lag behind in extracting | ||||
and translating network data into useful and actionable information in efficient | ||||
ways. The system bottleneck is shifting from data consumption to data supply. B | ||||
oth the number of network nodes and the traffic bandwidth keep increasing at a f | ||||
ast pace. The network configuration and policy change at smaller time slots than | ||||
before. More subtle events and fine-grained data through all network planes nee | ||||
d to be captured and exported in real time. In a nutshell, it is a challenge to | ||||
get enough high-quality data out of the network in a manner that is efficient, t | ||||
imely, and flexible. Therefore, we need to survey the existing technologies and | ||||
protocols and identify any potential gaps.</t> | ||||
<t>In the remainder of this section, we first clarify the scope of network | ||||
data (i.e., telemetry data) relevant in this document. Then, we discuss several | ||||
key use cases for network operations of today and the future. Next, we show why | ||||
the current network OAM techniques and protocols are insufficient for these use | ||||
cases. The discussion underlines the need for new methods, techniques, and prot | ||||
ocols, as well as the extensions of existing ones, which we assign under the umb | ||||
rella term "Network Telemetry". </t> | ||||
<section numbered="true" toc="default"> | ||||
<name>Telemetry Data Coverage</name> | ||||
<t>Any information that can be extracted from networks (including the da | ||||
ta plane, control plane, and management plane) and used to gain visibility or as | ||||
a basis for actions is considered telemetry data. It includes statistics, event | ||||
records and logs, snapshots of state, configuration data, etc. It also covers t | ||||
he outputs of any active and passive measurements <xref target="RFC7799" format= | ||||
"default"/>. In some cases, raw data is processed in network before being sent t | ||||
o a data consumer. Such processed data is also considered telemetry data. The va | ||||
lue of telemetry data varies. In some cases, if the cost is acceptable, less but | ||||
higher-quality data are preferred rather than a lot of low-quality data. A clas | ||||
sification of telemetry data is provided in <xref target="framework" format="def | ||||
ault"/>. To preserve the privacy of end users, no user packet content should be | ||||
collected. Specifically, the data objects generated, exported, and collected by | ||||
a network telemetry application should not include any packet payload from traf | ||||
fic associated with end-user systems. </t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Use Cases</name> | ||||
<t>The following set of use cases is essential for network operations. W | ||||
hile the list is by no means exhaustive, it is enough to highlight the requireme | ||||
nts for data velocity, variety, volume, and veracity, the attributes of big data | ||||
, in networks. </t> | ||||
<ul spacing="normal"> | ||||
<li> Security: Network intrusion detection and prevention systems need | ||||
to monitor network traffic and activities and act upon anomalies. Given increas | ||||
ingly sophisticated attack vectors coupled with increasingly severe consequences | ||||
of security breaches, new tools and techniques need to be developed, relying on | ||||
wider and deeper visibility into networks. The ultimate goal is to achieve secu | ||||
rity with no, or only minimal, human intervention and without disrupting legitim | ||||
ate traffic flows. </li> | ||||
<li>Policy and Intent Compliance: Network policies are the rules that | ||||
constrain the services for network access, provide service differentiation, or e | ||||
nforce specific treatment on the traffic. For example, a service function chain | ||||
is a policy that requires the selected flows to pass through a set of ordered ne | ||||
twork functions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts- | ||||
definitions" format="default"/>, is a set of operational goals that a network sh | ||||
ould meet and outcomes that a network is supposed to deliver, defined in a decla | ||||
rative manner without specifying how to achieve or implement them. An intent req | ||||
uires a complex translation and mapping process before being applied on networks | ||||
. While a policy or intent is enforced, the compliance needs to be verified and | ||||
monitored continuously by relying on visibility that is provided through network | ||||
telemetry data. Any violation must be reported immediately - this will alert th | ||||
e network | ||||
administrator to the policy or intent violation and will potentially | ||||
result in updates to how the policy or intent is applied in the network to | ||||
ensure that it remains in force. </li> | ||||
<li>SLA Compliance: A Service Level Agreement (SLA) is a service contr | ||||
act between a service provider and a client, which includes the metrics for the | ||||
service measurement and remedy/penalty procedures when the service level misses | ||||
the agreement. Users need to check if they get the service as promised, and netw | ||||
ork operators need to evaluate how they can deliver services that meet the SLA b | ||||
ased on real-time network telemetry data, including data from network measuremen | ||||
ts.</li> | ||||
<li>Root Cause Analysis: Many network failures can be the effect of a | ||||
sequence of chained events. Troubleshooting and recovery require quick identific | ||||
ation of the root cause of any observable issues. However, the root cause is not | ||||
always straightforward to identify, especially when the failure is sporadic and | ||||
the number of event messages, both related and unrelated to the same cause, is | ||||
overwhelming. While technologies such as machine learning can be used for root c | ||||
ause analysis, it is up to the network to sense and provide the relevant diagnos | ||||
tic data that are either actively fed into or passively retrieved by the root ca | ||||
use analysis applications.</li> | ||||
<li>Network Optimization: This covers all short-term and long-term net | ||||
work optimization techniques, including load balancing, Traffic Engineering (TE) | ||||
, and network planning. Network operators are motivated to optimize their networ | ||||
k utilization and differentiate services for better Return on Investment (ROI) o | ||||
r lower Capital Expenditure (CAPEX). The first step is to know the real-time net | ||||
work conditions before applying policies for traffic manipulation. In some cases | ||||
, microbursts need to be detected in a very short time frame so that fine-graine | ||||
d traffic control can be applied to avoid network congestion. Long-term planning | ||||
of network capacity and topology requires analysis of real-world network teleme | ||||
try data that is obtained over long periods of time.</li> | ||||
<li>Event Tracking and Prediction: The visibility into traffic path an | ||||
d performance is critical for services and applications that rely on healthy net | ||||
work operation. Numerous related network events are of interest to network opera | ||||
tors. For example, network operators want to learn where and why packets are dro | ||||
pped for an application flow. They also want to be warned of issues in advance, | ||||
so proactive actions can be taken to avoid catastrophic consequences. </li> | ||||
</ul> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Challenges</name> | ||||
<t>For a long time, network operators have relied upon <xref target="RFC | ||||
3416" format="default">SNMP</xref>, Command-Line Interface (CLI), or <xref targe | ||||
t="RFC5424" format="default">Syslog</xref> to monitor the network. Some other OA | ||||
M techniques as described in <xref target="RFC7276" format="default"/> are also | ||||
used to facilitate network troubleshooting. These conventional techniques are no | ||||
t sufficient to support the above use cases for the following reasons: </t> | ||||
<ul spacing="normal"> | ||||
<li>Most use cases need to continuously monitor the network and dynami | ||||
cally refine the data collection in real time. Poll-based low-frequency data col | ||||
lection is ill-suited for these applications. Subscription-based streaming data | ||||
directly pushed from the data source (e.g., the forwarding chip) is preferred to | ||||
provide sufficient data quantity and precision at scale.</li> | ||||
<li>Comprehensive data is needed, ranging from packet processing engin | ||||
es to traffic managers, line cards to main control boards, user flows to control | ||||
protocol packets, device configurations to operations, and physical layers to a | ||||
pplication layers. Conventional OAM only covers a narrow range of data (e.g., SN | ||||
MP only handles data from the Management Information Base (MIB)). Classical netw | ||||
ork devices cannot provide all the necessary probes. More open and programmable | ||||
network devices are therefore needed.</li> | ||||
<li>Many application scenarios need to correlate network-wide data fro | ||||
m multiple sources (i.e., from distributed network devices, different components | ||||
of a network device, or different network planes). A piecemeal solution is ofte | ||||
n lacking the capability to consolidate the data from multiple sources. The comp | ||||
osition of a complete solution, as partly proposed by <xref target="NMRG-ANTICIP | ||||
ATED-ADAPTATION" format="default">Autonomic Resource Control Architecture (ARCA) | ||||
</xref>, will be empowered and guided by a comprehensive framework. </li> | ||||
<li>Some conventional OAM techniques (e.g., CLI and Syslog) lack a for | ||||
mal data model. The unstructured data hinder the tool automation and application | ||||
extensibility. Standardized data models are essential to support the programmab | ||||
le networks. </li> | ||||
<section anchor="framework" title="Network Telemetry Framework"> | <li>Although some conventional OAM techniques support data push (e.g., | |||
<t> The top level network telemetry framework partitions the network telemetry i | <xref target="RFC2981" format="default">SNMP Trap</xref><xref target="RFC3877" f | |||
nto four modules based on the telemetry data object source and represents their | ormat="default"/>, Syslog, and <xref target="RFC3176" format="default">sFlow</xr | |||
relationship. Once the network operation applications acquire the data from thes | ef>), the pushed data are limited to only predefined management plane warnings ( | |||
e modules, they can apply data analytics and take actions. At the next level, th | e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Network operators requir | |||
e framework decomposes each module into separate components. Each of the modules | e the data with arbitrary source, granularity, and precision, which is beyond th | |||
follows the same underlying structure, with one component dedicated to the conf | e capability of the existing techniques. </li> | |||
iguration of data subscriptions and data sources, a second component dedicated t | <li>Conventional passive measurement techniques can either consume exc | |||
o encoding and exporting data, and a third component instrumenting the generatio | essive network resources and produce excessive redundant data or lead to inaccur | |||
n of telemetry related to the underlying resources. Throughout the framework, th | ate results; on the other hand, conventional active measurement techniques can i | |||
e same set of abstract data acquiring mechanisms and data types (<xref target="s | nterfere with the user traffic, and their results are indirect. Techniques that | |||
ec:type"/>) are applied. The two-level architecture with the uniform data abstra | can collect direct and on-demand data from user traffic are more favorable.</li> | |||
ction helps accurately pinpoint a protocol or technique to its position in a net | </ul> | |||
work telemetry system or disaggregate a network telemetry system into manageable | <t>These challenges were addressed by newer standards and techniques (e. | |||
parts.</t> | g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push), and more are e | |||
<section title="Top Level Modules"> | merging. These standards and techniques need to be recognized and accommodated i | |||
<t> Telemetry can be applied on the forwarding plane, the control plane, and the | n a new framework.</t> | |||
management plane in a network, as well as other sources out of the network, as | </section> | |||
shown in <xref target="figure_1"/>. Therefore, we categorize the network telemet | <section numbered="true" toc="default"> | |||
ry into four distinct modules (management plane, control plane, forwarding plane | <name>Network Telemetry</name> | |||
, and external data and event telemetry) with each having its own interface to N | <t>Network telemetry has emerged as a mainstream technical term to refer | |||
etwork Operation Applications.</t> | to the network data collection and consumption techniques. Several network tele | |||
<t> | metry techniques and protocols (e.g., <xref target="RFC7011" format="default">IP | |||
<figure anchor="figure_1" title="Modules in Layer Category of NTF"> | FIX</xref> and <xref target="grpc" format="default">gRPC</xref>) have been widel | |||
<artwork><![CDATA[ | y deployed. Network telemetry allows separate entities to acquire data from netw | |||
ork devices so that data can be visualized and analyzed to support network monit | ||||
oring and operation. Network telemetry covers the conventional network OAM and h | ||||
as a wider scope. For instance, it is expected that network telemetry can provid | ||||
e the necessary network insight for autonomous networks and address the shortcom | ||||
ings of conventional OAM techniques. </t> | ||||
<t>Network telemetry usually assumes machines as data consumers rather t | ||||
han human operators. Hence, network telemetry can directly trigger the automated | ||||
network operation, while in contrast, some conventional OAM tools were designed | ||||
and used to help human operators to monitor and diagnose the networks and guide | ||||
manual network operations. Such a proposition leads to very different technique | ||||
s. </t> | ||||
<t>Although new network telemetry techniques are emerging and subject to | ||||
continuous evolution, several characteristics of network telemetry have been we | ||||
ll accepted. Note that network telemetry is intended to be an umbrella term cove | ||||
ring a wide spectrum of techniques, so the following characteristics are not exp | ||||
ected to be held by every specific technique.</t> | ||||
<ul spacing="normal"> | ||||
<li>Push and Streaming: Instead of polling data from network devices, | ||||
telemetry collectors subscribe to streaming data pushed from data sources in net | ||||
work devices.</li> | ||||
<li>Volume and Velocity: Telemetry data is intended to be consumed by | ||||
machines rather than by human beings. Therefore, the data volume can be huge, an | ||||
d the processing is optimized for the needs of automation in real time.</li> | ||||
<li>Normalization and Unification: Telemetry aims to address the overa | ||||
ll network automation needs. Efforts are made to normalize the data representati | ||||
on and unify the protocols, so as to simplify data analysis and provide integrat | ||||
ed analysis across heterogeneous devices and data sources across a network.</li> | ||||
<li>Model-Based: Telemetry data is modeled in advance, which allows ap | ||||
plications to configure and consume data with ease. </li> | ||||
<li>Data Fusion: The data for a single application can come from multi | ||||
ple data sources (e.g., cross-domain, cross-device, and cross-layer) that are ba | ||||
sed on a common name/ID and need to be correlated to take effect.</li> | ||||
<li>Dynamic and Interactive: Since the network telemetry means to be u | ||||
sed in a closed control loop for network automation, it needs to run continuousl | ||||
y and adapt to the dynamic and interactive queries from the network operation co | ||||
ntroller. </li> | ||||
</ul> | ||||
<t>In addition, an ideal network telemetry solution may also have the fo | ||||
llowing features or properties:</t> | ||||
<ul spacing="normal"> | ||||
<li>In-Network Customization: The data that is generated can be custom | ||||
ized in network at runtime to cater to the specific need of applications. This n | ||||
eeds the support of a programmable data plane, which allows probes with custom f | ||||
unctions to be deployed at flexible locations. </li> | ||||
<li>In-Network Data Aggregation and Correlation: Network devices and a | ||||
ggregation points can work out which events and what data needs to be stored, re | ||||
ported, or discarded, thus reducing the load on the central collection and proce | ||||
ssing points while still ensuring that the right information is ready to be proc | ||||
essed in a timely way.</li> | ||||
<li>In-Network Processing: Sometimes it is not necessary or feasible t | ||||
o gather all information to a central point to be processed and acted upon. It i | ||||
s possible for the data processing to be done in network, allowing reactive acti | ||||
ons to be taken locally.</li> | ||||
<li>Direct Data Plane Export: The data originated from data plane forw | ||||
arding chips can be directly exported to the data consumer for efficiency, espec | ||||
ially when the data bandwidth is large and real-time processing is required. </l | ||||
i> | ||||
<li>In-Band Data Collection: In addition to the passive and active dat | ||||
a collection approaches, the new hybrid approach allows to directly collect data | ||||
for any target flow on its entire forwarding path <xref target="I-D.song-opsawg | ||||
-ifit-framework" format="default"/>. </li> | ||||
</ul> | ||||
<t>It is worth noting that a network telemetry system should not be intr | ||||
usive to normal network operations by avoiding the pitfall of the "observer effe | ||||
ct". That is, it should not change the network behavior and affect the forwardin | ||||
g performance. Moreover, high-volume telemetry traffic may cause network congest | ||||
ion unless proper isolation or traffic engineering techniques are in place, or c | ||||
ongestion control mechanisms ensure that telemetry traffic backs off if it excee | ||||
ds the network capacity. <xref target="RFC8084" format="default"/> and <xref tar | ||||
get="RFC8085" format="default"/> are relevant Best Current Practices (BCPs) in t | ||||
his space.</t> | ||||
<t>Although in many cases a system for network telemetry involves a remo | ||||
te data collecting and consuming entity, it is important to understand that ther | ||||
e are no inherent assumptions about how a system should be architected. While a | ||||
network architecture with a centralized controller (e.g., SDN) seems to be a nat | ||||
ural fit for network telemetry, network telemetry can work in distributed fashio | ||||
ns as well. For example, telemetry data producers and consumers can have a peer | ||||
-to-peer relationship, in which a network node can be the direct consumer of tel | ||||
emetry data from other nodes. </t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>The Necessity of a Network Telemetry Framework</name> | ||||
<t>Network data analytics (e.g., machine learning) is applied for networ | ||||
k operation automation, relying on abundant and coherent data from networks. Dat | ||||
a acquisition that is limited to a single source and static in nature will in ma | ||||
ny cases not be sufficient to meet an application's telemetry data needs. As a r | ||||
esult, multiple data sources, involving a variety of techniques and standards, w | ||||
ill need to be integrated. It is desirable to have a framework that classifies a | ||||
nd organizes different telemetry data sources and types, defines different compo | ||||
nents of a network telemetry system and their interactions, and helps coordinate | ||||
and integrate multiple telemetry approaches across layers. This allows flexible | ||||
combinations of data for different applications, while normalizing and simplify | ||||
ing interfaces. In detail, such a framework would benefit the development of net | ||||
work operation applications for the following reasons:</t> | ||||
<ul spacing="normal"> | ||||
<li>Future networks, autonomous or otherwise, depend on holistic and c | ||||
omprehensive network visibility. Use cases and applications are better when supp | ||||
orted uniformly and coherently using an integrated, converged mechanism and comm | ||||
on telemetry data representations wherever feasible. Therefore, the protocols an | ||||
d mechanisms should be consolidated into a minimum yet comprehensive set. A tele | ||||
metry framework can help to normalize the technique developments.</li> | ||||
<li>Network visibility presents multiple viewpoints. For example, the | ||||
device viewpoint takes the network infrastructure as the monitoring object from | ||||
which the network topology and device status can be acquired, and the traffic vi | ||||
ewpoint takes the flows or packets as the monitoring object from which the traff | ||||
ic quality and path can be acquired. An application may need to switch its viewp | ||||
oint during operation. It may also need to correlate a service and its impact on | ||||
user experience (UE) to acquire the comprehensive information.</li> | ||||
<li>Applications require network telemetry to be elastic in order to m | ||||
ake efficient use of network resources and reduce the impact of processing relat | ||||
ed to network telemetry on network performance. For example, routine network mon | ||||
itoring should cover the entire network with a low data sampling rate. Only when | ||||
issues arise or critical trends emerge should telemetry data sources be modifie | ||||
d and telemetry data rates be boosted as needed.</li> | ||||
<li>Efficient data aggregation is critical for applications to reduce | ||||
the overall quantity of data and improve the accuracy of analysis.</li> | ||||
</ul> | ||||
<t>A telemetry framework collects all the telemetry-related works from d | ||||
ifferent sources and working groups within the IETF. This makes it possible to a | ||||
ssemble a comprehensive network telemetry system and to avoid repetitious or red | ||||
undant work. The framework should cover the concepts and components from the sta | ||||
ndardization perspective. This document describes the modules that make up a net | ||||
work telemetry framework and decomposes the telemetry system into a set of disti | ||||
nct components that existing and future work can easily map to.</t> | ||||
</section> | ||||
</section> | ||||
<section anchor="framework" numbered="true" toc="default"> | ||||
<name>Network Telemetry Framework</name> | ||||
<t> The top-level network telemetry framework partitions the network telem | ||||
etry into four modules based on the telemetry data object source and represents | ||||
their relationship. Once the network operation applications acquire the data fro | ||||
m these modules, they can apply data analytics and take actions. At the next lev | ||||
el, the framework decomposes each module into separate components. Each of these | ||||
modules follows the same underlying structure, with one component dedicated to | ||||
the configuration of data subscriptions and data sources, a second component ded | ||||
icated to encoding and exporting data, and a third component instrumenting the g | ||||
eneration of telemetry related to the underlying resources. Throughout the frame | ||||
work, the same set of abstract data-acquiring mechanisms and data types (<xref t | ||||
arget="sec_type" format="default"/>) are applied. The two-level architecture wit | ||||
h the uniform data abstraction helps accurately pinpoint a protocol or technique | ||||
to its position in a network telemetry system or disaggregates a network teleme | ||||
try system into manageable parts.</t> | ||||
<section numbered="true" toc="default"> | ||||
<name>Top-Level Modules</name> | ||||
<t> Telemetry can be applied on the forwarding plane, control plane, and | ||||
management plane in a network, as well as on other sources out of the network, | ||||
as shown in <xref target="figure_1" format="default"/>. Therefore, we categorize | ||||
the network telemetry into four distinct modules (management plane, control pla | ||||
ne, forwarding plane, and external data and event telemetry) with each having it | ||||
s own interface to network operation applications.</t> | ||||
<figure anchor="figure_1"> | ||||
<name>Modules in Layer Category of the Network Telemetry Framework</na | ||||
me> | ||||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
+------------------------------+ | +------------------------------+ | |||
| | | | | | |||
| Network Operation |<-------+ | | Network Operation |<-------+ | |||
| Applications | | | | Applications | | | |||
| | | | | | | | |||
+------------------------------+ | | +------------------------------+ | | |||
^ ^ ^ | | ^ ^ ^ | | |||
| | | | | | | | | | |||
V V | V | V V | V | |||
+--------------+-----------|---+ +-----------+ | +--------------+-----------|---+ +-----------+ | |||
skipping to change at line 262 ¶ | skipping to change at line 264 ¶ | |||
| Management | ^ V | | Telemetry | | | Management | ^ V | | Telemetry | | |||
| Plane +-------|-------+ | | | | Plane +-------|-------+ | | | |||
| Telemetry | V | +-----------+ | | Telemetry | V | +-----------+ | |||
| | Forwarding | | | | Forwarding | | |||
| | Plane | | | | Plane | | |||
| <---> | | | <---> | | |||
| | Telemetry | | | | Telemetry | | |||
| | | | | | | | |||
+--------------+---------------+ | +--------------+---------------+ | |||
]]></artwork> | ]]></artwork> | |||
</figure> | </figure> | |||
</t> | <t>The rationale of this partition lies in the different telemetry data | |||
<t>The rationale of this partition lies in the different telemetry data objects | objects that result in different data sources and export locations. Such differe | |||
which result in different data source and export locations. Such differences hav | nces have profound implications on in-network data programming and processing ca | |||
e profound implications on in-network data programming and processing capability | pability, data encoding and the transport protocol, and required data bandwidth | |||
, data encoding and transport protocol, and required data bandwidth and latency. | and latency. Data can be sent directly or proxied via the control and management | |||
Data can be sent directly, or proxied via the control and management planes. Th | planes. There are advantages/disadvantages to both approaches.</t> | |||
ere are advantages/disadvantages to both approaches.</t> | <t>Note that in some cases, the network controller itself may be the sou | |||
<t>Note that in some cases the network controller itself may be the source of te | rce of telemetry data that is unique to it or derived from the telemetry data co | |||
lemetry data that is unique to it or derived from the telemetry data collected f | llected from the network elements. Some of the principles and taxonomy specific | |||
rom the network elements. Some of the principles and taxonomy specific to the co | to the control plane and management plane telemetry could also be applied to the | |||
ntrol plane and management plane telemetry could also be applied to the controll | controller when it is required to provide the telemetry data to network operati | |||
er when it is required to provide the telemetry data to Network Operation Applic | on applications hosted outside. The scope of this document is focused on the net | |||
ations hosted outside. The scope of the document is focused on the network eleme | work elements telemetry, and further details related to controllers are thus out | |||
nts telemetry and further details related to controllers are thus out of scope. | of scope. </t> | |||
</t> | <t>We summarize the major differences of the four modules in <xref targe | |||
t="table_1"/>. They are compared from six angles:</t> | ||||
<ul spacing="normal"> | ||||
<li>Data Object</li> | ||||
<li>Data Export Location</li> | ||||
<li>Data Model</li> | ||||
<li>Data Encoding</li> | ||||
<li>Telemetry Application Protocol</li> | ||||
<li>Data Transport Method</li> | ||||
</ul> | ||||
<t>Data Object is the target and source of each module. Because the data | ||||
source varies, the location where data is mostly conveniently exported also var | ||||
ies. For example, forwarding plane data mainly originates as data exported from | ||||
the forwarding Application-Specific Integrated Circuits (ASICs), while control p | ||||
lane data mainly originates from the protocol daemons running on the control CPU | ||||
(s). For convenience and efficiency, it is preferred to export the data off the | ||||
device from locations near the source. Because the locations that can export dat | ||||
a have different capabilities, different choices of data models, encoding, and t | ||||
ransport methods are made to balance the performance and cost. For example, the | ||||
forwarding chip has high throughput but limited capacity for processing complex | ||||
data and maintaining state, while the main control CPU is capable of complex dat | ||||
a and state processing but has limited bandwidth for high throughput data. As a | ||||
result, the suitable telemetry protocol for each module can be different. Some r | ||||
epresentative techniques are shown in the corresponding table blocks to highligh | ||||
t the technical diversity of these modules. Note that the selected techniques ju | ||||
st reflect the de facto state of the art and are by no means exhaustive (e.g., I | ||||
PFIX can also be implemented over TCP and SCTP, but that is not recommended for | ||||
the forwarding plane). The key point is that one cannot expect to use a universa | ||||
l protocol to cover all the network telemetry requirements. </t> | ||||
<t>We summarize the major differences of the four modules in the following table | <table anchor="table_1"> | |||
. They are compared from six angles:</t> | <name>Comparison of Data Object Modules</name> | |||
<t> | <thead> | |||
<list style="symbols"> | <tr> | |||
<t>Data Object</t> | <th>Module</th> | |||
<t>Data Export Location</t> | <th>Management Plane</th> | |||
<t>Data Model</t> | <th>Control Plane</th> | |||
<t>Data Encoding</t> | <th>Forwarding Plane</th> | |||
<t>Telemetry Application Protocol</t> | <th>External Data</th> | |||
<t>Data Transport Method</t> | </tr> | |||
</list> | </thead> | |||
</t> | <tbody> | |||
<t>Data Object is the target and source of each module. Because the data source | <tr> | |||
varies, the location where data is mostly conveniently exported also varies. For | <td>Object</td> | |||
example, forwarding plane data mainly originates as data exported from the forw | <td>configuration and operation state</td> | |||
arding Application-Specific Integrated Circuits (ASICs), while control plane dat | <td>control protocol and signaling, RIB</td> | |||
a mainly originates from the protocol daemons running on the control CPU(s). For | <td>flow and packet QoS, traffic stat., buffer and queue stat., FIB, Acces | |||
convenience and efficiency, it is preferred to export the data off the device f | s Control List (ACL)</td> | |||
rom locations near the source. Because the locations that can export data have d | <td>terminal, social, and environmental</td> | |||
ifferent capabilities, different choices of data model, encoding, and transport | </tr> | |||
method are made to balance the performance and cost. For example, the forwarding | <tr> | |||
chip has high throughput but limited capacity for processing complex data and m | <td>Export Location</td> | |||
aintaining state, while the main control CPU is capable of complex data and stat | <td>main control CPU</td> | |||
e processing, but has limited bandwidth for high throughput data. As a result, t | <td>main control CPU, linecard CPU, or forwarding chip</td> | |||
he suitable telemetry protocol for each module can be different. Some representa | <td>forwarding chip or linecard CPU; main control CPU unlikely</td> | |||
tive techniques are shown in the corresponding table blocks to highlight the tec | <td>various</td> | |||
hnical diversity of these modules. Note that the selected techniques just reflec | </tr> | |||
t the de facto state of the art and are by no means exhaustive (e.g., IPFIX can | <tr> | |||
also be implemented over TCP and SCTP, but that is not recommended for forwardin | <td>Data Model</td> | |||
g plane). The key point is that one cannot expect to use a universal protocol to | <td>YANG, MIB, syslog</td> | |||
cover all the network telemetry requirements. </t> | <td>YANG, custom</td> | |||
<t> | <td>YANG, custom</td> | |||
<figure anchor="figure_2" title="Comparison of the Data Object Modules"> | <td>YANG, custom</td> | |||
<artwork><![CDATA[ | </tr> | |||
+-----------+-------------+-------------+--------------+----------+ | <tr> | |||
| Module |Management |Control |Forwarding |External | | <td>Data Encoding</td> | |||
| |Plane |Plane |Plane |Data | | <td>GPB, JSON, XML</td> | |||
+-----------+-------------+-------------+--------------+----------+ | <td>GPB, JSON, XML, plain text</td> | |||
|Object |config. & |control |flow & packet |terminal, | | <td>plain text</td> | |||
| |operation |protocol & |QoS, traffic |social & | | <td>GPB, JSON, XML, plain text</td> | |||
| |state |signaling, |stat., buffer |environ- | | </tr> | |||
| | |RIB |& queue stat.,|mental | | <tr> | |||
| | | |ACL, FIB | | | <td>Application Protocol</td> | |||
+-----------+-------------+-------------+--------------+----------+ | <td>gRPC, NETCONF, RESTCONF</td> | |||
|Export |main control |main control |fwding chip |various | | <td>gRPC, NETCONF, IPFIX, traffic mirroring</td> | |||
|Location |CPU |CPU, |or linecard | | | <td>IPFIX, traffic mirroring, gRPC, NETFLOW</td> | |||
| | |linecard CPU |CPU; main | | | <td>gRPC</td> | |||
| | |or forwarding|control CPU | | | </tr> | |||
| | |chip |unlikely | | | <tr> | |||
+-----------+-------------+-------------+--------------+----------+ | <td>Data Transport</td> | |||
|Data |YANG, MIB, |YANG, |YANG |YANG, | | <td>HTTP(S), TCP</td> | |||
|Model |syslog |custom |custom, |custom | | <td>HTTP(S), TCP, UDP</td> | |||
+-----------+-------------+-------------+--------------+----------+ | <td>UDP</td> | |||
|Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | | <td>HTTP(S), TCP, UDP</td> | |||
|Encoding |XML |XML, | |XML, plain| | </tr> | |||
| | |plain text | |text | | </tbody> | |||
+-----------+-------------+-------------+--------------+----------+ | </table> | |||
|Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC | | ||||
|Protocol |RESTCONF |IPFIX,traffic|mirroring, | | | ||||
| | |mirroring |gRPC, NETFLOW | | | ||||
+-----------+-------------+-------------+--------------+----------+ | ||||
|Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), | | ||||
|Transport | |UDP | |TCP, UDP | | ||||
+-----------+-------------+-------------+--------------+----------+ | ||||
]]> | ||||
</artwork> | ||||
</figure> | ||||
</t> | ||||
<t>Note that the interaction with the applications that consume network telemetr | ||||
y data can be indirect. Some in-device data transfer is possible. For example, i | ||||
n the management plane telemetry, the management plane will need to acquire data | ||||
from the data plane. Some operational states can only be derived from data plan | ||||
e data sources such as the interface status and statistics. As another example, | ||||
obtaining control plane telemetry data may require the ability to access the For | ||||
warding Information Base (FIB) of the data plane.</t> | ||||
<t>On the other hand, an application may involve more than one plane and interac | ||||
t with multiple planes simultaneously. For example, an SLA compliance applicatio | ||||
n may require both the data plane telemetry and the control plane telemetry.</t> | ||||
<t>The requirements and challenges for each module are summarized as follows (no | ||||
te that the requirements may pertain across all telemetry modules; however, we e | ||||
mphasize those that are most pronounced for a particular plane).</t> | ||||
<section title="Management Plane Telemetry"> | ||||
<t>The management plane of network elements interacts with the Network Managemen | ||||
t System (NMS), and provides information such as performance data, network loggi | ||||
ng data, network warning and defects data, and network statistics and state data | ||||
. The management plane includes many protocols, including the classical SNMP and | ||||
syslog. Regardless the protocol, management plane telemetry must address the fo | ||||
llowing requirements:</t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>Convenient Data Subscription: An application should have the freedom to choos | ||||
e which data is exported (see section 4.3) and the means and frequency of how th | ||||
at data is exported (e.g., on-change or periodic subscription).</t> | ||||
<t>Structured Data: For automatic network operation, machines will replace human | ||||
for network data comprehension. Data modeling languages, such as YANG, can effi | ||||
ciently describe structured data and normalize data encoding and transformation. | ||||
</t> | ||||
<t>High Speed Data Transport: In order to keep up with the velocity of informati | ||||
on, a data source needs to be able to send large amounts of data at high frequen | ||||
cy. Compact encoding formats or data compression schemes are needed to reduce th | ||||
e quantity of data and improve the data transport efficiency. The subscription m | ||||
ode, by replacing the query mode, reduces the interactions between clients and s | ||||
ervers and helps to improve the data source's efficiency.</t> | ||||
<t>Network Congestion Avoidance: The application must protect the network from c | ||||
ongestion by congestion control mechanisms or at least circuit breakers. <xref t | ||||
arget="RFC8084" /> and <xref target="RFC8085" /> provide some solutions in this | ||||
space.</t> | ||||
</list> | ||||
</t> | ||||
</section> | ||||
<section title="Control Plane Telemetry"> | ||||
<t>The control plane telemetry refers to the health condition monitoring of diff | ||||
erent network control protocols at all layers of the protocol stack. Keeping tra | ||||
ck of the operational status of these protocols is beneficial for detecting, loc | ||||
alizing, and even predicting various network issues, as well as network optimiza | ||||
tion, in real-time and with fine granularity. Some particular challenges and iss | ||||
ues faced by the control plane telemetry are as follows: </t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>One challenging problem for the control plane telemetry is how to correlate t | ||||
he End-to-End (E2E) Key Performance Indicators (KPI) to a specific layer's KPIs. | ||||
For example, IPTV users may describe their User Experience (UE) by the video sm | ||||
oothness and definition. Then in case of an unusually poor UE KPI or a service d | ||||
isconnection, it is non-trivial to delimit and pinpoint the issue in the respons | ||||
ible protocol layer (e.g., the Transport Layer or the Network Layer), the respon | ||||
sible protocol (e.g., ISIS or BGP at the Network Layer), and finally the respons | ||||
ible device(s) with specific reasons. </t> | ||||
<t> Conventional OAM-based approaches for control plane KPI measurement include | ||||
Ping (L3), Traceroute (L3), <xref target="y1731">Y.1731</xref> (L2), and so on. | ||||
One common issue behind these methods is that they only measure the KPIs instead | ||||
of reflecting the actual running status of these protocols, making them less ef | ||||
fective or efficient for control plane troubleshooting and network optimization. | ||||
</t> | ||||
<t> An example of the control plane telemetry is the BGP monitoring protocol (BM | ||||
P). It is currently used for monitoring the BGP routes and enables rich applicat | ||||
ions, such as BGP peer analysis, AS analysis, prefix analysis, and security anal | ||||
ysis. However, the monitoring of other layers, protocols and the cross-layer, cr | ||||
oss-protocol KPI correlations are still in their infancy (e.g., IGP monitoring i | ||||
s not as extensive as BMP), which require further research. </t> | ||||
<t> The requirement and solutions for network congestion avoidance are also appl | ||||
icable to the control plane telemetry. </t> | ||||
</list> | ||||
</t> | ||||
</section> | ||||
<section title="Forwarding Plane Telemetry"> | ||||
<t>An effective forwarding plane telemetry system relies on the data that the ne | ||||
twork device can expose. The quality, quantity, and timeliness of data must meet | ||||
some stringent requirements. This raises some challenges to the network data pl | ||||
ane devices where the first-hand data originates.</t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>A data plane device's main function is user traffic processing and forwarding | ||||
. While supporting network visibility is important, the telemetry is just an aux | ||||
iliary function, and it should strive to not impede normal traffic processing an | ||||
d forwarding (i.e., the forwarding behavior should not be altered and the trade- | ||||
off between forwarding performance and telemetry should be well-balanced).</t> | ||||
<t>Network operation applications require end-to-end visibility across various s | ||||
ources, which can result in a huge volume of data. However, the sheer quantity o | ||||
f data must not exhaust the network bandwidth, regardless of the data delivery a | ||||
pproach (i.e., whether through in-band or out-of-band channels).</t> | ||||
<t>The data plane devices must provide timely data with the minimum possible del | ||||
ay. Long processing, transport, storage, and analysis delay can impact the effec | ||||
tiveness of the control loop and even render the data useless.</t> | ||||
<t>The data should be structured and labeled, and easy for applications to parse | ||||
and consume. At the same time, the data types needed by applications can vary s | ||||
ignificantly. The data plane devices need to provide enough flexibility and prog | ||||
rammability to support the precise data provision for applications.</t> | ||||
<t>The data plane telemetry should support incremental deployment and work even | ||||
though some devices are unaware of the system.</t> | ||||
<t>The requirement and solutions for network congestion avoidance are also appli | ||||
cable to the forwarding plane telemetry.</t> | ||||
</list> | ||||
</t> | ||||
<t>Although not specific to the forwarding plane, these challenges are more diff | ||||
icult to the forwarding plane because of the limited resource and flexibility. D | ||||
ata plane programmability is essential to support network telemetry. Newer data | ||||
plane forwarding chips are equipped with advanced telemetry features and provide | ||||
flexibility to support customized telemetry functions. </t> | ||||
<t>Technique Taxonomy: concerning about how one instruments the telemetry, there | <t>Note that the interaction with the applications that consume network | |||
can be multiple possible dimensions to classify the forwarding plane telemetry | telemetry data can be indirect. Some in-device data transfer is possible. For ex | |||
techniques.</t> | ample, in the management plane telemetry, the management plane will need to acqu | |||
<t> | ire data from the data plane. Some operational states can only be derived from d | |||
<list style="symbols"> | ata plane data sources such as the interface status and statistics. As another e | |||
<t> Active, Passive, and Hybrid: This dimension concerns about the end-to-end me | xample, obtaining control plane telemetry data may require the ability to access | |||
asurement. Active and passive methods (as well as the hybrid types) are well doc | the Forwarding Information Base (FIB) of the data plane.</t> | |||
umented in <xref target="RFC7799"/>. Passive methods include TCPDUMP, <xref targ | <t>On the other hand, an application may involve more than one plane and | |||
et="RFC7011">IPFIX</xref>, sFlow, and traffic mirroring. These methods usually h | interact with multiple planes simultaneously. For example, an SLA compliance ap | |||
ave low data coverage. The bandwidth cost is very high in order to improve the d | plication may require both the data plane telemetry and the control plane teleme | |||
ata coverage. On the other hand, active methods include Ping, <xref target="RFC4 | try.</t> | |||
656">OWAMP</xref>, <xref target="RFC5357">TWAMP</xref>, <xref target="RFC8762">S | <t>The requirements and challenges for each module are summarized as fol | |||
TAMP</xref>, and <xref target="RFC6812">Cisco's SLA Protocol</xref>. These metho | lows (note that the requirements may pertain across all telemetry modules; howev | |||
ds are intrusive and only provide indirect network measurements. Hybrid methods, | er, we emphasize those that are most pronounced for a particular plane).</t> | |||
including <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>, <xref targ | <section numbered="true" toc="default"> | |||
et="RFC8321">Alternate-Marking (AM)</xref>, and <xref target="RFC8889">Multipoin | <name>Management Plane Telemetry</name> | |||
t Alternate Marking</xref>, provide a well-balanced and more flexible approach. | <t>The management plane of network elements interacts with the Network | |||
However, these methods are also more complex to implement.</t> | Management System (NMS) and provides information such as performance data, netw | |||
<t> In-Band and Out-of-Band: Telemetry data carried in user packets before being | ork logging data, network warning and defects data, and network statistics and s | |||
exported to a data collector is considered in-band (e.g., <xref target="I-D.iet | tate data. The management plane includes many protocols, including the classical | |||
f-ippm-ioam-data">in-situ OAM</xref>). Telemetry data that is directly exported | SNMP and syslog. Regardless the protocol, management plane telemetry must addre | |||
to a data collector without modifying user packets is considered out-of-band (e. | ss the following requirements:</t> | |||
g., the postcard-based approach described in <xref target="pbt" />). It is also | <ul spacing="normal"> | |||
possible to have hybrid methods, where only the telemetry instruction or partial | <li>Convenient Data Subscription: An application should have the fre | |||
data is carried by user packets (e.g., <xref target="RFC8321">AM</xref>). </t> | edom to choose which data is exported (see <xref target="sec_type" format="defau | |||
<t> End-to-End and In-Network: End-to-End methods start from, and end at, the ne | lt"/>) and the means and frequency of how that data is exported (e.g., on-change | |||
twork end hosts (e.g., Ping). In-Network methods work in networks and are transp | or periodic subscription).</li> | |||
arent to end hosts. However, if needed, In-Network methods can be easily extende | <li>Structured Data: For automatic network operation, machines will | |||
d into end hosts. </t> | replace humans for network data comprehension. Data modeling languages, such as | |||
<t> Data Subject: Depending on the telemetry objective, the methods can be flow- | YANG, can efficiently describe structured data and normalize data encoding and t | |||
based (e.g., <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>), path-ba | ransformation.</li> | |||
sed (e.g., Traceroute), and node-based (e.g., <xref target="RFC7011">IPFIX</xref | <li>High-Speed Data Transport: In order to keep up with the velocity | |||
>). The various data objects can be packet, flow record, measurement, states, an | of information, a data source needs to be able to send large amounts of data at | |||
d signal.</t> | high frequency. Compact encoding formats or data compression schemes are needed | |||
</list> | to reduce the quantity of data and improve the data transport efficiency. The s | |||
</t> | ubscription mode, by replacing the query mode, reduces the interactions between | |||
</section> | clients and servers and helps to improve the data source's efficiency.</li> | |||
<section title="External Data Telemetry"> | ||||
<t>Events that occur outside the boundaries of the network system are another im | <li>Network Congestion Avoidance: The application must protect the | |||
portant source of network telemetry. Correlating both internal telemetry data an | network from congestion with congestion control mechanisms or, | |||
d external events with the requirements of network systems, as presented in <xre | at minimum, with circuit breakers. <xref target="RFC8084" format="default"/> | |||
f target="I-D.pedro-nmrg-anticipated-adaptation"/>, provides a strategic and fun | and <xref target="RFC8085" format="default"/> provide some solutions in this spa | |||
ctional advantage to management operations. </t> | ce.</li> | |||
<t>As with other sources of telemetry information, the data and events must meet | </ul> | |||
strict requirements, especially in terms of timeliness, which is essential to p | </section> | |||
roperly incorporate external event information into network management applicati | <section numbered="true" toc="default"> | |||
ons. The specific challenges are described as follows:</t> | <name>Control Plane Telemetry</name> | |||
<t> | <t>The control plane telemetry refers to the health condition monitori | |||
<list style="symbols"> | ng of different network control protocols at all layers of the protocol stack. K | |||
<t>The role of the external event detector can be played by multiple elements, i | eeping track of the operational status of these protocols is beneficial for dete | |||
ncluding hardware (e.g., physical sensors, such as seismometers) and software (e | cting, localizing, and even predicting various network issues, as well as for ne | |||
.g., Big Data sources that can analyze streams of information, such as Twitter m | twork optimization, in real time and with fine granularity. Some particular chal | |||
essages). Thus, the transmitted data must support different shapes but, at the s | lenges and issues faced by the control plane telemetry are as follows: </t> | |||
ame time, follow a common but extensible schema. </t> | ||||
<t>Since the main function of the external event detectors is to perform the not | <ul spacing="normal"> | |||
ifications, their timeliness is assumed. However, once messages have been dispat | <li>How to correlate the End-to-End (E2E) Key Performance Indicators | |||
ched, they must be quickly collected and inserted into the control plane with va | (KPIs) to a specific layer's KPIs. For example, IPTV users may describe their U | |||
riable priority, which is higher for important sources and events and lower for | E by the video smoothness and definition. Then in case of an unusually poor UE K | |||
secondary ones. </t> | PI or a service disconnection, it is non-trivial to delimit and pinpoint the iss | |||
<t>The schema used by external detectors must be easily adopted by current and f | ue in the responsible protocol layer (e.g., the transport layer or the network l | |||
uture devices and applications. Therefore, it must be easily mapped to current d | ayer), the responsible protocol (e.g., IS-IS or BGP at the network layer), and f | |||
ata models, such as in terms of YANG. </t> | inally the responsible device(s) with specific reasons. </li> | |||
<t>As the communication with external entities outside the boundary of a provide | <li> Conventional OAM-based approaches for control plane KPI measure | |||
r network may be realized over the Internet, the risk of congestion is even more | ment, which include Ping (L3), Traceroute (L3), <xref target="y1731" format="def | |||
relevant in this context and proper counter-measures must be taken. Solutions s | ault">Y.1731</xref> (L2), and so on. One common issue behind these methods is th | |||
uch as network transport circuit breakers are needed as well.</t> | at they only measure the KPIs instead of reflecting the actual running status of | |||
</list> | these protocols, making them less effective or efficient for control plane trou | |||
</t> | bleshooting and network optimization. </li> | |||
<t>Organizing both internal and external telemetry information together will be | <li> How more research is needed for the BGP monitoring protocol (BM | |||
key for the general exploitation of the management possibilities of current and | P). BMP is an example of the control plane telemetry; it is currently used for m | |||
future network systems, as reflected in the incorporation of cognitive capabilit | onitoring BGP routes and enables rich applications, such as BGP peer analysis, A | |||
ies to new hardware and software (virtual) elements. </t> | utonomous System (AS) analysis, prefix analysis, and security analysis. However, | |||
</section> | the monitoring of other layers, protocols, and the cross-layer, cross-protocol | |||
</section> | KPI correlations are still in their infancy (e.g., IGP monitoring is not as exte | |||
<section title="Second Level Function Components"> | nsive as BMP), which requires further research. </li> | |||
<t>The telemetry module at each plane can be further partitioned into five disti | </ul> | |||
nct conceptual components:</t> | <t> Note that the requirement and solutions for network congest | |||
<t> | ion avoidance are also applicable to the control plane telemetry. </t> | |||
<list style="symbols"> | </section> | |||
<t> Data Query, Analysis, and Storage: This component works at the network opera | <section numbered="true" toc="default"> | |||
tion application block in <xref target="figure_1"/>. It is normally a part of th | <name>Forwarding Plane Telemetry</name> | |||
e network management system at the receiver side. On the one hand, it is respons | <t>An effective forwarding plane telemetry system relies on the data t | |||
ible for issuing data requirements. The data of interest can be modeled data thr | hat the network device can expose. The quality, quantity, and timeliness of data | |||
ough configuration or custom data through programming. The data requirements can | must meet some stringent requirements. This raises some challenges for the netw | |||
be queries for one-shot data or subscriptions for events or streaming data. On | ork data plane devices where the first-hand data originates.</t> | |||
the other hand, it receives, stores, and processes the returned data from networ | <ul spacing="normal"> | |||
k devices. Data analysis can be interactive to initiate further data queries. Th | <li>A data plane device's main function is user traffic processing a | |||
is component can reside in either network devices or remote controllers. It can | nd forwarding. While supporting network visibility is important, the telemetry i | |||
be centralized and distributed, and involve one or more instances.</t> | s just an auxiliary function, and it should strive to not impede normal traffic | |||
<t> Data Configuration and Subscription: This component manages data queries on | processing and forwarding (i.e., the forwarding behavior should not be altered, | |||
devices. It determines the protocol and channel for applications to acquire desi | and the trade-off between forwarding performance and telemetry should be well-ba | |||
red data. This component is also responsible for configuring the desired data th | lanced).</li> | |||
at might not be directly available from data sources. The subscription data can | <li>Network operation applications require end-to-end visibility acr | |||
be described by models, templates, or programs. </t> | oss various sources, which can result in a huge volume of data. However, the she | |||
<t> Data Encoding and Export: This component determines how telemetry data is de | er quantity of data must not exhaust the network bandwidth, regardless of the da | |||
livered to the data analysis and storage component with access control. The data | ta delivery approach (i.e., whether through in-band or out-of-band channels).</l | |||
encoding and the transport protocol may vary due to the data export location.</ | i> | |||
t> | <li>The data plane devices must provide timely data with the minimum | |||
<t> Data Generation and Processing: The requested data needs to be captured, fil | possible delay. Long processing, transport, storage, and analysis delay can imp | |||
tered, processed, and formatted in network devices from raw data sources. This m | act the effectiveness of the control loop and even render the data useless.</li> | |||
ay involve in-network computing and processing on either the fast path or the sl | <li>The data should be structured, labeled, and easy for application | |||
ow path in network devices.</t> | s to parse and consume. At the same time, the data types needed by applications | |||
<t> Data Object and Source: This component determines the monitoring objects and | can vary significantly. The data plane devices need to provide enough flexibilit | |||
original data sources provisioned in the device. A data source usually just pro | y and programmability to support the precise data provision for applications.</l | |||
vides raw data which needs further processing. Each data source can be considere | i> | |||
d a probe. Some data sources can be dynamically installed, while others will be | <li>The data plane telemetry should support incremental deployment a | |||
more static.</t> | nd work even though some devices are unaware of the system.</li> | |||
</list> | <li>The requirement and solutions for network congestion avoidance a | |||
</t> | re also applicable to the forwarding plane telemetry.</li> | |||
<t> | </ul> | |||
<figure anchor="figure_3" title="Components in the Network Telemetry Framework"> | <t>Although not specific to the forwarding plane, these challenges are | |||
<artwork><![CDATA[ | more difficult for the forwarding plane because of the limited resources and fl | |||
exibility. Data plane programmability is essential to support network telemetry. | ||||
Newer data plane forwarding chips are equipped with advanced telemetry features | ||||
and provide flexibility to support customized telemetry functions. </t> | ||||
<t>Technique Taxonomy: This pertains to how one instruments the teleme | ||||
try; there can be multiple possible dimensions to classify the forwarding plane | ||||
telemetry techniques.</t> | ||||
<ul spacing="normal"> | ||||
<li> Active, Passive, and Hybrid: This dimension pertains to the end | ||||
-to-end measurement. Active and passive methods (as well as the hybrid types) ar | ||||
e well documented in <xref target="RFC7799" format="default"/>. Passive methods | ||||
include TCPDUMP, <xref target="RFC7011" format="default">IPFIX</xref>, sFlow, an | ||||
d traffic mirroring. These methods usually have low data coverage. The bandwidth | ||||
cost is very high in order to improve the data coverage. On the other hand, act | ||||
ive methods include Ping, the <xref target="RFC4656" format="default">One-Way Ac | ||||
tive Measurement Protocol (OWAMP)</xref>, the <xref target="RFC5357" format="def | ||||
ault">Two-Way Active Measurement Protocol (TWAMP)</xref>, the <xref target="RFC8 | ||||
762" format="default">Simple Two-way Active Measurement Protocol (STAMP)</xref>, | ||||
and <xref target="RFC6812" format="default">Cisco's SLA Protocol</xref>. These | ||||
methods are intrusive and only provide indirect network measurements. Hybrid met | ||||
hods, including <xref target="RFC9197" format="default">IOAM</xref>, <xref targe | ||||
t="RFC8321" format="default">Alternate Marking (AM)</xref>, and <xref target="RF | ||||
C8889" format="default">Multipoint Alternate Marking</xref>, provide a well-bala | ||||
nced and more flexible approach. However, these methods are also more complex to | ||||
implement.</li> | ||||
<li> In-Band and Out-of-Band: Telemetry data carried in user packets | ||||
before being exported to a data collector is considered in-band (e.g., <xref ta | ||||
rget="RFC9197" format="default">IOAM</xref>). Telemetry data that is directly ex | ||||
ported to a data collector without modifying user packets is considered out-of-b | ||||
and (e.g., the postcard-based approach described in <xref target="pbt" format="d | ||||
efault"/>). It is also possible to have hybrid methods, where only the telemetry | ||||
instruction or partial data is carried by user packets (e.g., <xref target="RFC | ||||
8321" format="default">AM</xref>). </li> | ||||
<li> End-to-End and In-Network: End-to-end methods start from, and e | ||||
nd at, the network end hosts (e.g., Ping). In-network methods work in networks a | ||||
nd are transparent to end hosts. However, if needed, in-network methods can be e | ||||
asily extended into end hosts. </li> | ||||
<li> Data Subject: Depending on the telemetry objective, the methods | ||||
can be flow based (e.g., <xref target="RFC9197" format="default">IOAM</xref>), | ||||
path based (e.g., Traceroute), and node based (e.g., <xref target="RFC7011" form | ||||
at="default">IPFIX</xref>). The various data objects can be packet, flow record, | ||||
measurement, states, and signal.</li> | ||||
</ul> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>External Data Telemetry</name> | ||||
<t>Events that occur outside the boundaries of the network system are | ||||
another important source of network telemetry. Correlating both internal telemet | ||||
ry data and external events with the requirements of network systems, as present | ||||
ed in <xref target="NMRG-ANTICIPATED-ADAPTATION" format="default"/>, provides a | ||||
strategic and functional advantage to management operations. </t> | ||||
<t>As with other sources of telemetry information, the data and events | ||||
must meet strict requirements, especially in terms of timeliness, which is esse | ||||
ntial to properly incorporate external event information into network management | ||||
applications. The specific challenges are described as follows:</t> | ||||
<ul spacing="normal"> | ||||
<li>The role of the external event detector can be played by multipl | ||||
e elements, including hardware (e.g., physical sensors, such as seismometers) an | ||||
d software (e.g., big data sources that can analyze streams of information, such | ||||
as Twitter messages). Thus, the transmitted data must support different shapes | ||||
but, at the same time, follow a common but extensible schema. </li> | ||||
<li>Since the main function of the external event detectors is to pe | ||||
rform the notifications, their timeliness is assumed. However, once messages hav | ||||
e been dispatched, they must be quickly collected and inserted into the control | ||||
plane with variable priority, which is higher for important sources and events a | ||||
nd lower for secondary ones. </li> | ||||
<li>The schema used by external detectors must be easily adopted by | ||||
current and future devices and applications. Therefore, it must be easily mapped | ||||
to current data models, such as in terms of YANG. </li> | ||||
<li>As the communication with external entities outside the boundary | ||||
of a provider network may be realized over the Internet, the risk of congestion | ||||
is even more relevant in this context and proper countermeasures must be taken. | ||||
Solutions such as network transport circuit breakers are needed as well.</li> | ||||
</ul> | ||||
<t>Organizing both internal and external telemetry information togethe | ||||
r will be key for the general exploitation of the management possibilities of cu | ||||
rrent and future network systems, as reflected in the incorporation of cognitive | ||||
capabilities to new hardware and software (virtual) elements. </t> | ||||
</section> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Second-Level Function Components</name> | ||||
<t>The telemetry module at each plane can be further partitioned into fi | ||||
ve distinct conceptual components:</t> | ||||
<ul spacing="normal"> | ||||
<li> Data Query, Analysis, and Storage: This component works at the ne | ||||
twork operation application block in <xref target="figure_1" format="default"/>. | ||||
It is normally a part of the network management system at the receiver side. On | ||||
one hand, it is responsible for issuing data requirements. The data of interest | ||||
can be modeled data through configuration or custom data through programming. T | ||||
he data requirements can be queries for one-shot data or subscriptions for event | ||||
s or streaming data. On the other hand, it receives, stores, and processes the r | ||||
eturned data from network devices. Data analysis can be interactive to initiate | ||||
further data queries. This component can reside in either network devices or rem | ||||
ote controllers. It can be centralized and distributed and involve one or more i | ||||
nstances.</li> | ||||
<li> Data Configuration and Subscription: This component manages data | ||||
queries on devices. It determines the protocol and channel for applications to a | ||||
cquire desired data. This component is also responsible for configuring the desi | ||||
red data that might not be directly available from data sources. The subscriptio | ||||
n data can be described by models, templates, or programs. </li> | ||||
<li> Data Encoding and Export: This component determines how telemetry | ||||
data is delivered to the data analysis and storage component with access contro | ||||
l. The data encoding and the transport protocol may vary due to the data export | ||||
location.</li> | ||||
<li> Data Generation and Processing: The requested data needs to be ca | ||||
ptured, filtered, processed, and formatted in network devices from raw data sour | ||||
ces. This may involve in-network computing and processing on either the fast pat | ||||
h or the slow path in network devices.</li> | ||||
<li> Data Object and Source: This component determines the monitoring | ||||
objects and original data sources provisioned in the device. A data source usual | ||||
ly just provides raw data that needs further processing. Each data source can be | ||||
considered a probe. Some data sources can be dynamically installed, while other | ||||
s will be more static.</li> | ||||
</ul> | ||||
<figure anchor="figure_3"> | ||||
<name>Components in the Network Telemetry Framework</name> | ||||
<artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
+----------------------------------------+ | +----------------------------------------+ | |||
+----------------------------------------+ | | +----------------------------------------+ | | |||
| | | | | | | | |||
| Data Query, Analysis, & Storage | | | | Data Query, Analysis, & Storage | | | |||
| | + | | | + | |||
+-------+++ -----------------------------+ | +-------+++ -----------------------------+ | |||
||| ^^^ | ||| ^^^ | |||
||| ||| | ||| ||| | |||
||V ||| | ||V ||| | |||
+--+V--------------------+++------------+ | +--+V--------------------+++------------+ | |||
+-----V---------------------+------------+ | | +-----V---------------------+------------+ | | |||
+---------------------+-------+----------+ | | | +---------------------+-------+----------+ | | | |||
| Data Configuration | | | | | | Data Configuration | | | | | |||
| & Subscription | Data Encoding | | | | | & Subscription | Data Encoding | | | | |||
| (model, template, | & Export | | | | | (model, template, | & Export | | | | |||
| & program) | | | | | | & program) | | | | | |||
+---------------------+------------------| | | | +---------------------+------------------| | | | |||
| | | | | | | | | | |||
| Data Generation | | | | | Data Generation | | | | |||
| & Processing | | | | | & Processing | | | | |||
| | | | | | | | | | |||
+----------------------------------------| | | | +----------------------------------------| | | | |||
| | | | | | | | | | |||
| Data Object and Source | |-+ | | Data Object and Source | |-+ | |||
| |-+ | | |-+ | |||
+----------------------------------------+ | +----------------------------------------+ | |||
]]></artwork> | ||||
]]> | </figure> | |||
</artwork> | </section> | |||
</figure> | <section anchor="sec_type" numbered="true" toc="default"> | |||
</t> | <name>Data Acquisition Mechanism and Type Abstraction</name> | |||
</section> | <t>Broadly speaking, network data can be acquired through subscription ( | |||
<section anchor="sec:type" title="Data Acquisition Mechanism and Type Abstractio | push) and query (poll). A subscription is a contract between publisher and subsc | |||
n"> | riber. After initial setup, the subscribed data is automatically delivered to re | |||
<t>Broadly speaking, network data can be acquired through subscription (push) an | gistered subscribers until the subscription expires. | |||
d query (poll). A subscription is a contract between publisher and subscriber. A | There are two variations of subscription. The subscriptions can be predef | |||
fter initial setup, the subscribed data is automatically delivered to registered | ined, or the subscribers are allowed to configure and tailor the published data | |||
subscribers until the subscription expires. | to their specific needs.</t> | |||
There are two variations of subscription. The subscriptions can be either pre-de | <t>In contrast, queries are used when a client expects immediate and one | |||
fined, or the subscribers are allowed to configure and tailor the published data | -off feedback from network devices. The queried data may be directly extracted f | |||
to their specific needs.</t> | rom some specific data source or synthesized and processed from raw data. Querie | |||
<t>In contrast, queries are used when a client expects immediate and one-off fee | s work well for interactive network telemetry applications. </t> | |||
dback from network devices. The queried data may be directly extracted from some | <t>In general, data can be pulled (i.e., queried) whenever needed, but i | |||
specific data source, or synthesized and processed from raw data. Queries work | n many cases, pushing the data (i.e., subscription) is more efficient, and it ca | |||
well for interactive network telemetry applications. </t> | n reduce the latency of a client detecting a change. From the data consumer poin | |||
<t>In general, data can be pulled (i.e., queried) whenever needed, but in many c | t of view, there are four types of data from network devices that a telemetry da | |||
ases, pushing the data (i.e., subscription) is more efficient, and can reduce th | ta consumer can subscribe or query:</t> | |||
e latency of a client detecting a change. From the data consumer point of view, | <ul spacing="normal"> | |||
there are four types of data from network devices that a telemetry data consumer | <li> Simple Data: Data that are steadily available from some datastore | |||
can subscribe or query:</t> | or static probes in network devices.</li> | |||
<t> | <li> Derived Data: Data that need to be synthesized or processed in th | |||
<list style="symbols"> | e network from raw data from one or more network devices. The data processing fu | |||
<t> Simple Data: The data that are steadily available from some datastore or sta | nction can be statically or dynamically loaded into network devices.</li> | |||
tic probes in network devices.</t> | <li> Event-triggered Data: Data that are conditionally acquired based | |||
<t> Derived Data: The data need to be synthesized or processed in network from r | on the occurrence of some events. An example of event-triggered data could be an | |||
aw data from one or more network devices. The data processing function can be st | interface changing operational state between up and down. Such data can be acti | |||
atically or dynamically loaded into network devices.</t> | vely pushed through subscription or passively polled through query. There are ma | |||
<t> Event-triggered Data: The data are conditionally acquired based on the occur | ny ways to model events, including using Finite State Machine (FSM) or <xref tar | |||
rence of some events. An example of event-triggered data could be an interface c | get="I-D.ietf-netmod-eca-policy" format="default">Event Condition Action (ECA)</ | |||
hanging operational state between up and down. Such data can be actively pushed | xref>. </li> | |||
through subscription or passively polled through query. There are many ways to m | <li> Streaming Data: Data that are continuously generated. It can be a | |||
odel events, including using Finite State Machine (FSM) or <xref target="I-D.wwx | time series or the dump of databases. For example, an interface packet counter | |||
-netmod-event-yang">Event Condition Action (ECA)</xref>. </t> | is exported every second. The streaming data reflect real-time network states an | |||
<t> Streaming Data: The data are continuously generated. It can be time series o | d metrics and require large bandwidth and processing power. The streaming data a | |||
r the dump of databases. For example, an interface packet counter is exported ev | re always actively pushed to the subscribers.</li> | |||
ery second. The streaming data reflect realtime network states and metrics and r | </ul> | |||
equire large bandwidth and processing power. The streaming data are always activ | <t>The above telemetry data types are not mutually exclusive. Rather, th | |||
ely pushed to the subscribers.</t> | ey are often composite. Derived data is composed of simple data; event-triggered | |||
</list> | data can be simple or derived; and streaming data can be based on some recurrin | |||
</t> | g event. The relationships of these data types are illustrated in <xref target=" | |||
<t>The above telemetry data types are not mutually exclusive. Rather, they are o | figure_0" format="default"/>. </t> | |||
ften composite. Derived data is composed of simple data; Event-triggered data ca | <figure anchor="figure_0"> | |||
n be simple or derived; streaming data can be based on some recurring event. The | <name>Data Type Relationship</name> | |||
relationships of these data types are illustrated in <xref target="figure_0"/>. | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
</t> | ||||
<t> | ||||
<figure anchor="figure_0" title="Data Type Relationship"> | ||||
<artwork><![CDATA[ | ||||
+----------------------+ +-----------------+ | +----------------------+ +-----------------+ | |||
| Event-triggered Data |<----+ Streaming Data | | | Event-Triggered Data |<----+ Streaming Data | | |||
+-------+---+----------+ +-----+---+-------+ | +-------+---+----------+ +-----+---+-------+ | |||
| | | | | | | | | | |||
| | | | | | | | | | |||
| | +--------------+ | | | | | +--------------+ | | | |||
| +-->| Derived Data |<--+ | | | +-->| Derived Data |<--+ | | |||
| +------+------ + | | | +------+------ + | | |||
| | | | | | | | |||
| V | | | V | | |||
| +--------------+ | | | +--------------+ | | |||
+------>| Simple Data |<------+ | +------>| Simple Data |<------+ | |||
+--------------+ | +--------------+ | |||
]]> | ]]></artwork> | |||
</artwork> | </figure> | |||
</figure> | <t>Subscription usually deals with event-triggered data and streaming da | |||
</t> | ta, and query usually deals with simple data and derived data. But the other way | |||
<t>Subscription usually deals with event-triggered data and streaming data, and | s are also possible. Advanced network telemetry techniques are designed mainly f | |||
query usually deals with simple data and derived data. But the other ways are al | or event-triggered or streaming data subscription and derived data query.</t> | |||
so possible. Advanced network telemetry techniques are designed mainly for event | </section> | |||
-triggered or streaming data subscription, and derived data query.</t> | <section numbered="true" toc="default"> | |||
</section> | <name>Mapping Existing Mechanisms into the Framework</name> | |||
<section title="Mapping Existing Mechanisms into the Framework"> | <t>The following table shows how the existing mechanisms (mainly publish | |||
<t>The following table shows how the existing mechanisms (mainly published in IE | ed in IETF and with the emphasis on the latest new technologies) are positioned | |||
TF and with the emphasis on the latest new technologies) are positioned in the f | in the framework. Given the vast body of existing work, we cannot provide an exh | |||
ramework. Given the vast body of existing work, we cannot provide an exhaustive | austive list, so the mechanisms in the tables should be considered as just examp | |||
list, so the mechanisms in the tables should be considered as just examples. Als | les. Also, some comprehensive protocols and techniques may cover multiple aspect | |||
o, some comprehensive protocols and techniques may cover multiple aspects or mod | s or modules of the framework, so a name in a block only emphasizes one particul | |||
ules of the framework, so a name in a block only emphasizes one particular chara | ar characteristic of it. More details about some listed mechanisms can be found | |||
cteristic of it. More details about some listed mechanisms can be found in Appen | in Appendix A.</t> | |||
dix A.</t> | ||||
<t> | ||||
<figure anchor="figure_5" title="Existing Work Mapping"> | ||||
<artwork><![CDATA[ | ||||
+-------------+-----------------+---------------+--------------+ | ||||
| | Management | Control | Forwarding | | ||||
| | Plane | Plane | Plane | | ||||
+-------------+-----------------+---------------+--------------+ | ||||
| data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | ||||
| & subscribe | RESTCONF, SNMP, | RESTCONF, | RESTCONF, | | ||||
| | YANG-Push | YANG-Push | YANG-Push | | ||||
+-------------+-----------------+---------------+--------------+ | ||||
| data gen. & | MIB, | YANG | IOAM, PSAMP | | ||||
| process | YANG | | PBT, AM, | | ||||
+-------------+-----------------+---------------+--------------+ | ||||
| data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | ||||
| & export | | | | | ||||
+-------------+-----------------+---------------+--------------+ | ||||
]]> | <table anchor="table_2"> | |||
</artwork> | <name>Existing Work Mapping</name> | |||
</figure> | <thead> | |||
</t> | <tr> | |||
<th></th> | ||||
<th>Management Plane</th> | ||||
<th>Control Plane</th> | ||||
<th>Forwarding Plane</th> | ||||
</tr> | ||||
</thead> | ||||
<tbody> | ||||
<tr> | ||||
<td>data configuration and subscribe</td> | ||||
<td>gNMI, NETCONF, RESTCONF, SNMP, YANG-Push</td> | ||||
<td>gNMI, NETCONF, RESTCONF, YANG-Push</td> | ||||
<td>NETCONF, RESTCONF, YANG-Push</td> | ||||
</tr> | ||||
<tr> | ||||
<td>data generation and process</td> | ||||
<td>MIB, YANG</td> | ||||
<td>YANG</td> | ||||
<td>IOAM, PSAMP, PBT, AM</td> | ||||
</tr> | ||||
<tr> | ||||
<td>data encoding and export</td> | ||||
<td>gRPC, HTTP, TCP</td> | ||||
<td>BMP, TCP</td> | ||||
<td>IPFIX, UDP</td> | ||||
</tr> | ||||
</tbody> | ||||
</table> | ||||
<t>Although the framework is generally suitable for any network environm | ||||
ents, the multi-domain telemetry has some unique challenges that deserve further | ||||
architectural consideration, which is out of the scope of this document.</t> | ||||
</section> | ||||
</section> | ||||
<section anchor="level" numbered="true" toc="default"> | ||||
<name>Evolution of Network Telemetry Applications</name> | ||||
<t>Network telemetry is an evolving technical area. As the network moves t | ||||
owards the automated operation, network telemetry applications undergo several s | ||||
tages of evolution, which add a new layer of requirements to the underlying netw | ||||
ork telemetry techniques. Each stage is built upon the techniques adopted by the | ||||
previous stages plus some new requirements.</t> | ||||
<dl newline="false" spacing="normal"> | ||||
<dt>Stage 0 - Static Telemetry:</dt> | ||||
<dd> The telemetry data source and type are determined at design time. T | ||||
he network operator can only configure how to use it with limited flexibility. < | ||||
/dd> | ||||
<dt>Stage 1 - Dynamic Telemetry:</dt> | ||||
<dd> The custom telemetry data can be dynamically programmed or configur | ||||
ed at runtime without interrupting the network operation, allowing a trade-off a | ||||
mong resource, performance, flexibility, and coverage.</dd> | ||||
<dt>Stage 2 - Interactive Telemetry:</dt> | ||||
<dd> The network operator can continuously customize and fine tune the t | ||||
elemetry data in real time to reflect the network operation's visibility require | ||||
ments. Compared with Stage 1, the changes are frequent based on the real-time fe | ||||
edback. At this stage, some tasks can be automated, but human operators still ne | ||||
ed to sit in the middle to make decisions. </dd> | ||||
<dt>Stage 3 - Closed-Loop Telemetry:</dt> | ||||
<dd> The telemetry is free from the interference of human operators, exc | ||||
ept for generating the reports. The intelligent network operation engine automat | ||||
ically issues the telemetry data requests, analyzes the data, and updates the ne | ||||
twork operations in closed control loops. </dd> | ||||
</dl> | ||||
<t>Existing technologies are ready for Stages 0 and 1. Individual applicat | ||||
ions for Stages 2 and 3 are also possible now. However, the future autonomic net | ||||
works may need a comprehensive operation management system that works at Stages | ||||
2 and 3 to cover all the network operation tasks. A well-defined network telemet | ||||
ry framework is the first step towards this direction. </t> | ||||
</section> | ||||
<section anchor="Security" numbered="true" toc="default"> | ||||
<name>Security Considerations</name> | ||||
<t>The complexity of network telemetry raises significant security implica | ||||
tions. For example, telemetry data can be manipulated to exhaust various network | ||||
resources at each plane as well as the data consumer; falsified or tampered dat | ||||
a can mislead the decision-making process and paralyze networks; and wrong confi | ||||
guration and programming for telemetry is equally harmful. The telemetry data is | ||||
highly sensitive, which exposes a lot of information about the network and its | ||||
configuration. Some of that information can make designing attacks against the n | ||||
etwork much easier (e.g., exact details of what software and patches have been i | ||||
nstalled) and allows an attacker to determine whether a device may be subject to | ||||
unprotected security vulnerabilities.</t> | ||||
<t>Although the framework is generally suitable for any network environments, th | <t>Given that this document has proposed a framework for network telemetry | |||
e multi-domain telemetry has some unique challenges which deserve further archit | and the telemetry mechanisms discussed are more extensive (in both message freq | |||
ectural consideration, which is out of the scope of this document.</t> | uency and traffic amount) than the conventional network OAM concepts, we must al | |||
so anticipate that new security considerations that may also arise. A number of | ||||
techniques already exist for securing the forwarding plane, control plane, and m | ||||
anagement plane in a network, but it is important to consider if any new threat | ||||
vectors are now being enabled via the use of network telemetry procedures and me | ||||
chanisms. </t> | ||||
<t>This document proposes a conceptual architectural for collecting, trans | ||||
porting, and analyzing a wide variety of data sources in support of network appl | ||||
ications. The protocols, data formats, and configurations chosen to implement th | ||||
is framework will dictate the specific security considerations. These considerat | ||||
ions may include:</t> | ||||
<ul spacing="normal"> | ||||
<li>Telemetry framework trust and policy models;</li> | ||||
<li>Role management and access control for enabling and disabling teleme | ||||
try capabilities;</li> | ||||
<li>Protocol transport used for telemetry data and its inherent security | ||||
capabilities;</li> | ||||
<li>Telemetry data stores, storage encryption, methods of access, and re | ||||
tention practices;</li> | ||||
<li>Tracking telemetry events and any abnormalities that might identify | ||||
malicious attacks using telemetry interfaces.</li> | ||||
<li>Authentication and integrity protection of telemetry data to make da | ||||
ta more trustworthy; and </li> | ||||
<li>Segregating the telemetry data traffic from the data traffic carried | ||||
over the network (e.g., historically management access and management data may | ||||
be carried via an independent management network).</li> | ||||
</ul> | ||||
<t>Some security considerations highlighted above may be minimized or nega | ||||
ted with policy management of network telemetry. In a network telemetry deployme | ||||
nt, it would be advantageous to separate telemetry capabilities into different c | ||||
lasses of policies, i.e., Role-Based Access Control and Event-Condition-Action p | ||||
olicies. Also, potential conflicts between network telemetry mechanisms must be | ||||
detected accurately and resolved quickly to avoid unnecessary network telemetry | ||||
traffic propagation escalating into an unintended or intended denial-of-service | ||||
attack.</t> | ||||
<t>Further study of the security issues will be required, and it is expect | ||||
ed that the security mechanisms and protocols are developed and deployed along w | ||||
ith a network telemetry system.</t> | ||||
</section> | ||||
<section anchor="IANA" numbered="true" toc="default"> | ||||
<name>IANA Considerations</name> | ||||
<t>This document has no IANA actions.</t> | ||||
</section> | ||||
</section> | </middle> | |||
</section> | <back> | |||
<section anchor="level" title="Evolution of Network Telemetry Applications"> | ||||
<t>Network telemetry is an evolving technical area. As the network moves towards | ||||
the automated operation, network telemetry applications undergo several stages | ||||
of evolution which add new layer of requirements to the underlying network telem | ||||
etry techniques. Each stage is built upon the techniques adopted by the previous | ||||
stages plus some new requirements.</t> | ||||
<t> | ||||
<list style="hanging"> | ||||
<t hangText="Stage 0 - Static Telemetry:"> The telemetry data source and type ar | ||||
e determined at design time. The network operator can only configure how to use | ||||
it with limited flexibility. </t> | ||||
<t hangText="Stage 1 - Dynamic Telemetry:"> The custom telemetry data can be dyn | ||||
amically programmed or configured at runtime without interrupting the network op | ||||
eration, allowing a trade-off among resource, performance, flexibility, and cove | ||||
rage. </t> | ||||
<t hangText="Stage 2 - Interactive Telemetry:"> The network operator can continu | ||||
ously customize and fine tune the telemetry data in real time to reflect the net | ||||
work operation's visibility requirements. Compared with Stage 1, the changes are | ||||
frequent based on the real-time feedback. At this stage, some tasks can be auto | ||||
mated, but human operators still need to sit in the middle to make decisions. </ | ||||
t> | ||||
<t hangText="Stage 3 - Closed-loop Telemetry:"> The telemetry is free from the i | ||||
nterference of human operators, except for generating the reports. The intellige | ||||
nt network operation engine automatically issues the telemetry data requests, an | ||||
alyzes the data, and updates the network operations in closed control loops. </t | ||||
> | ||||
</list> | ||||
</t> | ||||
<t>Existing technologies are ready for stage 0 and stage 1. Individual stage 2 a | ||||
nd stage 3 applications are also possible now. However, the future autonomic net | ||||
works may need a comprehensive operation management system which works at stage | ||||
2 and stage 3 to cover all the network operation tasks. A well-defined network t | ||||
elemetry framework is the first step towards this direction. </t> | ||||
</section> | ||||
<section anchor="Security" title="Security Considerations"> | ||||
<t>The complexity of network telemetry raises significant security implications. | ||||
For example, telemetry data can be manipulated to exhaust various network resou | ||||
rces at each plane as well as the data consumer; falsified or tampered data can | ||||
mislead the decision-making and paralyze networks; wrong configuration and progr | ||||
amming for telemetry is equally harmful. The telemetry data is highly sensitive, | ||||
which exposes a lot of information about the network and its configuration. Som | ||||
e of that information can make designing attacks against the network much easier | ||||
(e.g., exact details of what software and patches have been installed), and all | ||||
ows an attacker to determine whether a device may be subject to unprotected secu | ||||
rity vulnerabilities.</t> | ||||
<t>Given that this document has proposed a framework for network telemetry and t | ||||
he telemetry mechanisms discussed are more extensive (in both message frequency | ||||
and traffic amount) than the conventional network OAM concepts, we must also ref | ||||
lect that various new security considerations may also arise. A number of techni | ||||
ques already exist for securing the forwarding plane, the control plane, and the | ||||
management plane in a network, but it is important to consider if any new threa | ||||
t vectors are now being enabled via the use of network telemetry procedures and | ||||
mechanisms. </t> | ||||
<t>This document proposes a conceptual architectural for collecting, transportin | ||||
g, and analyzing a wide variety of data sources in support of network applicatio | ||||
ns. The protocols, data formats, and configurations chosen to implement this fra | ||||
mework will dictate the specific security considerations. These considerations m | ||||
ay include:</t> | ||||
<t> | ||||
<list style="symbols"> | ||||
<t>Telemetry framework trust and policy model;</t> | ||||
<t>Role management and access control for enabling and disabling telemetry capab | ||||
ilities;</t> | ||||
<t>Protocol transport used for telemetry data and its inherent security capabili | ||||
ties;</t> | ||||
<t>Telemetry data stores, storage encryption, methods of access, and retention p | ||||
ractices;</t> | ||||
<t>Tracking telemetry events and any abnormalities that might identify malicious | ||||
attacks using telemetry interfaces.</t> | ||||
<t>Authentication and integrity protection of telemetry data to make data more t | ||||
rustworthy. </t> | ||||
<t>Segregating the telemetry data traffic from the data traffic carried over the | ||||
network (e.g., historically management access and management data may be carrie | ||||
d via an independent management network).</t> | ||||
</list> | ||||
</t> | ||||
<t>Some security considerations highlighted above may be minimized or negated wi | ||||
th policy management of network telemetry. In a network telemetry deployment it | ||||
would be advantageous to separate telemetry capabilities into different classes | ||||
of policies, i.e., Role Based Access Control and Event-Condition-Action policies | ||||
. Also, potential conflicts between network telemetry mechanisms must be detecte | ||||
d accurately and resolved quickly to avoid unnecessary network telemetry traffic | ||||
propagation escalating into an unintended or intended denial of service attack. | ||||
</t> | ||||
<t>Further study of the security issues will be required, and it is expected tha | ||||
t the security mechanisms and protocols are developed and deployed along with a | ||||
network telemetry system.</t> | ||||
</section> | <displayreference target="I-D.ietf-netconf-distributed-notif" to="NETCONF-DISTRI | |||
<section anchor="IANA" title="IANA Considerations"> | B-NOTIF"/> | |||
<t>This document includes no request to IANA.</t> | <displayreference target="I-D.ietf-netconf-udp-notif" to="NETCONF-UDP-NOTIF"/> | |||
</section> | <displayreference target="I-D.song-ippm-postcard-based-telemetry" to="IPPM-POSTC | |||
<section anchor="Contributors" title="Contributors"> | ARD-BASED-TELEMETRY"/> | |||
<t> The other contributors of this document are Tianran Zhou, Zhenbin Li, Zhenqi | <displayreference target="I-D.song-opsawg-ifit-framework" to="OPSAWG-IFIT-FRAMEW | |||
ang Li, Daniel King, Adrian Farrel, and Alexander Clemm </t> | ORK"/> | |||
</section> | <displayreference target="I-D.irtf-nmrg-ibn-concepts-definitions" to="NMRG-IBN-C | |||
<section anchor="Acknowledgments" title="Acknowledgments"> | ONCEPTS-DEFINITIONS"/> | |||
<t>We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe Clarke, Vi | <displayreference target="I-D.ietf-netmod-eca-policy" to="NETMOD-ECA-POLICY"/> | |||
ctor Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Ye | ||||
gani, Young Lee, Qin Wu, Gyan Mishra, Ben Schwartz, Alexey Melnikov, Michael Sch | ||||
arf, Dhruv Dhody, Martin Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars E | ||||
ggert, Eric Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many oth | ||||
ers who have provided helpful comments and suggestions to improve this document. | ||||
</t> | ||||
</section> | ||||
</middle> | ||||
<back> | ||||
<!-- | ||||
<references title="Normative References"> | ||||
<?rfc include='reference.RFC.2119'?> | ||||
<?rfc include='reference.RFC.8174'?> | ||||
</references> | ||||
--> | ||||
<references title="Informative References"> | ||||
<?rfc include='reference.RFC.3954'?> | ||||
<?rfc include="reference.RFC.6020"?> | ||||
<?rfc include="reference.RFC.7950"?> | ||||
<?rfc include="reference.RFC.6241"?> | ||||
<?rfc include='reference.RFC.7540'?> | ||||
<?rfc include='reference.RFC.7854'?> | ||||
<?rfc include='reference.RFC.8321'?> | ||||
<?rfc include='reference.RFC.7011'?> | ||||
<?rfc include='reference.RFC.4656'?> | ||||
<?rfc include='reference.RFC.5357'?> | ||||
<?rfc include='reference.RFC.5424'?> | ||||
<?rfc include='reference.RFC.1157'?> | ||||
<?rfc include='reference.RFC.3176'?> | ||||
<?rfc include='reference.RFC.3411'?> | ||||
<?rfc include='reference.RFC.3416'?> | ||||
<?rfc include='reference.RFC.7276'?> | ||||
<?rfc include='reference.RFC.7799'?> | ||||
<?rfc include='reference.RFC.2981'?> | ||||
<?rfc include='reference.RFC.3877'?> | ||||
<?rfc include='reference.RFC.7575'?> | ||||
<?rfc include='reference.RFC.8641'?> | ||||
<?rfc include='reference.RFC.8639'?> | ||||
<?rfc include='reference.RFC.6812'?> | ||||
<?rfc include='reference.RFC.2578'?> | ||||
<?rfc include='reference.RFC.8762'?> | ||||
<?rfc include='reference.RFC.8040'?> | ||||
<?rfc include='reference.RFC.7258'?> | ||||
<?rfc include='reference.RFC.8259'?> | ||||
<?rfc include='reference.RFC.8924'?> | ||||
<?rfc include='reference.RFC.5085'?> | ||||
<?rfc include='reference.RFC.8084'?> | ||||
<?rfc include='reference.RFC.8085'?> | ||||
<?rfc include='reference.RFC.8889'?> | ||||
<?rfc include='reference.RFC.8671'?> | ||||
<?rfc include='reference.I-D.ietf-grow-bmp-local-rib'?> | ||||
<?rfc include='reference.I-D.ietf-netconf-distributed-notif'?> | ||||
<?rfc include='reference.I-D.ietf-netconf-udp-notif'?> | ||||
<?rfc include='reference.I-D.song-opsawg-dnp4iq'?> | ||||
<?rfc include='reference.I-D.ietf-ippm-ioam-data'?> | ||||
<?rfc include='reference.I-D.ietf-ippm-ioam-direct-export'?> | ||||
<?rfc include='reference.I-D.pedro-nmrg-anticipated-adaptation'?> | ||||
<?rfc include='reference.I-D.song-ippm-postcard-based-telemetry'?> | ||||
<?rfc include='reference.I-D.song-opsawg-ifit-framework'?> | ||||
<?rfc include='reference.I-D.irtf-nmrg-ibn-concepts-definitions'?> | ||||
<?rfc include='reference.I-D.wwx-netmod-event-yang'?> | ||||
<reference anchor="gpb" target="https://developers.google.com/protocol-buffers"> | <references> | |||
<front> | <name>Informative References</name> | |||
<title>Google Protocol Buffers</title> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<author/> | .3954.xml"/> | |||
<date/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
</front> | .6020.xml"/> | |||
</reference> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<reference anchor="grpc" target="https://grpc.io"> | .7950.xml"/> | |||
<front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<title>gPPC, A high performance, open-source universal RPC framework</title> | .6241.xml"/> | |||
<author/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<date/> | .7540.xml"/> | |||
</front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
</reference> | .7854.xml"/> | |||
<reference anchor="gnmi" target="https://github.com/openconfig/reference/tree/ma | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
ster/rpc/gnmi"> | .8321.xml"/> | |||
<front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<title>gNMI - gRPC Network Management Interface</title> | .7011.xml"/> | |||
<author/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
<date/> | .4656.xml"/> | |||
</front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
.5357.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.5424.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.1157.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.3176.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.3411.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.3416.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.7276.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.7799.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.2981.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.3877.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.7575.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8641.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8639.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.6812.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.2578.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8762.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8040.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.7258.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8259.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8924.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.5085.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8084.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8085.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8889.xml"/> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.8671.xml"/> | ||||
<!-- [I-D.ietf-ippm-ioam-data] is now 9197--> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
.9197.xml"/> | ||||
<!-- [I-D.ietf-grow-bmp-local-rib] Published as RFC 9069 --> | ||||
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9069. | ||||
xml"/> | ||||
<!-- [I-D.ietf-netconf-distributed-notif] IESG state I-D Exists --> | ||||
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | ||||
tconf-distributed-notif.xml"/> | ||||
<!-- [I-D.ietf-netconf-udp-notif] IESG state I-D Exists --> | ||||
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | ||||
tconf-udp-notif.xml"/> | ||||
<!-- [I-D.song-opsawg-dnp4iq] IESG state Expired. Note: included the long form a | ||||
s the editor role was missing --> | ||||
<reference anchor="OPSAWG-DNP4IQ"> | ||||
<front> | ||||
<title>Requirements for Interactive Query with Dynamic Network Probes</tit | ||||
le> | ||||
<author fullname="Haoyu Song" role="editor"> | ||||
<organization>Huawei Technologies Co., Ltd</organization> | ||||
</author> | ||||
<author fullname="Jun Gong"> | ||||
<organization>Huawei Technologies Co., Ltd</organization> | ||||
</author> | ||||
<date month="June" day="19" year="2017" /> | ||||
</front> | ||||
<seriesInfo name="Internet-Draft" value="draft-song-opsawg-dnp4iq-01" /> | ||||
</reference> | </reference> | |||
<reference anchor="xml" target="https://www.w3.org/TR/2008/REC-xml-20081126/"> | ||||
<front> | <!-- [I-D.ietf-ippm-ioam-direct-export] IESG state AD Evaluation. Note: included | |||
<title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title> | the long form as the editor role was missing --> | |||
<author/> | <reference anchor="IPPM-IOAM-DIRECT-EXPORT"> | |||
<date/> | <front> | |||
</front> | <title>In-situ OAM Direct Exporting</title> | |||
<author fullname="Haoyu Song"> | ||||
<organization>Futurewei</organization> | ||||
</author> | ||||
<author fullname="Barak Gafni"> | ||||
<organization>Nvidia</organization> | ||||
</author> | ||||
<author fullname="Tianran Zhou"> | ||||
<organization>Huawei</organization> | ||||
</author> | ||||
<author fullname="Zhenbin Li"> | ||||
<organization>Huawei</organization> | ||||
</author> | ||||
<author fullname="Frank Brockners"> | ||||
<organization>Cisco</organization> | ||||
</author> | ||||
<author fullname="Shwetha Bhandari" role="editor"> | ||||
<organization>Thoughtspot</organization> | ||||
</author> | ||||
<author fullname="Ramesh Sivakolundu"> | ||||
<organization>Cisco</organization> | ||||
</author> | ||||
<author fullname="Tal Mizrahi" role="editor"> | ||||
<organization>Huawei</organization> | ||||
</author> | ||||
<date month="October" day="13" year="2021" /> | ||||
</front> | ||||
<seriesInfo name="Internet-Draft" value="draft-ietf-ippm-ioam-direct-export-0 | ||||
7" /> | ||||
</reference> | </reference> | |||
<reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en"> | ||||
<front> | <!-- [I-D.pedro-nmrg-anticipated-adaptation] IESG state Expired. Note: in | |||
<title>ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet based networks, 2 | cluded the long form as the editor role was missing --> | |||
015</title> | <reference anchor="NMRG-ANTICIPATED-ADAPTATION"> | |||
<author/> | <front> | |||
<date/> | <title>Exploiting External Event Detectors to Anticipate Resource Requirem | |||
</front> | ents for the Elastic Adaptation of SDN/NFV Systems</title> | |||
<author fullname="Pedro Martinez-Julia" role="editor"> | ||||
<organization>NICT</organization> | ||||
</author> | ||||
<date month="June" day="29" year="2018" /> | ||||
</front> | ||||
<seriesInfo name="Internet-Draft" value="draft-pedro-nmrg-anticipated-adaptat | ||||
ion-02" /> | ||||
</reference> | </reference> | |||
</references> | <!-- [I-D.song-ippm-postcard-based-telemetry] IESG state I-D Exists --> | |||
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-ip | ||||
pm-postcard-based-telemetry.xml"/> | ||||
<section title="A Survey on Existing Network Telemetry Techniques"> | <!-- [I-D.song-opsawg-ifit-framework] IESG state I-D Exists --> | |||
<t>In this non-normative appendix, we provide an overview of some existing techn | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-op | |||
iques and standard proposals for each network telemetry module.</t> | sawg-ifit-framework.xml"/> | |||
<section title="Management Plane Telemetry"> | ||||
<section title="Push Extensions for NETCONF"> | <!-- [I-D.irtf-nmrg-ibn-concepts-definitions] IESG state I-D Exists --> | |||
<t><xref target="RFC6241">NETCONF</xref> is a popular network management protoco | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.irtf-nm | |||
l recommended by IETF. Its core strength is for managing configuration, but can | rg-ibn-concepts-definitions.xml"/> | |||
also be used for data collection. <xref target="RFC8641">YANG-Push</xref> <xref | ||||
target="RFC8639"/> extends NETCONF and enables subscriber applications to reques | <!-- [I-D.wwx-netmod-event-yang] FYI: I-D.wwx-netmod-event-yang (Expired) was re | |||
t a continuous, customized stream of updates from a YANG datastore. Providing su | placed by I-D.ietf-netmod-eca-policy - IESG state Expired --> | |||
ch visibility into changes made upon YANG configuration and operational objects | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | |||
enables new capabilities based on the remote mirroring of configuration and oper | tmod-eca-policy.xml"/> | |||
ational state. Moreover, <xref target="I-D.ietf-netconf-distributed-notif">distr | ||||
ibuted data collection mechanism</xref> via <xref target="I-D.ietf-netconf-udp-n | <reference anchor="gpb" target="https://developers.google.com/protocol-buf | |||
otif">UDP based publication channel</xref> provides enhanced efficiency for the | fers"> | |||
NETCONF based telemetry.</t> | <front> | |||
</section> | <title>Protocol Buffers</title> | |||
<section title="gRPC Network Management Interface"> | <author><organization>Google Developers</organization></author> | |||
<t><xref target="gnmi">gRPC Network Management Interface (gNMI)</xref> is a netw | <date/> | |||
ork management protocol based on the <xref target="grpc">gRPC</xref> RPC (Remote | </front> | |||
Procedure Call) framework. With a single gRPC service definition, both configur | </reference> | |||
ation and telemetry can be covered. gRPC is an <xref target="RFC7540">HTTP/2</xr | ||||
ef>-based open-source micro-service communication framework. It provides a numbe | <reference anchor="grpc" target="https://grpc.io"> | |||
r of capabilities which are well-suited for network telemetry, including: </t> | <front> | |||
<t> | <title>gPPC: A high performance, open source universal RPC framework</ | |||
<list style="symbols"> | title> | |||
<t>Full-duplex streaming transport model combined with a binary encoding mechani | <author><organization>gRPC</organization></author> | |||
sm provides good telemetry efficiency.</t> | <date/> | |||
<t>gRPC provides higher-level features consistency across platforms that common | </front> | |||
HTTP/2 libraries typically do not. This characteristic is especially valuable fo | </reference> | |||
r the fact that telemetry data collectors normally reside on a large variety of | ||||
platforms.</t> | <reference anchor="gnmi" target="https://datatracker.ietf.org/meeting/98/ma | |||
<t>The built-in load-balancing and failover mechanism.</t> | terials/slides-98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00"> | |||
</list> | <front> | |||
</t> | <title>gRPC Network Management Interface</title> | |||
</section> | <author initials="R." surname="Shakir" fullname="Rob Shakir"> | |||
</section> | <organization/> | |||
<section title="Control Plane Telemetry"> | </author> | |||
<section title="BGP Monitoring Protocol"> | <author initials="A." surname="Shaikh" fullname="Anees Shaikh"> | |||
<t><xref target="RFC7854">BGP Monitoring Protocol (BMP)</xref> is used to monito | <organization/> | |||
r BGP sessions and is intended to provide a convenient interface for obtaining r | </author> | |||
oute views. </t> | <author initials="P." surname="Borman" fullname="Paul Borman"> | |||
<t>The BGP routing information is collected from the monitored device(s) to the | <organization/> | |||
BMP monitoring station by setting up the BMP TCP session. The BGP peers are moni | </author> | |||
tored by the BMP Peer Up and Peer Down Notifications. The BGP routes (including | <author initials="M." surname="Hines" fullname="Marcus Hines"> | |||
<xref target="RFC7854"> Adjacency_RIB_In </xref>, <xref target="RFC8671"> Adjace | <organization/> | |||
ncy_RIB_out</xref>, and <xref target="I-D.ietf-grow-bmp-local-rib">Local_Rib</xr | </author> | |||
ef>) are encapsulated in the BMP Route Monitoring Message and the BMP Route Mirr | <author initials="C." surname="Lebsack" fullname="Carl Lebsack"> | |||
oring Message, providing both an initial table dump and real-time route updates. | <organization/> | |||
In addition, BGP statistics are reported through the BMP Stats Report Message, | </author> | |||
which could be either timer triggered or event-driven. Future BMP extensions cou | <author initials="C." surname="Marrow" fullname="Chris Morrow"> | |||
ld further enrich BGP monitoring applications. | <organization/> | |||
</t> | </author> | |||
</section> | <date month="March" year="2017"/> | |||
</section> | </front> | |||
<section title="Data Plane Telemetry"> | <refcontent>IETF 98</refcontent> | |||
<section title="The Alternate Marking (AM) technology"> | </reference> | |||
<t>The Alternate Marking method enables efficient measurements of packet loss, d | ||||
elay, and jitter both in IP and Overlay Networks, as presented in <xref target=" | <reference anchor="W3C.REC-xml-20081126" target="https://www.w3.org/TR/2008/RE | |||
RFC8321"/> and <xref target="RFC8889"/>. </t> | C-xml-20081126"> | |||
<t>This technique can be applied to point-to-point and multipoint-to-multipoint | <front> | |||
flows. Alternate Marking creates batches of packets by alternating the value of | <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title> | |||
1 bit (or a label) of the packet header. These batches of packets are unambiguou | <author initials="T." surname="Bray" fullname="Tim Bray"> | |||
sly recognized over the network and the comparison of packet counters for each b | <organization showOnFrontPage="true"/> | |||
atch allows the packet loss calculation. The same idea can be applied to delay m | </author> | |||
easurement by selecting ad hoc packets with a marking bit dedicated for delay me | <author initials="J." surname="Paoli" fullname="Jean Paoli"> | |||
asurements.</t> | <organization showOnFrontPage="true"/> | |||
<t>Alternate Marking method needs two counters each marking period for each flow | </author> | |||
under monitor. For instance, by considering n measurement points and m monitore | <author initials="M." surname="Sperberg-McQueen" fullname="Michael S | |||
d flows, the order of magnitude of the packet counters for each time interval is | perberg-McQueen"> | |||
n*m*2 (1 per color).</t> | <organization showOnFrontPage="true"/> | |||
<t>Since networks offer rich sets of network performance measurement data (e.g., | </author> | |||
packet counters), conventional approaches run into limitations. The bottleneck | <author initials="E." surname="Maler" fullname="Eve Maler"> | |||
is the generation and export of the data and the amount of data that can be reas | <organization showOnFrontPage="true"/> | |||
onably collected from the network. In addition, management tasks related to dete | </author> | |||
rmining and configuring which data to generate lead to significant deployment ch | <author initials="F." surname="Yergeau" fullname="Francois Yergeau"> | |||
allenges.</t> | <organization showOnFrontPage="true"/> | |||
<t>The Multipoint Alternate Marking approach, described in <xref target="RFC8889 | </author> | |||
"/>, aims to resolve this issue and make the performance monitoring more flexibl | <date month="November" year="2008"/> | |||
e in case a detailed analysis is not needed. </t> | </front> | |||
<t>An application orchestrates network performance measurements tasks across the | <refcontent>World Wide Web Consortium Recommendation REC-xml-20081126</ | |||
network to allow for optimized monitoring. The application can choose how roug | refcontent> | |||
hly or precisely to configure measurement points depending on the application's | </reference> | |||
requirements.</t> | ||||
<t>Using Alternate Marking, it is possible to monitor a Multipoint Network witho | <reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en" | |||
ut in depth examination by using the Network Clustering (subnetworks that are po | > | |||
rtions of the entire network that preserve the same property of the entire netwo | <front> | |||
rk, called clusters). So in the case that there is packet loss or the delay is | <title>Operations, administration and maintenance (OAM) functions and | |||
too high then the specific filtering criteria could be applied to gather a more | mechanisms for Ethernet-based networks</title> | |||
detailed analysis by using a different combination of clusters up to a per-flow | <author><organization>ITU-T</organization></author> | |||
measurement as described in <xref target="RFC8321">Alternate-Marking (AM)</xref> | <date month="August" year="2015"/> | |||
. </t> | </front> | |||
<t>In summary, an application can configure end-to-end network monitoring. If th | <seriesInfo name="ITU-T Recommendation" value="G.8013/Y.1731"/> | |||
e network does not experience issues, this approximate monitoring is good enough | </reference> | |||
and is very cheap in terms of network resources. However, in case of problems, | </references> | |||
the application becomes aware of the issues from this approximate monitoring and | ||||
, in order to localize the portion of the network that has issues, configures th | <section numbered="true" toc="default"> | |||
e measurement points more extensively, allowing more detailed monitoring to be p | <name>A Survey on Existing Network Telemetry Techniques</name> | |||
erformed. After the detection and resolution of the problem, the initial approxi | <t>In this non-normative appendix, we provide an overview of some existing | |||
mate monitoring can be used again.</t> | techniques and standard proposals for each network telemetry module.</t> | |||
</section> | <section numbered="true" toc="default"> | |||
<section title="Dynamic Network Probe"> | <name>Management Plane Telemetry</name> | |||
<t>Hardware-based <xref target="I-D.song-opsawg-dnp4iq">Dynamic Network Probe (D | <section numbered="true" toc="default"> | |||
NP)</xref> proposes a programmable means to customize the data that an applicati | <name>Push Extensions for NETCONF</name> | |||
on collects from the data plane. A direct benefit of DNP is the reduction of the | <t><xref target="RFC6241" format="default">NETCONF</xref> is a popular | |||
exported data. A full DNP solution covers several components including data sou | network management protocol recommended by IETF. Its core strength is for manag | |||
rce, data subscription, and data generation. The data subscription needs to defi | ing configuration, but it can also be used for data collection. <xref target="RF | |||
ne the derived data which can be composed and derived from the raw data sources. | C8639" format="default">YANG-Push</xref> <xref target="RFC8641" format="default" | |||
The data generation takes advantage of the moderate in-network computing to pro | /> extends NETCONF and enables subscriber applications to request a continuous, | |||
duce the desired data.</t> | customized stream of updates from a YANG datastore. Providing such visibility in | |||
<t>While DNP can introduce unforeseeable flexibility to the data plane telemetry | to changes made upon YANG configuration and operational objects enables new capa | |||
, it also faces some challenges. It requires a flexible data plane that can be d | bilities based on the remote mirroring of configuration and operational state. M | |||
ynamically reprogrammed at run-time. The programming API is yet to be defined.</ | oreover, a <xref target="I-D.ietf-netconf-distributed-notif" format="default">di | |||
t> | stributed data collection mechanism</xref> via a <xref target="I-D.ietf-netconf- | |||
</section> | udp-notif" format="default">UDP-based publication channel</xref> provides enhanc | |||
<section title="IP Flow Information Export (IPFIX) Protocol"> | ed efficiency for the NETCONF-based telemetry.</t> | |||
<t>Traffic on a network can be seen as a set of flows passing through network el | </section> | |||
ements. | <section numbered="true" toc="default"> | |||
<xref target="RFC7011">IP Flow Information Export (IPFIX) </xref> | <name>gRPC Network Management Interface</name> | |||
provides a means of transmitting traffic flow information for administrative or | <t><xref target="gnmi" format="default">gRPC Network Management Interf | |||
other purposes. A typical IPFIX enabled system includes a pool of Metering Proce | ace (gNMI)</xref> is a network management protocol based on the <xref target="gr | |||
sses that collects data packets at one or more Observation Points, optionally fi | pc" format="default">gRPC</xref> Remote Procedure Call (RPC) framework. With a s | |||
lters them and aggregates information about these packets. An Exporter then gath | ingle gRPC service definition, both configuration and telemetry can be covered. | |||
ers each of the Observation Points together into an Observation Domain and sends | gRPC is an open-source micro-service communication framework based on <xref targ | |||
this information via the IPFIX protocol to a Collector.</t> | et="RFC7540" format="default">HTTP/2</xref>. It provides a number of capabilitie | |||
</section> | s that are well-suited for network telemetry, including: </t> | |||
<section title="In-Situ OAM"> | <ul spacing="normal"> | |||
<t>Classical passive and active monitoring and measurement techniques are either | <li>A full-duplex streaming transport model; when combined with a bi | |||
inaccurate or resource-consuming. It is preferable to directly acquire data ass | nary encoding mechanism, it provides good telemetry efficiency.</li> | |||
ociated with a flow's packets when the packets pass through a network. <xref tar | <li>A higher-level feature consistency across platforms that common | |||
get="I-D.ietf-ippm-ioam-data">In-situ OAM (iOAM)</xref>, a data generation techn | HTTP/2 libraries typically do not provide. This characteristic is especially val | |||
ique, embeds a new instruction header to user packets and the instruction direct | uable for the fact that telemetry data collectors normally reside on a large var | |||
s the network nodes to add the requested data to the packets. Thus, at the path | iety of platforms.</li> | |||
end, the packet's experience gained on the entire forwarding path can be collect | <li>A built-in load-balancing and failover mechanism.</li> | |||
ed. Such firsthand data is invaluable to many network OAM applications.</t> | </ul> | |||
<t>However, iOAM also faces some challenges. The issues on performance impact, s | </section> | |||
ecurity, scalability and overhead limits, encapsulation difficulties in some pro | </section> | |||
tocols, and cross-domain deployment need to be addressed.</t> | <section numbered="true" toc="default"> | |||
</section> | <name>Control Plane Telemetry</name> | |||
<section anchor="pbt" title="Postcard Based Telemetry"> | <section numbered="true" toc="default"> | |||
<t>The postcard-based telemetry, as embodied in <xref target="I-D.ietf-ippm-ioam | <name>BGP Monitoring Protocol</name> | |||
-direct-export">IOAM DEX</xref> and <xref target="I-D.song-ippm-postcard-based-t | <t><xref target="RFC7854" format="default">BMP</xref> is used to monit | |||
elemetry">IOAM Marking</xref>, is a complementary technique to the passport-base | or BGP sessions and is intended to provide a convenient interface for obtaining | |||
d IOAM. PBT directly exports data at each node through an independent packet. At | route views. </t> | |||
the cost of higher bandwidth overhead and the need for data correlation, PBT sh | <t>BGP routing information is collected from the monitored device(s) t | |||
ows several unique advantages. It can also help to identify packet drop location | o the BMP monitoring station by setting up the BMP TCP session. The BGP peers ar | |||
in case a packet is dropped on its forwarding path.</t> | e monitored by the BMP Peer Up and Peer Down notifications. The BGP routes (incl | |||
</section> | uding <xref target="RFC7854" format="default"> Adj_RIB_In </xref>, <xref target= | |||
<section title="Existing OAM for Specific Data Planes"> | "RFC8671" format="default"> Adj_RIB_out</xref>, and <xref target="RFC9069" forma | |||
<t> | t="default">local RIB</xref>) are encapsulated in the BMP Route Monitoring Messa | |||
Various data planes raise unique OAM requirements. IETF has published OAM techni | ge and the BMP Route Mirroring Message, providing both an initial table dump and | |||
que and framework documents (e.g., <xref target="RFC8924" /> and <xref target="R | real-time route updates. In addition, BGP statistics are reported through the B | |||
FC5085" />) targeting different data planes such as Multi-Protocol Label Switchi | MP Stats Report Message, which could be either timer triggered or event-driven. | |||
ng (MPLS), L2 Virtual Private Network (L2-VPN), Network Virtualization Overlays | Future BMP extensions could further enrich BGP monitoring applications. | |||
(NVO3), Virtual Extensible LAN (VXLAN), Bit Indexed Explicit Replication (BIER), | ||||
Service Function Chaining (SFC), Segment Routing (SR), and Deterministic Networ | ||||
king (DETNET). The aforementioned data plane telemetry techniques can be used to | ||||
enhance the OAM capability on such data planes. | ||||
</t> | </t> | |||
</section> | </section> | |||
</section> | </section> | |||
<section title="External Data and Event Telemetry"> | <section numbered="true" toc="default"> | |||
<section title="Sources of External Events"> | <name>Data Plane Telemetry</name> | |||
<t>To ensure that the information provided by external event detectors and used | <section numbered="true" toc="default"> | |||
by the network management solutions is meaningful for management purposes, the n | <name>Alternate-Marking (AM) Technology</name> | |||
etwork telemetry framework must ensure that such detectors (sources) are easily | <t>The Alternate-Marking method enables efficient measurements of pack | |||
connected to the management solutions (sinks). This requires the specification o | et loss, delay, and jitter both in IP and Overlay Networks, as presented in <xre | |||
f a list of potential external data sources that could be of interest in network | f target="RFC8321" format="default"/> and <xref target="RFC8889" format="default | |||
management and match it to the connectors and/or interfaces required to connect | "/>. </t> | |||
them.</t> | <t>This technique can be applied to point-to-point and multipoint-to-m | |||
<t>Categories of external event sources that may be of interest to network manag | ultipoint flows. Alternate Marking creates batches of packets by alternating the | |||
ement include::</t> | value of 1 bit (or a label) of the packet header. These batches of packets are | |||
<t> | unambiguously recognized over the network, and the comparison of packet counters | |||
<list style="symbols"> | for each batch allows the packet loss calculation. The same idea can be applied | |||
<t>Smart objects and sensors. With the consolidation of the Internet of Things~( | to delay measurement by selecting ad hoc packets with a marking bit dedicated f | |||
IoT) any network system will have many smart objects attached to its physical su | or delay measurements.</t> | |||
rroundings and logical operation environments. Most of these objects will be ess | <t>The Alternate-Marking method needs two counters each marking period | |||
entially based on sensors of many kinds (e.g., temperature, humidity, presence) | for each flow under monitor. For instance, by considering n measurement points | |||
and the information they provide can be very useful for the management of the ne | and m monitored flows, the order of magnitude of the packet counters for each ti | |||
twork, even when they are not specifically deployed for such purpose. Elements o | me interval is n*m*2 (1 per color).</t> | |||
f this source type will usually provide a specific protocol for interaction, esp | <t>Since networks offer rich sets of network performance measurement d | |||
ecially one of those protocols related to IoT, such as the Constrained Applicati | ata (e.g., packet counters), conventional approaches run into limitations. The b | |||
on Protocol (CoAP).</t> | ottleneck is the generation and export of the data and the amount of data that c | |||
<t>Online news reporters. Several online news services have the ability to provi | an be reasonably collected from the network. In addition, management tasks relat | |||
de enormous quantity of information about different events occurring in the worl | ed to determining and configuring which data to generate lead to significant dep | |||
d. Some of those events can impact on the network system managed by a specific f | loyment challenges.</t> | |||
ramework and, therefore, such information may be of interest to the management s | <t>The Multipoint Alternate-Marking approach, described in <xref targe | |||
olution. For instance, diverse security reports, such as the Common Vulnerabilit | t="RFC8889" format="default"/>, aims to resolve this issue and make the performa | |||
ies and Exposures (CVE), can be issued by the corresponding authority and used b | nce monitoring more flexible in case a detailed analysis is not needed. </t> | |||
y the management solution to update the managed system if needed. Instead of a s | <t>An application orchestrates network performance measurement tasks a | |||
pecific protocol and data format, the sources of this kind of information usuall | cross the network to allow for optimized monitoring. The application can choose | |||
y follow a relaxed but structured format. This format will be part of both the o | how roughly or precisely to configure measurement points depending on the appli | |||
ntology and information model of the telemetry framework.</t> | cation's requirements.</t> | |||
<t>Global event analyzers. The advance of Big Data analyzers provides a huge amo | <t>Using Alternate Marking, it is possible to monitor a Multipoint Net | |||
unt of information and, more interestingly, the identification of events detecte | work without in-depth examination by using Network Clustering (subnetworks that | |||
d by analyzing many data streams from different origins. In contrast with the ot | are portions of the entire network that preserve the same property of the entire | |||
her types of sources, which are focused on specific events, the detectors of thi | network, called clusters). So in the case where there is packet loss or the de | |||
s source type will detect generic events. For example, during a sport event some | lay is too high, the specific filtering criteria could be applied to gather a mo | |||
unexpected movement makes it fascinating and many people connect to sites that | re detailed analysis by using a different combination of clusters up to a per-fl | |||
are reporting on the event. The underlying networks supporting the services that | ow measurement as described in the Alternate-Marking document <xref target="RFC8 | |||
cover the event can be affected by such situation, so their management solution | 321" format="default"/>. </t> | |||
s should be aware of it. In contrast with the other source types, a new informat | <t>In summary, an application can configure end-to-end network monitor | |||
ion model, format, and reporting protocol is required to integrate the detectors | ing. If the network does not experience issues, this approximate monitoring is g | |||
of this type with the management solution.</t> | ood enough and is very cheap in terms of network resources. However, in case of | |||
</list> | problems, the application becomes aware of the issues from this approximate moni | |||
toring and, in order to localize the portion of the network that has issues, con | ||||
figures the measurement points more extensively, allowing more detailed monitori | ||||
ng to be performed. After the detection and resolution of the problem, the initi | ||||
al approximate monitoring can be used again.</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Dynamic Network Probe</name> | ||||
<t>A hardware-based <xref target="OPSAWG-DNP4IQ" format="default">Dyna | ||||
mic Network Probe (DNP)</xref> provides a programmable means to customize the da | ||||
ta that an application collects from the data plane. A direct benefit of DNP is | ||||
the reduction of the exported data. A full DNP solution covers several component | ||||
s including data source, data subscription, and data generation. The data subscr | ||||
iption needs to define the derived data that can be composed and derived from ra | ||||
w data sources. The data generation takes advantage of the moderate in-network c | ||||
omputing to produce the desired data.</t> | ||||
<t>While DNP can introduce unforeseeable flexibility to the data plane | ||||
telemetry, it also faces some challenges. It requires a flexible data plane tha | ||||
t can be dynamically reprogrammed at runtime. The programming Application Progra | ||||
mming Interface (API) is yet to be defined.</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>IP Flow Information Export (IPFIX) Protocol</name> | ||||
<t>Traffic on a network can be seen as a set of flows passing through | ||||
network elements. | ||||
<xref target="RFC7011" format="default">IPFIX </xref> | ||||
provides a means of transmitting traffic flow information for administrative or | ||||
other purposes. A typical IPFIX-enabled system includes a pool of Metering Proce | ||||
sses that collects data packets at one or more Observation Points, optionally fi | ||||
lters them, and aggregates information about these packets. An Exporter then gat | ||||
hers each of the Observation Points together into an Observation Domain and send | ||||
s this information via the IPFIX protocol to a Collector.</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>In Situ OAM</name> | ||||
<t>Classical passive and active monitoring and measurement techniques | ||||
are either inaccurate or resource consuming. It is preferable to directly acquir | ||||
e data associated with a flow's packets when the packets pass through a network. | ||||
<xref target="RFC9197" format="default">IOAM</xref>, a data generation techniqu | ||||
e, embeds a new instruction header to user packets, and the instruction directs | ||||
the network nodes to add the requested data to the packets. Thus, at the path's | ||||
end, the packet's experience gained on the entire forwarding path can be collect | ||||
ed. Such firsthand data is invaluable to many network OAM applications.</t> | ||||
<t>However, IOAM also faces some challenges. The issues on performance | ||||
impact, security, scalability and overhead limits, encapsulation difficulties i | ||||
n some protocols, and cross-domain deployment need to be addressed.</t> | ||||
</section> | ||||
<section anchor="pbt" numbered="true" toc="default"> | ||||
<name>Postcard-Based Telemetry</name> | ||||
<t>The postcard-based telemetry, as embodied in <xref target="IPPM-IOA | ||||
M-DIRECT-EXPORT" format="default">IOAM Direct Export (DEX)</xref> and <xref targ | ||||
et="I-D.song-ippm-postcard-based-telemetry" format="default">IOAM Marking</xref> | ||||
, is a complementary technique to the passport-based IOAM <xref target="RFC9197" | ||||
format="default"/>. PBT directly exports data at each node through an independe | ||||
nt packet. At the cost of higher bandwidth overhead and the need for data correl | ||||
ation, PBT shows several unique advantages. It can also help to identify packet | ||||
drop location in case a packet is dropped on its forwarding path.</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Existing OAM for Specific Data Planes</name> | ||||
<t> | ||||
Various data planes raise unique OAM requirements. IETF has published OAM techni | ||||
que and framework documents (e.g., <xref target="RFC8924" format="default"/> and | ||||
<xref target="RFC5085" format="default"/>) targeting different data planes such | ||||
as Multiprotocol Label Switching (MPLS), L2 Virtual Private Network (VPN), Netw | ||||
ork Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN), Bit Inde | ||||
x Explicit Replication (BIER), Service Function Chaining (SFC), Segment Routing | ||||
(SR), and Deterministic Networking (DETNET). The aforementioned data plane telem | ||||
etry techniques can be used to enhance the OAM capability on such data planes. | ||||
</t> | </t> | |||
<t>Additional types of detector types can be added to the system, but they will | </section> | |||
be generally the result of composing the properties offered by these main classe | </section> | |||
s.</t> | <section numbered="true" toc="default"> | |||
</section> | <name>External Data and Event Telemetry</name> | |||
<section title="Connectors and Interfaces"> | <section numbered="true" toc="default"> | |||
<t>For allowing external event detectors to be properly integrated with other ma | <name>Sources of External Events</name> | |||
nagement solutions, both elements must expose interfaces and protocols that are | <t>To ensure that the information provided by external event detectors | |||
subject to their particular objective. Since external event detectors will be fo | and used by the network management solutions is meaningful for management purpo | |||
cused on providing their information to their main consumers, which generally wi | ses, the network telemetry framework must ensure that such detectors (sources) a | |||
ll not be limited to the network management solutions, the framework must includ | re easily connected to the management solutions (sinks). This requires the speci | |||
e the definition of the required connectors for ensuring the interconnection bet | fication of a list of potential external data sources that could be of interest | |||
ween detectors (sources) and their consumers within the management systems (sink | in network management and matching it to the connectors and/or interfaces requir | |||
s) are effective.</t> | ed to connect them.</t> | |||
<t>In some situations, the interconnection between the external event detectors | <t>Categories of external event sources that may be of interest to net | |||
and the management system is via the management plane. For those situations ther | work management include:</t> | |||
e will be a special connector that provides the typical interfaces found in most | <ul spacing="normal"> | |||
other elements connected to the management plane. For instance, the interfaces | <li>Smart objects and sensors. With the consolidation of the Interne | |||
could accomplish this with a specific data model (YANG) and specific telemetry p | t of Things (IoT), any network system will have many smart objects attached to i | |||
rotocol, such as NETCONF, YANG-Push, or gRPC.</t> | ts physical surroundings and logical operation environments. Most of these objec | |||
</section> | ts will be essentially based on sensors of many kinds (e.g., temperature, humidi | |||
</section> | ty, and presence), and the information they provide can be very useful for the m | |||
</section> | anagement of the network, even when they are not specifically deployed for such | |||
</back> | purpose. Elements of this source type will usually provide a specific protocol f | |||
or interaction, especially one of the protocols related to IoT, such as the Cons | ||||
trained Application Protocol (CoAP).</li> | ||||
<li>Online news reporters. Several online news services have the abi | ||||
lity to provide an enormous quantity of information about different events occur | ||||
ring in the world. Some of those events can have an impact on the network system | ||||
managed by a specific framework; therefore, such information may be of interest | ||||
to the management solution. For instance, diverse security reports, such as Com | ||||
mon Vulnerabilities and Exposures (CVEs), can be issued by the corresponding aut | ||||
hority and used by the management solution to update the managed system, if need | ||||
ed. Instead of a specific protocol and data format, the sources of this kind of | ||||
information usually follow a relaxed but structured format. This format will be | ||||
part of both the ontology and information model of the telemetry framework.</li> | ||||
<li>Global event analyzers. The advance of big data analyzers provid | ||||
es a huge amount of information and, more interestingly, the identification of e | ||||
vents detected by analyzing many data streams from different origins. In contras | ||||
t with the other types of sources, which are focused on specific events, the det | ||||
ectors of this source type will detect generic events. For example, during a spo | ||||
rts event, some unexpected movement makes it fascinating, and many people connec | ||||
t to sites that are reporting on the event. The underlying networks supporting t | ||||
he services that cover the event can be affected by such situation, so their man | ||||
agement solutions should be aware of it. In contrast with the other source types | ||||
, a new information model, format, and reporting protocol is required to integra | ||||
te the detectors of this type with the management solution.</li> | ||||
</ul> | ||||
<t>Additional detector types can be added to the system, but generally | ||||
they will be the result of composing the properties offered by these main class | ||||
es.</t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Connectors and Interfaces</name> | ||||
<t>For allowing external event detectors to be properly integrated wit | ||||
h other management solutions, both elements must expose interfaces and protocols | ||||
that are subject to their particular objective. Since external event detectors | ||||
will be focused on providing their information to their main consumers, which ge | ||||
nerally will not be limited to the network management solutions, the framework m | ||||
ust include the definition of the required connectors for ensuring the interconn | ||||
ection between detectors (sources) and their consumers within the management sys | ||||
tems (sinks) are effective.</t> | ||||
<t>In some situations, the interconnection between external event dete | ||||
ctors and the management system is via the management plane. For those situation | ||||
s, there will be a special connector that provides the typical interfaces found | ||||
in most other elements connected to the management plane. For instance, the inte | ||||
rfaces could accomplish this with a specific data model (YANG) and specific tele | ||||
metry protocol, such as NETCONF, YANG-Push, or gRPC.</t> | ||||
</section> | ||||
<section anchor="Acknowledgments" numbered="false" toc="default"> | ||||
<name>Acknowledgments</name> | ||||
<t>We would like to thank <contact fullname="Rob Wilton"/>, <contact fulln | ||||
ame="Greg Mirsky"/>, <contact fullname="Randy Presuhn"/>, <contact fullname="Joe | ||||
Clarke"/>, <contact fullname="Victor Liu"/>, <contact fullname="James Guichard" | ||||
/>, <contact fullname="Uri Blumenthal"/>, <contact fullname="Giuseppe Fioccola"/ | ||||
>, <contact fullname="Yunan Gu"/>, <contact fullname="Parviz Yegani"/>, <contact | ||||
fullname="Young Lee"/>, <contact fullname="Qin Wu"/>, <contact fullname="Gyan M | ||||
ishra"/>, <contact fullname="Ben Schwartz"/>, <contact fullname="Alexey Melnikov | ||||
"/>, <contact fullname="Michael Scharf"/>, <contact fullname="Dhruv Dhody"/>, <c | ||||
ontact fullname="Martin Duke"/>, <contact fullname="Roman Danyliw"/>, <contact f | ||||
ullname="Warren Kumari"/>, <contact fullname="Sheng Jiang"/>, <contact fullname= | ||||
"Lars Eggert"/>, <contact fullname="Éric Vyncke"/>, <contact fullname="Jean-Mich | ||||
el Combes"/>, <contact fullname="Erik Kline"/>, <contact fullname="Benjamin Kadu | ||||
k"/>, and many others who have provided helpful comments and suggestions to impr | ||||
ove this document.</t> | ||||
</section> | ||||
<section anchor="Contributors" numbered="false" toc="default"> | ||||
<name>Contributors</name> | ||||
<t> The other contributors of this document are <contact fullname="Tianran | ||||
Zhou"/>, <contact fullname="Zhenbin Li"/>, <contact fullname="Zhenqiang Li"/>, | ||||
<contact fullname="Daniel King"/>, <contact fullname="Adrian Farrel"/>, and <con | ||||
tact fullname="Alexander Clemm"/>.</t> | ||||
</section> | ||||
</section> | ||||
</section> | ||||
</back> | ||||
</rfc> | </rfc> | |||
End of changes. 29 change blocks. | ||||
1401 lines changed or deleted | 1641 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |