rfc9638.original | rfc9638.txt | |||
---|---|---|---|---|
NVO3 Working Group S. Boutros, Ed. | Internet Engineering Task Force (IETF) S. Boutros, Ed. | |||
Internet-Draft Ciena Corporation | Request for Comments: 9638 Ciena Corporation | |||
Intended status: Informational D. Eastlake, Ed. | Category: Informational D. Eastlake 3rd, Ed. | |||
Expires: 22 August 2024 Futurewei Technologies | ISSN: 2070-1721 Independent | |||
19 February 2024 | September 2024 | |||
Network Virtualization Overlays (NVO3) Encapsulation Considerations | Network Virtualization over Layer 3 (NVO3) Encapsulation Considerations | |||
draft-ietf-nvo3-encap-12 | ||||
Abstract | Abstract | |||
The IETF Network Virtualization Overlays (NVO3) Working Group | The IETF Network Virtualization Overlays (NVO3) Working Group | |||
developed considerations for a common encapsulation that addresses | developed considerations for a common encapsulation that addresses | |||
various network virtualization overlay technical concerns. This | various network virtualization overlay technical concerns. This | |||
document provides a record, for the benefit of the IETF community, of | document provides a record, for the benefit of the IETF community, of | |||
the considerations arrived at starting from the output of an NVO3 | the considerations arrived at by the NVO3 Working Group starting from | |||
encapsulation design team. These considerations may be helpful with | the output of the NVO3 encapsulation Design Team. These | |||
future deliberations by working groups over the choice of | considerations may be helpful with future deliberations by working | |||
encapsulation formats. | groups over the choice of encapsulation formats. | |||
There are implications of having different encapsulations in real | There are implications of having different encapsulations in real | |||
environments consisting of both software and hardware implementations | environments consisting of both software and hardware implementations | |||
and within and spanning multiple data centers. For example, OAM | and within and spanning multiple data centers. For example, | |||
functions such as path MTU discovery become challenging with multiple | Operations, Administration, and Maintenance (OAM) functions such as | |||
encapsulations along the data path. | path MTU discovery become challenging with multiple encapsulations | |||
along the data path. | ||||
Based on these considerations, the Working Group determined that | Based on these considerations, the NVO3 Working Group determined that | |||
Geneve with a few modifications as the common encapsulation. This | Generic Network Virtualization Encapsulation (Geneve) with a few | |||
document provides more details, particularly in Section 7. | modifications is the common encapsulation. This document provides | |||
more details, particularly in Section 7. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
approved by the IESG are candidates for any level of Internet | ||||
Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 22 August 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9638. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
2. Design Team and Working Group Process . . . . . . . . . . . . 3 | 2. Design Team and Working Group Process | |||
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Terminology | |||
4. Abbreviations and Acronyms . . . . . . . . . . . . . . . . . 4 | 4. Abbreviations, Acronyms, and Definitions | |||
5. Encapsulation Issues and Background . . . . . . . . . . . . . 5 | 5. Encapsulation Issues and Background | |||
5.1. Geneve . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 5.1. Geneve | |||
5.2. Generic UDP Encapsulation (GUE) . . . . . . . . . . . . . 6 | 5.2. Generic UDP Encapsulation (GUE) | |||
5.3. Generic Protocol Extension (GPE) for VXLAN . . . . . . . 7 | 5.3. Generic Protocol Extension (GPE) for VXLAN | |||
6. Common Encapsulation Considerations . . . . . . . . . . . . . 8 | 6. Common Encapsulation Considerations | |||
6.1. Current Encapsulations . . . . . . . . . . . . . . . . . 9 | 6.1. Current Encapsulations | |||
6.2. Useful Extensions Use Cases . . . . . . . . . . . . . . . 9 | 6.2. Useful Extensions Use Cases | |||
6.2.1. Telemetry Extensions . . . . . . . . . . . . . . . . 9 | 6.2.1. Telemetry Extensions | |||
6.2.2. Security/Integrity Extensions . . . . . . . . . . . . 10 | 6.2.2. Security/Integrity Extensions | |||
6.2.3. Group Based Policy . . . . . . . . . . . . . . . . . 10 | 6.2.3. Group-Based Policy | |||
6.3. Hardware Considerations . . . . . . . . . . . . . . . . . 10 | 6.3. Hardware Considerations | |||
6.4. Extension Size . . . . . . . . . . . . . . . . . . . . . 11 | 6.4. Extension Size | |||
6.5. Ordering of Extension Headers . . . . . . . . . . . . . . 12 | 6.5. Ordering of Extension Headers | |||
6.6. TLV versus Bit Fields . . . . . . . . . . . . . . . . . . 12 | 6.6. TLV versus Bit Fields | |||
6.7. Control Plane Considerations . . . . . . . . . . . . . . 13 | 6.7. Control Plane Considerations | |||
6.8. Split NVE . . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.8. Split NVE | |||
6.9. Larger VNI Considerations . . . . . . . . . . . . . . . . 15 | 6.9. Larger VNI Considerations | |||
7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 15 | 7. Recommendations | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 | 8. Security Considerations | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 | 9. IANA Considerations | |||
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 | 10. References | |||
11. Normative References . . . . . . . . . . . . . . . . . . . . 18 | 10.1. Normative References | |||
12. Informative References . . . . . . . . . . . . . . . . . . . 18 | 10.2. Informative References | |||
Appendix A. Encapsulation Comparison . . . . . . . . . . . . . . 21 | Appendix A. Encapsulation Comparison | |||
A.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 21 | A.1. Overview | |||
A.2. Extensibility . . . . . . . . . . . . . . . . . . . . . . 21 | A.2. Extensibility | |||
A.2.1. Native Extensibility Support . . . . . . . . . . . . 21 | A.2.1. Innate Extensibility Support | |||
A.2.2. Extension Parsing . . . . . . . . . . . . . . . . . . 21 | A.2.2. Extension Parsing | |||
A.2.3. Critical Extensions . . . . . . . . . . . . . . . . . 22 | A.2.3. Critical Extensions | |||
A.2.4. Maximal Header Length . . . . . . . . . . . . . . . . 22 | A.2.4. Maximal Header Length | |||
A.3. Encapsulation Header . . . . . . . . . . . . . . . . . . 22 | A.3. Encapsulation Header | |||
A.3.1. Virtual Network Identifier (VNI) . . . . . . . . . . 22 | A.3.1. Virtual Network Identifier (VNI) | |||
A.3.2. Next Protocol . . . . . . . . . . . . . . . . . . . . 22 | A.3.2. Next Protocol | |||
A.3.3. Other Header Fields . . . . . . . . . . . . . . . . . 23 | A.3.3. Other Header Fields | |||
A.4. Comparison Summary . . . . . . . . . . . . . . . . . . . 23 | A.4. Comparison Summary | |||
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | Acknowledgements | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 | Contributors | |||
Authors' Addresses | ||||
1. Introduction | 1. Introduction | |||
The NVO3 Working Group is chartered to gather requirements and | The NVO3 Working Group is chartered to gather requirements and | |||
develop solutions for network virtualization data planes based on | develop solutions for network virtualization data planes based on | |||
encapsulation of virtual network traffic over an IP-based underlay | encapsulation of virtual network traffic over an IP-based underlay | |||
data plane. Requirements include due consideration for OAM and | data plane. Requirements include due consideration for OAM and | |||
security. Based on these requirements the WG was to select, extend, | security. Based on these requirements, the WG was to select, extend, | |||
and/or develop one or more data plane encapsulation format(s). | and/or develop one or more data plane encapsulation formats. | |||
This led to WG drafts and an RFC describing three encapsulations as | This led to WG Internet-Drafts and an RFC describing three | |||
follows: | encapsulations as follows: | |||
* [RFC8926] Geneve: Generic Network Virtualization Encapsulation | * "Geneve: Generic Network Virtualization Encapsulation" [RFC8926] | |||
* [ietf_intarea_gue] Generic UDP Encapsulation | * "Generic UDP Encapsulation" [GUE] | |||
* [nvo3_vxlan_gpe] Generic Protocol Extension for VXLAN (VXLAN-GPE) | * "Generic Protocol Extension for VXLAN (VXLAN-GPE)" [VXLAN-GPE] | |||
Discussion on the list and in face-to-face meetings identified a | Discussion on the list and in face-to-face meetings identified a | |||
number of technical problems with each of these encapsulations. | number of technical problems with each of these encapsulations. | |||
Furthermore, there was clear consensus at the 96th IETF meeting in | Furthermore, there was a clear consensus at the 96th IETF meeting in | |||
Berlin that, to maximize interoperability, the working group should | Berlin that the working group should progress only one data plane | |||
progress only one data plane encapsulation. In order to overcome a | encapsulation, to maximize interoperability. In order to overcome a | |||
deadlock on the encapsulation decision, the WG consensus was to form | deadlock on the encapsulation decision, the WG consensus was to form | |||
a Design Team [RFC2418] to resolve this issue and provide initial | a Design Team [RFC2418] to resolve this issue and provide initial | |||
considerations. | considerations. | |||
2. Design Team and Working Group Process | 2. Design Team and Working Group Process | |||
The Design Team was to select one of the proposed encapsulations and | The Design Team was to select one of the proposed encapsulations and | |||
enhance it to address the technical concerns. The simple evolution | enhance it to address the technical concerns. The goals were simple | |||
of deployed networks as well as applicability to all locations in the | evolution of deployed networks as well as applicability to all | |||
NVO3 architecture were goals. The Design Team was to specifically | locations in the NVO3 architecture. The Design Team was to | |||
avoid selecting a design that is burdensome on hardware | specifically select a design that allows for future extensibility but | |||
implementations but should allow future extensibility. The selected | is not burdensome on hardware implementations. The selected design | |||
design also needed to operate well with ICMP and in Equal Cost Multi- | also needed to operate well with the Internet Control Message | |||
Path (ECMP) environments. If further extensibility is required, then | Protocol (ICMP) and in Equal-Cost Multipath (ECMP) environments. If | |||
it should be done in such a manner that it does not require the | further extensibility is required, then it should be done in such a | |||
consent of an entity outside of the IETF. | manner that it does not require the consent of an entity outside of | |||
the IETF. | ||||
The output of the Design Team was then prcoessed through the working | The output of the Design Team was then processed through the working | |||
group resulting in working group consensus for this document. | group, resulting in a working group consensus for this document. | |||
3. Terminology | 3. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
4. Abbreviations and Acronyms | 4. Abbreviations, Acronyms, and Definitions | |||
The following abbreviations and acronyms are used in this document: | The following abbreviations and acronyms are used in this document: | |||
ACL - Access Control List | ACL: Access Control List | |||
DT - NVO3 encapsulation Design Team | ECMP: Equal-Cost Multipath | |||
ECMP - Equal Cost Multi-Path | EVPN: Ethernet VPN [RFC8365] | |||
EVPN - Ethernet VPN [RFC8365] | Geneve: Generic Network Virtualization Encapsulation [RFC8926] | |||
Geneve - Generic Network Virtualization Encapsulation [RFC8926] | GPE: Generic Protocol Extension | |||
GPE - Generic Protocol Extension | GUE: Generic UDP Encapsulation [GUE] | |||
GUE - Generic UDP Encapsulation [ietf_intarea_gue] | HMAC: Hash-Based Message Authentication Code [RFC2104] | |||
HMAC - Hash based keyed Message Authentication Code [RFC2104] | IEEE: Institute for Electrical and Electronic Engineers | |||
(<https://www.ieee.org/>) | ||||
IEEE - Institute for Electrical and Electronic Engineers | NIC: Network Interface Card (refers to network interface hardware | |||
(www.ieee.org) | that is not necessarily a discrete "card") | |||
NIC - Network Interface Card (refers to network interface hardware | NSH: Network Service Header [RFC8300] | |||
which is not necessarily a discrete "card") | ||||
NSH - Network Service Header [RFC8300] | NVA: Network Virtualization Authority | |||
NVA - Network Virtualization Authority | NVE: Network Virtual Edge (refers to an NVE device) | |||
NVE - Network Virtual Edge (device) | NVO3: Network Virtualization over Layer 3 | |||
NVO3 - Network Virtualization Overlays over Layer 3 | OAM: Operations, Administration, and Maintenance [RFC6291] | |||
OAM - Operations, Administration, and Maintenance [RFC6291] | PWE3: Pseudowire Emulation Edge-to-Edge | |||
PWE3 - Pseudowire Emulation Edge to Edge | ||||
TCAM - Ternary Content-Addressable Memory | TCAM: Ternary Content-Addressable Memory | |||
TLV - Type, Length, and Value | TLV: Type-Length-Value | |||
Transit device - Underlay network devices between NVE(s). | Transit device: Refers to underlay network devices between NVEs. | |||
UUID - Universally Unique Identifier | UUID: Universally Unique Identifier | |||
VNI - Virtual Network Identifier | VNI: Virtual Network Identifier | |||
VXLAN - Virtual eXtensible LAN [RFC7348] | VXLAN: Virtual eXtensible Local Area Network [RFC7348] | |||
5. Encapsulation Issues and Background | 5. Encapsulation Issues and Background | |||
The following subsections describe issues with current encapsulations | The following subsections describe issues with current encapsulations | |||
as discussed by the NVO3 WG. Numerous extensions and options have | as discussed by the NVO3 WG. Numerous extensions and options have | |||
been designed for GUE and Geneve which may help resolve some of these | been designed for GUE and Geneve that may help resolve some of these | |||
issues but have not yet been validated by the WG. | issues, but these have not yet been validated by the WG. | |||
Also included are diagrams and information on the candidate | Also included are diagrams and information on the candidate | |||
encapsulations. These are mostly copied from other documents. Since | encapsulations. These are mostly copied from other documents. Since | |||
each protocol is assumed to be sent over UDP, an initial UDP Header | each protocol is assumed to be sent over UDP, an initial UDP header | |||
is shown which would be preceded by an IPv4 or IPv6 Header. | is shown that would be preceded by an IPv4 or IPv6 header. | |||
5.1. Geneve | 5.1. Geneve | |||
The Geneve packet format, taken from [RFC8926], is shown in Figure 1 | The Geneve packet format, taken from [RFC8926], is shown in Figure 1 | |||
below. | below. | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
Outer UDP Header: | Outer UDP Header: | |||
skipping to change at page 6, line 29 ¶ | skipping to change at line 250 ¶ | |||
| Virtual Network Identifier (VNI) | Reserved | | | Virtual Network Identifier (VNI) | Reserved | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
~ Variable-Length Options ~ | ~ Variable-Length Options ~ | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 1: Geneve Header | Figure 1: Geneve Header | |||
The type of payload being carried is indicated by an Ethertype | The type of payload being carried is indicated by an Ethertype | |||
[RFC7042] in the Protocol Type field in the Geneve Header; Ethernet | [RFC9542] in the Protocol Type field in the Geneve header; Ethernet | |||
itself is represented by Ethertype 0x6558. See [RFC8926] for details | itself is represented by Ethertype 0x6558. See [RFC8926] for details | |||
concerning UDP header fields. The O bit indicates an OAM packet. | concerning UDP header fields. The O bit indicates an OAM packet. | |||
The C bit is the "Critical" bit which means that the options must be | The Geneve C bit is the "Critical" bit, which means that the options | |||
processed or the packet discarded. | must be processed or the packet discarded. | |||
Issues with Geneve [RFC8926] are as follows: | Issues with Geneve [RFC8926] are as follows: | |||
* Can't be implemented cost-effectively in all use cases because | * Geneve can't be implemented cost-effectively in all use cases | |||
variable length header and order of the TLVs makes it costly (in | because the variable-length header and order of the TLVs make it | |||
terms of number of gates) to implement in hardware. | costly (in terms of number of gates) to implement in hardware. | |||
* Header doesn't fit into largest commonly available parse buffer | * The header doesn't fit into the largest commonly available parse | |||
(256 bytes in NIC). Cannot justify doubling buffer size unless it | buffer (256 bytes in a NIC). Thus, doubling the buffer size can't | |||
is mandatory for hardware to process additional option fields. | be justified unless it is mandatory for hardware to process | |||
additional option fields. | ||||
Selection of Geneve despite these issues may be the result of the | The selection of Geneve despite these issues may be the result of the | |||
Geneve design effort assuming that the Geneve header would typically | Geneve design effort, assuming that the Geneve header would typically | |||
be delivered to a server and parsed in software. | be delivered to a server and parsed in software. | |||
5.2. Generic UDP Encapsulation (GUE) | 5.2. Generic UDP Encapsulation (GUE) | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
UDP Header: | UDP Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Source port | Dest port = 6080 GUE | | | Source Port | Dest Port = 6080 GUE | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| UDP Length | Checksum | | | UDP Length | UDP Checksum | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
GUE Header: | GUE Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| 0 |C| Hlen | Proto/ctype | Flags | | | 0 |C| Hlen | Proto/ctype | Flags | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
~ Extensions Fields (optional) ~ | ~ Extensions Fields (optional) ~ | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 2: GUE Header | Figure 2: GUE Header | |||
The type of payload being carried is indicated by an IANA Internet | The type of payload being carried is indicated by an IANA protocol | |||
protocol number in the Proto/ctype field. The C bit indicates a | number in the Proto/ctype field. The GUE C bit (Control bit) | |||
Control packet. | indicates a control packet. | |||
Issues with GUE [ietf_intarea_gue] are as follows: | Issues with GUE [GUE] are as follows: | |||
* There were a significant number of objections to GUE related to | * There were a significant number of objections to GUE related to | |||
the complexity of implementation in hardware, similar to those | the complexity of its implementation in hardware, similar to those | |||
noted for Geneve above, such as the variable length and possible | noted for Geneve above, such as the variable length and possible | |||
high maximum length of the header. | high maximum length of the header. | |||
5.3. Generic Protocol Extension (GPE) for VXLAN | 5.3. Generic Protocol Extension (GPE) for VXLAN | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
Outer UDP Header: | Outer UDP Header: | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Source Port | Dest Port = 4790 GPE | | | Source Port | Dest Port = 4790 GPE | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| UDP Length | UDP Checksum | | | UDP Length | UDP Checksum | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
VXLAN-GPE Header | VXLAN-GPE Header | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
|R|R|Ver|I|P|B|O| Reserved | Next Protocol | | |R|R|Ver|I|P|B|O| Reserved | Next Protocol | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| VXLAN Network Identifier (VNI) | Reserved | | | Virtual Network Identifier (VNI) | Reserved | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 3: GPE Header | Figure 3: GPE Header | |||
The type of payload being carried is indicated by the Next Protocol | The type of payload being carried is indicated by the Next Protocol | |||
field using a VXLAN-GPE-specific registry. The I bit indicates that | field using a registry specific to VXLAN-GPE. The I bit indicates | |||
the VNI is valid. The P bit indicates that the Next Protocol field | that the VNI is valid. The P bit indicates that the Next Protocol | |||
is valid. The B bit indicates the packet is an ingress replicated | field is valid. The B bit indicates that the packet is an ingress | |||
Broadcast, Unknown Unicast, or Multicast packet. The O bit indicates | replicated Broadcast, Unknown Unicast, or Multicast packet. The O | |||
an OAM packet. | bit indicates an OAM packet. | |||
Issues with VXLAN-GPE [nvo3_vxlan_gpe] are as follows: | Issues with VXLAN-GPE [VXLAN-GPE] are as follows: | |||
* GPE is not day-1 backwards compatible with VXLAN [RFC7348]. | * GPE is not day one backwards compatible with VXLAN [RFC7348]. | |||
Although the frame format is similar, it uses a different UDP | Although the frame format is similar, it uses a different UDP | |||
port, so would require changes to existing implementations even if | port, so it would require changes to existing implementations even | |||
the rest of the GPE frame were the same. | if the rest of the GPE frame were the same. | |||
* GPE is insufficiently extensible. It adds a Next Protocol field | * GPE is insufficiently extensible. It adds a Next Protocol field | |||
and some flag bits to the VXLAN header but is not otherwise | and some flag bits to the VXLAN header but is not otherwise | |||
extensible. | extensible. | |||
* Security, e.g., of the VNI, as discussed in Section 6.2.2, has not | * As discussed in Section 6.2.2, security (e.g., of the VNI) has not | |||
been addressed by GPE. Although a shim header could be added for | been addressed by GPE. Although a shim header could be added for | |||
security and to support other extensions, this has not been | security and to support other extensions, this has not been | |||
defined yet. More study would be needed to understand the | defined yet. More study would be needed to understand the | |||
implication of such a shim on offloading in NICs. | implication of such a shim on offloading in NICs. | |||
6. Common Encapsulation Considerations | 6. Common Encapsulation Considerations | |||
6.1. Current Encapsulations | 6.1. Current Encapsulations | |||
Appendix A includes a detailed comparison between the three proposed | Appendix A includes a detailed comparison between the three proposed | |||
encapsulations. The comparison indicates several common properties | encapsulations. The comparison indicates several common properties | |||
but also three major differences among the encapsulations: | but also three major differences among the encapsulations: | |||
* Extensibility: Geneve and GUE were defined with built-in | * Extensibility: Geneve and GUE were defined with built-in | |||
extensibility, while VXLAN-GPE is not inherently extensible. Note | extensibility, while VXLAN-GPE is not inherently extensible. Note | |||
that any of the three encapsulations can be extended using the | that any of the three encapsulations can be extended using the | |||
Network Service Header (NSH [RFC8300]). | Network Service Header (NSH) [RFC8300]. | |||
* Extension method: Geneve is extensible using Type/Length/Value | * Extension method: Geneve is extensible using Type-Length-Value | |||
(TLV) fields, while GUE uses a small set of possible extensions, | (TLV) fields, while GUE uses a small set of possible extensions | |||
and a set of flags that indicate which extensions are present. | and a set of flags that indicate which extensions are present. | |||
* Length field: Geneve and GUE include a Length field, indicating | * Length field: Geneve and GUE include a Length field, indicating | |||
the length of the encapsulation header, while VXLAN-GPE does not | the length of the encapsulation header, while VXLAN-GPE does not | |||
include such a field. Thus it may be harder to skip the | include such a field. Thus, it may be harder to skip the | |||
encapsulation header with VXLAN-GPE | encapsulation header with VXLAN-GPE | |||
6.2. Useful Extensions Use Cases | 6.2. Useful Extensions Use Cases | |||
Non-vendor specific extensions, such as TLVs, MUST follow the | Extensions that are not vendor-specific, such as TLVs, MUST follow | |||
standardization process. The following use cases for extensions show | the standardization process. The following use cases for extensions | |||
that there is a strong requirement to support variable length | show that there is a strong requirement to support variable-length | |||
extensions with possible different subtypes. | extensions with possible different subtypes. | |||
6.2.1. Telemetry Extensions | 6.2.1. Telemetry Extensions | |||
In several scenarios it is beneficial to make information about the | In several scenarios, it is beneficial to make information available | |||
path a packet took through the network or through a network device as | to the operator about the path a packet took through the network or | |||
well as associated telemetry information available to the operator. | through a network device as well as information about associated | |||
telemetry. | ||||
This includes not only tasks like debugging, troubleshooting, and | This includes not only tasks like debugging, troubleshooting, and | |||
network planning and optimization but also policy or service level | network planning and optimization but also policy or service level | |||
agreement compliance checks. | agreement compliance checks. | |||
Packet scheduling algorithms, especially for balancing traffic across | Packet scheduling algorithms, especially for balancing traffic across | |||
equal cost paths or links, often leverage information contained | equal-cost paths or links, often leverage information contained | |||
within the packet, such as protocol number, IP address, or MAC | within the packet, such as protocol number, IP address, or Message | |||
address. Probe packets would thus either need to be sent between the | Authentication Code (MAC) address. Thus, probe packets would need to | |||
exact same endpoints with the exact same parameters, or probe packets | be either sent between the exact same endpoints with the exact same | |||
would need to be artificially constructed as "fake" packets and | parameters or artificially constructed as "fake" packets and inserted | |||
inserted along the path. Both approaches are often not feasible from | along the path. Both approaches are often not feasible from an | |||
an operational perspective because access to the end-system is not | operational perspective because access to the end system is not | |||
feasible or the diversity of parameters and associated probe packets | feasible or the diversity of parameters and associated probe packets | |||
to be created is simply too large. An extension providing an in-band | to be created is simply too large. An extension providing an in-band | |||
telemetry mechanism [RFC9197] is an alternative in those cases. | telemetry mechanism [RFC9197] is an alternative in those cases. | |||
6.2.2. Security/Integrity Extensions | 6.2.2. Security/Integrity Extensions | |||
Since the currently proposed NVO3 encapsulations do not protect their | Since the currently proposed NVO3 encapsulations do not protect their | |||
headers, a single bit corruption in the VNI field could deliver a | headers, a single bit corruption in the VNI field could deliver a | |||
packet to the wrong tenant. Extension headers are needed to use any | packet to the wrong tenant. Extension headers are needed to use any | |||
sophisticated security. | sophisticated security. | |||
The possibility of VNI spoofing with an NVO3 protocol is exacerbated | The possibility of VNI spoofing with an NVO3 protocol is exacerbated | |||
by using UDP. Systems typically have no restrictions on applications | by using UDP. Systems typically have no restrictions on applications | |||
being able to send to any UDP port so an unprivileged application can | being able to send to any UDP port, so an unprivileged application | |||
trivially spoof VXLAN [RFC7348] packets for instance, including using | can trivially spoof VXLAN [RFC7348] packets, using arbitrary VNIs, | |||
arbitrary VNIs. | for instance. | |||
One can envision support of an HMAC-like Message Authentication Code | One can envision support of an HMAC-like Message Authentication Code | |||
(MAC) [RFC2104] in an NVO3 extension to authenticate the header and | (MAC) [RFC2104] in an NVO3 extension to authenticate the header and | |||
the outer IP addresses, thereby preventing attackers from injecting | the outer IP addresses, thereby preventing attackers from injecting | |||
packets with spoofed VNIs. | packets with spoofed VNIs. | |||
Another aspect of security is payload security. Essentially this | Another aspect of security is payload security. Essentially, this | |||
makes packets that look like the following: | makes packets that look like the following: | |||
IP|UDP|NVO3 Encap|DTLS/IPsec-ESP Extension|payload. | IP|UDP|NVO3 Encap|DTLS/IPsec-ESP Extension|payload. | |||
This is desirable since we still have the UDP header for ECMP, the | This is desirable because: | |||
NVO3 header is in plain text so it can be read by network elements, | ||||
and different security or other payload transforms can be supported | ||||
on a single UDP port (we don't need a separate UDP port for DTLS/ | ||||
IPsec [RFC9147]/[RFC6071]). | ||||
6.2.3. Group Based Policy | * we still have the UDP header for ECMP, | |||
Another use case would be to carry the Group Based Policy (GBP) | * the NVO3 header is in plain text so it can be read by network | |||
elements, and | ||||
* different security or other payload transforms can be supported on | ||||
a single UDP port (we don't need a separate UDP port for DTLS/ | ||||
IPsec; see [RFC9147] and [RFC6071], respectively). | ||||
6.2.3. Group-Based Policy | ||||
Another use case would be to carry the Group-Based Policy (GBP) | ||||
source group information within a NVO3 header extension in a similar | source group information within a NVO3 header extension in a similar | |||
manner as has been implemented for VXLAN [VXLANgroup]. This allows | manner as has been implemented for VXLAN [VXLAN-GROUP]. This allows | |||
various forms of policy such as access control and QoS to be applied | various forms of policy such as access control and QoS to be applied | |||
between abstract groups rather than coupled to specific endpoint | between abstract groups rather than coupled to specific endpoint | |||
addresses. | addresses. | |||
6.3. Hardware Considerations | 6.3. Hardware Considerations | |||
Hardware restrictions should be taken into consideration along with | Hardware restrictions should be taken into consideration along with | |||
future hardware enhancements that may provide more flexible metadata | future hardware enhancements that may provide more flexible metadata | |||
processing. However, the set of options that need to and will be | (MD) processing. However, the set of options that need to and will | |||
implemented in hardware will be a subset of what is implemented in | be implemented in hardware will be a subset of what is implemented in | |||
software, since software NVEs are likely to grow features, and hence | software. This is because software NVEs are likely to grow features, | |||
option support, at a more rapid rate. | and hence option support, at a more rapid rate. | |||
It is hard to predict which options will be implemented in which | It is hard to predict which options will be implemented in which | |||
piece of hardware and when. That depends on whether the hardware | piece of hardware and when. That depends on whether the hardware | |||
will be in the form of | will be in the form of: | |||
* a NIC providing increasing offload capabilities to software NVEs, | * a NIC providing increasing offload capabilities to software NVEs, | |||
or | ||||
* or a switch chip being used as an NVE gateway towards non-NVO3 | * a switch chip being used as an NVE gateway towards non-NVO3 parts | |||
parts of the network, | of the network, or even | |||
* or even a transit device that participates in the NVO3 dataplane, | * a transit device that participates in the NVO3 data plane, e.g., | |||
e.g., for OAM purposes. | for OAM purposes. | |||
A result of this is that it doesn't look useful to prescribe some | A result of this is that it doesn't look useful to prescribe some | |||
order of the options so that the ones that are likely to be | order to the options so that the ones that are likely to be | |||
implemented in hardware come first; we can't decide such an order | implemented in hardware come first. We can't decide such an order | |||
when we define the options, however a control plane can enforce such | when we define the options; however, a control plane can enforce such | |||
an order for some hardware implementation. | an order for some hardware implementations. | |||
We do know that hardware needs to initially be able to efficiently | We do know that hardware initially needs to be able to efficiently | |||
skip over the NVO3 header to find the inner payload. That is needed | skip over the NVO3 header to find the inner payload. That is needed | |||
both for NICs implementing various TCP offload mechanisms and for | both for NICs implementing various TCP offload mechanisms and for | |||
transit devices and NVEs applying policy or ACLs to the inner | transit devices and NVEs applying policy or ACLs to the inner | |||
payload. | payload. | |||
6.4. Extension Size | 6.4. Extension Size | |||
Extension header length has a significant impact on hardware and | Extension header length has a significant impact on hardware and | |||
software implementations. A maximum total header length that is too | software implementations. A maximum total header length that is too | |||
small will unnecessarily constrain software flexibility. A maximum | small will unnecessarily constrain software flexibility. A maximum | |||
total header length that is too large will place a nontrivial cost on | total header length that is too large will place a nontrivial cost on | |||
hardware implementations. Thus, the DT recommends that there be a | hardware implementations. Thus, the DT recommends that there be a | |||
minimum and maximum total available extension header length | minimum and maximum total available extension header length | |||
specified. The maximum total header length is determined by the size | specified. The maximum total header length is determined by the size | |||
of the bit field allocated for the total extension header length | of the bit field allocated for the total extension header length | |||
field. The risk with this approach is that it may be difficult to | field. The risk with this approach is that it may be difficult to | |||
extend the total header size in the future. The minimum total header | extend the total header size in the future. The minimum total header | |||
length is determined by a requirement in the specifications that all | length is determined by a requirement in the specifications that all | |||
implementations must meet. The risk with this approach is that all | implementations must meet. The risk with this approach is that all | |||
implementations will only implement support for the minimum total | implementations will only implement support for the minimum total | |||
header length which would then become the de facto maximum total | header length, which would then become the de facto maximum total | |||
header length. | header length. | |||
The recommended minimum total available header length is 64 bytes. | The recommended minimum total available header length is 64 bytes. | |||
The size of an extension header should always be 4 byte aligned. | The size of an extension header should always be 4-byte aligned. | |||
The maximum length of a single option should be large enough to meet | The maximum length of a single option should be large enough to meet | |||
the different extension use case requirements, e.g., in-band | the different extension use case requirements, e.g., for in-band | |||
telemetry and future use. | telemetry and future use. | |||
6.5. Ordering of Extension Headers | 6.5. Ordering of Extension Headers | |||
To support hardware nodes at the target NVE or at a transit device | To support hardware nodes at the target NVE or at a transit device | |||
that can process one or a few extension headers in TCAM, a control | that can process one or a few extension headers in TCAM, a control | |||
plane in such a deployment can signal a capability to ensure a | plane in such a deployment could signal a capability to ensure that a | |||
specific extension header will always appear in a specific order, for | specific extension header will always appear in a specific order, for | |||
example the first one in the packet. | example, that such a specific extension header appear first in the | |||
packet. | ||||
The order of the extension headers should be hardware friendly for | The order of the extension headers should be hardware friendly for | |||
both the sender and the receiver and possibly some transit devices | both the sender and the receiver and possibly some transit devices as | |||
also. This may requre that the extension headers and their order be | well. This may require that the extension headers and their order be | |||
dynamically determined based on the hardware of those devices. | determined dynamically based on the hardware of those devices. | |||
Transit devices don't participate in control plane communication | Transit devices don't participate in control plane communication | |||
between the end points and are not required to process the extension | between the endpoints and are not required to process the extension | |||
headers; however, if they do, they may need to process only a small | headers; however, if they do, they may need to process only a small | |||
subset of the extension headers that will be consumed by target NVEs. | subset of the extension headers that will be consumed by target NVEs. | |||
6.6. TLV versus Bit Fields | 6.6. TLV versus Bit Fields | |||
If there is a well-known initial set of options that are likely to be | If there is a well-known initial set of options that is likely to be | |||
implemented in software and in hardware, it can be efficient to use | implemented in software and in hardware, it can be efficient to use | |||
the bit fields approach to indicate the presence of extensions as in | the bit fields approach to indicate the presence of extensions as in | |||
GUE. However, as described in section 6.3, if options are added over | GUE. However, as described in Section 6.3, if options are added over | |||
time and different subsets of options are likely to be implemented in | time and different subsets of options are likely to be implemented in | |||
different pieces of hardware, then it would be hard for the IETF to | different pieces of hardware, then it would be hard for the IETF to | |||
specify which options should get the early bit fields. TLVs are a | specify which options should get the early bit fields. TLVs are a | |||
lot more flexible, which avoids the need to determine the relative | lot more flexible, which avoids the need to determine the relative | |||
importance of different options. However, general TLVs of arbitrary | importance of different options. However, general TLVs of arbitrary | |||
order, size, and repetition are difficult to implement in hardware. | order, size, and repetition are difficult to implement in hardware. | |||
A middle ground is to use TLVs with restrictions on their size and | A middle ground is to use TLVs with restrictions on their size and | |||
alignment, observing that individual TLVs can have a fixed length, | alignment, observing that individual TLVs can have a fixed length, | |||
and to support via the control plane a method such that an NVE will | and to support via the control plane a method such that an NVE will | |||
only receive options that it needs and implements. The control plane | only receive options that it needs and implements. The control plane | |||
skipping to change at page 13, line 15 ¶ | skipping to change at line 557 ¶ | |||
A benefit of TLVs from a hardware perspective is that they are self | A benefit of TLVs from a hardware perspective is that they are self | |||
describing, i.e., all the information is in the TLV. In a bit field | describing, i.e., all the information is in the TLV. In a bit field | |||
approach, the hardware needs to look up the bit to determine the | approach, the hardware needs to look up the bit to determine the | |||
length of the data associated with the bit through some separate | length of the data associated with the bit through some separate | |||
table, which would add hardware complexity. | table, which would add hardware complexity. | |||
There are use cases where multiple modules of software are running on | There are use cases where multiple modules of software are running on | |||
an NVE. These can be modules such as a diagnostic module by one | an NVE. These can be modules such as a diagnostic module by one | |||
vendor that does packet sampling and another module from a different | vendor that does packet sampling and another module from a different | |||
vendor that implements a firewall. Using a TLV format, it is easier | vendor that implements a firewall. Using a TLV format, it is easier | |||
to have different software modules process different TLVs, which | to have different software modules process different TLVs without | |||
could be standard extensions or vendor specific extensions defined by | conflicting with each other. Such TLVs could be standard extensions | |||
the different vendors, without conflicting with each other. This can | or vendor-specific extensions. This can help with hardware | |||
help with hardware modularity as well. There are some | modularity as well. There are some implementations with options that | |||
implementations with options that allows different software modules, | allow different software modules, like MAC learning and security, to | |||
like MAC learning and security, to process different options. | process different options. | |||
6.7. Control Plane Considerations | 6.7. Control Plane Considerations | |||
Given that we want to allow considerable flexibility and | Given that we want to allow considerable flexibility and | |||
extensibility, e.g., for software NVEs, yet be able to support | extensibility (e.g., for software NVEs), yet want to be able to | |||
important extensions in less flexible contexts such as hardware NVEs, | support important extensions in less flexible contexts such as | |||
it is useful to consider the control plane. By control plane in this | hardware NVEs, it is useful to consider the control plane. By | |||
section we mean both protocols, such as EVPN [RFC8365] and others, | control plane in this section we mean protocols, such as EVPN | |||
and deployment specific configuration. | [RFC8365] and others, and deployment-specific configurations. | |||
If each NVE can express in the control plane that it only supports | If each NVE can express in the control plane that it only supports | |||
certain extensions (which could be a single extension, or a few), and | certain extensions (which could be a single extension, or a few), and | |||
the source NVEs only include supported extensions in the NVO3 | the source NVEs only include supported extensions in the NVO3 | |||
packets, then the target NVE can both use a simpler parser (e.g., a | packets, then the target NVE can use a simpler parser (e.g., a TCAM | |||
TCAM might be usable to look for a single NVO3 extension) and the | might be usable to look for a single NVO3 extension) and the depth of | |||
depth of the inner payload in the NVO3 packet will be minimized. | the inner payload in the NVO3 packet will be minimized. Furthermore, | |||
Furthermore, if the target NVE cares about a few extensions and can | if the target NVE cares about a few extensions and can express in the | |||
express in the control plane the desired order of those extensions in | control plane the desired order of those extensions in the NVO3 | |||
the NVO3 packets, then the deployment can provide useful | packets, then the deployment can provide useful functionality with | |||
functionality with simplified hardware requirements for the target | simplified hardware requirements for the target NVE. | |||
NVE. | ||||
Transit devices that are not aware of the NVO3 extensions somewhat | Transit devices that are not aware of the NVO3 extensions somewhat | |||
benefit from such an approach, since the inner payload is less deep | benefit from such an approach, since the inner payload is less deep | |||
in the packet if no extraneous extension headers are included in the | in the packet if no extraneous extension headers are included in the | |||
packet. In general, a transit device is not likely to participate in | packet. In general, a transit device is not likely to participate in | |||
the NVO3 control plane. However, configuration mechanisms can take | the NVO3 control plane. However, configuration mechanisms can take | |||
into account limitations of the transit devices used in particular | into account limitations of the transit devices used in particular | |||
deployments. | deployments. | |||
Note that with this approach different NVEs could desire different | Note that with this approach, different NVEs could desire different | |||
extensions or sets of extensions, which means that the source NVE | extensions or sets of extensions, which means that the source NVE | |||
needs to be able to place different sets of extensions in different | needs to be able to place different sets of extensions in different | |||
NVO3 packets, and perhaps in different order. It also assumes that | NVO3 packets, and perhaps in a different order. It also assumes that | |||
underlay multicast or replication servers are not used together with | underlay multicast or replication servers are not used together with | |||
NVO3 extension headers. | NVO3 extension headers. | |||
There is a need to consider mandatory extensions versus optional | There is a need to consider mandatory extensions versus optional | |||
extensions. Mandatory extensions require the receiver to drop the | extensions. Mandatory extensions require the receiver to drop the | |||
packet if the extension is unknown. A control plane mechanism can | packet if the extension is unknown. A control plane mechanism can | |||
prevent the need for dropping unknown extensions, since they would | prevent the need for dropping unknown extensions, since they would | |||
not be included to target NVEs that do not support them. | not be included to target NVEs that do not support them. | |||
The control planes defined today need to add the ability to describe | The control planes defined today need to add the ability to describe | |||
the different encapsulations. Thus, perhaps EVPN [RFC8365] and any | the different encapsulations. Thus, perhaps EVPN [RFC8365] and any | |||
other control plane protocol that the IETF defines should have a way | other control plane protocol that the IETF defines should have a way | |||
to indicate the supported NVO3 extensions and their order, for each | to indicate the supported NVO3 extensions and their order for each of | |||
of the encapsulations supported. | the encapsulations supported. | |||
Developing a separate draft on guidance for option processing and | Developing a separate document on guidance for option processing and | |||
control plane participation should be considered. This should | control plane participation should be considered. This should | |||
provide examples/guidance on range of usage models and deployments | provide examples and guidance on the range of usage models and | |||
scenarios for specific options and ordering that are relevant for | deployment scenarios for specific options. It should also provide | |||
that specific deployment. This includes end points and middle boxes | examples of option ordering that are relevant for that specific | |||
using the options. Having the control plane negotiate the | deployment. This includes endpoints and middleboxes that are using | |||
constraints is the most appropriate and flexible way to address these | the options. Having the control plane negotiate the constraints is | |||
requirements. | the most appropriate and flexible way to address these requirements. | |||
6.8. Split NVE | 6.8. Split NVE | |||
If there is a need for hosts to send and receive options in a split | If there is a need for hosts to send and receive options in a split | |||
NVE case [RFC8394], this is possible using any of the existing | NVE case [RFC8394], this is possible using any of the existing | |||
extensible encapsulations (Geneve, GUE, GPE+NSH) by defining a way to | extensible encapsulations (GPE with NSH, GUE, or Geneve) by defining | |||
carry those over other transports. NSH can already be used over | a way to carry those over other transports. An NSH can already be | |||
different transports. | used over different transports. | |||
If this is needed with other encapsulations it can be done by | If this is needed with other encapsulations, it can be done by | |||
defining an Ethertype so that it can be carried over Ethernet and | defining an Ethertype so that it can be carried over Ethernet and | |||
[IEEE802.1Q]. | IEEE Std 802.1Q [IEEE802.1Q]. | |||
If there is a need to carry other encapsulations over MPLS, it would | If there is a need to carry other encapsulations over MPLS, it would | |||
require an EVPN control plane to signal that other encapsulation | require an EVPN control plane to signal that other encapsulation | |||
header + options will be present in front of the L2 packet. The VNI | headers and options will be present in front of the Layer 2 (L2) | |||
can be ignored in the header, and the MPLS label will be the one used | packet. The VNI can be ignored in the header, and the MPLS label | |||
to identify the EVPN L2 instance. | will be the one used to identify the EVPN L2 instance. | |||
6.9. Larger VNI Considerations | 6.9. Larger VNI Considerations | |||
Whether we should make the VNI 32-bits or larger was one of the | Whether we should make the VNI 32 bits or larger was one of the | |||
topics considered. The benefit of a 24-bit VNI would be to avoid | topics considered. The benefit of a 24-bit VNI would be to avoid | |||
unnecessary changes with existing proposals and implementations that | unnecessary changes with existing proposals and implementations that | |||
are almost all, if not all, using 24-bit VNI. If we need a larger | are almost all, if not all, using a 24-bit VNI. If we need a larger | |||
VNI, perhaps for a telemetry case, an extension can be used to | VNI, perhaps for a telemetry case, an extension can be used to | |||
support that. | support that. | |||
7. Recommendations | 7. Recommendations | |||
The Design Team (DT) reported that Geneve was most suitable as a | The Design Team reported that Geneve was most suitable as a starting | |||
starting point for a proposed standard for network virtualization, | point for a proposed standard for network virtualization, for the | |||
for the following reasons given below. This conclusion was supported | following reasons given below. This conclusion was supported by the | |||
by the NVO3 Working Group. | NVO3 Working Group. | |||
1. On whether VNI should be in the base header or in an extension | 1. On whether the VNI should be in the base header or in an | |||
header and whether it should be a 24-bit or 32-bit field (see | extension header and whether it should be a 24-bit or 32-bit | |||
Section 6.9), it was agreed that VNI is critical information for | field (see Section 6.9), it was agreed that the VNI is critical | |||
network virtualization and MUST be present in all packets. It | information for network virtualization and MUST be present in all | |||
was also agreed that a 24-bit VNI, which is supported by Geneve, | packets. It was also agreed that a 24-bit VNI, which is | |||
matches the existing widely used encapsulation formats, i.e., | supported by Geneve, matches the existing widely used | |||
VXLAN [RFC7348] and NVGRE [RFC7637], and hence is more suitable | encapsulation formats, i.e., VXLAN [RFC7348] and Network | |||
to use going forward. | Virtualization Using Generic Routing Encapsulation (NVGRE) | |||
[RFC7637], and hence is more suitable to use going forward. | ||||
2. The Geneve header has the total options length which allows | 2. The Geneve header has the total options length, which allows | |||
skipping over the options for NIC offload operations and will | skipping over the options for NIC offload operations and transit | |||
allow transit devices to view flow information in the inner | devices to view flow information in the inner payload. | |||
payload. | ||||
3. The option of using NSH [RFC8300] with VXLAN-GPE was considered | 3. The option of using an NSH [RFC8300] with VXLAN-GPE was | |||
but given that NSH is targeted at service chaining and contains | considered, but given that an NSH is targeted at service chaining | |||
service chaining information, it is less suitable for the network | and contains service chaining information, it is less suitable | |||
virtualization use case. The other downside for VXLAN-GPE was | for the network virtualization use case. The other downside of | |||
lack of a header length in VXLAN-GPE, which makes skipping over | VXLAN-GPE was the lack of a header length in VXLAN-GPE, which | |||
the headers to process inner payload more difficult. A Total | makes skipping over the headers to process inner payloads more | |||
Option Length is present in Geneve. It is not possible to skip | difficult. A total options length is present in Geneve. It is | |||
any options in the middle with VXLAN-GPE. In principle a split | not possible to skip any options in the middle with VXLAN-GPE. | |||
between a base header and a header with options is interesting | In principle, a split between a base header and a header with | |||
(whether that options header is NSH or some new header without | options is interesting (whether that options header is an NSH or | |||
ties to a service path). Whether it would make sense to either | some new header without ties to a service path). Whether it | |||
use NSH for this, or define a new NVO3 options header was | would make sense to either use an NSH for this or define a new | |||
explored. However, this makes it slightly harder to find the | NVO3 options header was explored. However, this makes it | |||
inner payload since the length field is not in the NVO3 header | slightly harder to find the inner payload since the Length field | |||
itself. Thus, one more field would have to be extracted to | is not in the NVO3 header itself. Thus, one more field would | |||
compute the start of the inner payload. Also, if the experience | have to be extracted to compute the start of the inner payload. | |||
with IPv6 extension headers is a guide, there would be a risk | Also, if the experience with IPv6 extension headers is a guide, | |||
that key pieces of hardware might not implement the options | there would be a risk that key pieces of hardware might not | |||
header, resulting in future calls to deprecate its use. Making | implement the options header, resulting in future calls to | |||
the options part of the base NVO3 header has less of those | deprecate its use. Making the options part of the base NVO3 | |||
issues. Even though the implementation of any particular option | header has less of those issues. Even though the implementation | |||
can not be predicted ahead of time, the option mechanism and | of any particular option can't be predicted ahead of time, the | |||
ability to skip the options is likely to be broadly implemented. | option mechanism and ability to skip the options is likely to be | |||
broadly implemented. | ||||
4. The TLV style and bit field style of extension were compared. It | 4. The TLV style and bit field style of extension mechanisms were | |||
was deemed that parsing either TLVs or bit fields is expensive | compared. It was deemed that parsing either TLVs or bit fields | |||
and, while bit fields may be simpler to parse, it is also more | is expensive, and while bit fields may be simpler to parse, they | |||
restrictive and requires guessing which extensions will be widely | are also more restrictive and require guessing which extensions | |||
implemented so they can get early bit assignments. Given that | will be widely implemented in order to get early bit assignments. | |||
half the bits are already assigned in GUE, a widely deployed | Given that half the bits are already assigned in GUE, a widely | |||
extension may appear in a flag extension, and this will require | deployed extension may appear in a flag extension, and this will | |||
extra processing, to dig the flag from the flag extension and | require extra processing to dig the flag from the flag extension | |||
then look for the extension itself. Also bit fields are not | and then look for the extension itself. Also, bit fields are not | |||
flexible enough to address the requirements from OAM, Telemetry, | flexible enough to address the requirements from OAM, telemetry, | |||
and security extensions, for variable length option and different | and security extensions for variable-length options and different | |||
subtypes of the same option. While TLVs are more flexible, a | subtypes of the same option. While TLVs are more flexible, a | |||
control plane can restrict the number of option TLVs as well as | control plane can restrict the number of option TLVs as well as | |||
the order and size of the TLVs to limit this flexibility and make | the order and size of the TLVs to limit this flexibility and make | |||
the TLVs simpler for a dataplane implementation to handle. | the TLVs simpler for a data plane implementation to handle. | |||
5. The multi-vendor NVE case was briefly discussed, as was the need | 5. The multi-vendor NVE case was briefly discussed, as was the need | |||
to allow vendors to put their own extensions in the NVE header. | to allow vendors to put their own extensions in the NVE header. | |||
This is possible with TLVs. | This is possible with TLVs. | |||
6. It was agreed that the C (Critical) bit in Geneve is helpful. | 6. It was agreed that the C bit (Critical bit) in Geneve is helpful. | |||
This bit indicates that the header includes options which must be | This bit indicates that the header includes options that must be | |||
parsed or the packet discarded. It allows a receiver NVE to | parsed, or else the packet must be discarded. The bit allows a | |||
easily decide whether to process options or not, for example a | receiver NVE to easily decide whether or not to process options | |||
UUID based packet trace, and how an optional extension such as | (such as a UUID-based packet trace) and decide how an optional | |||
that can be ignored by a receiver NVE and thus make it easy for | extension can be ignored. Thus, a Critical bit makes it easy for | |||
NVE to skip over the options. Thus, the C bit should remain as | the NVE to skip over the options not marked with such a bit. | |||
defined in Geneve. | Thus, the C bit should remain as defined in Geneve. | |||
7. There are already some extensions that are being discussed (see | 7. There are already some extensions of varying sizes that are being | |||
section 6.2) of varying sizes. By using Geneve options it is | discussed (see Section 6.2). By using Geneve options, it is | |||
possible to get in-band parameters like switch id, ingress port, | possible to get in-band parameters like switch id, ingress port, | |||
egress port, internal delay, and queue size using TLV extensions | egress port, internal delay, and queue size using TLV extensions | |||
for telemetry purpose from switches. It is also possible to add | for telemetry purposes from switches. It is also possible to add | |||
security extension TLVs like HMAC [RFC2104] and DTLS/IPsec | security extension TLVs like HMAC [RFC2104] and DTLS/IPsec (see | |||
[RFC9147]/[RFC6071] to authenticate the Geneve packet header and | [RFC9147] and [RFC6071], respectively) to authenticate the Geneve | |||
secure the Geneve packet payload by software or hardware tunnel | packet header and secure the Geneve packet payload by software or | |||
endpoints. A Group Based Policy extension TLV can be carried as | hardware tunnel endpoints. A Group-Based Policy extension TLV | |||
well. | can be carried as well. | |||
8. There are already implementations of Geneve options deployed in | 8. There are already implementations of Geneve options deployed in | |||
production networks. There is as well new hardware supporting | production networks. There is new hardware supporting Geneve TLV | |||
Geneve TLV parsing. In addition, an In-band Telemetry [INT] | parsing as well. In addition, an In-band Telemetry (INT) | |||
specification is being developed by P4.org that illustrates the | specification [INT] is being developed by P4.org that illustrates | |||
option of INT meta data carried over Geneve. OVN/OVS [OVN] have | the option of INT metadata carried over Geneve. Open Virtual | |||
also defined some option TLV(s) for Geneve. | Network (OVN) and Open vSwitch (OVS) [OVN] have also defined one | |||
or more option TLVs for Geneve. | ||||
9. Usage requirements (see Section 6) have been addressed while | 9. Usage requirements (see Section 6) have been addressed while also | |||
considering the requirements and implementations in general | considering requirements and implementations in general | |||
including software and hardware. | (including those for software and hardware). | |||
There seems to be interest in standardizing some well-known secure | There seems to be interest in standardizing some well-known secure | |||
option TLVs to secure the header and payload to guarantee | option TLVs to secure the header and payload to guarantee | |||
encapsulation header integrity and tenant data privacy. The working | encapsulation header integrity and tenant data privacy. The working | |||
group should consider standardizing such option(s). | group should consider standardizing such option(s). | |||
The following enhancements to Geneve are recommended to make it more | The following enhancements to Geneve are recommended to make it more | |||
suitable to hardware and yet provide flexibility for software: | suitable to hardware and yet provide flexibility for software: | |||
* The following sort of text is recommended: while TLVs are more | * The following sort of text is recommended in Geneve documents: | |||
flexible, a control plane can restrict the number of option TLVs | while TLVs are more flexible, a control plane can restrict the | |||
as well the order and size of the TLVs to make it simpler for a | number of option TLVs as well as the order and size of the TLVs to | |||
data plane implementation in software or hardware to handle. For | make it simpler for a data plane implementation in software or | |||
example, there may be some critical information such as a secure | hardware to handle. For example, there may be some critical | |||
hash that must be processed in a certain order at lowest latency. | information such as a secure hash that must be processed in a | |||
certain order at lowest latency. | ||||
* A control plane can negotiate a subset of option TLVs and certain | * A control plane can negotiate a subset of option TLVs and certain | |||
TLV ordering, as well as limiting the total number of option TLVs | TLV ordering, as well as limiting the total number of option TLVs | |||
present in the packet, for example, to allow for hardware capable | present in the packet, for example, to allow for hardware capable | |||
of processing fewer options. Hence, the control plane needs to | of processing fewer options. Hence, the control plane needs to | |||
have the ability to describe the supported TLVs subset and their | have the ability to describe the supported TLVs subset and their | |||
order. | order. | |||
* The Geneve documents should specify that the subset and order of | * The Geneve documents should specify that the subset and order of | |||
option TLVs SHOULD be configurable for each remote NVE in the | option TLVs SHOULD be configurable for each remote NVE in the | |||
absence of a protocol control plane. | absence of a protocol control plane. | |||
* Geneve should follow fragmentation recommendations in overlay | * Geneve should follow fragmentation recommendations in overlay | |||
services like PWE3 and the L2/L3 VPN recommendations to guarantee | services like PWE3 and the L2/L3 VPN recommendations to guarantee | |||
larger MTU for the tunnel overhead ([RFC3985] Section 5.3). | larger MTUs for the tunnel overhead ([RFC3985], Section 5.3). | |||
* Geneve should provide a recommendation for critical bit processing | * The Geneve documents should provide a recommendation for C bit | |||
- text could specify how critical bits can be used with control | (Critical bit) processing. This text could specify how critical | |||
plane specifying the critical options. | bits can be used with control planes and specify the critical | |||
options. | ||||
* Given that there is a telemetry option use case for a length of | * Given that there is a telemetry option use case for a length of | |||
256 bytes, it is recommended that Geneve increase the Single TLV | 256 bytes, it is recommended that Geneve increase the single TLV | |||
option length to 256. | option length to 256. | |||
* Geneve address requirements for OAM considerations for alternate | * Geneve address requirements for OAM considerations for alternate | |||
marking and for performance measurements that need a 2 bit field | marking and for performance measurements that need a 2-bit field | |||
in the header should be considered and the need for the current | in the header should be considered and the need for the current | |||
OAM bit in the Geneve Header clarified. | OAM bit in the Geneve header should be clarified. | |||
* The WG should work on security options for Geneve. | * The WG should work on security options for Geneve. | |||
8. Acknowledgements | 8. Security Considerations | |||
The authors would like to thank Tom Herbert for providing the | ||||
motivation for the Security/Integrity extension, and for his valuable | ||||
comments, T. Sridhar for his valuable comments and feedback, Anoop | ||||
Ghanwani for his extensive comments, and Ignas Bagdonas. | ||||
9. Security Considerations | ||||
This document does not introduce any additional security constraints; | This document does not introduce any additional security constraints; | |||
however, Section 6.2.2 discusess security/integrity extensions and | however, Section 6.2.2 discusses security/integrity extensions and | |||
this document suggests, in Section 7, that the the nvo3 WG work on | this document suggests, in Section 7, that the NVO3 WG work on | |||
security options for Geneve. | security options for Geneve. | |||
10. IANA Considerations | 9. IANA Considerations | |||
This document requires no IANA actions. | This document has no IANA actions. | |||
11. Normative References | 10. References | |||
10.1. Normative References | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
12. Informative References | 10.2. Informative References | |||
[ietf_gue_extensions] | [GUE] Herbert, T., Yong, L., and O. Zia, "Generic UDP | |||
Herbert, T., Yong, L., and F. Templin, "Extensions for | Encapsulation", Work in Progress, Internet-Draft, draft- | |||
Generic UDP Encapsulation", work in progress, 8 March | ietf-intarea-gue-09, 26 October 2019, | |||
2019, <https://datatracker.ietf.org/doc/draft-ietf- | <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | |||
intarea-gue-extensions/>. | gue-09>. | |||
[ietf_intarea_gue] | [GUE-ENCAPSULATION] | |||
Herbert, T., Yong, L., and O. Zia, "Generic UDP | Yong, L., Herbert, T., and O. Zia, "Generic UDP | |||
Encapsulation", work in progress, 26 October 2019. | Encapsulation (GUE) for Network Virtualization Overlay", | |||
Work in Progress, Internet-Draft, draft-hy-nvo3-gue-4-nvo- | ||||
04, 28 October 2016, | ||||
<https://datatracker.ietf.org/doc/html/draft-hy-nvo3-gue- | ||||
4-nvo-04>. | ||||
[GUE-EXTENSIONS] | ||||
Herbert, T., Yong, L., and F. Templin, "Extensions for | ||||
Generic UDP Encapsulation", Work in Progress, Internet- | ||||
Draft, draft-ietf-intarea-gue-extensions-06, 8 March 2019, | ||||
<https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | ||||
gue-extensions-06>. | ||||
[IEEE802.1Q] | [IEEE802.1Q] | |||
802.1 WG, IEEE., "Bridges and Bridged Networks", IEEE Std | IEEE, "IEEE Standard for Local and Metropolitan Area | |||
802.1Q-2014, 3 November 2014. | Networks--Bridges and Bridged Networks", IEEE Std 802.1Q- | |||
2022, DOI 10.1109/IEEESTD.2022.10004498, December 2022, | ||||
<https://doi.org/10.1109/IEEESTD.2022.10004498>. | ||||
[INT] P4.org, "In-band Network Telemetry (INT) Dataplane | [INT] P4.org Applications Working Group, "In-band Network | |||
Specification", November 2020, | Telemetry (INT) Dataplane Specification", November 2020, | |||
<https://p4.org/p4-spec/docs/INT_v2_1.pdf>. | <https://p4.org/p4-spec/docs/INT_v2_1.pdf>. | |||
[nvo3_vxlan_gpe] | [OVN] Linux Foundation, "Open vSwitch", | |||
Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | <https://www.openvswitch.org/>. | |||
Extension for VXLAN (VXLAN-GPE)", work in progress, 4 | ||||
November 2023, <https://datatracker.ietf.org/doc/draft- | ||||
ietf-nvo3-vxlan-gpe/>. | ||||
[OVN] Network, O. V., "", <https://www.openvswitch.org/>. | ||||
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- | [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- | |||
Hashing for Message Authentication", RFC 2104, | Hashing for Message Authentication", RFC 2104, | |||
DOI 10.17487/RFC2104, February 1997, | DOI 10.17487/RFC2104, February 1997, | |||
<https://www.rfc-editor.org/info/rfc2104>. | <https://www.rfc-editor.org/info/rfc2104>. | |||
[RFC2418] Bradner, S., "IETF Working Group Guidelines and | [RFC2418] Bradner, S., "IETF Working Group Guidelines and | |||
Procedures", BCP 25, RFC 2418, DOI 10.17487/RFC2418, | Procedures", BCP 25, RFC 2418, DOI 10.17487/RFC2418, | |||
September 1998, <https://www.rfc-editor.org/info/rfc2418>. | September 1998, <https://www.rfc-editor.org/info/rfc2418>. | |||
skipping to change at page 19, line 46 ¶ | skipping to change at line 877 ¶ | |||
Internet Key Exchange (IKE) Document Roadmap", RFC 6071, | Internet Key Exchange (IKE) Document Roadmap", RFC 6071, | |||
DOI 10.17487/RFC6071, February 2011, | DOI 10.17487/RFC6071, February 2011, | |||
<https://www.rfc-editor.org/info/rfc6071>. | <https://www.rfc-editor.org/info/rfc6071>. | |||
[RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, | [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, | |||
D., and S. Mansfield, "Guidelines for the Use of the "OAM" | D., and S. Mansfield, "Guidelines for the Use of the "OAM" | |||
Acronym in the IETF", BCP 161, RFC 6291, | Acronym in the IETF", BCP 161, RFC 6291, | |||
DOI 10.17487/RFC6291, June 2011, | DOI 10.17487/RFC6291, June 2011, | |||
<https://www.rfc-editor.org/info/rfc6291>. | <https://www.rfc-editor.org/info/rfc6291>. | |||
[RFC7042] Eastlake 3rd, D. and J. Abley, "IANA Considerations and | ||||
IETF Protocol and Documentation Usage for IEEE 802 | ||||
Parameters", RFC 7042, DOI 10.17487/RFC7042, October 2013, | ||||
<https://www.rfc-editor.org/info/rfc7042>. | ||||
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, | |||
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | L., Sridhar, T., Bursell, M., and C. Wright, "Virtual | |||
eXtensible Local Area Network (VXLAN): A Framework for | eXtensible Local Area Network (VXLAN): A Framework for | |||
Overlaying Virtualized Layer 2 Networks over Layer 3 | Overlaying Virtualized Layer 2 Networks over Layer 3 | |||
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, | |||
<https://www.rfc-editor.org/info/rfc7348>. | <https://www.rfc-editor.org/info/rfc7348>. | |||
[RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network | [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network | |||
Virtualization Using Generic Routing Encapsulation", | Virtualization Using Generic Routing Encapsulation", | |||
RFC 7637, DOI 10.17487/RFC7637, September 2015, | RFC 7637, DOI 10.17487/RFC7637, September 2015, | |||
skipping to change at page 21, line 5 ¶ | skipping to change at line 921 ¶ | |||
[RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The | [RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The | |||
Datagram Transport Layer Security (DTLS) Protocol Version | Datagram Transport Layer Security (DTLS) Protocol Version | |||
1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022, | 1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022, | |||
<https://www.rfc-editor.org/info/rfc9147>. | <https://www.rfc-editor.org/info/rfc9147>. | |||
[RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, | [RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, | |||
Ed., "Data Fields for In Situ Operations, Administration, | Ed., "Data Fields for In Situ Operations, Administration, | |||
and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, | and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, | |||
May 2022, <https://www.rfc-editor.org/info/rfc9197>. | May 2022, <https://www.rfc-editor.org/info/rfc9197>. | |||
[VXLANgroup] | [RFC9542] Eastlake 3rd, D., Abley, J., and Y. Li, "IANA | |||
Considerations and IETF Protocol and Documentation Usage | ||||
for IEEE 802 Parameters", BCP 141, RFC 9542, | ||||
DOI 10.17487/RFC9542, April 2024, | ||||
<https://www.rfc-editor.org/info/rfc9542>. | ||||
[VXLAN-GPE] | ||||
Maino, F., Ed., Kreeger, L., Ed., and U. Elzur, Ed., | ||||
"Generic Protocol Extension for VXLAN (VXLAN-GPE)", Work | ||||
in Progress, Internet-Draft, draft-ietf-nvo3-vxlan-gpe-13, | ||||
4 November 2023, <https://datatracker.ietf.org/doc/html/ | ||||
draft-ietf-nvo3-vxlan-gpe-13>. | ||||
[VXLAN-GROUP] | ||||
Smith, M. and L. Kreeger, "VXLAN Group Policy Option", | Smith, M. and L. Kreeger, "VXLAN Group Policy Option", | |||
work in progress, 22 October 2018, | Work in Progress, Internet-Draft, draft-smith-vxlan-group- | |||
policy-05, 22 October 2018, | ||||
<https://datatracker.ietf.org/doc/html/draft-smith-vxlan- | <https://datatracker.ietf.org/doc/html/draft-smith-vxlan- | |||
group-policy-05>. | group-policy-05>. | |||
Appendix A. Encapsulation Comparison | Appendix A. Encapsulation Comparison | |||
A.1. Overview | A.1. Overview | |||
This section presents a comparison of the three NVO3 encapsulation | This section presents a comparison of the three NVO3 encapsulation | |||
proposals, Geneve [RFC8926], GUE [ietf_intarea_gue], and VXLAN-GPE | proposals: Geneve [RFC8926], GUE [GUE], and VXLAN-GPE [VXLAN-GPE]. | |||
[nvo3_vxlan_gpe]. The three encapsulations use an outer UDP/IP | The three encapsulations use an outer UDP/IP transport. Geneve and | |||
transport. Geneve and VXLAN-GPE use an 8-octet header, while GUE | VXLAN-GPE use an 8-octet header, while GUE uses a 4-octet header. In | |||
uses a 4-octet header. In addition to the base header, optional | addition to the base header, optional extensions may be included in | |||
extensions may be included in the encapsulation, as discussed in | the encapsulation, as discussed in Appendix A.2 below. | |||
Section A.2 below. | ||||
A.2. Extensibility | A.2. Extensibility | |||
A.2.1. Native Extensibility Support | A.2.1. Innate Extensibility Support | |||
The Geneve and GUE encapsulations both enable optional headers to be | The Geneve and GUE encapsulations both enable optional headers to be | |||
incorporated at the end of the base encapsulation header. | incorporated at the end of the base encapsulation header. | |||
VXLAN-GPE does not provide native support for header extensions. | VXLAN-GPE does not provide innate support for header extensions. | |||
However, as discussed in [nvo3_vxlan_gpe], extensibility can be | However, as discussed in [VXLAN-GPE], extensibility can be attained | |||
attained to some extent if the Network Service Header (NSH) [RFC8300] | to some extent if the Network Service Header (NSH) [RFC8300] is used | |||
is used immediately following the VXLAN-GPE header. NSH supports | immediately following the VXLAN-GPE header. The NSH supports either | |||
either a fixed-size extension (MD Type 1), or a variable-size TLV- | a fixed-size extension (MD Type 1) or a variable-size TLV-based | |||
based extension (MD Type 2). Note that NSH-over-VXLAN-GPE implies an | extension (MD Type 2). Note that NSH-over-VXLAN-GPE implies an | |||
additional overhead of the 8-octets NSH header, in addition to the | additional overhead of the 8-octet NSH, in addition to the VXLAN-GPE | |||
VXLAN-GPE header. | header. | |||
A.2.2. Extension Parsing | A.2.2. Extension Parsing | |||
The Geneve Variable Length Options are defined as Type/Length/Value | The Geneve variable-length options are defined as Type-Length-Value | |||
(TLV) extensions. Similarly, VXLAN-GPE, when using NSH, can include | (TLV) extensions. Similarly, VXLAN-GPE, when using an NSH, can | |||
NSH TLV-based extensions. In contrast, GUE defines a small set of | include NSH TLV-based extensions. In contrast, GUE defines a small | |||
possible extension fields (proposed in [ietf_gue_extensions]), and a | set of possible extension fields (proposed in [GUE-EXTENSIONS] and | |||
set of flags in the GUE header that indicate for each extension type | [GUE-ENCAPSULATION]), and a set of flags in the GUE header that | |||
whether it is present or not. | indicate for each extension type whether it is present or not. | |||
TLV-based extensions, as defined in Geneve, provide the flexibility | TLV-based extensions, as defined in Geneve, provide the flexibility | |||
for a large number of possible extension types. Similar behavior can | for a large number of possible extension types. Similar behavior can | |||
be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- | be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- | |||
based approach taken in GUE strives to simplify implementations by | based approach taken in GUE strives to simplify implementations by | |||
defining a small number of possible extensions used in a fixed order. | defining a small number of possible extensions used in a fixed order. | |||
The Geneve and GUE headers both include a length field, defining the | The Geneve and GUE headers both include a Length field that defines | |||
total length of the encapsulation, including the optional extensions. | the total length of the encapsulation, including the optional | |||
This length field simplifies the parsing by transit devices that skip | extensions. This Length field simplifies the parsing by transit | |||
the encapsulation header without parsing its extensions. | devices that skip the encapsulation header without parsing its | |||
extensions. | ||||
A.2.3. Critical Extensions | A.2.3. Critical Extensions | |||
The Geneve encapsulation header includes the 'C' field, which | The Geneve encapsulation header includes the C field, which indicates | |||
indicates whether the current Geneve header includes critical | whether the current Geneve header includes critical options, that is | |||
options, that is to say, options which must be parsed by the target | to say, options which must be parsed by the target NVE. If the | |||
NVE. If the endpoint is not able to process a critical option, the | endpoint is not able to process a critical option, the packet is | |||
packet is discarded. | discarded. | |||
A.2.4. Maximal Header Length | A.2.4. Maximal Header Length | |||
The maximal header length in Geneve, including options, is 260 | The maximal header length in Geneve, including options, is 260 | |||
octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE | octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE | |||
uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is | uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is | |||
used, yielding an encapsulation header of up to 264 octets. | used, yielding an encapsulation header of up to 264 octets. | |||
A.3. Encapsulation Header | A.3. Encapsulation Header | |||
A.3.1. Virtual Network Identifier (VNI) | A.3.1. Virtual Network Identifier (VNI) | |||
The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. | The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. | |||
GUE, on the other hand, enables the use of a 32-bit field called | GUE, on the other hand, enables the use of a 32-bit field called | |||
VNID; this field is not included in the GUE header, but was defined | VNID; this field is not included in the GUE header but was defined as | |||
as an optional extension in [ietf_gue_extensions]. | an optional extension in [GUE-ENCAPSULATION]. | |||
The VXLAN-GPE header includes the 'I' bit, indicating that the VNI | The VXLAN-GPE header includes the I bit, indicating that the VNI | |||
field is valid in the current header. A similar indicator is defined | field is valid in the current header. A similar indicator is defined | |||
as a flag in the GUE header [ietf_gue_extensions]. | as a flag in the GUE header [GUE-EXTENSIONS]. | |||
A.3.2. Next Protocol | A.3.2. Next Protocol | |||
All three encapsulation headers include a field that specifies the | All three encapsulation headers include a field that specifies the | |||
type of the next protocol header, which resides after the NVO3 | type of the next protocol header, which resides after the NVO3 | |||
encapsulation header. The Geneve header includes a 16-bit field that | encapsulation header. The Geneve header includes a 16-bit field that | |||
uses the IEEE Ethertype convention. GUE uses an 8-bit field, which | uses the IEEE Ethertype convention. GUE uses an 8-bit field, which | |||
uses the IANA Internet protocol numbering. The VXLAN-GPE header | uses the IANA protocol numbering. The VXLAN-GPE header incorporates | |||
incorporates an 8-bit Next Protocol field, using a VXLAN-GPE-specific | an 8-bit Next Protocol field, using a registry specific to VXLAN-GPE, | |||
registry, defined in [nvo3_vxlan_gpe]. | defined in [VXLAN-GPE]. | |||
The VXLAN-GPE header also includes the 'P' bit, which explicitly | The VXLAN-GPE header also includes the P bit, which explicitly | |||
indicates whether the Next Protocol field is present in the current | indicates whether the Next Protocol field is present in the current | |||
header. | header. | |||
A.3.3. Other Header Fields | A.3.3. Other Header Fields | |||
The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates | The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates | |||
whether the current packet is an OAM packet. The GUE header includes | whether the current packet is an OAM packet. The GUE header includes | |||
a similar field, but uses different terminology; the GUE 'C-bit' | a similar field but uses different terminology; the GUE C bit | |||
specifies whether the current packet is a control packet. Note that | (Control bit) specifies whether the current packet is a control | |||
the GUE control bit can potentially be used in a large set of | packet. Note that the GUE C bit can potentially be used in a large | |||
protocols that are not OAM protocols. However, the control packet | set of protocols that are not OAM protocols. However, the control | |||
examples discussed in [ietf_intarea_gue] are OAM-related. | packet examples discussed in [GUE] are related to OAM. | |||
Each of the three NVO3 encapsulation headers includes a 2-bit Version | Each of the three NVO3 encapsulation headers includes a 2-bit Version | |||
field, which is currently defined to be zero. | field, which is currently defined to be zero. | |||
The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in | The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in | |||
the Geneve header, and 27 bits in the VXLAN-GPE header are reserved. | the Geneve header and 27 bits in the VXLAN-GPE header are reserved. | |||
A.4. Comparison Summary | A.4. Comparison Summary | |||
The following table summarizes the comparison between the three NVO3 | The following table summarizes the comparison between the three NVO3 | |||
encapsulations. In some cases a plus sign ("+") or minus sign ("-") | encapsulations. In some cases, a plus sign ("+") or minus sign ("-") | |||
is used to indicate that the header is stronger or weaker in an area | is used to indicate that the header is stronger or weaker in an area, | |||
respectively. | respectively. | |||
+----------------+----------------+----------------+----------------+ | +===============+=================+=============+===================+ | |||
| | Geneve | GUE | VXLAN-GPE | | | | Geneve | GUE | VXLAN-GPE | | |||
+----------------+----------------+----------------+----------------+ | +===============+=================+=============+===================+ | |||
| Outer transport| UDP/IP | UDP/IP | UDP/IP | | | Outer | UDP/IP 6081 | UDP/IP 6080 | UDP/IP 4790 | | |||
| UDP Port Number| 6081 | 6080 | 4790 | | | transport UDP | | | | | |||
+----------------+----------------+----------------+----------------+ | | Port Number | | | | | |||
| Base header | 8 octets | 4 octets | 8 octets | | +---------------+-----------------+-------------+-------------------+ | |||
| length | | | (16 octets | | | Base header | 8 octets | 4 octets | 8 octets (16 | | |||
| | | | using NSH) | | | length | | | octets using | | |||
+----------------+----------------+----------------+----------------+ | | | | | an NSH) | | |||
| Extensibility |Variable length |Extension fields| No native ext- | | +---------------+-----------------+-------------+-------------------+ | |||
| | options | | ensibility. | | | Extensibility | Variable-length | Extension | No innate | | |||
| | | | Might use NSH. | | | | options | fields | extensibility. | | |||
+----------------+----------------+----------------+----------------+ | | | | | Might use an | | |||
| Extension | TLV-based | Flag-based | TLV-based | | | | | | NSH. | | |||
| parsing method | | |(using NSH with | | +---------------+-----------------+-------------+-------------------+ | |||
| | | | MD Type 2) | | | Extension | TLV-based | Flag-based | TLV-based | | |||
+----------------+----------------+----------------+----------------+ | | parsing | | | (using an NSH | | |||
| Extension | Variable | Fixed | Variable | | | method | | | with MD Type | | |||
| order | | | (using NSH) | | | | | | 2) | | |||
+----------------+----------------+----------------+----------------+ | +---------------+-----------------+-------------+-------------------+ | |||
| Length field | + | + | - | | | Extension | Variable | Fixed | Variable | | |||
+----------------+----------------+----------------+----------------+ | | order | | | (using an NSH) | | |||
| Max Header | 260 octets | 128 octets | 8 octets | | +---------------+-----------------+-------------+-------------------+ | |||
| Length | | |(264 using NSH) | | | Length field | + | + | - | | |||
+----------------+----------------+----------------+----------------+ | +---------------+-----------------+-------------+-------------------+ | |||
| Critical exte- | + | - | - | | | Max header | 260 octets | 128 octets | 8 octets (264 | | |||
| nsion bit | | | | | | length | | | using an NSH) | | |||
+----------------+----------------+----------------+----------------+ | +---------------+-----------------+-------------+-------------------+ | |||
| VNI field size | 24 bits | 32 bits | 24 bits | | | Critical | + | - | - | | |||
| | | (extension) | | | | extension bit | | | | | |||
+----------------+----------------+----------------+----------------+ | +---------------+-----------------+-------------+-------------------+ | |||
| Next protocol | 16 bits | 8 bits | 8 bits | | | VNI field | 24 bits | 32 bits | 24 bits | | |||
| field | Ethertype | Internet prot- | New registry | | | size | | (extension) | | | |||
| | registry | ocol registry | | | +---------------+-----------------+-------------+-------------------+ | |||
+----------------+----------------+----------------+----------------+ | | Next Protocol | 16 bits | 8 bits | 8 bits New | | |||
| Next protocol | - | - | + | | | field | Ethertype | Internet | registry | | |||
| indicator | | | | | | | registry | protocol | | | |||
+----------------+----------------+----------------+----------------+ | | | | registry | | | |||
| OAM / control | OAM bit | Control bit | OAM bit | | +---------------+-----------------+-------------+-------------------+ | |||
| field | | | | | | Next protocol | - | - | + | | |||
+----------------+----------------+----------------+----------------+ | | indicator | | | | | |||
| Version field | 2 bits | 2 bits | 2 bits | | +---------------+-----------------+-------------+-------------------+ | |||
+----------------+----------------+----------------+----------------+ | | OAM / Control | OAM bit | Control bit | OAM bit | | |||
| Reserved bits | 14 bits | none | 27 bits | | | field | | | | | |||
+----------------+----------------+----------------+----------------+ | +---------------+-----------------+-------------+-------------------+ | |||
| Version field | 2 bits | 2 bits | 2 bits | | ||||
+---------------+-----------------+-------------+-------------------+ | ||||
| Reserved bits | 14 bits | none | 27 bits | | ||||
+---------------+-----------------+-------------+-------------------+ | ||||
Figure 4: NVO3 Encapsulations Comparison | Table 1: Encapsulations Comparison | |||
Acknowledgements | ||||
The authors would like to thank Tom Herbert for providing the | ||||
motivation for the security/integrity extension and for his valuable | ||||
comments; T. Sridhar for his valuable comments and feedback; Anoop | ||||
Ghanwani for his extensive comments; and Ignas Bagdonas. | ||||
Contributors | Contributors | |||
The following co-authors have contributed to this document:. | The following coauthors have contributed to this document: | |||
Ilango Ganga | Ilango Ganga | |||
Intel | Intel | |||
Email: ilango.s.ganga@intel.com | Email: ilango.s.ganga@intel.com | |||
Pankaj Garg | Pankaj Garg | |||
Microsoft | Microsoft | |||
Email: pankajg@microsoft.com | Email: pankajg@microsoft.com | |||
Rajeev Manur | Rajeev Manur | |||
skipping to change at page 26, line 7 ¶ | skipping to change at line 1157 ¶ | |||
Email: aldrin.ietf@gmail.com | Email: aldrin.ietf@gmail.com | |||
Authors' Addresses | Authors' Addresses | |||
Sami Boutros (editor) | Sami Boutros (editor) | |||
Ciena Corporation | Ciena Corporation | |||
United States of America | United States of America | |||
Email: sboutros@ciena.com | Email: sboutros@ciena.com | |||
Donald E. Eastlake 3rd (editor) | Donald E. Eastlake 3rd (editor) | |||
Futurewei Technologies | Independent | |||
2386 Panoramic Circle | 2386 Panoramic Circle | |||
Apopka, Florida 32703 | Apopka, FL 32703 | |||
United States of America | United States of America | |||
Phone: +1-508-333-2270 | Phone: +1-508-333-2270 | |||
Email: d3e3e3@gmail.com | Email: d3e3e3@gmail.com | |||
End of changes. 155 change blocks. | ||||
477 lines changed or deleted | 517 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |