NV03 working group L. Dunbar Internet Draft D. Eastlake Category: Informational Huawei Expires: April 4 2014 September 20, 2013 NV03 NVA Gap Analysis draft-dunbar-nvo3-nva-gap-analysis-01 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Dunbar, et al Expires November 2014 [Page 1] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Abstract The intent of the draft is to identify the gaps of existing solutions against NVO3's NVE <-> NVA control plane requirement. Through the gap analysis, the document provides a basis for future works that develop solutions for NVE<->NVA control plane. Table of Contents 1. Introduction ................................................ 3 2. Terminology ................................................. 3 3. Overall Requirement for NVE<->NVA Control Plane ............. 4 4. Existing Directory Components ............................... 5 4.1. Types of NVA: .......................................... 5 4.2. Key components of the information kept in the NVA ...... 6 4.3. Mapping Entries Distribution Mechanism ................. 6 4.3.1. Push Mode ......................................... 6 4.3.2. Pull Mode ......................................... 8 4.3.3. Hybrid Mode....................................... 11 5. Redundancy ................................................. 12 6. Inconsistency Processing.................................... 12 7. Gap Summary ................................................ 12 7.1. Features necessary to NVO3 but not present in TRILL ... 12 7.2. Additional detailed requirement applicable to NVO3's NVA 13 8. Security Considerations..................................... 14 9. IANA Considerations ........................................ 14 10. Acknowledgements .......................................... 14 11. References ................................................ 14 11.1. Normative References.................................. 14 11.2. Informative References................................ 15 Authors' Addresses ............................................ 15 Dunbar, et al [Page 2] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 1. Introduction The intent of the draft is to identify the gaps of existing solutions against NVO3's requirement for Network Virtualization Authority (NVA). Through the gap analysis, the document provides a basis for future works to develop solutions for (NVA). The existing solutions analyzed in draft include the LISP mapping database system and TRILL's directory mechanism. Section 4.5 of [nvo3-problem-statement] describes the back-end Network Virtualization Authority (NVA) that is responsible for distributing the mapping information for entire overlay system. [nvo3-nve-nva-cp-req] defines the requirement for the control plane between NVA and NVE. There are many similarities between LISP, TRILL [RFC6325] and NVO3, e.g. LISP using IP header to achieve overlay, TRILL using TRILL header to achieve overlay, and NV03 using L3 headers plus VNID to achieve overlay. This draft analyzes the TRILL directory mechanisms along with some LISP mapping database system that are applicable to NVO3's NVA<->NVE and summarize the gaps. 2. Terminology The following terms are used interchangeably in this document: - The terms "Subnet" and "VLAN" because it is common to map one subnet to one VLAN. - The term ''Directory'' and ''Network Virtualization Authority (NVA)'' - The term ''NVE'' and ''Edge'' Bridge: IEEE Std 802.1Q-2011 compliant device [802.1Q]. In this draft, Bridge is used interchangeably with Layer 2 switch. DA: Destination Address DC: Data Center EoR: End of Row switches in data center. Also known as aggregation switches. End Station: Guest OS running on a physical server or on a virtual machine. An end station in this document has at Dunbar, et al [Page 3] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 least one IP address and at least one MAC address, which could be in DA or SA field of a data frame. LISP: Locator/ID Separation Protocol RBridge: ''Routing Bridge'', an alternative name for a TRILL switch. NVA: Network Virtualization Authority NVE: Network Virtualization Edge SA: Source Address Station: A node, or a virtual node, with IP and/or MAC addresses, which could be in the DA or SA of a data frame. ToR: Top of Rack Switch in data center. It is also known as access switches in some data centers. TRILL: Transparent Interconnection of Lots of Links [RFC6325] TRILL switch: A device implementing the TRILL protocol [RFC6325] TS: Tenant System VM: Virtual Machines VN: Virtual Network VNID: Virtual Network Instance Identifier 3. Overall Requirement for NVE<->NVA Control Plane Section 3.1 of [nvo3-cp-req] describes the basic requirement of inner address to outer address mapping for NVO3. A NVE needs to know the mapping of the Tenant System destination (inner) address to the (outer) address (IP) on the Underlying Network of the egress NVE, in the same way as a TRILL Edge node needing to know how the inner MAC/VLAN is mapped to the egress TRILL edge. Section 3.1 of [nvo3-cp-req] states that a protocol is needed to provide this inner to outer mapping and VN Context to each NVE that requires it and keep the mapping updated in a timely manner. Dunbar, et al [Page 4] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 Timely updates are important for maintaining connectivity between Tenant Systems. TRILL's directory mechanism and LISP mapping database system are to achieve the same goal as NVO3's NVE-NVA control plane, i.e. distributing the mapping table that edge nodes use to tunnel traffic across the underlying network. Therefore it is worthwhile to examine the TRILL's directory mechanism and LISP mapping database system, and analyze the gaps. 4. Existing Directory Components For the ease of description, we match the terminologies used by TRILL/LISP to NVO3. The document will use the NVO3's terminologies as much as possible throughout the document to describe TRILL's directory assistance mechanism. NVO3 LISP TRILL ---- -------- -------------------------- NVE Edge Edge, TRILL Edge or RBridge Edge NVA MapServer Directory 4.1. Types of NVA: NVAs can be centralized or distributed with each NVA holding the mapping information for a subset of VNs. Centralized NVA could have multiple entities for redundancy purpose. A NVA could be instantiated on a server/VM attached to a NVE, very much like a TS attached to a NVE, or could be integrated with a NVE. When a NVA is a standalone server/VM attached to a NVE, it has to be reachable via the attached NVE by other NVEs. A NVA can also be instantiated on a NVE that doesn't have any TSs attached. The NVE-NVA control plane for NVA being attached to NVE will require additional functions on NVEs than NVA being instantiated on NVE. Dunbar, et al [Page 5] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 4.2. Key components of the information kept in the NVA The information held by the TRILL directories is inner-outer address mapping information as well as hosts' VLAN IDs. Same is true for NVO3's NVA. For each TS (or VM), TRILL directory has the following attributes: 1. Inner Address: TS (host) Address family (IPv4/IPv6, MAC, virtual network Identifier MPLS/VLAN, etc) 2. Outer Address: The list of locally attached edges (NVEs); normally one TS is attached to one edge, TS could also be attached to 2 edges for redundancy (dual homing). One TS is rarely attached to more than 2 edges, though it could be possible; 3. Timer for NVEs to keep the entry when pushed down to or pulled from NVEs. 4. Optionally the list of interested remote edges (NVEs). This information is for NVA to promptly update relevant edges (NVEs) when there is any change to this TS' attachment to edges (NVEs). However, this information doesn't have to be kept per TS. It can be kept per VN. NVO3's NVA will need one additional attribute: VN Context (VN ID and/or VN Name). 4.3. Mapping Entries Distribution Mechanism A directory can offer services in a Push, Pull mode, or the combination of the two. 4.3.1. Push Mode Under this mode, Directory Server(s) push the inner-outer mapping for all the entries of the VNs that are enabled on an edge node (NVE). If the destination of a data frame arriving at the Ingress Edge (NVE) can't be found in its inner-outer mapping database that are pushed down from the Directory Server(s) (or NVA), the Ingress edge could be configured with one or more of the following policies: - simply drop the data frame, Dunbar, et al [Page 6] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 - Using legacy method(s) to forward the data frames to other edges, or - start the ''pull'' process to get information from Pull Directory Server(s) (or NVA) When the edge is waiting for reply from Pull process, the edge can either drop or queue the packet. Again, the VN Context (VNID or VN name) needs to be added for NVO3. One drawback of the Push Mode is that it usually will push more mapping entries to an edge (NVE) than needed. Under the normal process of edge cache aging and unknown destination address flooding, rarely used entries would have been removed. It would be difficult for Directory Servers (NVA) to predict the communication patterns among TSs within one VN. Therefore, it is likely that the Directory Servers will push down all the entries for all the VNs that are enabled on the NVE. And with Push there really can't be any source-based policy. It's all or nothing. 4.3.1.1. Requesting Push Service In the Push Mode, it is necessary to have a way for an edge node (NVE) to request directory server(s) (NVA) to start the pushing process, e.g. when the NVE is initialized or re- started. Or it can be like a routing protocol where it just happens automatically. Push Directory servers (NVAs) advertise their availability to push mapping information for a particular virtual network to all edges who participate in the VN. There could be multiple directories (NVAs), with each having mapping information for a subset of VNs. TRILL edge uses modified Virtual Network scoped instances of the IS-IS reliable link state flooding protocol, a.k.a. the ESADI protocol mechanism, to announce all the Virtual Networks in which it is participating to directories (NVAs) who have the mapping information for the VNs. An edge subscribes to push directory information. Dunbar, et al [Page 7] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 The subscription is VN scoped, so that a directory server doesn't need to push down the entire set of mapping entries. Each Push Directory server also has a priority. For robustness, the one or two directories with the highest priority are considered as Active in pushing information for the VN to all edges who have subscribed for that VN. 4.3.1.2. Incremental Push Service Whenever there is any change in TS' association to an edge (NVE), which can be triggered by TS being added, removed, or de-commissioned, an incremental update can be sent to the edges that are impacted by the change. Therefore, sequence numbers have to be maintained by directory servers (NVA) and edges (NVEs). If the Push Directory server is configured to believe it has complete mapping information for VN X then, after it has actually transmitted all of its ESADI-LSPs for X it waits its CSNP time (see Section 6.1 of [ESADI]), and then updates its ESADI-Parameters APPsub-TLV to set the Complete Push (CP) bit to one. It then maintains the CP bit as one as long as it is Active. 4.3.2. Pull Mode Under this mode, an NVE pulls the mapping entry from the directory servers (or NVA) when its cache doesn't have the entry. The main advantage of Pull Mode is that state is stored only where it needs to be stored and only when it is required. In addition, in the Pull Mode, edge nodes (NVEs) can age out mapping entries if they haven't been used for a certain period of time. Therefore, each edge (NVE) will only keep the entries that are frequently used, so its mapping table size will be smaller than a complete table pushed down from NVA. The drawback of Pull Mode is that it might take some time for NVEs to pull the needed mapping from NVA. Before NVE gets the response from NVA, the NVE has to buffer the subsequent data frames with destination address to the same target. The buffer could overflow before the NVE gets the response from NVA. However, this scenario should not happen very often in data center environment because most likely the TSs are end systems Dunbar, et al [Page 8] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 which have to wait for (TCP) acknowledgement before sending subsequent data frames. Another option is forward, not flood, subsequent frames to a default location, i.e. forward to a re- encapsulating NVE. The practice of an edge waiting and dropping packets upon receiving an unknown DA is not new. Most deployed routers today drop packets while waiting for target addresses to be resolved. It is too expensive to queue subsequent packets while resolving target address. The routers send ARP/ND requests to the target upon receiving a packet with DA not in its ARP/ND cache and wait for an ARP or ND responses. This practice minimizes flooding when targets don't exist in the subnet. When the target doesn't exist in the subnet, routers generally re-send an ARP/ND request a few more times before dropping the packets. The holding time by routers to wait for an ARP/ND response when the target doesn't exist in the subnet can be longer than the time taken by the Pull Mode to get mapping from NVA. 4.3.2.1. Pull Requests Here are some events that can trigger the pulling process: o An edge node (NVE) receives an ingress data frame with a destination whose attached edge (NVE) is unknown, or o The edge node (NVE) receives an ingress ARP/ND request for a target whose link address (MAC) or attached edge (NVE) is unknown. Each Pull request can have queries for multiple inner-outer mapping entries. 4.3.2.2. Pull Response There are several possibilities of the Pull Response: 1. Valid inner-outer address mapping, coupled with the valid timer indicating how long the entry can be cached by the edge (NVE). The timer for cache should be short in an environment where VMs move frequently. The cache timer can also be configured. Dunbar, et al [Page 9] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 2. The target being queried is not available. The response should include the policy if requester should forward data frame in legacy way, or drop the data frame. 3. The requestor is administratively prohibited from getting an informative response. If no response is received to a Pull Directory request within a configurable timeout, the request should be re-transmitted with the same Sequence Number up to a configurable number of times that defaults to three. 4.3.2.3. Cache Consistency It is important that the cached information be kept consistent with the actual placement of VMs. Therefore, it is highly desirable to have a mechanism to prevent NVEs from using the staled mapping entries. When data at a Pull Directory changes, such as entry being deleted or new entry added, and there may be unexpired stale information at a querying edge (NVE), the Pull Directory MUST send an unsolicited message to the edge (NVE). To achieve this goal, a Pull Directory server MUST maintain one of the following, in order of increasing specificity. 1. An overall record per VN of when the last returned query data will expire at a requestor and when the last query record specific negative response will expire. 2. For each unit of data (IA APPsub-TLV Address Set) held by the server and each address about which a negative response was sent, when the last expected response with that unit or negative response will expire at a requester. Note: It is much more important to cache negative reply, because there are many invalid address queries. Study has shown that for each valid ND query, there are 100's of invalid address queries. 3. For each unit of data held by the server and each address about which a negative response was sent, a list of Edges that were sent that unit as the response or sent a negative Dunbar, et al [Page 10] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 response to the address, with the expected time to expiration at each of them. 4.3.2.4. Pull Request Errors If errors occur at the query level, they MUST be reported in a response message separate from the results of any successful queries. If multiple queries in a request have different errors, they MUST be reported in separate response messages. If multiple queries in a request have the same error, this error response MAY be reported in one response message. 4.3.2.5. Redundant Pull Directories (NVAs) There could be multiple directories (NVA) holding mapping information for a particular VN for reliability or scalability purposes. Pulling Directories (NVAs) advertise themselves by having the Pull Directory flag on in their Interested VNs sub- TLV [rfc6326bis]. A pull request can be sent to any of them that is reachable but it is RECOMMENDED that pull requests be sent to a server (NVA) that is least cost from the requesting edge (NVE). 4.3.3. Hybrid Mode For some edge nodes that have great number of VNs enabled and combined number of hosts under all those VNs are large, managing the inner-outer address mapping for hosts under all those VNs can be a challenge. This is especially true for Data Center gateway nodes, which need to communicate with a majority of VNs if not all. For those Edge nodes, a hybrid mode should be considered. That is the Push Mode being used for some VNs, and the Pull Mode being used for other VNs. It is the network operator's decision by configuration as to which VNs' mapping entries are pushed down from directories (NVA) and which VNs' mapping entries are pulled. In addition, directory can inform the Edge to use legacy way to forward if it doesn't have the mapping information, or the Dunbar, et al [Page 11] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 Edge is administratively prohibited from forwarding data frame to the requested target. 5. Redundancy For redundancy purpose, there should be more than one directory (NVAs) that hold mapping information for each VN. At any given time, only one or a small number of push directories is considered as active for a particular VN. All NVAs should announce its capability and priority to all the edges. 6. Inconsistency Processing If an edge (NVE) notices that a Push Directory server (NVA) is no longer reachable [RFCclear], it MUST ignore any Push Directory data from that server because it is no longer being updated and may be stale. There may be transient conflicts between mapping information from different Push Directory servers (NVAs) or conflicts between locally learned information and information received from a Push Directory server. TRILL associates a confidence level with address table information so, in case of such conflicts, information with a higher confidence value is preferred over information with a lower confidence. In case of equal confidence, Push Directory information is preferred to locally learned information and if information from Push Directory servers conflicts, the information from the higher priority Push Directory server is preferred. 7. Gap Summary 7.1. Features necessary to NVO3 but not present in TRILL NVO3's NVA will need one additional attribute: VN context (VN ID and/or VN Name). For data center networks that don't have IS-IS protocol enabled, other mechanism have to be considered. Dunbar, et al [Page 12] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 7.2. Additional detailed requirement applicable to NVO3's NVA Here are some of the TRILL's directory detailed requirements that should be considered by NVO3 NVA as well: - Push Mode: o For redundancy purposes, for each VN there should be multiple NVA entities holding the mapping information for the TSs in the VN. At any given time, only one or a small number of the NVAs are considered as Active for a particular VN. All NVAs should announce their capability and priority to all the edges. o If the destination of a data frame arriving at the Ingress Edge (NVE) can't be found in its inner-outer mapping table that are pushed down from the Directory Server(s) (NVA), the Ingress edge could be configured to: simply drop the data frame, flood it to all other edges that are in the same VN, or start the ''pull'' process to get information from Pull Directory Server(s) (or NVA) o If an NVE lost its connection to its NVA, it MUST ignore any Push Directory data from that server because it is no longer being updated and may be stale. o When transient conflict occurs: higher priority data take precedence. - Pull Mode: o The Pull Directory response could indicate that the address being queried is not available in NVA or that the requestor is administratively prohibited from getting an informative response. o The timer for ingress NVE caching should be short in an environment where VMs move frequently. The cache timer could be configured or could be sent along with the Pulled reply from the NVA. o Each Pull request can have multiple queries for different TSs. o It is highly desirable to have a mechanism to prevent NVEs from using the stale mapping entries pulled from NVA. Dunbar, et al [Page 13] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 o While waiting for query response from NVA, the NVE has to buffer the subsequent data frames with destination address to the same target. The buffer could overflow before the NVE gets the response from NVA. o If no response is received to a Pull Directory request within a configurable timeout, the request should be re- transmitted with the same Sequence Number up to a configurable number of times. - Hybrid Mode: o NVE can be configured to get some VN's mapping entries via push mode and other VN's mapping entries via pull mode. 8. Security Considerations Accurate mapping of inner address into outer addresses is important to the correct delivery of information. The security of specific directory assisted mechanisms will be discussed in the document or documents specifying those mechanisms. For general TRILL security considerations, see [RFC6325]. 9. IANA Considerations This document requires no IANA actions. RFC Editor: please delete this section before publication. 10. Acknowledgements Special thanks to Dino Farinacci for valuable suggestions and comments to this draft. 11. References 11.1. Normative References As an Informational document, this draft has no Normative References. [nvo3-nve-nva-cp-req] draft-ietf-nvo3-nve-nva-cp-req-00, "Network Virtualization NVE to NVA Control Protocol Requirements", Kreeger, et al. July 31, 3013. Dunbar, et al [Page 14] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 11.2. Informative References [802.1Q] IEEE Std 802.1Q-2011, "IEEE Standard for Local and metropolitan area networks - Virtual Bridged Local Area Networks", May 2011. [802.1Qbg] IEEE Std 802.1Qbg-2012, ''Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks -- Edge Virtual Bridging'', July 2012. [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol", RFC 826, November 1982. [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, September 2007. [RFC6325] Perlman, et, al ''RBridge: Base Protocol Specification'', https://datatracker.ietf.org/doc/rfc6325/, July, 2011 [RFC6439] Perlman, et, al ''RBridges: Appointed Forwarders'', https://datatracker.ietf.org/doc/rfc6439/, Nov 2011 Authors' Addresses Linda Dunbar Huawei Technologies 5430 Legacy Drive, Suite #175 Plano, TX 75024, USA Phone: (469) 277 5840 Email: linda.dunbar@huawei.com Dunbar, et al [Page 15] Internet-Draft draft-dunbar-nv03-NVA-gap-analysis June 2013 Donald Eastlake Huawei Technologies 155 Beaver Street Milford, MA 01757 USA Phone: 1-508-333-2270 Email: d3e3e3@gmail.com Dunbar, et al [Page 16]