INTERNET-DRAFT A. Ghanwani Intended Status: Informational Dell Expires: May 29, 2013 November 30, 2012 Multicast Issues in Networks Using NVO3 draft-ghanwani-nvo3-mcast-issues-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Ghanwani Expires May 29, 2013 [Page 1] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 Abstract This memo discusses issues with supporting multicast traffic in a network that uses Network Virtualization using Overlays over Layer 3. It lists the various mechanisms that may be used for multicast and describes some of the considerations. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Multicast mechanisms in networks that use NVO3 . . . . . . . . 3 2.1 No multicast support . . . . . . . . . . . . . . . . . . . . 3 2.2 Replication at the source NVE . . . . . . . . . . . . . . . 4 2.3 Replication at a multicast service node . . . . . . . . . . 4 2.4 IP multicast in the underlay . . . . . . . . . . . . . . . . 5 2.5 Simultaneous use of more than one mechanism . . . . . . . . 6 3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 6 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1 Normative References . . . . . . . . . . . . . . . . . . . 6 6.2 Informative References . . . . . . . . . . . . . . . . . . 7 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 Ghanwani Expires May 29, 2013 [Page 2] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 1 Introduction Network virtualization using Overlays over Layer 3 (NVO3) is a technology that is used to address issues that arise in building large, multitenant data centers that make extensive use of server virtualization [PS]. This document is focused specifically on the problem of supporting multicast in networks that use NVO3. Because of the requirement of multi-destination delivery, multicast traffic poses some unique challenges. The reader is assumed to be familiar with the terminology as defined in the NVO3 Framework document [FW]. 2. Multicast mechanisms in networks that use NVO3 In NVO3 environments, traffic between NVEs is transported using a tunnel encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT [STT], etc. Besides the need to support the Address Resolution Protocol (ARP) and Neighbor Discovery (ND), there are several applications that require the support of multicast and/or broadcast in data centers [DC-MC]. With NVO3, there are four possible ways that multicast may be handled in such networks. 1. No multicast support. 2. Replication at the source NVE. 3. Replication at a multicast service node. 4. IP Multicast in the underlay. These mechanisms are briefly mentioned in the NVO3 Framework [FW] document. This document attempts to fill in some more details about the basic mechanisms underlying each of these mechanisms and discusses the issues and tradeoffs of each. 2.1 No multicast support In this scenario, there is no support whatsoever for multicast traffic when using the overlay. This can only work if the following conditions are met: 1. All of the traffic is unicast. 2. An oracle is used at the NVE to determine the MAC address-to-NVE mapping and to determine the MAC address-to-IP address bindings. In other words, there is no data plane learning, and address resolution Ghanwani Expires May 29, 2013 [Page 3] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 requests via ARP/ND that are issued by the VMs must be resolved by the NVE that they are attached to. With this approach, certain multicast/broadcast applications such as DHCP can be supported by use of a helper function in the NVE. The main issues that need to be addressed with this mechanism are the handling of hosts for which a mapping does not already exist in the oracle. This issue can be particularly challenging if such end systems are reachable through more than one NVE. 2.2 Replication at the source NVE With this method, the overlay attempts to provide a multicast service without requiring any specific support from the underlay, other than that of a unicast service. A multicast or broadcast transmission is achieved by replicating the packet at the source NVE, and making copies, one for each destination NVE that the multicast packet must be sent to. For this mechanism to work, the source NVE must know, a priori, the IP addresses of all destination NVEs that need to receive the packet. For example, in the case of an ARP broadcast or an ND multicast, the source NVE must know the IP addresses of all the remote NVEs where there are members of the tenant subnet in question. The obvious drawback with this method is that we have multiple copies of the same packet that will traverse any common links that are along the path to each of the destination NVEs. If, for example, a tenant subnet is spread across 50 NVEs, the packet would have to be replicated 50 times at the source NVE. This also creates an issue with the forwarding performance of the NVE, especially if it is implemented in software. Note that this method is similar to what was used in VPLS [VPLS] prior to extensive support of MPLS multicast [MPLS-MC]. 2.3 Replication at a multicast service node With this method, all multicast packets would be sent using a unicast tunnel encapsulation to a multicast service node. The multicast service node, in turn, would create multiple copies of the packet and would deliver a copy, using a unicast tunnel encapsulation, to each of the NVEs that are part of the multicast group for which the packet is intended. This mechanism is similar to that used by the ATM Forum's LAN Emulation [LANE] specification [LANE]. Ghanwani Expires May 29, 2013 [Page 4] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 Unlike the method described in Section 2.2, there is no performance impact at the ingress NVE, nor are there any issues with multiple copies of the same packet from the source NVE to the multicast service node. However there remain issues with multiple copies of the same packet on links that are common to the paths from the multicast service node to each of the egress NVEs. Additional issues that are introduced with this method include the availability of the multicast service node, methods to scale the services offered by the multicast service node, and the sub-optimality of the delivery paths. Finally, the IP address of the source NVE must be preserved in packet copies created at the multicast service node if data plane learning is in use. This could create problems if IP source address reverse path forwarding (RPF) checks are in use. 2.4 IP multicast in the underlay In this method, the underlay supports IP multicast and the ingress NVE encapsulates the packet with the appropriate IP multicast address in the tunnel encapsulation header for delivery to the desired set of NVEs. The protocol in the underlay could be any variant of Protocol Independent Multicast (PIM). With this method, there are none of the issues with the methods described in Sections 2.2. With PIM Sparse Mode (PIM-SM), the number of flows required would be (n*g), where n is the number of source NVEs that source packets for the group, and g is the number of groups. Bidirectional PIM (BIDIR- PIM) would offer better scalability with the number of flows required being g. In the absence of any additional mechanism, e.g. using an oracle for address resolution, for optimal delivery, there would have to be a separate group for each tenant, plus a separate group for each multicast address (used for multicast applications) within a tenant. Additional considerations are that only the lower 23 bits of the IP address (regardless of whether IPv4 or IPv6 is in use) are mapped to the outer MAC address, and if there is equipment that prunes multicasts at Layer 2, there will be some aliasing. Finally, a mechanism to efficiently provision such addresses for each group would be required. There are additional optimizations which are possible, but they come with their own restrictions. For example, a set of tenants may be restricted to some subset of NVEs and they could all share the same outer IP multicast group address. This however introduces a problem of sub-optimal delivery (even if a particular tenant within the group Ghanwani Expires May 29, 2013 [Page 5] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 of tenants doesn't have a presence on one of the NVEs which another one does, the former's multicast packets would still be delivered to that NVE). It also introduces an additional network management burden to optimize which tenants should be part of the same tenant group (based on the NVEs they share), which somewhat dilutes the value proposition of NVO3 which is to completely decouple the overlay and physical network design allowing complete freedom of placement of VMs anywhere within the data center. 2.5 Simultaneous use of more than one mechanism While the mechanisms discussed in the previous section have been discussed individually, it is possible for implementations to rely on more than one of these. For example, the method of Section 2.1 could be used for minimizing ARP/ND, while at the same time, multicast applications may be supported by one, or a combination of, the other methods. For small multicast groups, the methods of source NVE replication or the use of a multicast service node may be attractive, while for larger multicast groups, the use of multicast in the underlay may be preferable. 3 Summary This document has identified various mechanisms for supporting multicast in networks that use NVO3. It highlights the basics of each mechanism and some of the issues with them. As solutions are developed, the protocols would need to consider the use of these mechanisms and co-existence may be a consideration. 4 Security Considerations This is an informational document, and as such, does not introduce any new security considerations beyond what may be present in proposed solutions. 5 IANA Considerations This draft does not have any IANA considerations. 6 References 6.1 Normative References [PS] Lasserre, M. et al., "Framework for DC network virtualization", work in progress. [FW] Narten, T. et al., "Problem statement: Overlays for network virtualization", work in progress. Ghanwani Expires May 29, 2013 [Page 6] INTERNET DRAFT Multicast Issues in NVO3 November 30, 2012 6.2 Informative References [VXLAN] Mahalingam, M. et al., "VXLAN: A framework for overlaying virtualized Layer 2 networks over Layer 3 networks", work in progress. [NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization using Generic Routing Encapsulation", work in progress. [STT] Davie, B. and Gross J., "A stateless transport tunneling protocol for network virtualization", work in progress. [DC-MC] McBride M., and Lui, H., "Multicast in the data center overview", work in progress. [VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private LAN Service (VPLS) using Label Distribution Protocol (LDP) signaling", RFC 4762, January 2007. [MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS", work in progress. [LANE] "LAN emulation over ATM", The ATM Forum, af-lane-0021.000, January 1995. Authors' Addresses Anoop Ghanwani Dell 350 Holger Way San Jose, CA 95134 Phone: +1-408-571-3228 Email: anoop@alumni.duke.edu Ghanwani Expires May 29, 2013 [Page 7]