Network Working Group Z. Chen Internet-Draft China Telecom Intended status: Standards Track D. Lopez Expires: April 22, 2013 Telefonica I+D T. Tsou Huawei Technologies (USA) C. Zhou Huawei Technologies October 19, 2012 IPv6 Operational Guidelines for Datacenters draft-lopez-v6ops-dc-ipv6-03 Abstract This document is intended to provide operational guidelines for datacenter operators planning to deploy IPv6 in their infrastructures. It aims to offer a reference framework for evaluating different products and architectures, and therefore it is also addressed to manufacturers and solution providers, so they can use it to gauge their solutions. We believe this will translate in a smoother and faster transition of these infrastuctures into IPv6 The document focuses on the DC infrastructure itself, its operation, and the aspects related to DC interconnection through IPv6. It does not consider the particular mechanisms for making Internet services provided by applications hosted in the DC available through IPv6 beyond the specific aspects related to how their deployment on the DC infrastructure. Apart from facilitating the transition to IPv6, the mechanisms outlined here are intended to make this transition as transparent as possible (if not completely transparent) to applications and services running on the DC infrastructure, as well as to take advantage of IPv6 features to simplify DC operations, internally and across the Internet. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Chen, et al. Expires April 22, 2013 [Page 1] Internet-Draft IPv6 Op Guidelines for DCs October 2012 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 22, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Chen, et al. Expires April 22, 2013 [Page 2] Internet-Draft IPv6 Op Guidelines for DCs October 2012 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Transition Stages . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Experimental Stage. Native IPv4 Infrastructure . . . . . . 7 2.1.1. Off-shore v6 Access . . . . . . . . . . . . . . . . . 8 2.2. Dual Stack Stage. Internal Adaptation . . . . . . . . . . 8 2.2.1. Dual-stack at the Aggregation Layer . . . . . . . . . 9 2.2.2. Dual-stack Extended OS/Hypervisor . . . . . . . . . . 11 2.3. Next Generation Stage. Pervasive IPv6 Infrastrcuture . . . 11 3. Security Considerations . . . . . . . . . . . . . . . . . . . 12 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 6. Informative References . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 Chen, et al. Expires April 22, 2013 [Page 3] Internet-Draft IPv6 Op Guidelines for DCs October 2012 1. Introduction The need for considering the aspects related to IPv4-to-IPv6 transition for all devices and services connected to the Internet has been widely mentioned elsewhere, and it is not our intention to make an additional call on it. Just let us note that many of those services are already or will soon be located in datacenters (DC), what makes considering the issues associated to DC infrastructure transition a key aspect both for these infrastructures themselves, and for providing a simpler and clear path to service transition. All issues discussed here are related to DC infrastructure transition, and are intended to be orthogonal to whatever particular mechanisms for making the services hosted in the DC available through IPv6 beyond the specific aspects related to their deployment on the infrastructure. Those general mechanisms to service transition have been discussed in depth elsewhere (see, for example [I-D.ietf-v6ops-icp-guidance] and [I-D.chkpvc-enterprise-incremental-ipv6]) and are considered to be orthogonal to the goal of this discussion. Though it is obvious that their applicability in many cases would depend on the characteristics of the supporting DC infrastructure, the transition procedures are intended to keep services as independent as possible of these processes. Furthermore, the combination of the regularity and controlled management in a DC interconnection fabric with IPv6 universal end-to- end addressing should translate in simpler and faster VM migrations, either intra- or inter-DC, and even inter-provider. The diagram in Figure 1 depicts a generalized interconnection schema in a DC. Chen, et al. Expires April 22, 2013 [Page 4] Internet-Draft IPv6 Op Guidelines for DCs October 2012 | | +-----+-----+ +-----+-----+ | Gateway | | Gateway | Internet Access +-----+-----+ +-----+-----+ | | +---+-----------+ | | +---+---+ +---+---+ | Core0 | | CoreN | Core +---+---+ +---+---+ / \ / / / \-----\ / / /---/ \ / +--------+ +--------+ +/-------+ | +/-------+ | | Aggr01 | +-----| AggrN1 | + Aggregation +---+---+/ +--------+/ / \ / \ / \ / \ +-----+ +-----+ +-----+ +-----+ | T11 |... | T1x | | T21 |... | T2y | Access +-----+ +-----+ +-----+ +-----+ | HyV | | HyV | | HyV | | HyV | Physical Servers +:::::+ +:::::+ +:::::+ +:::::+ | VMs | | VMs | | VMs | | VMs | Virtual Machines +-----+ +-----+ +-----+ +-----+ . . . . . . . . . . . . . . . . +-----+ +-----+ +-----+ +-----+ | HyV | | HyV | | HyV | | HyV | +:::::+ +:::::+ +:::::+ +:::::+ | VMs | | VMs | | VMs | | VMs | +-----+ +-----+ +-----+ +-----+ Figure 1: DC Interconnnection Schema o Hypervisors provide connection services (among others) to virtual machines running on physical servers. o Access elements provide connectivity directly to/from physical servers. The access elements are typically placed either top-of- rack (ToR) or end-of-row(EoR). o Aggregation elements group several (many) physical racks to achieve local integration and provide as much structure as possible to data paths. o Core elements connect all aggregation elements acting as the DC backbone. Chen, et al. Expires April 22, 2013 [Page 5] Internet-Draft IPv6 Op Guidelines for DCs October 2012 o One or several gateways connecting the DC to the Internet and/or other DCs through dedicated links. In many actual deployments, depending on DC size and design decisions, some of these elements may be combined (core and gateways are provider by the same routers, or hypervisors act as access elements) or virtualized to some extent, but this layered schema is the one that best accommodates the different options to use L2 or L3 at any of the different DC interconnection layers, and will help us in the discussion along the document. 2. Transition Stages The framework is structured along transition stages, associated with the degree of penetration of IPv6 into the DC communication fabric. It is worth noting we are using these stages as a classification mechanism, and they have not to be associated with any a succession of steps from a v4-only infrastructure to full-fledged v6, but to provide a framework that operators, users, and even manufacturers could use to assess their plans and products. There is no (explicit or implicit) requirement on starting at the stage describe in first place, nor to follow them in successive order. According to their needs and the available solutions, DC operators can choose to start or remain at a certain stage, and freely move from one to another as they see fit, without contravening this document. In this respect, the classification intends to support the planning in aspects such as the adaptation of the different transition stages to the evolution of traffic patterns, or risk assessment in what relates to deploying new components and incorporating change control, integration and testing in highly- complex multi-vendor infrastructures. Three main transition stages can be considered when analyzing IPv6 deployment in the DC infrastructure, all compatible with the availability of services running in the DC through IPv6: o Experimental. The DC keeps a native IPv4 infrastructure, with gateway routers (or even application gateways when services require so) performing the adaptation to requests arriving from the IPv6 Internet. o Dual stack. Native IPv6 and IPv4 are present in the infrastructure, up to whatever the layer in the interconnection scheme where L3 is applied to packet forwarding. Chen, et al. Expires April 22, 2013 [Page 6] Internet-Draft IPv6 Op Guidelines for DCs October 2012 o Next generation. The DC has a fully pervasive IPv6 infrastructure, including full IPv6 hypervisors, which perform the appropriate tunneling or NAT if required by internal applications running IPv4. 2.1. Experimental Stage. Native IPv4 Infrastructure This transition stage corresponds to the first step that many datacenters may take (or have taken) in order to make their external services initially accessible from the IPv6 Internet and/or to evaluate the possiblities around it, and corresponds to IPv6 traffic patterns totally originated out of the DC or their tenants, being a small percentage of the total external requests. At this stage, DC network scheme and addressing do not require any important change, if any. It is important to remark that in no case this can be considered a permanent stage in the transition, or even a long-term solution for incorporating IPv6 into the DC infrastructure. This stage is only recommended for exprimentation or early evalution purposes. The translation of IPv6 requests into the internal infrastructure format occurs at the outmost level of the DC Internet connection. This can be typically achieved at the DC gateway routers, that support the appropriate address translation mechanisms for those services required to be accessed through native IPv6 requests. The policies for applying adaptation can range from performing it only to a limited set of specified services to providing a general translation service for all public services. Finer mechanisms, based on address ranges or more sophisticated dynamic policies are also possible, as they are applied by a limited set of control elements. This provides an additional level of control to the usage of IPv6 routable addresses in the DC environment, which can be especially significant in the experimentation or early deployment phases this stage is applicable to. Even at this stage, some implicit advantages of IPv6 application come into play, even if they can only be applied at the ingress elements: o Flow labels can be applied to enhance load-balancing, as described in [I-D.carpenter-flow-label-balancing]. Incoming IPv6 requests can take advantage of them, and the gateway systems use them as a hint for applying load-balancing mechanisms at the IPv4 internal accesses. o During VM migration (intra- or even inter-DC), Mobile IP mechanisms can be applied to keep service availability during the transient state. Chen, et al. Expires April 22, 2013 [Page 7] Internet-Draft IPv6 Op Guidelines for DCs October 2012 2.1.1. Off-shore v6 Access This model is also suitable to be applied in an "off-shore" mode by the service provider connecting the DC infrastructure to the Internet, as described in [I-D.sunq-v6ops-contents-transition]. When this off-shore mode is applied, the original source address will be hidden to the DC infrastructure, and therefore identification techniques based on it, such as geolocation or reputation evaluation, will be hampered. Unless there is a specific trust link between the DC operator and the ISP, and the DC operator is able to access equivalent identification interfaces provided by the ISP as an additional service, the off-shore experimental stage cannot be considered applicable when source address identification is required. 2.2. Dual Stack Stage. Internal Adaptation This stage requires dual-stack elements in some internal parts of the DC infrastructure. This brings some degree of partition in the infrastructure, either in a horizontal (when data paths or management interfaces are migrated or left in IPv4 while the rest migrate) or a vertical (per tenant or service group), or even both. Although it may seem an artificial case, situations requiring this stage can arise from differen requirements from the user base, or the need for technology changes at different points of the infrastructure, or even the goal of having the possibility of experimenting new solutions in a controled real-operations environment, at the price of the additional complexity of dealing with a double protocol stack, as noted in [I-D.ietf-v6ops-icp-guidance] and elsewhere. This transition stage can accommodate different traffic patterns, both internal and external, though it better fits to scenarios of a clear differentiation of different types of traffic (external vs internal, data vs management...), and/or a more or less even distribution of external requests. A common scenario would include native dual stack servers for certain services combined with single stack ones for others (web server in dual stack and database servers only supporting v4, for example). At this stage, the advantages outlined above on load balancing based on flow labels and Mobile IP mechanisms are applicable to any L3- based mechanism (intra- as well as inter-DC). They will translate into enhanced VM mobility, more effective load balancing, and higher service availability. Furthermore, the simpler integration provided by IPv6 to and from the L2 flat space to the structured L3 one can be applied to achieve simpler deployments, as well as alleviating Chen, et al. Expires April 22, 2013 [Page 8] Internet-Draft IPv6 Op Guidelines for DCs October 2012 encapsulation and fragmentation issues when traversing between L2 and L3 spaces. With an appropriate prefix management, automatic address assignment, discovery, and renumbering can be applied not only to public service interfaces, but most notably to data and management paths. Other potential advantages include the application of multicast scopes to limit broadcast floods, and the usage of specific security headers to enhance tenant differentiation. On the other hand, this stage requires a much more careful planning of addressing schemas and access control, according to security levels. While the experimental stage implies relatively few global routable addresses, this one brings the advantages and risks of using different kinds of addresses at each point of the IPv6-aware infrastructure. 2.2.1. Dual-stack at the Aggregation Layer +---------------------+ | Internet | +---------+-----------+ | +-----+----+ | Gateway | +-----+----+ . . Core Level . +--+--+ | FW | +--+--+ | Aggregation Level +--+--+ | LB | +--+--+ _ / \_ / \ +--+--+ +--+--+ | Web | ... | Web | +--+--+ +--+--+ | \ __ _ _/ | | / \ | +--+--+ +--+--+ |Cache| | DB | +-----+ +-----+ Chen, et al. Expires April 22, 2013 [Page 9] Internet-Draft IPv6 Op Guidelines for DCs October 2012 Figure 2: Data Center Application Scheme An initial approach corresponding to this transition stage relies on taking advantage of specific elements at the aggregation layer described in Figure 1, and make them able to provide dual-stack gatewaying to the IPv4-based servers and data infrastructure. Typically, firewalls (FW) are deployed as the security edge of the whole service domain and provides safe access control of this service domain from other function domains. In addition, some application optimization based on devices and security devices (e.g.,Load Balancers, SSL VPN, IPS and etc.) may be deployed in the aggregation level to alleviate the burden of the server and to guarantee deep security, as shown in Figure 2. The load balancer (LB) or some other boxes could be upgraded to support the data transmission. There may be two ways to achieve this at the edge of the DC: Encapsulation and NAT. In the encapsulation case, the LB function carries the IPv6 traffic over IPv4 using an encapsulation (IPv6-in-IPv4). In the NAT case, there are already some technologies to solve this problem. For example, DNS and NAT device could be concatenated for IPv4/IPv6 translation if IPv6 host needs to visit IPv4 servers. However, this may require the concatenation of multiple network devices, which means the NAT tables needs to be synchronized at different devices. As described below, a simplified IPv4/IPv6 translation model can be applied, which could be implemented in the LB device. The mapping information of IPv4 and IPv6 will be generated automatically based on the information of the LB. The host IP address will be translated without port translation. +----------+------------------------------+ |Dual Stack| IPv4-only +----------+ | | | +----|Web Server| | | +------|------+ / +----------+ | +--------+ +-------+ | | | | / | |Internet|--|Gateway|---|---+Load-Balancer+-- \ | | | | | | | | | \ +----------+ | +--------+ +-------+ | +------|------+ +----|Web Server| | | | +----------+ | +----------+------------------------------+ Figure 3: Dual Stack LB mechanism As shown in Figure 3,the LB can be considered divided into two parts: The dual-stack part facing the external border, and the IPv4-only part which contains the traditional LB functions. The IPv4 DC is allocated an IPv6 prefix which is for the VSIPv6 (Virtual Service Chen, et al. Expires April 22, 2013 [Page 10] Internet-Draft IPv6 Op Guidelines for DCs October 2012 IPv6 Address). We suggest that the IPv6 prefix is not the well-known prefix in order to avoid the IPv4 routings of the services in different DCs spread to the IPv6 network. The VSIPv4 (Virtual Service IPv4 Address) is embedded in VSIPv6 using the allocated IPv6 prefix. In this way, the LB has the stateless IP address mapping between VSIPv6 and VSIPv4, and synchronization is not required between LB and DNS64 server. The dual-stack part of the LB has a private IPv4 address pool. When IPv6 packets arrive, the dual-stack part does the one-on-one SIP (source IP address) mapping (as defined in [I-D.sunq-v6ops-contents-transition]) between IPv4 private address and IPv6 SIP. Because there will be too many UDP/TCP sessions between the DC and Internet, the IP addresses binding tables between IPv6 and IPv4 are not session-based, but SIP-based. Thus, the dual- stack part of LB builds IP binding stateful tables for the host IPv6 address and private IPv4 address of the pool. When the following IPv6 packets of the host come from Internet to the LB, the dual stack part does the IP address translation for the packets. Thus, the IPv6 packets were translated to IPv4 packets and sent to the IPv4 only part of the LB. 2.2.2. Dual-stack Extended OS/Hypervisor Another option for deploying a infrastructure at the dual-stack stage would bring dual-stack much closer to the application servers, by requiring hypervisors, VMs and applications in the v6-capable zone of the DC to be able to operate in dual stack. This way, incoming connections would be dealt in a seamless manner, while for outgoing ones an OS-specific replacement for system calls like gethostbyname() and getaddrinfo() would accept a character string (an IPv4 literal, an IPv6 literal, or a domain name) and would return a connected socket or an error message, having executed a happy eyeballs algorithm ([RFC6555]). If these hypothetical system call replacements were smart enough, they would allow the transparent interoperation of DCs with different levels of v6 penetration, either horizontal (internal data paths are not migrated, for example) or vertical (per tenant or service group). This approach requires, on the other hand, all the involved DC infrastructure to become dual-stack, as well as some degree of explicit application adaptation. 2.3. Next Generation Stage. Pervasive IPv6 Infrastrcuture We can consider a DC infrastructure at the next generation stage when all network layer elements, including hypervisors, are IPv6-aware and apply it by default. Conversely with the experimental stage, access Chen, et al. Expires April 22, 2013 [Page 11] Internet-Draft IPv6 Op Guidelines for DCs October 2012 from the IPv4 Internet is achieved, when required, by protocol translation performed at the edge infrastructure elements, or even supplied by the service provider as an additional network service. This level can be of interest for new deployments willing to apply a fresh start aligned with future IPv6 widespread usage, when a relevant amount of requests are expected to be using IPv6, or to take advantage of any of the potential benefits that an IPv6 support infrastructure can provide. The potential advantages mentioned for the previous levels (load balancing based on flow labels, mobility mechanisms for transient states in VM or data migration, controled multicast, and better mapping of L2 flat space on L3 constructs) can be applied at any layer, even especially tailored for individual services. Obviously, the need for a careful planning of address space is even stronger here, though the centralized protocol translation services should reduce the risk of translation errors causing disruptions or security breaches. [V6DCS] proposes an approach to a next generation DC deployment, already demonstrated in practice, and claims the advantages of materializing the stage from the beginning, providing some rationale for it based on simplifying the transition process. It relies on stateless NAT64 ([RFC6052], [RFC6145]) to enable access from the IPv4 Internet. 3. Security Considerations A thorough collection of operational security aspects for IPv6 network is made in [I-D.vyncke-opsec-v6]. Most of them, with the probable exception of those specific to residential users, are applicable in the environment we consider in this document. Let us just emphasize the relevance of addressing plan issues (and the limits to public routable addresses for the whole infrastructure), and the need for careful configuration of access control rules at the translation points. This latter one is specially sensitive in infrastructures at the dual-stack stage, as the translation points are potentially distributed, and when protocol translation is offered as an external service, since there can be operational mismatches. 4. IANA Considerations None. Chen, et al. Expires April 22, 2013 [Page 12] Internet-Draft IPv6 Op Guidelines for DCs October 2012 5. Acknowledgements We would like to thank Tore Anderson, Ray Hunter, Arturo Servin, Joel Jaeggli, Fred Baker, Lorenzo Colitti, and Dan York for their questions, suggestions and comments. 6. Informative References [I-D.carpenter-flow-label-balancing] Carpenter, B., Jiang, S., and W. Tarreau, "Using the IPv6 Flow Label for Server Load Balancing", draft-carpenter-flow-label-balancing-01 (work in progress), June 2012. [I-D.chkpvc-enterprise-incremental-ipv6] Chittimaneni, K., Chown, T., Howard, L., Kuarsingh, V., Pouffary, Y., and E. Vyncke, "Enterprise Incremental IPv6", draft-chkpvc-enterprise-incremental-ipv6-01 (work in progress), July 2012. [I-D.ietf-v6ops-icp-guidance] Carpenter, B. and S. Jiang, "IPv6 Guidance for Internet Content and Application Service Providers", draft-ietf-v6ops-icp-guidance-04 (work in progress), September 2012. [I-D.sunq-v6ops-contents-transition] Sun, Q., Liu, D., Zhao, Q., Liu, Q., Xie, C., Li, X., and J. Qin, "Rapid Transition of IPv4 contents to be IPv6- accessible", draft-sunq-v6ops-contents-transition-03 (work in progress), March 2012. [I-D.vyncke-opsec-v6] Chittimaneni, K., Kaeo, M., and E. Vyncke, "Operational Security Considerations for IPv6 Networks", draft-vyncke-opsec-v6-01 (work in progress), July 2012. [RFC6052] Bao, C., Huitema, C., Bagnulo, M., Boucadair, M., and X. Li, "IPv6 Addressing of IPv4/IPv6 Translators", RFC 6052, October 2010. [RFC6145] Li, X., Bao, C., and F. Baker, "IP/ICMP Translation Algorithm", RFC 6145, April 2011. [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with Dual-Stack Hosts", RFC 6555, April 2012. Chen, et al. Expires April 22, 2013 [Page 13] Internet-Draft IPv6 Op Guidelines for DCs October 2012 [V6DCS] "The case for IPv6-only data centres", . Authors' Addresses Zhonghua Chen China Telecom P.R.China Phone: Email: 18918588897@189.cn Diego R. Lopez Telefonica I+D Don Ramon de la Cruz, 84 Madrid 28006 Spain Phone: +34 913 129 041 Email: diego@tid.es Tina Tsou Huawei Technologies (USA) 2330 Central Expressway Santa Clara, CA 95050 USA Phone: +1 408 330 4424 Email: Tina.Tsou.Zouting@huawei.com Cathy Zhou Huawei Technologies Bantian, Longgang District Shenzhen 518129 P.R. China Phone: Email: cathy.zhou@huawei.com Chen, et al. Expires April 22, 2013 [Page 14]