Operations and Management Area Working P. Fan Group L. Li Internet-Draft China Mobile Intended status: Standards Track February 25, 2013 Expires: August 29, 2013 Requirements for IP/MPLS network transmission interruption duration draft-fan-opsawg-transmission-interruption-02 Abstract The transmission performance of IP/MPLS network affects upper layer services and networks, but there is no consensus in the industry on transmission interruption for IP/MPLS network up to now. This memo studies requirements for the interruption duration criteria in several service scenarios. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 29, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as Fan & Li Expires August 29, 2013 [Page 1] Internet-Draft IP/MPLS transmission interruption February 2013 described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Services and Performance Criteria . . . . . . . . . . . . . . 4 2.1. Softswitch . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. SS7 transport . . . . . . . . . . . . . . . . . . . . . . 7 2.3. LTE Backhaul . . . . . . . . . . . . . . . . . . . . . . . 8 2.4. Ethernet VPN . . . . . . . . . . . . . . . . . . . . . . . 8 2.5. IPTV . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3. Other considerations . . . . . . . . . . . . . . . . . . . . . 9 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 7. Informative References . . . . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Fan & Li Expires August 29, 2013 [Page 2] Internet-Draft IP/MPLS transmission interruption February 2013 1. Introduction Today's IP/MPLS network is widely used as a bearer network to carry diversified packet switched services. The transmission qualities of these services are closely related to the performance of bearer layers, as network failure, delay, congestion and other abnormities will inevitably bring about service interruption and user perception degradation. However, there is no consensus in the industry on transmission interruption for IP/MPLS network up to now. This memo studies relationships between service performance and transmission interruption duration in several scenarios, and is intended to reach a list of requirements for these interruption duration criteria. For a long time the industry has been aspiring for the so-called golden standard for network resilience, that is the 50-millisecond recovery threshold. [HeavyReading] gives us a basic introduction to the origin of this fast protection legacy which can date back to 1980s. The 50ms threshold was established informally in the early 1980s, and then formally through standardization of [G.841] recommendation on SDH network protection architects. The specific requirement shows a maximum threshold for detecting and restoring a fault of 60ms, which adds up fault detection duration of less than 10ms and protection switching time of less than 50ms. The report also mentions original concerns that the threshold results from. The voice channel banks deployed in early 1980s had limited fault tolerance. Failures that lasted longer than 200ms would generate a Carrier Group Alarm (CGA) which caused the channel bank to terminate all connections over that given TDM line. So an outage budget was developed by carriers and the 50ms standard was employed to protect voice services. However newer channel banks at that time had started to implement a CGA timer of 2s, so the 50ms protection was adopted to protect a small and diminishing fraction of digital network. Historically this 50ms fast protection speed has been achieved by SDH network. Using various fast convergence technics, IP/MPLS is also able to react within 50ms. As for network applications that are carried by optical or packet core, changes have been made through the past decades, accompanied by the continuing questions about needs for 50ms protection. Here we list three basic considerations about services and their requirement for IP/MPLS: for services like TDM over IP/MPLS, the traditional 50ms guarantee should be kept and met; for current IP services (e.g. voice, internet), experiences or experiments are to be provided for guidance; for services in future, we are supposed to propose requirement early and give consideration to IP/MPLS. Fan & Li Expires August 29, 2013 [Page 3] Internet-Draft IP/MPLS transmission interruption February 2013 2. Services and Performance Criteria Services delivered by IP/MPLS network have different transmission quality requirements, thus introduce different performance criteria for the bearing IP/MPLS network. We believe there are two principles that need to be considered during network and service design, configuration and operation. The IP/MPLS bearer should satisfy quality requirements of upper level services and applications, while services and applications should also take into account the intrinsic IP capabilities. In this section we will describe concerns on IP/ MPLS and service mutual adaptation from aspects of several kinds of service scenarios. 2.1. Softswitch From the softswitch point of view, the IP carrying nature imposes certain influence to the service quality. Especially when speech is delivered by IP, the communication quality of voice is impaired, and in turn makes higher requirements for the transmission performance of IP. This part will mainly focus on three communication quality criteria and their influence factors to give requirements for softswitch and IP bearer networks. 1) Call Loss Call loss is used to describe the circumstance where a phone call fails to establish after initiated by a subscriber due to network faults. In the practical network, the call loss rate is mainly associated by the factors as follows: (1) Interfaces, including Nc, Mc and interface between MSS and SG. (2) State machine message timer. If a timeout takes place, the state machine releases signaling messages, producing a call loss. Typical value of BICC timer is 10~15 seconds and value of DTAP timer about 15 seconds. (3) Interface association timer. Associations breaks off at the expiration of timer. (4) Bearer network convergence time. If the configured timer duration of a state machine is shorter than the timer duration of interface association, then although interface association may not be broken off, call loss is still possible to occur due to message timer expiration. If the association timer duration is shorter than IP routing convergence time, the association is considered broken off by SCTP, hence message loss at interface Fan & Li Expires August 29, 2013 [Page 4] Internet-Draft IP/MPLS transmission interruption February 2013 between MSS and SG as well as interface Nc results in massive call loss, and new calling request cannot be satisfied because of interface Mc breakoff. In this case, the call loss rate can be calculated as Call Loss Rate = ( IP Convergence Time + Association Restoration Time ) * CAPS / BHCA. However, if the association timer duration is longer than IP routing convergence time, then the association is considered normal by SCTP, and data will be retransmitted. Although this may cause buffer overflow leading to call loss, the call loss rate is possible to achieve approximately zero if buffer is big enough. From the analysis above and practical operation experience, the requirements for softswitch and IP bearer are as follows: the duration of SCTP interface association timer should be shorter than that of the state machine message timer, and this duration is further recommended to be no longer than 6 seconds in order to maintain detection sensitivity; the interruption duration of IP bearer network should be as short as possible to avoid call loss during the IP layer interruption period, and this duration is further recommended to be no longer than 5 seconds. 2) Call Cut-off Call cut-off is referred to the abnormal release during a phone call due to reasons other than intentional release by any of the parties involved in the call. The call cut-off rate is related with: (1) Interfaces, including Nc and interface between MSS and SG. (2) Interface association timer. (3) Bearer network convergence time. If the association timer duration is shorter than IP routing convergence time, established phone calls will be released once interruption of interface Nc or interface connecting MSS and SG is detected. In the case of association breakoff, call cut-off rate can be calculated as Call Cut-off Rate = ( CAPS * Call Duration ) * Busy Hour Association Breakoffs / BHCA. While if the association is not interrupted, the call cut-off rate can be approximately zero. Fan & Li Expires August 29, 2013 [Page 5] Internet-Draft IP/MPLS transmission interruption February 2013 In conclusion, the SCTP association should be guaranteed during IP layer interruption to avoid interface breakoff alert. The requirements for softswitch and IP bearer are the same as those related to call loss. 3) Connection Delay The connection delay from a call initiation by a calling party to PLMN should be no longer than 4 seconds. This delay is affected by factors below: (1) RRC connection setup delay (irrelevant to whether service is carried by IP or not). (2) Core network signaling interaction delay. The message number at interface Nc/Nb is 6, and is 8 (calling side) or 16 (called side, in case of IP-IP) at interface Mc. Each message is with a delay of no longer than 50 milliseconds. Calling message delay at interface Nc is no longer than 300 milliseconds. If long distance call is made though CMN, the message delay is to be increased by transmission delay of 5 msec/km and CMN process delay. So the message delay is likely to be 400 milliseconds. (3) IP bearer network QoS and load. The connection delay is influenced by the delay criterion defined in the IP bearer network QoS, and is raised by delay, jitter, packet loss caused by network overload. In addition, if the configured timer duration of interface association is too long, the SCTP sensitivity to the retransmitted messages after packet loss will be decreased, which increases connection delay. Connection delay is generally expressed as Connection Delay = (IP convergence time + RRC connection setup delay + Signaling Interaction Delay), and is no longer than 4 seconds. So the IP network in normal working state should be constrained within a certain range of load to ensure that delay is shorter than 50 milliseconds, while in interruption state the IP convergence time should be no longer than 3 seconds to ensure that connection delay is shorter than 4 seconds. From the analysis of IP/MPLS performance according to the three criteria above, we suggest the transmission interruption duration of IP/MPLS network for softswitch service should be no longer than 3 seconds. Fan & Li Expires August 29, 2013 [Page 6] Internet-Draft IP/MPLS transmission interruption February 2013 2.2. SS7 transport The Signaling System No. 7 (SS7/C7) network is one of the examples of the principle that services should take into account the ability of IP. The bearer of SS7 protocol stack has been experiencing evolution from TDM to IP. Traditionally the user parts of SS7 (including MAP, CAP, BSSAP+, ISUP, etc.) are carried by MTP layers, but the bearer has gradually been evolved into a packetized form with SIGTRAN (including M2PA, M2UA, M3UA, etc.) using SCTP associations over IP. The change requires transport layer to take mechanisms to meet demand of SCN signaling, and more importantly it requires protocols to make adaption to the "best effort" fact of IP. The SIGTRAN uses an architecture that can be described as standard IP plus unified transport plus diversified adaption units. It introduces SCTP to realize reliable signaling transport over IP. The SCTP itself provides reliable transmission mechanisms, such as path selection and monitoring, validation and acknowledgment mechanisms, and retransmission timing management. The unreliable nature of IP makes it necessary for the upper-level protocols to be more tolerable to the possible instability of bearer. Once a service request from a UE is accepted, the system allocates resources and establishes paths for the user. A breakoff caused by IP will result in signaling disconnection or rerouting. Signaling transmission path may also be switched back after IP layer restores. Frequent switchovers and disconnections lead to unnecessary system cost and service interruption, so parameters should be configured a little bit "insensitive" to try to sustain connections on control plane. One of the examples of parameter configuration is the timer value. The following gives two cases about SCTP on transport layer and M2PA on adaption layer. The values should not be set very small to prevent unnecessary disconnection caused by IP instability. However, because upper services of SS7 may also have timeout rules, values should not be set very large too to avoid violating the rules. 1) SCTP SCTP uses RTO to manage timeout duration for retransmission in case of feedback missing. The RTO is given an initial, a max and a min value, and is calculated instantaneously with a set of management rules. Many other parameters are used for fault detection in SCTP. Association.Max.Retrans is used to indicate the upper limit of number of possible retransmission without considering endpoint down. Path.Max.Retrans is a similar value to detect path failure. The parameters together characterize the ability of SCTP to tolerate Fan & Li Expires August 29, 2013 [Page 7] Internet-Draft IP/MPLS transmission interruption February 2013 bearer downwards and provide reliable SS7 transport upwards. The typical values of the parameters are RTO.Initial = 0.5 sec, RTO.MIN = 0.5 sec, RTO.MAX = 1.5 sec, Path.Max.Retrans = 5, Assoc.Max.Retrans = 10. 2) M2PA Although protocols like H.248 and BICC can be carried directly upon SCTP, the user part protocols of SS7 usually have to be carried by SCTP/IP with the help of different adaption layers. In this case, the attributes of adaption layers, e.g. M2PA used between STPs, are more important to SS7. M2PA uses a T7 timer to indicate the maximum delay of acknowledgement and start T7 at the time of data transmission. If no message is acknowledged after the maximum waiting time, T7 expires and M2PA sends a message of out of service to the peer end. Because propagation delays in IP networks are more variable than in traditional SS7 networks, the value of T7 should be set considering IP propagation delays, as well as acknowledgement time, SCTP slow-start algorithms, upper service timers and other factors. Typical value of T7 is 7~10 sec. Parameter configuration induced tolerance to bearer may have some influence on service, but it avoids service cut-off or severe user perception degradation. For services like SMS or route lookup, possible latency may be introduced, but operations can still be completed after short delay. Because SMS has no strict requirement for instantaneity, impact on service is limited. If route lookup takes more time due to IP interruption and convergence, user may experience longer setup delay when dialing. For service of location update, even if operation fails because bearer is interrupted for too long, UE has the mechanism to initiate request again. 2.3. LTE Backhaul To be further analyzed. 2.4. Ethernet VPN Ethernet VPNs (e.g. VPLS) are used to provide transparent Ethernet type layer 2 connections for customers. Ethernet frames are treated as service payload and encapsulated and transported in providers MPLS network. The interruption criteria of IP/MPLS bearer should guarantee continuity of Ethernet service, and IP/MPLS failover is not supposed to generate outage of Ethernet service. [Y.1731] and [IEEE802.1ag] describe in detail OAM functions and mechanisms for Ethernet, with specific recommendation on connectivity fault management. Ethernet uses continuity check function to detect Fan & Li Expires August 29, 2013 [Page 8] Internet-Draft IP/MPLS transmission interruption February 2013 loss of continuity between any pair of MEPs in a MEG, and this function is realized by sending CCMs (connectivity check messages) between peer MEPs. When a MEP does not receive CCM from a peer MEP within a certain interval, it detects loss of continuity to that peer MEP. The threshold interval is specified as 3.5 times the CCM transmission period, which corresponds to a loss of three consecutive CCMs from the peer MEP, and the CCM transmission period is recommended to be the default value of 1 second. So the interruption duration of IP/MPLS for Ethernet VPN services should be less than 3 seconds. 2.5. IPTV To be further analyzed. 3. Other considerations So far this document has focused on use cases and their requirement for IP/MPLS, and other practical issues are not included in this version. For example, an IP/MPLS packet core is expected to carry a variety of services, so the requirement for IP/MPLS may have to include additional concerns on this multi-service co-existence scenario. A simple and straight-forward way may be to satisfy the most critical need for protection time required by the services. Another issue is related to service awareness. Whether service type is or can be known by IP/MPLS would influence the ability of IP/MPLS to provide reliability guarantee accordingly. It seems to be easier to perform service identification on edge devices than network core. We believe these kinds of issues need to be taken into account, and currently we will just leave them to be updated in future revisions. 4. Security Considerations TBD 5. IANA Considerations This memo includes no request to IANA. 6. Acknowledgements We would like to thank Kai Li, Xu Chen and Chris Donley for their kind help to the document. Fan & Li Expires August 29, 2013 [Page 9] Internet-Draft IP/MPLS transmission interruption February 2013 7. Informative References [G.841] ITU-T Recommendation G.841, "Types and characteristics of SDH network protection architectures", October 1998. [HeavyReading] Bennett, G., "Resilience Reliability and OAM in Converged Network", Heavy Reading, Vol. 2, No. 6, February 2004. [IEEE802.1ag] IEEE Std 802.1ag-2007, "IEEE Standard for Local and metropolitan area networks, Virtual Bridged Local Area Networks, Amendment 5: Connectivity Fault Management", December 2007. [Y.1731] ITU-T Recommendation Y.1731, "OAM Functions and Mechanisms for Ethernet based Networks", July 2011. Authors' Addresses Peng Fan China Mobile 32 Xuanwumen West Street, Xicheng District Beijing 100053 P.R. China Email: fanpeng@chinamobile.com Lianyuan Li China Mobile 32 Xuanwumen West Street, Xicheng District Beijing 100053 P.R. China Email: lilianyuan@chinamobile.com Fan & Li Expires August 29, 2013 [Page 10]