SAMInternet ResearchGroupTask Force (IRTF) J. BufordInternet-DraftRequest for Comments: 7019 Avaya Labs ResearchIntended status:Category: Experimental M. Kolberg, Ed.Expires: January 24, 2014ISSN: 2070-1721 University of StirlingJuly 23,September 2013Application LayerApplication-Layer Multicast Extensions toRELOAD draft-irtf-samrg-sam-baseline-protocol-06REsource LOcation And Discovery (RELOAD) Abstract We define aRELOADREsource LOcation And Discovery (RELOAD) Usage forApplication LayerApplication-Layer Multicast (ALM) as well as a mapping to the RELOAD experimental message type to support ALM. The ALM Usage is intended to support a variety of ALM control algorithms in anoverlay-independentoverlay- independent way. Two example algorithms are defined, based on Scribe and P2PCast. This document is a product of the Scalable Adaptive Multicast Research Group (SAM RG). Status of This Memo ThisInternet-Draftdocument issubmitted in full conformance with the provisions of BCP 78not an Internet Standards Track specification; it is published for examination, experimental implementation, andBCP 79. Internet-Drafts are working documentsevaluation. This document defines an Experimental Protocol for the Internet community. This document is a product of the InternetEngineeringResearch Task Force(IETF). Note that other groups may also distribute working documents as Internet-Drafts.(IRTF). ThelistIRTF publishes the results ofcurrent Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents validInternet-related research and development activities. These results might not be suitable for deployment. This RFC represents the consensus of the Scalable Adaptive Multicast Research Group of the Internet Research Task Force (IRTF). Documents approved for publication by the IRSG are not amaximumcandidate for any level of Internet Standard; see Section 2 ofsix monthsRFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may beupdated, replaced, or obsoleted by other documentsobtained atany time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 24, 2014.http://www.rfc-editor.org/info/rfc7019. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . 3....................................................4 1.1. Requirements Language. . . . . . . . . . . . . . . . . . 4......................................5 2. Definitions. . . . . . . . . . . . . . . . . . . . . . . . . 4.....................................................5 2.1. Overlay Network. . . . . . . . . . . . . . . . . . . . . 4............................................5 2.2. Overlay Multicast. . . . . . . . . . . . . . . . . . . . 5..........................................5 2.3.Source SpecificSource-Specific Multicast (SSM). . . . . . . . . . . . . 5............................6 2.4.Any SourceAny-Source Multicast (ASM). . . . . . . . . . . . . . . 5.................................6 2.5. Peer. . . . . . . . . . . . . . . . . . . . . . . . . . 5.......................................................6 3. Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . 5.....................................................6 3.1. Overlay. . . . . . . . . . . . . . . . . . . . . . . . . 6....................................................6 3.2. Overlay Multicast. . . . . . . . . . . . . . . . . . . . 6..........................................7 3.3. RELOAD. . . . . . . . . . . . . . . . . . . . . . . . . 6.....................................................7 3.4. NAT. . . . . . . . . . . . . . . . . . . . . . . . . . . 6........................................................7 3.5. Tree Topology. . . . . . . . . . . . . . . . . . . . . . 6..............................................7 4. Architecture Extensions to RELOAD. . . . . . . . . . . . . . 7...............................7 5. RELOAD ALM Usage. . . . . . . . . . . . . . . . . . . . . . 8................................................9 6. ALM Tree Control Signaling. . . . . . . . . . . . . . . . . 9......................................9 7. ALM Messages Mapped to RELOAD. . . . . . . . . . . . . . . . 10..................................11 7.1. Introduction. . . . . . . . . . . . . . . . . . . . . . 10..............................................11 7.2. Tree Lifecycle Messages. . . . . . . . . . . . . . . . . 11...................................12 7.2.1.Create Tree . . . . . . . . . . . . . . . . . . . . . 11CreateALMTree ......................................12 7.2.2.CreateTreeResponse . . . . . . . . . . . . . . . . . 12CreateALMTreeResponse ..............................13 7.2.3. Join. . . . . . . . . . . . . . . . . . . . . . . . 12...............................................13 7.2.4.Join AcceptJoinAccept (Join Response). . . . . . . . . . . . . 14.........................14 7.2.5.Join RejectJoinReject (Join Response). . . . . . . . . . . . . 14.........................15 7.2.6.Join Confirm . . . . . . . . . . . . . . . . . . . . 14JoinConfirm ........................................15 7.2.7.Join Confirm Response . . . . . . . . . . . . . . . . 15JoinConfirmResponse ................................16 7.2.8.Join Decline . . . . . . . . . . . . . . . . . . . . 15JoinDecline ........................................16 7.2.9.Join Decline Response . . . . . . . . . . . . . . . . 16JoinDeclineResponse ................................16 7.2.10. Leave. . . . . . . . . . . . . . . . . . . . . . . . 16.............................................17 7.2.11.Leave Response . . . . . . . . . . . . . . . . . . . 17LeaveResponse .....................................17 7.2.12.Re-FormReform or Optimize Tree. . . . . . . . . . . . . . 17...........................17 7.2.13.Reform Response . . . . . . . . . . . . . . . . . . . 17ReformResponse ....................................18 7.2.14. Heartbeat. . . . . . . . . . . . . . . . . . . . . . 17.........................................18 7.2.15. Heartbeat Response. . . . . . . . . . . . . . . . . 18................................18 7.2.16. NodeQuery. . . . . . . . . . . . . . . . . . . . . . 18.........................................19 7.2.17.NodeQuery Response . . . . . . . . . . . . . . . . . 18NodeQueryResponse .................................19 7.2.18. Push. . . . . . . . . . . . . . . . . . . . . . . . 21..............................................21 7.2.19. PushResponse. . . . . . . . . . . . . . . . . . . . 21......................................22 8. Scribe Algorithm. . . . . . . . . . . . . . . . . . . . . . 22...............................................22 8.1. Overview. . . . . . . . . . . . . . . . . . . . . . . . 22..................................................22 8.2. Create. . . . . . . . . . . . . . . . . . . . . . . . . 23....................................................23 8.3. Join. . . . . . . . . . . . . . . . . . . . . . . . . . 23......................................................24 8.4.Leave . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8.5. JoinConfirm . . . . . . . . . . . . . . . . . . . . . . . 23 8.6. JoinDecline . . . . . . . . . . . . . . . . . . . . . . . 23 8.7. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 24 9. P2PCast Algorithm . . . . . . . . . . . . . . . . . . . . . . 24Leave .....................................................24 8.5. JoinConfirm ...............................................24 8.6. JoinDecline ...............................................24 8.7. Multicast .................................................24 9. P2PCast Algorithm ..............................................25 9.1. Overview. . . . . . . . . . . . . . . . . . . . . . . . 24..................................................25 9.2. Message Mapping. . . . . . . . . . . . . . . . . . . . . 24...........................................25 9.3. Create. . . . . . . . . . . . . . . . . . . . . . . . . 25....................................................26 9.4. Join. . . . . . . . . . . . . . . . . . . . . . . . . . 25......................................................26 9.5. Leave. . . . . . . . . . . . . . . . . . . . . . . . . . 26.....................................................28 9.6. JoinConfirm. . . . . . . . . . . . . . . . . . . . . . . 27...............................................28 9.7. Multicast. . . . . . . . . . . . . . . . . . . . . . . . 27.................................................28 10. Message Format. . . . . . . . . . . . . . . . . . . . . . . 27................................................28 10.1. ALMHeader Definition. . . . . . . . . . . . . . . . . . 29.....................................30 10.2. ALMMessageContents Definition. . . . . . . . . . . . . 29............................31 10.3.Response Codes . . . . . . . . . . . . . . . . . . . . . 30Response Codes ...........................................31 11. Examples. . . . . . . . . . . . . . . . . . . . . . . . . . 31......................................................32 11.1. Create Tree. . . . . . . . . . . . . . . . . . . . . . 31..............................................32 11.2. Join Tree. . . . . . . . . . . . . . . . . . . . . . . 32................................................33 11.3. Leave Tree. . . . . . . . . . . . . . . . . . . . . . . 33...............................................35 11.4. Push Data. . . . . . . . . . . . . . . . . . . . . . . 33................................................35 12. Kind Definitions. . . . . . . . . . . . . . . . . . . . . . 34..............................................36 12.1. ALMTree Kind Definition. . . . . . . . . . . . . . . . 34..................................36 13. RELOAD Configuration File Extensions. . . . . . . . . . . . 34 14. Change History . . . . . . . . . . . . . . . . . . . . . . . 35 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 15.1. ALM Algorithm Types . . . . . . . . . . . . . . . . . . 35 15.2. Message Code Registration . . . . . . . . . . . . . . . 36 15.3...........................37 14. IANA Considerations ...........................................37 14.1. ALM Algorithm Types ......................................37 14.2. Message Code Registration ................................38 14.3. Error Code Registration. . . . . . . . . . . . . . . . 36 16...................................38 15. Security Considerations. . . . . . . . . . . . . . . . . . . 37.......................................39 16. Acknowledgements ..............................................40 17.Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 38 18.References ....................................................40 17.1. Normative Reference ......................................40 17.2. Informative References. . . . . . . . . . . . . . . . . . . 38 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39...................................40 1. Introduction The concept of scalable adaptive multicast includes both scaling properties and adaptability properties. Scalability is intended to cover: o large group size o large numbers of small groups o rate of group membership change o admission control for QoS o use withnetwork layernetwork-layer QoS mechanisms o varying degrees of reliability o trees connecting nodes over the global Internet Adaptability includes o use of different control mechanisms for different multicast trees depending on initial application parameters or application classes o changing multicast tree structure depending on changes in application requirements, network conditions, and membershipApplication LayerApplication-Layer Multicast (ALM) has been demonstrated to be a viable multicast technology where native multicast isn't available. Many ALM designs have been proposed. This ALM Usage focuses on: o ALM implemented in RELOAD-based overlays o Support for a variety of ALM control algorithms o Providing a basis for defining a separatehybrid-ALMhybrid ALM RELOAD Usage RELOAD[I-D.ietf-p2psip-base][RELOAD] has an application extension mechanism in which a new type of application defines a Usage. A RELOAD Usage defines a set of data types and rules for their use. In addition, this document describes additional message types and a new ALM algorithm plugin architectural component. This document represents the consensus of the SAM RG. It was repeatedly discussed within the research group, as well as with other Application-Layer Multicast experts. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Definitions We adopt the terminology defined insection 2Section 3 of[I-D.ietf-p2psip-base],[RELOAD], specifically the distinction betweenNode, Peer,"node", "peer", andClient."client". 2.1. Overlay Network Overlaynetwork -network: Anapplication layerapplication-layer virtual or logical networkin whichwith addressable end pointsare addressable andthat provides connectivity, routing, and messaging between end points. Overlay networks are frequently used as a substrate for deploying new networkservices,services or for providing a routing topology not available from the underlying physical network. Many peer-to-peer systems are overlay networks that run on top of the Internet. In Figure 1, "P" indicates overlay peers, and peers are connected in a logical address space. The links shown in the figure represent predecessor/successor links. Depending on the overlay routing model, additional or different links may be present. P P P P P ..+....+....+...+.....+... . +P P+ . . +P ..+....+....+...+.....+... P P P P P Figure 1: Overlay Network Example 2.2. Overlay Multicast Overlay Multicast (OM): Hosts participating in a multicast session form an overlay network and utilize unicast connections among pairs of hosts for data dissemination[BUFORD2009], [KOLBERG2010],[BUFORD2009] [KOLBERG2010] [BUFORD2008]. The hosts in overlay multicast exclusively handle group management, routing, and tree construction, without any support from Internet routers. This is also commonly known asApplicationApplication- Layer Multicast (ALM) orEnd SystemEnd-System Multicast (ESM). We call systemswhichthat use proxies connected in an overlay multicast backbone "proxied overlay multicast" or POM. 2.3.Source SpecificSource-Specific Multicast (SSM) SSM tree: The creator of the tree is the source. It sends data messages to the tree rootwhichthat are forwarded down the tree. 2.4.Any SourceAny-Source Multicast (ASM) ASM tree: A node sending a data message sends the message to its parent and its children. Each node receiving a data message from one edge forwards it to the remaining tree edges to which it isconnected to.connected. 2.5. Peer Peer:anAn autonomous end system that is connected to the physical network and participates in and contributes resources to overlay construction,routingrouting, and maintenance. Some peers may also perform additional roles such as connection relays, super nodes, NAT traversal assistance, and data storage. 3. Assumptions 3.1. Overlay Peers connect in a large-scale overlay, which may be used for a variety of peer-to-peer applications in addition to multicast sessions. Peers may assume additional roles in the overlay beyond participation in the overlay and in multicast trees. We assume asingle structuredsingle-structured overlay routing algorithm is used. Any of a variety of multi-hop, one-hop, or variable-hop overlay algorithms could be used.CastroCastro, et al. [CASTRO2003] compared multi-hop overlays and found that tree-based construction in a single overlayout-performedoutperformed using separate overlays for each multicast session. We use a single overlay rather than separate overlays per multicastsessions.session. An overlay multicast algorithm may leverage the overlay's mechanism for maintaining overlay state in the face of churn. For example, a peer may store a number of DHT (Distributed Hash Table) entries. When the peer gracefully leaves the overlay, it transfers those entries to the nearest peer. When another peer joinswhichthat is closer to some of the entries than the current peerwhichthat holds those entries, than those entries are migrated. Overlay churn affects multicast trees as well; remedies include automatic migration of the tree state and automaticre-joinrejoin operations for dislocatedchildrenchild nodes. 3.2. Overlay Multicast The overlay supports concurrent multiple multicast trees. The limit on the number of concurrent trees depends on peer and network resources and is not an intrinsic property of the overlay. 3.3. RELOAD We use RELOAD[I-D.ietf-p2psip-base][RELOAD] as thePeer-to-Peerpeer-to-peer (P2P) overlay for data storage and the mechanism by which the peers interconnect and route messages. RELOAD is a generic P2P overlay, and application support is defined by profiles called Usages. 3.4. NAT Some nodes in the overlay may be in a private address space and behind firewalls. We use the RELOAD mechanisms for NAT traversal. We permit clients to be leaf nodes in an ALM tree. 3.5. Tree Topology All tree control messages are routed in the overlay. Two types of data or media topologies are envisioned: 1) tree edges are paths in the overlay, and 2) tree edges are direct connections between a parent and child peer in the tree, formed using the RELOAD AppAttach method. 4. Architecture Extensions to RELOAD There are two changes as depicted in Figure 2. New ALM messages are mapped to RELOAD Message Transport using the RELOAD experimental message type. Aplug-inplugin for ALM algorithms handles the ALM state and control. The ALMAlgorithmalgorithm is under control of the application via the Group API[I-D.irtf-samrg-common-api].[COMMON-API]. +---------+ |Group API| +---------+ | ------------------- Application ------------------------ +-------+ | | ALM | | | Usage | | +-------+ | -------------- Messaging Service Boundary -------------- | +--------+ +-----------+---------+ +---------+ | Storage|<---> | RELOAD | ALM |<-->| ALM Alg | +--------+ | Message | Messages| +---------+ ^ | Transport | | | +-----------+---------+ v | | +-------------+ | | Topology | | | Plugin | | +-------------+ | ^ | v v +-------------------+ | Forwarding & | | Link Management | +-------------------+ ---------- Overlay Link Service Boundary -------------- Figure 2: RELOAD Architecture Extensions The ALM components interact with RELOAD as follows: o ALM uses the RELOAD data storage functionality to store an ALMTree instance when a new ALM tree is created in theoverlay,overlay and to retrieve ALMTree instance(s) for existing ALM trees. o ALM applications and management tools may use the RELOAD data storage functionality to store diagnostic information about the operation of trees, including average number oftree,trees, delay from source to leaf nodes, bandwidth use, and packet loss rate. In addition, diagnostic information may include statistics specific to the treeroot,root or to any node in the tree. 5. RELOAD ALM Usage Applications of RELOAD are restricted in the data types that can be stored in the DHT. The profile of accepted data types for an application is referred to as a Usage. RELOAD is designed so that new applications can easily define new Usages. New RELOAD Usages are needed for multicast applications since the data types in base RELOAD and existingusagesUsages are not sufficient. We define an ALM Usage in RELOAD. This ALM Usage is sufficient for applicationswhichthat require ALM functionality in the overlay. Figure 2 shows the internal structure of the ALM Usage. This contains the Group API([I-D.irtf-samrg-common-api])([COMMON-API]), an ALM algorithm plugin(e.g. Scribe)(e.g., Scribe), and the ALM messageswhichthat are then sent out to the RELOAD network. A RELOAD Usage is required[I-D.ietf-p2psip-base][RELOAD] to define the following: oKind-IdKind-ID andCodecode points o data structures for eachkindKind o access control rules for eachkindKind o the Resource Name used to hash to the Resource ID that determines where thekindKind is stored oAddressesaddress restorationof valuesafter recovery from a network partition (to form a single coherent network) o the types of connections that can be initiated using AppConnectanAn ALMGroupIDgroup_id is a RELOADNode-ID.node_id. The owner of an ALM group creates a RELOADNode-IDnode_id as specified in[I-D.ietf-p2psip-base].[RELOAD]. This means that aGroupIDgroup_id is used as a RELOAD Destination for overlay routing purposes. 6. ALM Tree Control Signaling Peers use the overlay to support ALM operations such as: oCreate treeCreateALMTree o Join o Leave oRe-FormReform or optimize tree There are a variety of algorithms for peers to form multicast trees in the overlay. The approach presented here permits multiple such algorithms to be supported in the overlay since different algorithms may be more suitable for certain applicationrequirements, and to supportrequirements; the approach also supports experimentation. Therefore, overlay messaging corresponding to the set of overlay multicast operations MUST carry algorithm identification information. For example, for small groups, the join point might be directly assigned by the rendezvous point, while for large trees thejoinJoin request might be propagated down the tree with candidate parents forwarding their position directly to the new node. Here is a simplistic notation for forming a multicast tree in the overlay. Its main advantage is the use of the overlay for routing both control and data messages. The group creator does not have to be the root of the tree or even in the tree. It does not considerper nodeper-node load, admission control, or alternative paths. After the creation of a tree, thegroupIDgroup_id is expected to be advertised or distributed out of band, perhaps by publishing in the DHT. Similarly, joining peers will discover thegroupIDgroup_id out of band, perhaps by a lookup in the tree. As stated earlier, multiple algorithms willco-existcoexist in the overlay. 1. Peerwhichthat initiates multicast group:groupIDgroup_id = create(); // Allocate a uniquegroupId.group_id. // The root is the nearest // peer in the overlay. 2. Any joining peer:joinTree(groupID);joinTree(group_id); // sends "joingroupID"group_id" message The overlay routes thejoinJoin request using the overlay routing mechanism toward the peer with the nearestidID to thegroupID.group_id. This peer is the root. Peers on the path to the root join the tree as forwarding points. 3. Leave Tree:leaveTree(groupID)leaveTree(group_id); // removes this node from the tree Propagates aleave messageLeave request to each child node and to the parent node. If the parent node is a forwarding node and this is its last child, then it propagates aleave messageLeave request to its parent. A child node receiving aleave messageLeave request from a parent sends ajoin messageJoin request to thegroupID.group_id. 4. Message forwarding:multicastMsg(groupID,multicastMsg(group_id, msg); Forthemessageforwardingforwarding, bothAny SourceAny-Source Multicast (ASM) andSource SpecificSource-Specific Multicast (SSM) approaches may be used. 7. ALM Messages Mapped to RELOAD 7.1. Introduction In thisdocumentdocument, we define messages for overlay multicast tree creation, using an existing protocol (RELOAD) in the P2P-SIP WG[I-D.ietf-p2psip-base][RELOAD] for a universal structured peer-to-peer overlay protocol. RELOAD provides the mechanism to support a number of overlay topologies.HenceHence, the overlay multicast framework defined in this document can be used withP2P-SIP,P2P-SIP and makes theSAMScalable Adaptive Multicast (SAM) framework overlay agnostic. As discussed in the SAM requirements document[I-D.muramoto-irtf-sam-generic-require],[SAM-GENERIC], there are a variety of ALM tree formation and tree maintenance algorithms. The intent of this specification is to be algorithm agnostic, similar to how RELOAD is overlay algorithm agnostic. We assume that all control messages are propagated using overlay routed messages. The message types needed for ALM behavior are divided into the following categories: o Treelife-cycle (create, join, leave, re-form, heartbeat)lifecycle (Create, Join, Leave, Reform, Heartbeat) o Peer region and multicast properties The message codes are defined in Section15.214.2 of this document. Messages are mapped to the RELOAD experimental message type. In the followingsectionssections, the protocol messages as mapped to RELOAD are discussed. Detailed example message flows are provided in Section 11. In the followingdescriptionsdescriptions, we use the datatypeDictionaryDictionary, which is a set of opaque values indexed by an opaque key with one value for each key. A single dictionary entry is represented by a DictionaryEntry as defined in Section 7.2.3 of the RELOAD document[I-D.ietf-p2psip-base].[RELOAD]. The Dictionary datatype is defined as follows: struct { DictionaryEntry elements<0..2^16-1>; } Dictionary; 7.2. Tree Lifecycle Messages Peers use the overlay to transmit ALM(application layer multicast)operations defined in this section. 7.2.1.Create TreeCreateALMTree A new ALM tree is created in the overlay with the identity specified by group_id. The common interpretation in aDHT basedDHT-based overlay of group_id is that the peer withpeer ida peer_id closest to and less than the group_id is the root of the tree. However, other overlay types are supported. The tree has no children at the time it is created. The group_id is generated from a well-known session key to be used by other peers to address the multicast tree in the overlay. The generation of the group_id from the session_key MUST be done using the overlay'sid generationID-generation mechanism. struct { node_id peer_id; opaque session_key<0..2^32-1>; node_id group_id; Dictionary options; } ALMTree; peer_id:theoverlay address of the peer that creates the multicast tree. session_key: a well-known string that when hashed using the overlay'sid generationID-generation algorithm produces the group_id. group_id:theoverlay address of the root of thetreetree. options: name-value list of properties to be associated with the tree, such as the maximum size of the tree, restrictions on peers joining the tree, latency constraints, preference for distributed or centralized tree formation and maintenance,heartbeatand Heartbeat interval. Tree creation is subject to access control since it involves a Store operation. The NODE-MATCH access policy defined insectionSection 7.3.2 ofRELOAD[RELOAD] is used. A successfulCreate TreeCreateALMTree causes an ALMTree structure to be stored in the overlay at the node G responsible for the group_id. This node G performs the RELOAD-defined StoreReq operation as a side effect of performing theCreate Tree.CreateALMTree. If the StoreReq fails, theCreate TreeCreateALMTree fails too. After a successfulCreate Tree,CreateALMTree, peers can use the RELOAD Fetch method to retrieve the ALMTree struct at address group_id. The ALMTreekindKind is defined in Section 12.1. 7.2.2.CreateTreeResponseCreateALMTreeResponse After receiving aCreateTreeCreateALMTree message from node S, the peer sends aCreateTreeReponseCreateALMTreeResponse to node S. struct { Dictionary options; }CreateTreeResponse;CreateALMTreeResponse; options: A node may provide algorithm-dependent parameters about the created tree to the requesting node. 7.2.3. JoinCausesJoin causes the distributed algorithm for peer join of a specific ALM group to be invoked. The definition of the Joinmessagerequest is shown below. If successful, the joining peer is notified of one or more candidate parent peers in one or more JoinAccept messages. The particular ALM join algorithm is not specified in this protocol. struct { node_id peer_id; node_id group_id; Dictionary options; } Join; peer_id: overlay address of joining/leaving peer group_id:theoverlay address of the root of the tree options: name-value list of options proposed by joining peer RELOAD is a request-response protocol. Consequently, the messages JoinAccept and JoinReject (defined below) are matching responses for Join. If JoinReject is received, then no further action on this request is carried out. If JoinAccept is received, then either a JoinConfirm or a JoinDecline message (see below) is sent. The matching response for JoinConfirm is JoinConfirmResponse. The matching response for JoinDecline is JoinDeclineResponse. The following list shows the matching request-responses according to the request-response mechanism defined inRELOAD.[RELOAD]. Join -- JoinAccept: Node C sends a Join request to node P. If node P accepts, it responds with JoinAccept. Join -- JoinReject: Node C sends a Join request to node P. If node P does not accept thejoinJoin request, it responds with JoinReject. JoinConfirm -- JoinConfirmResponse: If node P sent node C a JoinAccept and node C confirms with a JoinConfirmrequestrequest, thenNodenode Pthenresponds with a JoinConfirmResponse. JoinDecline -- JoinDeclineResponse: If node P sent node C a JoinAccept and node C declines with a JoinDeclinerequestrequest, thenNodenode Pthenresponds with a JoinDeclineResponse.ThusThus, Join, JoinConfirm, and JoinDecline are treated as requests as defined in RELOAD, are mapped to the RELOAD exp_a_req message, and are therefore retransmitted until either a retry limit is reached or a matching response received. JoinAccept, JoinReject, JoinConfirmResponse, and JoinDeclineResponse are treated as message responses as definedabove,above and are mapped to the RELOAD exp_a_ans message. The Joinbehaviourbehavior can be described as follows: if(checkAccept(msg)) { recvJoins.add(msg.source, msg.group_id)SEND(JOINAccept(node_id,SEND(JoinAccept(node_id, msg.source, msg.group_id)) } 7.2.4.Join AcceptJoinAccept (Join Response)TellsJoinAccept tells the requesting joining peer that the indicated peer is available to act as its parent in the ALM tree specified by group_id, with the corresponding options specified. A peer MAY receive more than one JoinAccept from different candidate parent peers in the group_id tree. The peer accepts a peer as parent using a JoinConfirm message. A JoinAcceptwhichthat receives neither a JoinConfirmornor JoinDecline message MUST expire. RELOAD implementations are able to read a local configuration file for settings. It is assumed that this file contains the timeout value to be used. struct { node_id parent_peer_id; node_id child_peer_id; node_id group_id; Dictionary options; } JoinAccept; parent_peer_id: overlay address of a peerwhichthat accepts the joining peer child_peer_id: overlay address of joining peer group_id:theoverlay address of the root of the tree options: name-value list of options accepted by parent peer 7.2.5.Join RejectJoinReject (Join Response) A peer receiving a Joinmessagerequest responds with a JoinReject response to indicate the request is rejected. 7.2.6.Join ConfirmJoinConfirm A peer receiving a JoinAccept messagewhichthat it wishes to accept MUST explicitly accept it using a JoinConfirm message before the expiration of a timer for the JoinAcceptmessage using a JoinConfirmmessage. The joining peer MUST include only those options from the JoinAcceptwhichthat it also accepts, completing the negotiation of options between the two peers. struct { node_id child_peer_id; node_id parent_peer_id; node_id group_id; Dictionary options; } JoinConfirm; child_peer_id: overlay address of joining peerwhichthat is a child of the parent peer parent_peer_id: overlay address of the peerwhichthat is the parent of the joining peer group_id:theoverlay address of the root of the tree options: name-value list of options accepted by both peers The JoinConfirm messagebehaviourbehavior isdecribeddescribed below: if(recvJoins.contains(msg.source,msg.group_id)){ if !(groups.contains(msg.group_id)) { groups.add(msg.group_id) SEND(msg,msg.group_id) } groups[msg.group_id].children.add(msg.source) recvJoins.del(msg.source, msg.group_id) } 7.2.7.Join Confirm ResponseJoinConfirmResponse A peer receiving a JoinConfirm message responds with a JoinConfirmResponse message. 7.2.8.Join DeclineJoinDecline A peer receiving a JoinAccept messagewhichthat it does not wish to acceptitMAY explicitly decline it using a JoinDecline message. struct { node_id peer_id; node_id parent_peer_id; node_id group_id; } JoinDecline; peer_id: overlay address of joining peerwhichthat declines the JoinAccept parent_peer_id: overlay address of the peerwhichthat issued a JoinAccept to this peer group_id:theoverlay address of the root of the tree Thebehaviourbehavior of the JoinDecline message is described as follows: if(recvJoins.contains(msg.source,msg.group_id)) recvJoins.del(msg.source, msg.group_id) 7.2.9.Join Decline ResponseJoinDeclineResponse A peer receiving a JoinConfirm message responds with a JoinDeclineResponse message. 7.2.10. Leave A peerwhichthat is part of an ALM tree identified by group_idwhichthat intends to detach from either a child or parent peer SHOULD send a Leavemessagerequest to the peer from which it wishes todetach from.detach. A peer receiving a Leavemessagerequest from a peerwhichthat is neither in its parentornor child lists SHOULD ignore the message. struct { node_id peer_id; node_id group_id; Dictionary options; } Leave; peer_id: overlay address of leaving peer group_id:theoverlay address of the root of the tree options: name-value list of options Thebehaviourbehavior of the Leavemessagerequest can be described as: groups[msg.group_id].children.remove(msg.source) if (groups[msg.group].children = 0) SEND(msg,groups[msg.group_id].parent) 7.2.11.Leave ResponseLeaveResponse A peer receiving a Leavemessagerequest responds with a LeaveResponse message. 7.2.12.Re-FormReform or Optimize Tree This triggers a reorganization of either the entire tree or only asub-tree.subtree. It MAY include hints to specific peers of recommended parent or child peers toreconnect to.which to reconnect. A peer receiving this message MAY ignore it, MAY propagate it to other peers in its subtree, and MAY invoke local algorithms for selecting preferred parent and/or child peers. struct { node_id group_id; node_id peer_id; Dictionary options; } Reform; group_id:theoverlay address of the root of the tree peer_id: if omitted, then the tree is reorganized starting from theroot, otherwiseroot; otherwise, it is reorganized only at thesub-treesubtree identified by peer_id. options: name-value list of options 7.2.13.Reform ResponseReformResponse A peer receiving a Reform message responds with aReformResponseReformResponse. struct { Dictionary options; } ReformResponse; options:algorithm dependentalgorithm-dependent information about the results of thereformReform operation 7.2.14. Heartbeat A child node signals to its adjacent parent nodes in the tree that it is alive. If a parent node does not receive a Heartbeat message within NheartbeatHeartbeat time intervals, it MUST treat this as an explicit Leavemessagerequest from the unresponsive peer. N is configurable. RELOAD implementations are able to read a local configuration file for settings. It is assumed that this file contains the value for N to be used. struct { node_id peer_id_src; node_id peer_id_dst; node_id group_id; Dictionary options; } Heartbeat; peer_id_src: source ofheartbeatHeartbeat peer_id_dst: destination ofheartbeatHeartbeat group_id: overlay address of the root of the tree options: an algorithm may use theheartbeatHeartbeat message to provide state information to adjacent nodes in the tree 7.2.15. Heartbeat Response A parent node responds with aHeartbeat ResponseHeartbeatResponse to a Heartbeat from a child node indicating that it has received the Heartbeat message. 7.2.16. NodeQuery The NodeQuery message is used to obtain information about the state and performance of the tree on aper nodeper-node basis. A set of nodes could be queried to construct a centralized view of the multicast trees, similar to a web crawler. struct { node_id peer_id_src; node_id peer_id_dst; } NodeQuery; peer_id_src: source of query peer_id_dst: destination of query 7.2.17.NodeQuery ResponseNodeQueryResponse The response to a NodeQuery message contains a NodeStatistics instance for this node. public struct { uint32 node_lifetime; uint32 total_number_trees; uint16 number_algorithms_supported; uint8 algorithms_supported[32]; TreeData max_tree_data; uint16active_number_trees;number_active_trees; TreeData tree_data<0..2^8-1>; ImplementationInfoimp_info;impl_info; } NodeStatistics; node_lifetime: time the node has been alive in seconds since last restart total_number_trees: total number of trees this node has been part of during the node lifetime number_algorithms_supported: value between 0..2^16-1 corresponding to the number of algorithms supported algorithms_supported: list of algorithms, each byte encoded using the corresponding algorithm code max_tree_data: data about tree with largest number of nodes that this node was part of. NodeQuery can be used to crawl all the nodes in an ALM tree to fill this field. This is intended to support monitoring, algorithm design, and general experimentation with ALM in RELOAD.active_number_trees:number_active_trees: current number of trees that the node is part of tree_data: details of each activetree,tree; the number of such is specified bythe number_active_trees.number_active_trees impl_info: information about the implementation of thisusageUsage public struct { uint32 tree_id; uint8 algorithm;NodeIdnode_id tree_root; uint8 number_parents;NodeIdnode_id parent<0..2^8-1>;Uint16 number_children_nodes; NodeIduint16 number_child_nodes; node_id children<0..2^16-1>;Uint32uint32 path_length_to_root;Uint32uint32 path_delay_to_root;Uint32uint32 path_delay_to_child; } TreeData; tree_id: theidID of the tree algorithm: code identifying the multicast algorithm used by this tree tree_root: node_id of tree root, or 0 if unknown number_parents: 0 .. 2^8-1 indicates number of parent nodes for this node parent: the RELOADNodeIdnode_id of each parent nodenumber_children_nodes:number_child_nodes: 0..2^16-1 indicates number of children children: the RELOADNodeIdnode_id of each child node path_length_to_root: number of overlay hops to the root of the tree path_delay_to_root: RTT inmillisec.milliseconds to root node path_delay_to_child: last measured RTT inmsecmilliseconds to child node with largestRTT.RTT public struct { uint32join_confim_timeout;join_confirm_timeout; uint32 heartbeat_interval; uint32heartbeat_reponse_timeout;heartbeat_response_timeout; uint16 info_length; uint8 info<0..2^16-1>; } ImplementationInfo; join_confirm_timeout: The default time forjoin confirm/decline,JoinConfirm/JoinDecline, intended to provide sufficient time for ajoinJoin request to receive all responses and confirm the best choice. Default value is 5000 msec. An implementation can change this value.heartbeat interval:heartbeat_interval: The defaultheartbeatHeartbeat interval is 2000 msec. Different interoperating implementations could use different intervals.heartbeat timeout interval:heartbeat_response_timeout: The defaultheartbeatHeartbeat timeout is 5000msec,msec and is the max time betweenheartbeatHeartbeat reports from an adjacent node in the tree at which point theheartbeatHeartbeat is missed. info_length: length of the info field info:implementation specificimplementation-specific information, such as name of implementation, build version, andimplementation specificimplementation-specific features 7.2.18. Push A peer sends arbitrary multicast data to other peers in the tree. Nodes in the tree forward this message to adjacent nodes in the tree in analgorithm dependentalgorithm-dependent way. struct { node_id group_id; uint8 priority; uint32 length; uint8 data<0..2^32-1>; } Push; group_id: overlay address of root of the ALM tree priority: the relative priority of themessage,message; highest priority is 255. A node may ignore thisfieldfield. length: length of the data field in bytes data: the data Inpseudocodepseudocode, thebehaviourbehavior of Push can be described as: foreach(groups[msg.group_id].children as node_id) SEND(msg,node_id) if memberOf(msg.group_id) invokeMessageHandler(msg.group_id, msg) 7.2.19. PushResponse After receiving a Push message from node S, the receiving peer sends aPushReponsePushResponse to node S. struct { Dictionary options; } PushResponse; options: A node may provide feedback to the sender about previouspushPush messages in some window, for example, the last NpushPush messages. The feedback could include, for eachpushPush message received, the number of adjacent nodeswhichthat were forwarded thepush message,Push message and the number of adjacent nodes from which a PushResponse was received. 8. Scribe Algorithm 8.1. Overview Figure 3 shows a mapping between RELOAD ALM messages (as defined in Section 5 of this document) and Scribe messages as defined in [CASTRO2002]. +---------+-------------------+-----------------+ | Section |RELOAD ALM Message | Scribe Message | +---------+-------------------+-----------------+ | 7.2.1 | CreateALMTree | Create | +---------+-------------------+-----------------+ |7.2.27.2.3 | Join | Join | +---------+-------------------+-----------------+ |7.2.37.2.4 | JoinAccept | | +---------+-------------------+-----------------+ |7.2.47.2.6 | JoinConfirm | | +---------+-------------------+-----------------+ |7.2.57.2.8 | JoinDecline | | +---------+-------------------+-----------------+ |7.2.67.2.10 | Leave | Leave | +---------+-------------------+-----------------+ |7.2.77.2.12 | Reform | | +---------+-------------------+-----------------+ |7.2.87.2.14 | Heartbeat | | +---------+-------------------+-----------------+ |7.2.97.2.16 | NodeQuery | | +---------+-------------------+-----------------+ |7.2.107.2.18 | Push | Multicast | +---------+-------------------+-----------------+ | | Note 1 | deliver | +---------+-------------------+-----------------+ | | Note 1 | forward | +---------+-------------------+-----------------+ | | Note 1 | route | +---------+-------------------+-----------------+ | | Note 1 | send | +---------+-------------------+-----------------+ Figure 3: Mapping to Scribe Messages Note 1: These Scribe messages are handled by RELOAD messages. The following sections describe the Scribe algorithm in more detail. 8.2. Create This message will create a group with group_id. This message MUST be delivered to the node whose node_id is closest to the group_id. This node becomes the rendezvous point and root for the new multicast tree. Groups MAY have multiple sources of multicast messages. 8.3. Join To join a multicasttreetree, a node SHOULD send aJOINJoin request with the group_id as the key. This message gets routed by the overlay to the rendezvous point of the tree. If an intermediate node is already a forwarder for this tree, it SHOULD add the joining node as a child.OtherwiseOtherwise, the node SHOULD create a child table for the group and add the joining node. It SHOULD then send theJOINJoin request towards therendevousrendezvous point terminating theJOIN messageJoin request from the child. To adapt the Scribe algorithmintoto the ALM Usage proposed here, after aJOINJoin request is accepted, aJOINAcceptJoinAccept message MUST be returned to the joining node. 8.4. Leave When leaving a multicastgroupgroup, a node SHOULD change its local state to indicate that it left the group. If the node has no children in itstabletable, it MUST send aLEAVELeave request to its parent, from where it SHOULD travel up the multicast tree and stop at a nodewhich hasthat still has children remaining after removing the leaving node. 8.5. JoinConfirm This message is not part of the Scribeprotocol,protocol but is required by the basic protocol proposed in this document.ThusThus, theusageUsage MUST send this message to confirm a joining node accepting its parent node. 8.6. JoinDecline Like JoinConfirm, this message is not part of the Scribe protocol.ThusThus, theusageUsage MUST send this message if a peer receiving a JoinAccept message wishes to decline it. 8.7. Multicast A message to be multicast to a group MUST be sent to therendevousrendezvous node from where it is forwarded down the tree. If a node is a member of the tree rather than just aforwarderforwarder, it SHOULD pass the multicast data up to the application. 9. P2PCast Algorithm 9.1. Overview P2PCast [P2PCAST] creates a forest of related trees to increase load balancing. P2PCast is independent of the underlying P2P substrate. Its goals and approach are similar toSplitstreamSplitStream [SPLITSTREAM] (which assumes Pastry as the P2P overlay). InP2PCastP2PCast, the content provider splits the stream of data into f stripes. Each tree in the forest of multicast trees is an (almost) full tree of arity f. These trees are conceptually separate: every node of the system appears once in each tree, with the content provider being the source in all of them. To ensure that each peer contributes as much bandwidth as it receives, every node is a leaf in all the trees except for one, in which the node will serve as an internal node (proper tree of this node).The remainder of this section will assume f=2 for the discussion. This is to keepTo reduce the complexityforof thedescription down.discussion that follows, the remainder of this section will assume that f = 2. However, the algorithm scales for any number f. P2PCast distinguishes the following types of nodes: o IncompleteNodes:Node: A node with less than f children in its properstripe;stripe o Only-ChildNodes:Node: A node whose parent (in any multicast tree) is an incompletenode;node o CompleteNodes:Node: A node with exactly f children in its proper stripe o Special Node: A single nodewhichthat is a leaf in all multicast trees of the forest 9.2. Message Mapping Figure 4 shows a mapping between RELOAD ALM messages (as defined in Section 5 of this document) and P2PCast messages as defined in [P2PCAST]. +---------+-------------------+-----------------+ | Section |RELOAD ALM Message | P2PCast Message | +---------+-------------------+-----------------+ | 7.2.1 | CreateALMTree | Create | +---------+-------------------+-----------------+ |7.2.27.2.3 | Join | Join | +---------+-------------------+-----------------+ |7.2.37.2.4 | JoinAccept | | +---------+-------------------+-----------------+ |7.2.47.2.6 | JoinConfirm | | +---------+-------------------+-----------------+ |7.2.57.2.8 | JoinDecline | | +---------+-------------------+-----------------+ |7.2.67.2.10 | Leave | Leave | +---------+-------------------+-----------------+ |7.2.77.2.12 | Reform | Takeon | | | | Substitute | | | | Search | | | | Replace | | | | Direct | | | | Update | +---------+-------------------+-----------------+ |7.2.87.2.14 | Heartbeat | | +---------+-------------------+-----------------+ |7.2.97.2.16 | NodeQuery | | +---------+-------------------+-----------------+ |7.2.107.2.18 | Push | Multicast | +---------+-------------------+-----------------+ Figure 4: Mapping to P2PCast Messages The following sections describe the mapping of the P2PCast messages in more detail. 9.3. Create This message will create a group with group_id. This message MUST be delivered to the node whose node_id is closest to the group_id. This node becomes the rendezvous point and root for the new multicast tree. The rendezvous point will maintain f subtrees. 9.4. Join To join a multicasttreetree, a joining node N MUST send aJOINJoin request to a random node A already part of the tree. Dependingofon the type ofAA, the joining algorithm continues as follows: o IncompleteNodes:Node: Node A will arbitrarily select for which tree it wants to serve as an internalnode,node and adopt N in that tree. In the othertreetree, node N will adopt node A as a child (taking node A's place in thetree)tree), thus becoming an internal node in the stripe that node A didn't choose. o Only-ChildNodes:Node: As this node has a parentwhichthat is an incomplete node, the joining node will be redirected to the parent node and will handle the request as detailed above. o CompleteNodes:Node: The contacted node A must be a leaf in the other tree. If node A is a leaf node in Stripe 1, node N will become an internal node in Stripe 1, taking the place of nodeA,A and adopting it at the same time. To find a place for itself in the other stripe, node N starts a random walk down the subtree rooted at the sibling of node A (if node A is the root and thus does not have siblings, node N is sent directly to a leaf in that tree), which ends as soon as node N finds an incomplete node or a leaf. In thiscasecase, node N is adopted by the incomplete node. o Special Node: as this node is a leaf in all subtrees, the joining node MAY adopt the node in one tree and become a child in the other. P2PCast uses defined messages for communication between nodes duringreorganisation.reorganization. To use P2PCast in this context, these messages are encapsulated by the message typeREFORM.Reform. In doing so, the P2PCast message is to be included in the options parameter ofREFORM.Reform. The followingreorganisationreorganization messages are defined by P2PCast:TAKEON:Takeon: To take another peer as a childSUBSTITUTE:Substitute: To take the place of a child of some peerSEARCH:Search: To obtain the child of a node in a particular stripeREPLACE:Replace: Different fromSUBSTITUTESubstitute in that the calling nodewhichthat makesusa node its child sheds off a random childDIRECT:Direct: To direct a node to its would-be parentUPDATE:Update: A node sends its updated state to its children To adapt the P2PCast algorithmintoto the ALM Usage proposed here, after aJOINJoin request is accepted, aJOINAcceptJoinAccept message MUST be returned to the joining node (one for every subtree). 9.5. Leave When leaving a multicastgroupgroup, a node will change its local state to indicate that it left the group. Disregarding the case where the leaving node is the root of the tree, the leaving node must be complete or incomplete in its proper tree. In the othertreestrees, the node is a leaf and can just disappear by notifying its parent. For the proper tree, if the node is incomplete, it is replaced by its child. However, if the node is complete, a gap is createdwhichthat is filled by a random child. If this child is incomplete, it can simply fill the gap. However, if it is complete, it needs to shed a random child. This child is directed to its sibling, which sheds a random child. This process ripples down the tree until the next-to-last level is reached. The shed node is then taken as a child by the parent of the deleted node in the other stripe. Again, for thereorganisationreorganization of the tree, theREFORMReform message type is used as defined in the previous section. 9.6. JoinConfirm This message is not part of the P2PCastprotocol,protocol but is required by the basic protocol defined in this document.ThusThus, theusageUsage MUST send this message to confirm a joining node accepting its parent node. As with Join and JoinAccept, this MUST be carried out for every subtree. 9.7. Multicast A message to be multicast to a group MUST be sent to the rendezvous node from where it is forwarded down the tree by being split into k stripes. Each stripe is then sent via a subtree. If a receiving node is a member of the tree rather than just aforwarderforwarder, it MAY pass the multicast data up to the application. 10. Message Format All messages are mapped to the RELOAD experimental message type. The mapping isgivenshown inthe following table.Figure 5. The message codes aregivenlisted in Section15.2.14.2. The format of the body of a message isgivenprovided inFigure 5.[RELOAD]. +-------------------------+------------------+ | Message |RELOAD Code Point | +-------------------------+------------------+ | CreateALMTree | exp_a_req | +-------------------------+------------------+ | CreateALMTreeResponse | exp_a_ans | +-------------------------+------------------+ | Join | exp_a_req | +-------------------------+------------------+ | JoinAccept | exp_a_ans | +-------------------------+------------------+ | JoinReject | exp_a_ans | +-------------------------+------------------+ | JoinConfirm | exp_a_req | +-------------------------+------------------+ | JoinConfirmResponse | exp_a_ans | +-------------------------+------------------+ | JoinDecline | exp_a_req | +-------------------------+------------------+ | JoinDeclineResponse | exp_a_ans | +-------------------------+------------------+ | Leave | exp_a_req | +-------------------------+------------------+ | LeaveResponse | exp_a_ans | +-------------------------+------------------+ | Reform | exp_a_req | +-------------------------+------------------+ | ReformResponse | exp_a_ans | +-------------------------+------------------+ | Heartbeat | exp_a_req | +-------------------------+------------------+ | HeartbeatResponse | exp_a_ans | +-------------------------+------------------+ | NodeQuery | exp_a_req | +-------------------------+------------------+ | NodeQueryResponse | exp_a_ans | +-------------------------+------------------+ | Push | exp_a_req | +-------------------------+------------------+ | PushResponse | exp_a_ans | +-------------------------+------------------+ Figure 5: RELOAD Message CodemappingMapping For Data Kind-IDs, the RELOAD specification [RELOAD] states: "Code points in the range0xf00000010xF0000001 to0xfffffffe0xFFFFFFFE are reserved for private use". ALM Usage Kind-IDs are defined in the private use range. All ALM Usage messages map to the RELOAD Message Extension mechanism. Code points for thekindsKinds defined in this document MUST NOT conflict with any defined code points for RELOAD. RELOAD definesexp_a_req,exp_a_req and exp_a_ans for experimental purposes. This specification uses only these message types for all ALM messages. RELOAD defines the MessageContents data structure. The ALM mapping uses the fields as follows: o message_code: exp_a_req for requests and exp_a_ans for responses o message_body: contains one instance of ALMHeader followed by one instance of ALMMessageContents o extensions: unused 10.1. ALMHeader Definition struct { uint32 sam_token; uint16 alm_algorithm_id; uint8 version; } ALMHeader; The fields in ALMHeader are used as follows: sam_token: The first four bytes identify this message as an ALM message. This field MUST contain the value0xd3414d420xD3414D42 (the string "SAMB" with the high bit of the first byteset.set). alm_algorithm_id: The ALMAlgorithAlgorithm ID of the ALM algorithm being used. Each multicast tree uses only one algorithm. Trees with different ALM algorithms canco-exist,coexist and can share the same nodes. ALM Algorithm ID codes are defined in Section15.114.1. version: The version of the ALM protocol being used. This is afixed pointfixed-point integer between 0.1 and25.425.4. This document describes version 1.0 with a value of0xa.0xA. 10.2. ALMMessageContents Definition struct { uint16 alm_message_code; opaque alm_message_body; } ALMMessageContents; The fields in ALMMessageContents are used as follows: alm_message_code: This indicates the message being sent. The message codes are listed in Section15.2.14.2. alm_message_body: The message body itself, represented as a variable-length string of bytes. The bytes themselves are dependent on the code value. SeeSectionSections 8 andSection 9 describing9, which describe the various ALM methods for the definitions of the payload contents. 10.3. Response Codes Response codes are defined insectionSection 6.3.3.1in RELOAD.of [RELOAD]. This specification maps to RELOAD ErrorResponse as follows: ErrorResponse.error_code = Error_Exp_A; Error_info contains an ALMErrorResponse instance. public struct { uint16 alm_error_code; opaque alm_error_info<0..2^16-1>; } ALMErrorResponse; alm_error_code: The following error code values are defined. Numeric values for these are defined insectionSection15.3.14.3. Error_Unknown_Algorithm: The multicast algorithm is not known or not supported. Error_Child_Limit_Reached: The maximum number ofchildrenchild nodes has been reached for thisnodenode. Error_Node_Bandwidth_Reached: The overall data bandwidth limit through this node has beenreachedreached. Error_Node_Conn_Limit_Reached: The total number of connections to this node has beenreachedreached. Error_Link_Cap_Limit_Reached: The capacity of a link has beenreachedreached. Error_Node_Mem_Limit_Reached: An internal memory capacity of the node has beenreachedreached. Error_Node_CPU_Cap_Limit_Reached: An internal processing capacity of the node has beenreachedreached. Error_Path_Limit_Reached: The maximum path length inhopcounthop count over the multicast tree has beenreachedreached. Error_Path_Delay_Limit_Reached: The maximum path length in message delay over the multicast tree has beenreachedreached. Error_Tree_Fanout_Limit_Reached: The maximum fanout of a multicast tree has beenreachedreached. Error_Tree_Depth_Limit_Reached: The maximum height of a multicast tree has beenreachedreached. Error_Other: A human-readable description is placed in the alm_error_info field. 11. Examples All peers in the examples are assumed to have completed bootstrapping. "Pn" refers to peer N."GroupID""group_id" refers to a peer responsible for storing the ALMTree instance withGroupID.group_id. 11.1. Create Tree A node with "NODE-MATCH" rights sends a CreateALMTree requestCreateTreeto thegroup-idgroup_id node, which also has NODE-MATCH rights for its own address. Thegroup-idgroup_id node determines whether to create the newtree, andtree and, if so, performs a local StoreReq. If theCreateTreeCreateALMTree succeeds, the ALMTree instance can be retrieved using Fetch. An example message flow forceatingcreating a tree is depicted in Figure 6. P1 P2 P3 P4GroupIDgroup_id | | | | | | | | | | | | | | | |CreateTreeCreateALMTree | | | |------------------------------->| | | | | | | | | | | StoreReq | | | | |--+ | | | | | | | | | | | | | | | | |<-+ | | | | | StoreResponse | | | | |--+ | | | | | | | | | | | | | | | | |<-+ | | | | | | | | | | | |CreateTreeResponseCreateALMTreeResponse | |<-------------------------------| | | | | | | | | | | | Fetch | | | |------------------------------->| | | | | | | | | | | | | FetchResponse | |<-------------------------------| | | | | | Figure 6: Messageflow exampleFlow Example forCreateTree.CreateALMTree 11.2. Join Tree P1 joins nodeGroupIDgroup_id as child node. P2 joins the tree as a child of P1. P4 joins the tree as a child of P1. The corresponding message flow is shown in Figure 7. P1 P2 P3 P4GroupIDgroup_id | | | | | | | | | | | Join | |------------------------------->| | | | | | | JoinAccept | |<-------------------------------| | | | | | | | | | | | |Join | | |----------------------->| | | | | | | Join| |<-------------------------------| | | | | | |JoinAccept | | | |------>| | | | | | | | | |JoinConfirm | | | |<------| | | | | | | | | | | | |Join | | | | |------>| | | | | Join | |<-------------------------------| | | | | | | Join | | | | |------>| | | | | | | | | | JoinAccept | | | |----------------------->| | | | | | | | | JoinAccept | | | |--------------->| | | | | | | | | | | | | |Join ConfirmJoinConfirm | | |<-----------------------| | | | | | | | |Join DeclineJoinDecline | | | |<---------------| | | | | | | | | | | | Figure 7: Messageflow exampleFlow Example fortree Join.Tree Join 11.3. Leave Tree P1 P2 P3 P4GroupIDgroup_id | | | | | | | | | | | | | Leave | | |<-----------------------| | | | | | | | LeaveResponse | | | |----------------------->| | | | | | | | | | | | Figure 8: Messageflow exampleFlow Example for Leavetree.Tree 11.4. Push Data The multicast data is pushed recursively P1 =>GroupIDgroup_id => P1 => P2, P4 following the tree topology created in the Join example above. An example message flow is shown in Figure 9. P1 P2 P3 P4GroupIDgroup_id | | | | | | Push | | | | |------------------------------->| | | | | | | | | PushResponse| |<-------------------------------| | | | | | | | | | Push| |<-------------------------------| | | | | | | PushResponse | | | |------------------------------->| | | | | | |Push | | | | |------>| | | | | | | | | |PushResponse | | | |<------| | | | | | | | | | Push | | | | |----------------------->| | | | | | | | | PushResponse | | |<-----------------------| | | | | | | | | | | | | | | | | Figure 9: Messageflow exampleFlow Example forpushing data.Pushing Data 12. Kind Definitions 12.1. ALMTree Kind Definition This section defines the ALMTreekindKind persectionSection 7.4.5in RELOAD.of [RELOAD]. An instance of the ALMTreekindKind is stored in the overlay for each ALM tree instance. It is stored at the address group_id.Kind-Id: 0xf0000001Kind-ID: 0xF0000001. (This is a private-usecode-pointcode point persectionSection 14.6 ofRELOAD.)[RELOAD].) The Resource Name for the ALMTree Kind-ID is the session_key used to identify the ALM tree. DataModelModel: The data model is the ALMTree structure. AccessControlControl: NODE-MATCH. The node performing the store operation is required to have NODE-MATCH access. Meaning: The meaning of the fields is given in Section 7.2.1. struct { node_id peer_id; opaque session_key<0..2^32-1>; node_id group_id; Dictionary options; } ALMTree; 13. RELOAD Configuration File Extensions There are no ALM parameters defined for the RELOAD configuration file. 14.Change History o Version 02: Remove Hybrid ALM material. Define ALMTree kind. Define new RELOAD messages. Define RELOAD architecture extensions. Add Scribe as base algorithm for ALM usage. Define code points. Define preliminary ALM-specific security issues. o Version 03: Add P2Pcast Algorithm. o Version 04: Add mapping to RELOAD experimental message. Modified IANA considerations section. Changed category of id from Informational to Experimental. New algorithm identification coding. New message coding. Added push message. Create Tree access policy changed to use NODE-MATCH. Create Tree StoreReq clarified. Updated the diagrams in the Examples section. Added a Push data example. Defined the ALMTree kind. Version 05: Updated references. Fixed typos. Version 06: Fixed typos. 15.IANA Considerations This section contains the new code points registered by this document.[NOTE TO IANA/RFC-EDITOR: Please replace RFC-to-be with the RFC number for this specification in the following list. ] 15.1.14.1. ALM Algorithm TypesWe request thatIANAcreate ahas created the "SAM ALM AlgorithmID" Registry.IDs" registry. Entries in this registry are 16-bit integers denotingApplication LayerApplication-Layer Multicast algorithms as described insectionSection 10.1 of[RFC-to-be].this document. Code points in the range0x30x0003 to0x7fff0x7FFF SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range0x7fff0x8000 to0xfffe0xFFFF are reserved for private use. The initial contents of this registry are: +----------------+-------------------+-----------+ | Algorithm Name | ALMAlgorithAlgorithm ID | RFC | +----------------+-------------------+-----------+ | INVALID-ALG |00x0000 |RFC-to-beRFC 7019 | | SCRIBE-SAM |10x0001 |RFC-to-beRFC 7019 | | P2PCAST-SAM |20x0002 |RFC-to-beRFC 7019 | | Reserved |0x3..0xffff0x8000-0xFFFF |RFC-to-beRFC 7019 | +----------------+-------------------+-----------+ Figure1010: "SAM ALM Algorithm IDs" Registry Allocations These values have been made available for the purposes of experimentation. These values are not meant forvendor specificvendor-specific use of any sort and MUST NOT be used for operational deployments.15.2.14.2. Message Code RegistrationWe request thatIANAcreate ahas created the "SAM ALM MessageCode" Registry.Codes" registry. Entries in this registry are 16-bit integers denoting message codes as described insectionSection 10.2 of[RFC-to-be].this document. Code points in the range0x140x0014 to0x7fff0x7FFF SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range0x7fff0x8000 to0xfffe0xFFFF are reserved for private use. The initial contents of this registry are: +-------------------------+----------------------+-----------+ | Message Code Name | Message Code Value | RFC | +-------------------------+----------------------+-----------+ | InvalidMessageCode |00x0000 |RFC-to-beRFC 7019 | |CreateALMTReeCreateALMTree |10x0001 |RFC-to-beRFC 7019 | | CreateALMTreeResponse |20x0002 |RFC-to-beRFC 7019 | | Join |30x0003 |RFC-to-beRFC 7019 | | JoinAccept |40x0004 |RFC-to-beRFC 7019 | | JoinReject |50x0005 |RFC-to-beRFC 7019 | | JoinConfirm |60x0006 |RFC-to-beRFC 7019 | | JoinConfirmResponse |70x0007 |RFC-to-beRFC 7019 | | JoinDecline |80x0008 |RFC-to-beRFC 7019 | | JoinDeclineResponse |90x0009 |RFC-to-beRFC 7019 | | Leave |100x000A |RFC-to-beRFC 7019 | | LeaveResponse |110x000B |RFC-to-beRFC 7019 | | Reform |120x000C |RFC-to-beRFC 7019 | | ReformResponse |130x000D |RFC-to-beRFC 7019 | | Heartbeat |140x000E |RFC-to-beRFC 7019 | | HeartbeatResponse |150x000F |RFC-to-beRFC 7019 | | NodeQuery |160x0010 |RFC-to-beRFC 7019 | | NodeQueryResponse |170x0011 |RFC-to-beRFC 7019 | | Push |180x0012 |RFC-to-beRFC 7019 | | PushResponse |190x0013 |RFC-to-beRFC 7019 | | Reserved |0x14..0xffff0x8000-0xFFFF |RFC-to-beRFC 7019 | +-------------------------+----------------------+-----------+ Figure1111: "SAM ALM Message Codes" Registry Allocations These values have been made available for the purposes of experimentation. These values are not meant forvendor specificvendor-specific use of any sort and MUST NOT be used for operational deployments.15.3.14.3. Error Code RegistrationWe request thatIANAcreate ahas created the "SAM ALM ErrorCode" Registry.Codes" registry. Entries in this registry are 16-bit integers denoting error codes as described insectionSection 10.3 of[RFC-to-be].this document. Code points in the range0x140x000D to0x7fff0x7FFF SHALL be registered via RFC 5226 [RFC5226] Expert Review. Code points in the range0x7fff0x8000 to0xfffe0xFFFF are reserved for private use. The initial contents of this registry are:+----------------------------------+--------------+-----------++----------------------------------+---------------+-----------+ | Error Code Name | Code Value | RFC |+----------------------------------+--------------+-----------++----------------------------------+---------------+-----------+ | InvalidErrorCode |00x0000 |RFC-to-beRFC 7019 | | Error_Unknown_Algorithm |10x0001 |RFC-to-beRFC 7019 | | Error_Child_Limit_Reached |20x0002 |RFC-to-beRFC 7019 | | Error_Node_Bandwidth_Reached |30x0003 |RFC-to-beRFC 7019 | | Error_Node_Conn_Limit_Reached |40x0004 |RFC-to-beRFC 7019 | | Error_Link_Cap_Limit_Reached |50x0005 |RFC-to-beRFC 7019 | | Error_Node_Mem_Limit_Reached |60x0006 |RFC-to-beRFC 7019 | | Error_Node_CPU_Cap_Limit_Reached |70x0007 |RFC-to-beRFC 7019 | | Error_Path_Limit_Reached |80x0008 |RFC-to-beRFC 7019 | | Error_Path_Delay_Limit_Reached |90x0009 |RFC-to-beRFC 7019 | | Error_Tree_Fanout_Limit_Reached |100x000A |RFC-to-beRFC 7019 | | Error_Tree_Depth_Limit_Reached |110x000B |RFC-to-beRFC 7019 | | Error_Other |120x000C |RFC-to-beRFC 7019 | | Reserved |0x0D..0xffff0x8000-0xFFFF |RFC-to-beRFC 7019 |+----------------------------------+--------------+-----------++----------------------------------+---------------+-----------+ Figure1212: "SAM ALM Error Codes" Registry Allocations These values have been made available for the purposes of experimentation. These values are not meant forvendor specificvendor-specific use of any sort and MUST NOT be used for operational deployments.16.15. Security Considerations Overlays are vulnerable toDOSDoS and collusion attacks. We are not solving overlay security issues. We assume that the node authentication model as defined in[I-D.ietf-p2psip-base].[RELOAD] will be used. Security issues specific to ALM Usagespecific security issues:include the following: oRightThe right to createGroupIDgroup_id at some node_id oRightThe right to store Tree info at someLocationlocation in the DHT oLimitA limit on#number of messages/ secper second and bandwidth use oRightThe right to join an ALM tree17. Acknowledgement16. Acknowledgements Marc Petit-Huguenin, Michael Welzl, Joerg Ott, and Lars Eggert provided important comments on earlier versions of this document.18.17. References 17.1. Normative Reference [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 17.2. Informative References [BUFORD2008] Buford, J. and H. Yu,"Peer-to-Peer"P2P: Overlay Multicast", Encyclopedia of Wireless and MobileCommunications 2008,Communications, 2008, <http://www.tandfonline.com/doi/abs/10.1081/ E-EWMC-120043583>. [BUFORD2009] Buford, J., Yu, H., and E. Lua, "P2P Networking and Applications (Chapter 9)", MorganKaufmanKaufman, 2009,2009, <http://www.sciencedirect.com/science/book/9780123742148>.<http://www.sciencedirect.com/science/book/ 9780123742148>. [CASTRO2002] Castro, M., Druschel, P., Kermarrec, A., and A. Rowstron,"Scribe:"SCRIBE: A large-scale and decentralized application-level multicast infrastructure", IEEE Journal on Selected Areas inCommunications vol.20, No.8,Communications, Vol. 20, No. 8, October 2002,<http:// research.microsoft.com/en-us/um/people/antr/past/ jsac.pdf>.<http://ieeexplore.ieee.org/xpl/ login.jsp?tp=&arnumber=1038579>. [CASTRO2003] Castro, M., Jones, M., Kermarrec, A., Rowstron, A., Theimer, M., Wang, H., and A. Wolman, "An Evaluation of Scalable Application-level Multicast Built UsingPeer-to- peer overlays",Peer- to-peer Overlays", Proceedings of IEEE INFOCOM 2003, April 2003,<http://research.microsoft.com/en-us/um/people/ mcastro/publications/infocom-compare.pdf>. [I-D.ietf-p2psip-base] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base Protocol", draft-ietf-p2psip-base-26 (work in progress), February 2013. [I-D.irtf-samrg-common-api]<http://ieeexplore.ieee.org/xpl/ login.jsp?tp=&arnumber=1208986>. [COMMON-API] Waehlisch, M., Schmidt, T., and S. Venaas, "A Common API for Transparent Hybrid Multicast",draft-irtf-samrg- common-api-06 (work in progress), August 2012. [I-D.muramoto-irtf-sam-generic-require] Muramoto, E., "Requirements for Scalable Adaptive Multicast Framework in Non-GIG Networks", draft-muramoto- irtf-sam-generic-require-01 (workWork inprogress), November 2006.Progress, April 2013. [KOLBERG2010] Kolberg, M., "Employing Multicast in P2P Overlay Networks", Handbook of Peer-to-PeerNetworking (Ed. X.Shen, H. Yu, J. Buford, M. Akon)Networking, 2010,2010, <http://link.springer.com/ content/pdf/10.1007%2F978-0-387-09751-0_30.pdf>.<http://link.springer.com/content/pdf/ 10.1007%2F978-0-387-09751-0_30.pdf>. [P2PCAST] Nicolosi, A. and S. Annapureddy, "P2PCast: APeer-to-PeerPeer-to- Peer Multicast Scheme for Streaming Data", Stanford Secure Computer Systems GroupReport 2003,Report, May 2003,<http:// www.scs.stanford.edu/~reddy/research/p2pcast/report.pdf>. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3552]<http://www.scs.stanford.edu/~reddy/research/p2pcast/ report.pdf>. [RELOAD] Jennings, C., Lowekamp, B., Ed., Rescorla,E.E., Baset, S., andB. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003.H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base Protocol", Work in Progress, February 2013. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [SAM-GENERIC] Muramoto, E., Imai, Y., and N. Kawaguchi, "Requirements for Scalable Adaptive Multicast Framework in Non-GIG Networks", Work in Progress, November 2006. [SPLITSTREAM] Castro, M., Druschel, P., Nandi, A., Kermarrec, A., Rowstron, A., and A. Singh, "SplitStream:High-bandwidth multicastHigh- Bandwidth Multicast in acooperative environment", SOSP'03,LakeCooperative Environment", SOSP '03, Lake Bolton, NewYork 2003,York, October 2003,<http:// research.microsoft.com/en-us/um/people/antr/PAST/ SplitStream-sosp.pdf>.<http://dl.acm.org/citation.cfm?id=945474>. Authors' Addresses John Buford Avaya Labs Research 211 Mt. AiryRdRd. Basking Ridge, New Jersey 07920 USA Phone: +1 908 848 5675Email:EMail: buford@avaya.com Mario Kolberg (editor) University of Stirling Dept. of Computing Science and Mathematics Stirling FK9 4LA UK Phone: +44 1786 46 7440Email:EMail: mkolberg@ieee.org URI: http://www.cs.stir.ac.uk/~mko