<?xmlversion='1.0'?>version="1.0" encoding="UTF-8"?> <!DOCTYPE rfc SYSTEM'rfc2629.dtd' [ ]> <?rfc toc="yes"?> <?rfc tocompact="no"?> <?rfc tocdepth="6"?> <?rfc symrefs="yes"?> <?rfc sortrefs="yes"?> <?rfc compact="yes"?> <?rfc subcompact="no"?> <?rfc strict="yes" ?>"rfc2629-xhtml.ent"> <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std" consensus="true" docName="draft-ietf-pim-drlb-15"ipr="trust200902">number="8775" ipr="trust200902" obsoletes="" updates="" submissionType="IETF" xml:lang="en" tocInclude="true" tocDepth="6" symRefs="true" sortRefs="true" version="3"> <!-- xml2rfc v2v3 conversion 2.39.0 --> <!-- ***** FRONT MATTER ***** --> <front> <title abbrev="PIM Designated Router Load Balancing">PIM Designated Router Load Balancing</title> <seriesInfo name="RFC" value="8775"/> <author fullname="Yiqun Cai" initials="Y" surname="Cai"> <organization>Alibaba Group</organization> <address> <postal> <street>520 Almanor Avenue</street> <city>Sunnyvale</city><region>CA</region> <code>94085</code> <country>United States of America</country> </postal> <email>yiqun.cai@alibaba-inc.com</email> </address> </author> <author initials="H" surname="Ou" fullname="Heidi Ou"> <organization>Alibaba Group</organization> <address> <postal> <street>520 Almanor Avenue</street> <city>Sunnyvale</city><region>CA</region> <code>94085</code> <country>United States of America</country> </postal> <email>heidi.ou@alibaba-inc.com</email> </address> </author> <author initials="S" surname="Vallepalli" fullname="Sri Vallepalli"><organization>Cisco Systems, Inc.</organization><address><postal> <street>3625 Cisco Way</street> <city>San Jose</city> <code>CA 95134</code> <country>USA</country> </postal> <email>svallepa@cisco.com</email><email>vallepal@yahoo.com</email> </address> </author> <author initials="M" surname="Mishra" fullname="Mankamana Mishra"> <organization>Cisco Systems, Inc.</organization> <address> <postal> <street>821 Alder Drive,</street> <city>Milpitas</city><code>CA 95035</code> <country>USA</country><region>CA</region> <code>95035</code> <country>United States of America</country> </postal> <email>mankamis@cisco.com</email> </address> </author> <author initials="S" surname="Venaas" fullname="Stig Venaas"> <organization>Cisco Systems, Inc.</organization> <address> <postal> <street>Tasman Drive</street> <city>San Jose</city><code>CA 95134</code> <country>USA</country><region>CA</region> <code>95134</code> <country>United States of America</country> </postal> <email>stig@cisco.com</email> </address> </author> <author initials="A" surname="Green" fullname="Andy Green"> <organization>British Telecom</organization> <address> <postal> <street>Adastral Park</street> <city>Ipswich</city> <code>IP5 2RE</code> <country>United Kingdom</country> </postal> <email>andy.da.green@bt.com</email> </address> </author><date/><date year="2020" month="April" /> <area>Routing</area> <keyword>Multicast</keyword> <abstract> <t>On a multi-access network, one of the PIM-SM (PIM Sparse Mode) routers is elected as a Designated Router. One of the responsibilities of the Designated Router is to track local multicast listeners and forward data to these listeners if the group is operating in PIM-SM. This document specifies a modification to the PIM-SM protocol that allows more than one of the PIM-SM routers to take on this responsibility so that the forwarding load can be distributed among multiple routers. </t> </abstract> </front> <!-- ***** MIDDLE MATTER ***** --> <middle> <sectiontitle="Introduction">numbered="true" toc="default"> <name>Introduction</name> <t>On a multi-accessLAN, suchLAN (such as anEthernet,Ethernet) with one or more PIM-SM (PIM Sparse Mode) <xreftarget="RFC7761"/>target="RFC7761" format="default"/> routers, one of the PIM-SM routers is elected as a Designated Router (DR). The PIM DR has two responsibilities in the PIM-SM protocol. For any active sources on a LAN, the PIM DR is responsible for registering with the Rendezvous Point (RP) if the group is operating in PIM-SM. Also, the PIM DR is responsible for tracking local multicast listeners and forwarding data to these listeners if the group is operating in PIM-SM. </t> <t>Consider the following LAN inFigure 1: </t><xref target="LAN-REC" format="default"/>:</t> <figure> <preamble/>anchor="LAN-REC"> <name>LAN with Receivers</name> <artwork><![CDATA[name="" type="" align="left" alt=""><![CDATA[ (core networks) | | | | | | R1 R2 R3 | | | ----(LAN)---- | | (many receivers)Figure 1: LAN with receivers]]></artwork><postamble></postamble></figure> <t>Assume R1 is elected as the DR. According to the PIM-SM protocol, R1 will be responsible for forwarding traffic to that LAN on behalf of all local members. In addition to keeping track of membership reports, R1 is also responsible for initiating the creation of source and/or shared trees towards the senders or the RPs. The membership reports would be IGMP orMLDMulticast Listener Discovery (MLD) messages. This applies to any versions of the IGMP and MLD protocols. The most recent versions are IGMPv3 <xreftarget="RFC3376"/>target="RFC3376" format="default"/> and MLDv2 <xreftarget="RFC3810"/>.target="RFC3810" format="default"/>. </t> <t>Having a single router acting as DR and being responsible fordata planedata-plane forwarding leads to several issues. One of the issues is that the aggregated bandwidth will be limited to what R1 can handle with regards to capacity of incoming links, the interface on the LAN, and total forwarding capacity. It is very common that a LAN consists of switches that run IGMP/MLD or PIM snooping <xreftarget="RFC4541"/>.target="RFC4541" format="default"/>. This allows the forwarding of multicast packets to be restricted only to segments leading to receivers that have indicated their interest in multicast groups using either IGMP or MLD. The emergence of the switched Ethernet allows the aggregated bandwidth to exceed, sometimes by a large number, that of a single link. For example, let us modifyFigure 1<xref target="LAN-REC" format="default"/> and introduce an Ethernet switch inFigure 2.<xref target="LAN-SWITCH" format="default"/>. </t><figure> <preamble/> <artwork> <![CDATA[<figure anchor="LAN-SWITCH"> <name>LAN with Ethernet Switch</name> <artwork name="" type="" align="left" alt=""><![CDATA[ (core networks) | | | | | | R1 R2 R3 | | | +=gi1===gi2===gi3=+ + + + switch + + + +=gi4===gi5===gi6=+ | | | H1 H2 H3Figure 2: LAN with Ethernet Switch ]]> </artwork> <postamble></postamble>]]></artwork> </figure> <t>Let us assume that each individual link is a Gigabit Ethernet. Eachrouter, R1, R2router (R1, R2, andR3,R3) and the switch have enough forwarding capacity to handle hundreds ofGigabitsgigabits of data. </t> <t>Let us further assume that each of the hosts requests 500 Mbps of unique multicast data. This totals to 1.5 Gbps of data, which is less than what each switch or the combined uplink bandwidth across the routers can handle, even under failure of a single router. </t> <t> On the other hand, the link between R1 and switch, via port gi1, can only handle a throughput of1Gbps.1 Gbps. And if R1 is the only DR (the PIM DR elected using the procedure defined by <xreftarget="RFC7761"/>)target="RFC7761" format="default"/>), at least 500 Mbps worth of data will be lost because the only link that can be used to draw the traffic from the routers to the switch is via gi1. In other words, the entire network's throughput is limited by the single connection between the PIM DR and the switch (orLANLAN, as inFigure 1).<xref target="LAN-REC" format="default"/>). </t> <t>Another important issue is related to failover. If R1 is the only forwarder on a shared LAN, when R1 goes out of service, multicast forwarding for the entire LAN has to be rebuilt by the newly elected PIM DR. However, if there were a way that allowed multiple routers to forward to the LAN for different groups, failure of one of the routers would only lead to disruption to a subset of the flows, therefore improving the overall resilience of the network. </t> <t>This document specifies a modification to the PIM-SM protocol that allows more than one of these routers, called Group Designated Routers(GDR)(GDRs), to be selected so that the forwarding load can be distributed among a number of routers. </t> </section> <sectiontitle="Terminology"> <t>Thenumbered="true" toc="default"> <name>Terminology</name> <t> The key words"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY","<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and"OPTIONAL""<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shown here. </t> <t>With respect to PIM-SM, this document follows the terminology that has been defined in <xreftarget="RFC7761"/>.target="RFC7761" format="default"/>. </t> <t> This document also introduces the following new acronyms: </t><t> <list style="symbols"> <t><dl newline="false" spacing="normal"> <dt> GDR: Group DesignatedRouter. ForRouter.</dt> <dd>For each multicast flow, either a (*,G) for Any-Source Multicast(ASM),(ASM) or an (S,G) for Source-Specific Multicast (SSM) <xreftarget="RFC4607"/>,target="RFC4607" format="default"/>, aHash Algorithmhash algorithm (described below) is used to select one of the routers as a GDR. The GDR is responsible for initiating the forwarding tree building process for the corresponding multicast flow.</t> <t>GDR Candidate: a</dd> <dt>GDR Candidate:</dt> <dd>a router that has the potential to become a GDR. There might be multiple GDR Candidates on a LAN, but only one can become the GDR for a specific multicast flow.</t> </list> </t></dd> </dl> </section> <sectiontitle="Applicability">numbered="true" toc="default"> <name>Applicability</name> <t>The extension specified in this document applies to PIM-SM routers acting aslast hoplast-hop routers (there are directly connected receivers). It does not alter the behavior of a PIMDR,DR or any otherrouters,routers on thefirst hopfirst-hop network (directly connected sources). This is because the source tree is built using the IP address of the sender, not the IP address of the PIM DR that sends PIM registers towards the RP. The load balancing betweenfirst hopfirst-hop routers can be achieved naturally if an IGP provides equal cost multiple paths (which it usually does in practice).AlsoAlso, distributing the load to do source registration does not justify the additional complexity required to support it. </t> </section> <sectiontitle="Functional Overview">numbered="true" toc="default"> <name>Functional Overview</name> <t>In the PIM DR election as defined in <xreftarget="RFC7761"/>,target="RFC7761" format="default"/>, when multiple routers are connected to a multi-access LAN (for example, an Ethernet), one of them is elected to act as PIM DR. The PIM DR is responsible for sending local Join/Prune messages towards the RP or source. In order to elect the PIM DR, each PIM router on the LAN examines the received PIM Hello messages and compares its own DR priority and IP address with those of its neighbors. The router with the highest DR priority is the PIM DR. If there are multiple such routers, their IP addresses are used as thetie-breaker,tiebreaker, as described in <xreftarget="RFC7761"/>.target="RFC7761" format="default"/>. </t> <t> In order to share forwarding load amonglast hoplast-hop routers, besides the normal PIM DR election, one or more GDRs are elected on the multi-access LAN. There is only one PIM DR on the multi-access LAN, but there might be multiple GDR Candidates. </t> <t>For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, aHash Algorithm [<xref target="maskalgo"/>]hash algorithm (<xref target="maskalgo" format="default"/>) is used to select one of the routers to be the GDR. The new DRLoad BalancingLoad-Balancing Capability (DRLB-Cap) PIM Hello Option is used to announce theCapabilityCapability, as well as theHash Algorithmhash algorithm type. Routers with the new DRLB-Cap Option advertised in their PIM Hello, using the same GDR electionHash Algorithmhash algorithm and the same DR priority as the PIM DR, are considered as GDR Candidates. </t> <t>HashMasksmasks are defined for Source,GroupGroup, andRPRP, separately, in order to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR CandidateAddresses,addresses, are announced by the DR in a new DRLoad BalancingLoad-Balancing List (DRLB-List) PIM Hello Option. </t> <t>AHash Algorithmhash algorithm based on the announced Source, Group, or RP masks allows one GDR to be assigned to a corresponding multicast state. That GDR is responsible for initiating the creation of the multicast forwarding tree for multicast traffic. </t> <sectiontitle="GDR Candidates">numbered="true" toc="default"> <name>GDR Candidates</name> <t>GDR is the new concept introduced by this specification. GDR Candidates are routers eligible for GDR election on the LAN. To become a GDR Candidate, a router must have the same DR priority and run the same GDR electionHash Algorithmhash algorithm as the DR on the LAN. </t> <t>For example, assume there are 4 routers on the LAN: R1, R2,R3R3, and R4, each announcing a DRLB-Capoption.Option. R1,R2R2, and R3 have the same DRprioritypriority, while R4's DR priority is less preferred. In this example, R4 will not be eligible for GDR election, because R4 will not become a PIM DR unless all of R1,R2R2, and R3 go out of service. </t> <t>Furthermore, assume router R1 wins the PIM DR election, R1 and R2 advertise the sameHash Algorithmhash algorithm for GDR election, while R3 advertises a different one. In this case, only R1 and R2 will be eligible for GDR election, while R3 will not. </t> <t>As a DR, R1 will include its ownLoad BalancingLoad-Balancing Hash Masks and the identity of R1 and R2 (the GDR Candidates) in its DRLB-List Hello Option. </t> </section> </section> <sectiontitle="Protocol Specification">numbered="true" toc="default"> <name>Protocol Specification</name> <sectiontitle="Hashanchor="maskalgo" numbered="true" toc="default"> <name>Hash Mask and HashAlgorithm" anchor="maskalgo">Algorithm</name> <t>AHash Maskhash mask is used to extract a number of bits from the corresponding IP address field (32 for IPv4, 128 for IPv6) and calculate a hash value. A hash value is used to select a GDR from GDR Candidates advertised by the PIM DR. Hash masks allow for certain flows to always be forwarded by the same GDR, by ignoring certain bits in the hash value calculation, so that the hash values are the same. For example, 0.0.255.0 defines aHash Maskhash mask for an IPv4 address that masks the first,thesecond, andthefourth octets, which means that only the third octet will influence the hash value computed. Note that the masks need not be a contiguous set of bits.E.g,For example, for IPv4, 15.15.15.15 would be a valid mask. </t> <t> In the text below, a hash maskisis, in someplacesplaces, said to be zero. A hash mask is zero if no bits areset. Thatset, that is, 0.0.0.0 for IPv4 and :: for IPv6. Also, a hash mask is said to be an all-bits-set mask if it is 255.255.255.255 for IPv4 or ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff for IPv6. </t> <t>There are threeHash Maskshash masks defined: </t><t> <list style="symbols"> <t>RP<ul spacing="normal"> <li>RP HashMask</t> <t>SourceMask</li> <li>Source HashMask</t> <t>GroupMask</li> <li>Group HashMask</t> </list> </t>Mask</li> </ul> <t>The hash masks need to be configured on the PIM routers that can potentially become a PIM DR, unless the implementation provides default hash mask values. An implementationSHOULD<bcp14>SHOULD</bcp14> have default hash mask values as follows. The default RP Hash MaskSHOULD<bcp14>SHOULD</bcp14> be zero (no bits set). The default Source and Group Hash MasksSHOULD<bcp14>SHOULD</bcp14> both be all-bits-set masks. These default values are likely acceptable for mostdeployments,deployments and simplify configuration. There is only a need to use other masks if one needs to ensure that certain flows are forwarded by the same GDR. </t> <t> The DRLB-List Hello Option contains a list of GDR Candidates. The first one listed has ordinal number 0, the second listed ordinal number 1, and the last one has ordinal number N - 1 if there are N candidates listed. The hash value computed will be the ordinal number of the GDR Candidate that is acting as GDR for the flow in question. </t> <t>The input to be hashed is determined as follows:<list style="symbols"> <t>If</t> <ul spacing="normal"> <li>If the group is in ASM mode and the RP Hash Mask announced by the PIM DR is not zero (at least one bit is set), calculate the value of hashvalue_RP[<xref target="algorithm"/>](<xref target="algorithm" format="default"/>) to determine the GDR.</t> <t>If</li> <li>If the group is in ASM mode and the RP Hash Mask announced by the PIM DR is zero (no bits are set), obtain the value of hashvalue_Group[<xref target="algorithm"/>](<xref target="algorithm" format="default"/>) to determine the GDR.</t> <t>If</li> <li>If the group is in SSM mode, use hashvalue_SG[<xref target="algorithm"/>](<xref target="algorithm" format="default"/>) to determine the GDR.</t> </list> </t></li> </ul> <t> A simpleModulo Hash Algorithmmodulo hash algorithm is defined in this document. However, to allow anotherHash Algorithmshash algorithm to be used, a 1-octet "Hash Algorithm" field is included in the DRLB-Cap Hello Option to specify theHash Algorithmhash algorithm used by the router. </t> <t>If differentHash Algorithmshash algorithms are advertised among the routers on a LAN, only the routers advertising the sameHash Algorithmhash algorithm as the DR (as well as having the same DR priority as the DR) are eligible for GDR election. </t> </section> <sectiontitle="Moduloanchor="algorithm" numbered="true" toc="default"> <name>Modulo HashAlgorithm" anchor="algorithm">Algorithm</name> <t> As part of computing the hash, the notation LSZC(hash_mask) is used to denote the number of zeroes counted from the least significant bit of aHash Maskhash mask hash_mask. As an example, LSZC(255.255.128) is 7 andalsoLSZC(ffff:8000::) is 111. If all bits are set, LSZC will be 0. If the mask is zero, then LSZC will be 32 forIPv4,IPv4 and 128 for IPv6. </t> <t> The number of GDR Candidates is denoted as GDRC. </t> <t> The idea behind theModulo Hash Algorithm ismodulo hash algorithm is, in simpletermsterms, that the corresponding mask is applied to a value, then the result is shifted right LSZC(mask) bits so that the least significant bits that were masked out are not considered.ThenThen, this result is masked by 0xffffffff, keeping only the last 32 bits of the result (this only makes a difference for IPv6). Finally, the hash value is this result modulo the number of GDR Candidates (GDRC). </t> <t> TheModulo Hash Algorithmmodulo hash algorithm, for computing the values hashvalue_RP,hashvalue_Grouphashvalue_Group, andhashvalue_SGhashvalue_SG, is defined as follows. </t> <t> hashvalue_RP is calculated as:<list style = "empty"> <t></t> <artwork> (((RP_address & RP_mask)>>>> LSZC(RP_mask)) & 0xffffffff) % GDRC</t> <t>RP_address</artwork> <ul empty="true"> <li>RP_address is the address of the RP defined for thegroupgroup, and RP_mask is the RP HashMask. </t> </list> </t>Mask.</li> </ul> <t> hashvalue_Group is calculated as:<list style = "empty"> <t></t> <artwork> (((Group_address & Group_mask)>>>> LSZC(Group_mask)) & 0xffffffff) % GDRC</t> <t></artwork> <ul empty="true"> <li> Group_address is the groupaddressaddress, and Group_mask is the Group HashMask. </t> </list> </t>Mask.</li> </ul> <t> hashvalue_SG is calculated as:<list style = "empty"> <t></t> <artwork> ((((Source_address & Source_mask)>>>> LSZC(Source_mask)) & 0xffffffff) ^ (((Group_address & Group_mask)>>>> LSZC(Group_mask)) & 0xffffffff)) % GDRC</t> <t></artwork> <ul empty="true"> <li> Group_address is the groupaddressaddress, and Group_mask is the Group HashMask. </t> </list> </t>Mask.</li> </ul> <sectiontitle="Modulonumbered="true" toc="default"> <name>Modulo Hash AlgorithmExamples">Examples</name> <t>To help illustrate the algorithm, consider this example. Router X with IPv4 address 203.0.113.1 receives a DRLB-List Hello Option from theDR, whichDR that announces RP Hash Mask 0.0.255.0 and a list of GDR Candidates, sorted by IP addresses from high to low: 203.0.113.3,203.0.113.2203.0.113.2, and 203.0.113.1. The ordinal number assigned to those addresses would be: </t><t>0<t> 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (RouterX). </t>X).</t> <t>Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2 for Group2. Following the moduloHash Algorithm:hash algorithm: </t><t>LSZC(0.0.255.0)<ul spacing="normal"> <li>LSZC(0.0.255.0) is88, and GDRC is 3. The hashvalue_RP for Group1 with RP RP1 is:</t> <t>(((192.0.2.1</li> </ul> <ul empty="true"> <li> <artwork> (((192.0.2.1 & 0.0.255.0)>>>> 8) & 0xffffffff % 3) = 2 % 3 = 2</t> <t>which</artwork> </li> <li>This matches the ordinal number assigned to Router X. Router X will be the GDR forGroup1. </t> <t>TheGroup1.</li> </ul> <ul spacing="normal"> <li>The hashvalue_RP for Group2 with RP RP2is: </t> <t>(((198.51.100.2is:</li> </ul> <ul empty="true"> <li> <artwork> (((198.51.100.2 & 0.0.255.0)>>>> 8) & 0xffffffff % 3) = 100 % 3 = 1</t> <t>which</artwork> </li> <li>This is different from the ordinal number of Router X (2). Hence, Router X will not be GDR forGroup2. </t>Group2.</li> </ul> <t>ForIPv6IPv6, consider this example, similar to the above. Router X with IPv6 address fe80::1 receives a DRLB-List Hello Option from theDR, whichDR that announces RP Hash Mask ::ffff:ffff:ffff:0 and a list of GDR Candidates, sorted by IP addresses from high to low: fe80::3,fe80::2fe80::2, and fe80::1. The ordinal number assigned to those addresses would be: </t><t>0<ul empty="true"> <li>0 for fe80::3; 1 for fe80::2; 2 for fe80::1 (RouterX). </t>X).</li> </ul> <t>Assume there are 2 RPs: RP1 2001:db8::1:0:5678:1 for Group1 and RP2 2001:db8::1:0:1234:2 for Group2. Following the moduloHash Algorithm:hash algorithm: </t><t>LSZC(::ffff:ffff:ffff:0)<ul spacing="normal"> <li>LSZC(::ffff:ffff:ffff:0) is1616, and GDRC is 3. The hashvalue_RP for Group1 with RP RP1is: </t> <t>(((2001:db8::1:0:5678:1is:</li> </ul> <ul empty="true"> <li> <artwork> (((2001:db8::1:0:5678:1 & ::ffff:ffff:ffff:0)>>>> 16) & 0xffffffff % 3) = ((::1:0:5678:0>>>> 16) & 0xffffffff % 3) = (::1:0:5678 & 0xffffffff % 3) = ::5678 % 3 = 2</t> <t>which</artwork> </li> <li>This matches the ordinal number assigned to Router X. Router X will be the GDR forGroup1. </t> <t>TheGroup1.</li> </ul> <ul spacing="normal"> <li>The hashvalue_RP for Group2 with RP RP2is: </t> <t>(((2001:db8::1:0:1234:1is:</li> </ul> <ul empty="true"> <li> <artwork> (((2001:db8::1:0:1234:1 & ::ffff:ffff:ffff:0)>>>> 16) & 0xffffffff % 3) = ((::1:0:1234:0>>>> 16) & 0xffffffff % 3) = (::1:0:1234 & 0xffffffff % 3) = ::1234 % 3 = 1</t> <t>which</artwork> </li> <li>This is different from the ordinal number of Router X (2). Hence, Router X will not be GDR forGroup2. </t>Group2.</li> </ul> </section> <sectiontitle="Limitations">numbered="true" toc="default"> <name>Limitations</name> <t> TheModulo Hash Algorithmmodulo hash algorithm has poor failover characteristics when a shared LAN has more than two GDRs. In the case of more than two GDRs on a LAN, when one GDR fails, all of the groups may be reassigned to a different GDR, even if they were not assigned to the failed GDR. However, many deployments use only two routers on a shared LAN for redundancy purposes. Future work may define newHash Algorithmshash algorithms where only groups assigned to the failed GDR get reassigned. </t> <t>TheModulo Hash Algorithmmodulo hash algorithm willuseuse, atmostmost, 32 consecutive bits of the input addresses for its computation. Exactly which bits are used of the source,groupgroup, or RPaddresses,addresses depend on the respective masks. This limitation may be an issue for IPv6 deployments, since not all bits of the IPv6 addresses are considered. If this causes operational issues, a new hash algorithm would need to be defined. </t> </section> </section> <sectiontitle="PIMnumbered="true" toc="default"> <name>PIM HelloOptions">Options</name> <t>PIM routers include a new option, called"Load Balancing"Load-Balancing Capability(DRLB-Cap)"(DRLB-Cap)", in their PIM Hello messages. </t> <t>Besides this DRLB-Cap Hello Option, the elected PIM DR also includes a new "DRLoad BalancingLoad-Balancing List (DRLB-List) Hello Option". The DRLB-List Hello Option consists of threeHash Maskshash masks, as definedaboveabove, and also a list of GDR Candidate addresses on the LAN. It is recommended that the GDR Candidate addresses are sorted in descending order. This ensures that when usingalgorithmsalgorithms, such as theModulomodulo hash algorithm in this document, that it is predictable which GDR is responsible for which groups, regardless of the order the DR learned about the candidates. </t> <sectiontitle="PIMnumbered="true" toc="default"> <name>PIM DRLoad BalancingLoad-Balancing Capability (DRLB-Cap) HelloOption">Option</name> <figurealign="center">anchor="PIM-CAP"> <name>PIM DR Load-Balancing Capability Hello Option</name> <artworkalign="center"><![CDATA[align="center" name="" type="" alt=""><![CDATA[ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 34 | Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved |Hash Algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Figure 3: PIM DR Load Balancing Capability Hello Option]]></artwork><postamble></postamble></figure><t> <list style="empty"> <t>Type: 34 </t> <t>Length: 4 </t> <t>Reserved: Transmitted<dl newline="false" spacing="normal"> <dt>Type:</dt> <dd>34</dd> <dt>Length:</dt> <dd>4</dd> <dt>Reserved:</dt> <dd>Transmitted as zero, ignored onreceipt. </t> <t>Hash Algorithm: Hash Algorithmreceipt.</dd> <dt>Hash Algorithm:</dt> <dd>Hash algorithm type. A value listed in the IANA "PIM Designated RouterLoad BalancingLoad-Balancing HashAlgorithmsAlgorithms" registry. 0 is used for theModulohash algorithm defined in this document.</t> </list> </t></dd> </dl> <t>This DRLB-Cap Hello OptionMUST<bcp14>MUST</bcp14> be advertised by routers on all interfaces where DR Load Balancing is enabled. Note that the option isincludedincluded, atmostmost, once. </t> </section> <sectiontitle = "PIMnumbered="true" toc="default"> <name>PIM DRLoad BalancingLoad-Balancing List (DRLB-List) HelloOption">Option</name> <figurealign="center">anchor="PIM-LIST"> <name>PIM DR Load-Balancing List Hello Option</name> <artworkalign="center"><![CDATA[align="center" name="" type="" alt=""><![CDATA[ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 35 | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RP Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | GDR Candidate Address(es) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Figure 4: PIM DR Load Balancing List Hello Option]]></artwork><postamble></postamble></figure><t> <list style="empty"> <t>Type: 35</t> <t>Length: (3<dl newline="false" spacing="normal"> <dt>Type:</dt> <dd>35</dd> <dt>Length:</dt> <dd>(3 + n) x (4 or 16) bytes, where n is the number of GDRcandidates.</t> <t>GroupCandidates.</dd> <dt>Group Mask (32/128bits): Maskbits):</dt> <dd>Mask applied to group addresses as part of hashcomputation.</t> <t>computation.</dd> <dt> Source Mask (32/128bits): Maskbits):</dt> <dd>Mask applied to source addresses as part of hashcomputation.</t> <t>RPcomputation.</dd> <dt>RP Mask (32/128bits): Maskbits):</dt> <dd>Mask applied to RP addresses as part of hashcomputation.</t> <t> <list style="empty">computation.</dd> </dl> <t>All masksMUST<bcp14>MUST</bcp14> have the same number of bits as the IP source address in the PIM Hello IP header. </t></list> </t> <t>GDR<dl newline="false" spacing="normal"> <dt>GDR Candidate Address(es) (32/128bits): Listbits):</dt> <dd><t>List of GDRCandidate(s) <list style="empty">Candidate(s)</t> <t>All addressesMUST<bcp14>MUST</bcp14> be in the same address family as the PIM Hello IP header. It is recommended that the addresses are sorted in descending order. </t> <t>If the "Interface ID" option, as specified in <xreftarget="RFC6395"/>,target="RFC6395" format="default"/>, is present in a GDRCandidate'sCandidate's PIM Hellomessage,message and the "Router Identifier" portion is non-zero:<list style="symbols"> <t>For</t> <ul spacing="normal"> <li>For IPv4, the "GDR Candidate Address" will be set directly to the "Router Identifier".</t> <t>For</li> <li>For IPv6, the "GDR Candidate Address" will be 96 bits ofzeroeszeroes, followed by the 32 bit Router Identifier.</t> </list> </t></li> </ul> <t>If the "Interface ID" option is not present in a GDRCandidate'Candidate's PIM Hellomessage,message or if the "Interface ID" option is present but the "Router Identifier" field is zero, the "GDR Candidate Address" will be the IPv4 or IPv6 source address of the PIM Hello message. </t> <t>This DRLB-List Hello OptionMUST<bcp14>MUST</bcp14> only be advertised by the elected PIM DR. ItMUST<bcp14>MUST</bcp14> be ignored if received from a non-DR. The optionMUST<bcp14>MUST</bcp14> also be ignored if the hash masks are not the correct number ofbits,bits or GDR Candidate addresses are in the wrong address family. </t></list> </t> </list> </t></dd></dl> </section> </section> <sectiontitle="PIMnumbered="true" toc="default"> <name>PIM DROperation">Operation</name> <t>The DR election process is still the same as defined in <xreftarget="RFC7761"/>.target="RFC7761" format="default"/>. The DR advertises the new DRLB-List Hello Option, which contains mask values from user configuration (or default values), followed by a list of GDR CandidateAddresses.addresses. Note that if a router included the "Interface ID" option in the hellomessage,message and the Router ID is non-zero, the Router ID will be used to form the GDR Candidate address of the router, as discussed in the previous section. It is recommended that the list besorted,sorted from the highest value to the lowest value. The reason for sorting the list is to make the behavior deterministic, regardless of the order in which the DR learns of new candidates. Note that, as for non-DR routers, the DR also advertises the DRLB-Cap Hello Option to indicate its ability to support the new functionality and the type of GDR electionHash Algorithmhash algorithm it uses. </t> <t>If a PIM DR receives a neighbor DRLB-Cap HelloOption, whichOption that contains the sameHash Algorithmhash algorithm as theDR,DR and the neighbor has the same DR priority as the DR, PIM DRSHOULD<bcp14>SHOULD</bcp14> consider the neighbor as a GDR Candidate and insert the GDRCandidate'Candidate's Address into the list of the DRLB-List Option. However, the DR may have policies limiting whichGDR Candidates,or the number of GDR Candidates to include. Likewise, the DRSHOULD<bcp14>SHOULD</bcp14> include itself in the list of GDR Candidates, but it is permissible not to do so,ifforinstanceinstance, if there is some policy restricting the candidate set. </t> <t>If a PIM neighbor included in the list expires, stops announcing the DRLB-Cap Hello Option, changes DR priority, changesHash Algorithmhash algorithm, or otherwise becomes ineligible as a candidate, the DRSHOULD<bcp14>SHOULD</bcp14> immediately send a triggered hello with a new list in the DRLB-List option, excluding the neighbor. </t> <t>If a new router becomes eligible as a candidate, there is no urgency in sending out an updated list. An updated listSHOULD<bcp14>SHOULD</bcp14> be included in the next hello. </t> </section> <sectiontitle="PIMnumbered="true" toc="default"> <name>PIM GDR CandidateOperation">Operation</name> <t>When an IGMP/MLD report is received, aHash Algorithmhash algorithm is used by the GDR Candidates to determine which router is going to be responsible for building forwarding trees on behalf of the host. </t> <t>The routerMUST<bcp14>MUST</bcp14> include the DRLB-Cap Hello Option in all PIM Hello messages sent on the interface. Note that the presence of the DRLB-Cap Option in the PIM Hello does not guarantee that the router will be considered as a GDRcandidate.Candidate. Once the DR election is done, the DRLB-List Hello Option is received from the current PIM DR containing a list of the selectedGDRsGDR Candidates. </t> <t>A router only acts as a GDR Candidate if it is included in the GDR Candidate list of the DRLB-List Hello Option. See next section for details. </t> </section> <sectiontitle="DRLB-Listnumbered="true" toc="default"> <name>DRLB-List Hello OptionProcessing">Processing</name> <t> This section discusses processing of the DRLB-List Hello Option, including the case where it was received in the previoushello,hello but not in the current hello. All routersMUST<bcp14>MUST</bcp14> ignore the DRLB-List Hello Option if it is received from a PIM routerwhichthat is not the DR. The optionMUST<bcp14>MUST</bcp14> only be processed by routers that are announcing the DRLB-CapOption,Option and only if theHash Algorithmhash algorithm announced by the DR is the same as the local announcement. All GDR CandidatesMUST<bcp14>MUST</bcp14> use theHash Maskshash masks advertised in the Option, even if they differ from those the candidate was configured with. The DRMUST<bcp14>MUST</bcp14> also process its own DRLB-List Hello Option. </t> <t>A router stores the latest option contents thatwaswere announced, if any, and deletes the previous contents. The routerMUST<bcp14>MUST</bcp14> also compare the new contents with any previouscontents, andcontents and, if there are any changes, continue processing as below. Note that if the option does not pass the above checks, the below processingMUST<bcp14>MUST</bcp14> be done as if the option was not announced. </t> <t> If the contents of the DRLB-List Option, themasksmasks, or the candidatelist, differslist differ from the previously saved copy, it is received for the first time, or it is no longer being received or accepted, the optionMUST<bcp14>MUST</bcp14> be processed as below.<list style="numbers"></t> <ol spacing="normal" type="1"> <li> <t>If the local router is included in theGDR"GDR CandidateAddress(es) field (itAddress(es)" field, it will look for its own address, orits Router IDif it announces a non-zero RouterID), forID, its own Router ID. For each of thegroups,groups or source and grouppairspairs, if the group is in SSMmode,mode with local receiver interest, the routerMUST<bcp14>MUST</bcp14> run theHash Algorithmhash algorithm to determine which of themitis for theGDR for. <list style="symbol"> <t>IfGDR. </t> <ul spacing="normal"> <li>If there is no change in the GDR status, then no further action is required.</t> <t>If</li> <li>If the router becomes the new GDR, then a multicast forwarding treeMUST<bcp14>MUST</bcp14> be built <xreftarget="RFC7761"/>. </t> <t>target="RFC7761" format="default"/>. </li> <li> If the router is no longer the GDR, then it uses an Assert as explained in[<xref target="assert"/>]. </t> </list> </t><xref target="assert" format="default"/>. </li> </ul> </li> <li> <t>If one of the following occurs:</t> <ul> <li>the local router is not included in theGDR"GDR CandidateAddress(es) field, or if theAddress(es)" field,</li> <li>the DRLB-List Hello Option is no longer included in the DR's Hello,or if theor</li> <li>the DR's Neighbor Liveness Timer expires<xref target="RFC7761"/>,[RFC7761],</li> </ul> <t> then for eachof the groups, orgroup (or each source and grouppairspair if the group is in SSMmode,mode) with local receiver interest, for which the router is the GDR,itthe router uses an Assert as explained in[<xref target="assert"/>]. </t> </list><xref target="assert"/>. </t> </li> </ol> </section> <sectiontitle="PIManchor="assert" numbered="true" toc="default"> <name>PIM AssertModification" anchor="assert">Modification</name> <t>GDR changes may occur due to configuration change,due toGDRcandidatesCandidates going down, and also new routers coming up and becoming GDRcandidates.Candidates. This may occur while flows are being forwarded. If the GDR for an active flow changes, there is likely to be some disruption, such as packet loss or duplicates. By using asserts, packet loss isminimized,minimized while allowing a small amount of duplicates. </t> <t>When a router stops acting as the GDR for a group, or source and group pair if SSM, itMUST<bcp14>MUST</bcp14> set the Assert metric preference to maximum (0x7fffffff) and the Assert metric to one less than maximum (0xfffffffe). That is, whenever it sends or receives an Assert for the group, it must use these values as the metric preference and metric rather than the values provided by the unicast routing protocol. </t> <t>The rest of this section is just for illustration purposes and not part of the protocol definition. </t> <t>To illustrate the behavior when there is a GDR change, consider the following scenario where there are twoflowsflows: G1 and G2. R1 is the GDR for G1, and R2 is the GDR for G2. When R3 comes up, it is possible that R3 becomes GDR for both G1 andG2, henceG2; hence, R3 starts to build the forwarding tree for G1 and G2. If R1 and R2 stop forwarding before R3 completes the process, packet loss might occur. On the other hand, if R1 and R2 continue forwarding while R3 is building the forwarding trees, duplicates might occur. </t> <t>When the role of GDR changes as above, instead of immediately stopping forwarding, R1 and R2 continue forwarding to G1 and G2 respectively, while, at the same time, R3 build forwarding trees for G1 and G2. This will lead to PIM Asserts. </t> <t>For G1, using the functionality described in this document, R1 and R3 determine the new GDR, which is R3. With the modified Assert behavior, R1 sets its Assert metric to the near maximumvaluevalue, as discussed above. That will make R3, which has normal metric in itsAssert asAssert, the Assert winner. </t> </section> <sectiontitle="Backward Compatibility">numbered="true" toc="default"> <name>Backward Compatibility</name> <t>In the case of a hybrid Ethernet shared LAN (where some PIM routers support the functionality defined in thisdocument,document and some donot); <list style="symbols"> <t>Ifnot): </t> <ul spacing="normal"> <li>If the DR does not support the new functionality, then there will be noload-balancing. </t> <t>Ifload balancing. </li> <li>If non-DR routers do not support the new functionality, they will not be considered as GDR CandidateGDRsanditwill not take part inload-balancing. Load-balancingload balancing. Load balancing may still happen on the link.</t> </list> </t></li> </ul> </section> </section> <sectiontitle="Operational Considerations">numbered="true" toc="default"> <name>Operational Considerations</name> <t> An administrator needs to consider what the total bandwidth requirements are and find a set of routers that togetherhashave enough availablecapacity,capacity while making sure that each of the routers can handle its part, assuming that the traffic is distributed roughly equally among the routers. Ideally, one should also have enough bandwidth to handle the case where at least one router fails. All routers should have reachability to thesources,sources andRPsRPs, if applicable, thatisare not via the LAN. </t> <t>Care must be taken when choosing what hash masks to configure. One would typically configure the same masks on all therouters,routers so that they are the same, regardless of which router is elected as DR. The default masks are likely suitable for most deployment. The RP Hash Mask must be configured (the default is no bits set) if one wishes to hash based on the RP address rather than the group address for ASM. The default masks will use the entire group addresses, and source addresses if SSM, as part of the hash. An administrator may set other masks thatmasksmask out part of the addresses to ensure that certain flows always get hashed to the same router. How this is achieved depends on how the group addresses are allocated. </t> <t> Only the routers announcing the sameHash Algorithmhash algorithm as the DR would be considered as GDRcandidates.Candidates. Network administrators need to make sure that the desired set of routers announce the same algorithm. Migration between different algorithms is not considered in this document. </t> </section> <sectiontitle="IANA Considerations">numbered="true" toc="default"> <name>IANA Considerations</name> <t>IANA hastemporarily assigned typemade these assignments in the "PIM-Hello Options" registry: value 34 for the PIM DRLoad BalancingLoad-Balancing Capability (DRLB-Cap) HelloOption,Option (with Length of 4), andtypevalue 35 for the PIM DRLoad BalancingLoad-Balancing List (DRLB-List) Hello Optionin the PIM-Hello Options registry. IANA is requested to make these assignments permanent when(with variable Length). </t> <t> Per thisdocument is published as an RFC. Note that the option names have changed slightly since the temporary assignments were made. Also, the length of option 34 is always 4, the registry currently says it is variable. </t><t> This document requestsdocument, IANAto createhas created a registry called"Designated"PIM Designated RouterLoad BalancingLoad-Balancing Hash Algorithms" in the "Protocol Independent Multicast (PIM)" branch of the registry tree. The registry listsHash Algorithmshash algorithms for use by PIM Designated Router Load Balancing. </t> <sectiontitle="Initial registry">numbered="true" toc="default"> <name>Initial Registry</name> <t> The initial content of the registryshould beis as follows.<figure> <artwork> <![CDATA[ Type Name Reference ------ ---------------------------------------- -------------------- 0 Modulo This document 1-255 Unassigned ]]> </artwork> </figure></t> <table anchor="initial-reg" align="center"> <thead> <tr> <th>Type</th> <th>Name</th> <th>Reference</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>Modulo</td> <td>RFC 8775</td> </tr> <tr> <td>1-255</td> <td>Unassigned</td> <td></td> </tr> </tbody> </table> </section> <sectiontitle="Assignmentnumbered="true" toc="default"> <name>Assignment ofnewNew HashAlgorithms">Algorithms</name> <t>Assignment of newHash Algorithmshash algorithms is done according to the "IETF Review"model,procedure; see <xreftarget="RFC8126"/>.target="RFC8126" format="default"/>. </t> </section> </section> <sectiontitle="Security Considerations">numbered="true" toc="default"> <name>Security Considerations</name> <t>Security of the new DRLoad BalancingLoad-Balancing PIM Hello Options is only guaranteed by the security of PIM Hello messages, so the security considerations for PIM Hellomessagesmessages, as described in PIM-SM <xreftarget="RFC7761"/>target="RFC7761" format="default"/>, apply here. </t> <t>If the DR issubvertedsubverted, it could omit or add certain GDRs or announce an unsupported algorithm. If another router is subverted, it could be made DR and cause similar issues. While these issues are specific to this specification, they are not that different from existingattacksattacks, such as subverting a DR and lowering the DR priority, causing a different router to become the DR. </t><t>If<t>If, for any reason, the DR includes a GDR in the announced listwhichthat announces a different algorithm from what the DR announces, the GDR is required to ignore the announcement, and there will be no router acting as the DR for the flows that hash to that GDR. </t> <t>If a GDR is subverted, it could potentially be made to stop forwarding all the traffic it is expected to forward. This is also similar today to if a DR is subverted. </t> <t>An administrator may be able to achieve the desiredload-balancingload balancing of known flows, but an attacker may send a single high rate flowwhichthat is served by a singleGDR,GDR or send multiple flows that are expected to be hashed to the same GDR.</t> </section> </middle> <!-- *****BACK MATTER ***** --> <back> <references> <name>References</name> <references> <name>Normative References</name> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6395.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7761.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8126.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> </references> <references> <name>Informative References</name> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3376.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3810.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4541.xml"/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4607.xml"/> </references> </references> <sectiontitle="Acknowledgement">numbered="false" toc="default"> <name>Acknowledgements</name> <t> The authors would like to thankSteve Simlo and Taki Millonis<contact fullname="Steve Simlo"/> and <contact fullname="Taki Millonis"/> for helping with the original idea;Alia Atlas, Bill Atwood, Joe Clarke, Alissa Cooper, Jake Holland, Bharat Joshi, Anish Kachinthaya, Anvitha Kachinthaya, Benjamin Kaduk, Mirja Kuhlewind, Barry Leiba, Ben Niven-Jenkins, Alvaro Retana, Adam Roach, Michael Scharf, Eric Vyncke and Carl Wallace<contact fullname="Alia Atlas"/>, <contact fullname="Bill Atwood"/>, <contact fullname="Joe Clarke"/>, <contact fullname="Alissa Cooper"/>, <contact fullname="Jake Holland"/>, <contact fullname="Bharat Joshi"/>, <contact fullname="Anish Kachinthaya"/>, <contact fullname="Anvitha Kachinthaya"/>, <contact fullname="Benjamin Kaduk"/>, <contact fullname="Mirja Kühlewind"/>, <contact fullname="Barry Leiba"/>, <contact fullname="Ben Niven-Jenkins"/>, <contact fullname="Alvaro Retana"/>, <contact fullname="Adam Roach"/>, <contact fullname="Michael Scharf"/>, <contact fullname="Éric Vyncke"/>, and <contact fullname="Carl Wallace"/> for reviews and comments; andToerless Eckert and Rishabh Parekh<contact fullname="Toerless Eckert"/> and <contact fullname="Rishabh Parekh"/> for helpful conversation on the document. </t> </section></middle> <!-- *****BACK MATTER ***** --> <back> <references title='Normative References'> <?rfc include='reference.RFC.2119' ?> <?rfc include='reference.RFC.6395' ?> <?rfc include='reference.RFC.7761' ?> <?rfc include='reference.RFC.8126' ?> <?rfc include='reference.RFC.8174' ?> </references> <references title="Informative References"> <?rfc include='reference.RFC.3376' ?> <?rfc include='reference.RFC.3810' ?> <?rfc include='reference.RFC.4541' ?> <?rfc include='reference.RFC.4607' ?> </references></back> </rfc>