RankSlicing: A Decentralized Protocol for Supernode Selection


RankSlicing: A Decentralized Protocol for Supernode Selection

Authors: Giovanni Simoni, Roberto Roverso and Alberto Montresor

 

INTRODUCTION
In this work, we tackle the supernode selection problem, that is the decision of which peers, among all nodes in the overlay, should become supernodes. This effort is motivated by the needs of a commercial peer-assisted live streaming platform called Hive Streaming, which utilizes supernodes to efficiently deliver content to the viewers. Peer-assisted streaming applications strive to provide the same quality of user experience as CDNs in terms of throughput and latency, while keeping the load on the source of the stream to a minimum. There are two main requirements for supernode selection. The first is that the supernodes must be the peers with the highest upload capacity. It has been shown that this can significantly decrease the average number of hops the content has to traverse and therefore also lower latency, significantly improving the quality of user experience. The second requirement is that the size of the supernode set has to be kept – as much as possible – fixed over time and equal to a design parameter K. The value K is a system parameter that depends on the particular application; e.g., in P2P live streaming applications, K can be derived from the number of peers and their upload bandwidth distribution. In such case, the rationale behind limiting the amount of supernodes is to provide bounds on the load and distribution costs of the streaming source, while keeping a good level of the quality of user experience.

In the literature, two types of distributed slicing algorithms can be found: absolute slicing and ordered slicing. However, none of the two approaches address our requirements of finding the best K peers in the overlay network, but rather either choose any K peers among the potential supernodes (absolute slicing) or the best X percentage of nodes among all nodes in the system (ordered slicing).The contribution of this paper is a practical solution to the problem of supernode selection called RANKSLICING. Beside allowing the identification of the best K nodes in the system under realistic deployments characterized by the presence of NATs, RANKSLICING has been designed while keeping the following additional set of informal requirements in mind:

• The stability of the supernode set is of paramount importance. The motivation for providing a supernode selection algorithm, in the first place, is that the application needs to delegate a certain role to the supernodes. Frequent and/or abrupt changes of the supernode set tend to disrupt the application. Therefore, it is important that the supernode set remains stable over time if no node better than any of the existing supernodes joins the network or one of the current supernodes leaves the overlay.

• All nodes should be aware of which peer nodes belong to the supernode set; e.g., in live streaming, this is to easily locate supernodes and request content from them.

• Each node should have an estimate about the stability of its supernode set. This is to avoid executing expensive operations if a supernode set if not stable.

Download white paper

RELATED WORK

Absolute slicing, mentioned in the introduction, constitutes in our opinion the most closely related work to RANKSLICING since it aims at identifying a slice of the overlay network of size K. Absolute slicing achieves supernode selection using a three-layered overlay network. The outermost layer is a random overlay that contains all peers and is built using Newscast. The second overlay is constructed in the same way but only eligible nodes are allowed to participate. Eligibility is determined by a threshold on an application-specific metric. The last layer, called slice, is composed of K peers on average and it is constructed by having peers probabilistically decide if to join the layer according to the number of peers already in the layer and the value of K. The approach is particularly suited for churn as it is based on gossip and it can maintain an average number of nodes in the third layer that is equal to K over time. However, the set of nodes in the slice changes continuously even without churn, which is unreasonable for our requirement of stability of the supernode set. Besides that, absolute slicing does not promote the best K peers in the overlay to the slice layer but rather any K number of the eligible nodes.

Sacha et al. proposes a distributed ranking method called Gradient, similar to Gossip-based aggregation in large dynamic networks. The approach is based on having nodes periodically measure their utility value through a utility metric and gossip about the measured value with their neighbors. Each peer then maintains a set of similar neighbors, with respect to the utility value, along with a set of randomly selected ones. The main application of the distributed ranking in this case is search. Peers can issue queries for identifying which peers have higher or same rank as specified in the query.

This can be used to find supernodes with high utility values, however queries cannot be instrumented to return the top-K peers according to those values. Raychoudhury et al. proposes a strategy for identifying the top K nodes in the context of wireless ad-hoc networks. Each peer locally finds the directly reachable nodes and then all peers proceed with the election of a leader supernode. The leader supernode later in turn selects the remaining K − 1 super-peers. The protocol however does not tackle the problem of continuously adapting the supernode set according to the changes in the overlay network. The only way to address churn is for the application to expressly restart the supernode selection process.

A recent paper from Liu et al. proposes an algorithm for supernode selection based on gossip. Nodes start as regular peers, to later build the set of candidate supernodes by evaluating the utility values of their neighbors. After that, peers make a local decision, based on an application-defined threshold on available resources, if to act as superpeers or not. Although churn is expressly addressed as a problem, the number of selected supernodes in the system cannot be specified by the application in the proposed solution.

Finally, supernode selection can be achieved with structured overlay networks rather than gossip. The SPiDeR framework implements Web discovery services through a structured overlay in which only the most powerful nodes join a structured overlay network based on Chord. In this case, peers become supernodes if their identifiers occupy specific positions in the ring. The same supernode selection technique is used by a distributed filesystem designed by Kovendhan el al. In our case, we concentrate on an unstructured overlay approach given that it provides better resiliency to churn compared to structured overlay approaches unstructured overlays.

PROBLEM DEFINTION
In this paper, we aim at solving the problem of choosing a supernode set that includes the best nodes in the overlay based on a utility function provided by the application and has fixed size K. As in absolute slicing, we limit the number of peers that can become supernodes by setting a threshold on the utility values of peers. Nodes having an utility value greater than the threshold are considered eligible to become supernodes. This is in line with our application of the algorithm, that is as part of a live streaming system, where we set a threshold on, for instance, computation power in order to avoid weak peers to serve as supernodes. We design our approach considering that the view of the supernode set at a peer should adapt as quickly as possible to changes in the overlay, namely churn, such that peers have, at all times, the best view of the top K peers. Due to dynamism in the network and latency however, the view of the supernode set at each peer might slightly differ. We have designed our live streaming application to take into account this limitation. However, the performance of the system as a whole is directly proportional to how correct is the supernode set estimation at peers. We now define our problem in a more formal manner. Let Πt = { p1, . . . , pn } be the set of nodes in the network at any given time t. Each node pi is characterized by a timevarying tuple Ct(pi) ∈ Rm that is our utility value. Ct (pi) contains m numerical values that measure the capabilities of pi, in terms of software and hardware resources, at time t. A node can obtain an up-to-date version of such tuple at any time, by calling the capability function cf(). The eligibility predicate ep : Rm → Boolean returns true if a given set of capabilities is sufficient to become a supernode. A node pi is eligible at time t if ep(Ct (pi)) is true.

Both the values that are reported by the capability function and the definition of the eligibility predicate are application dependent. For example, nodes may be associated with a triple (x1, x2, x3), identifying machines with x1 CPUs, x2 GB of available storage space and x3 KB/s of available bandwidth. We could for instance consider eligible any node having at least 16 GB of available storage and 2 CPUs.

We denote the set of eligible nodes at time t as Et ⊆ Πt . Both the set of nodes and the set of eligible nodes may vary over time: Πt may differ from Πt+1 because of churn, i.e. nodes joining and/or leaving the system at any time. Et, besides being affected by churn, may also change because of variations in the system state: e.g., in the case of storage media, eligibility may change as storage is allocated or deallocated. We do not consider byzantine failures related to neither nodes nor communication: security aspects are assumed to be covered transparently by the framework underneath.

Under the aforementioned assumptions, the desired output at any node pi at time t is the supernode set L&fracti; such that |Lt/i| = min{K,|Et|}. We call such set the supernode set of pi at time t. Ideally, such sets should be characterized by the following properties:

  • Consistency: If no variations occur in the eligible set for a sufficiently long time, the supernode sets of any pair of node must eventually converge to the same set. ∃t0, ∀t ≥ t0 : Et = Et+1 ⇒ ∃t1 ≥ t0, ∀t ≥ t1, ∀p, q ∈ Πt : Lt/p; = Lt/q;
  • Adaptiveness: If no variations occur in the eligible set for a sufficiently long time, each of the supernode sets must eventually be contained in E<supt. In other words, nodes that lose their eligible status must eventually leave the supernodes set. ∃t0, ∀t ≥ t0 : Et = Et+1 ⇒ ∃t1 ≥ t0, ∀t ≥ t1, ∀p ∈ Πt : Lt/p; ⊆ Et

ALGORITHIM

The design of RANKSLICING is inspired by the following observation: given that the eligibility and selection as supernode of a node depends strictly on its capabilities, such values can be used to reduce supernode selection to a ranking problem. So, we first rank eligible nodes based on their utility value and solving the tie-breaks, if any; second, we select the first K items in such ranking to become supernodes.

In our approach, we assume that nodes are already connected through a random overlay topology provided by another service, such as a NAT-resilient peer sampling service. Besides that, in our protocol we strictly avoid creating new connections in order to prevent running into connectivity limitations caused by the presence of NATs. That is because, in gossip algorithms those limitations have been shown to cause significant biasing towards nodes that are easily reachable.

For our implementation, we choose WPSS to provide a NAT-resilient overlay. In WPSS, every peer maintains a slowly changing and fixed set of connections with its neighbors. NATed nodes are connected only to public nodes and public nodes are connected among themselves. Using this type of overlay has a number of advantages. First, the system does not incur in the cost of executing expensive NAT-to-NAT connection establishments. Second, the constructed overlay is random and is not subjected to biasing introduced by the presence of NATs. Finally, WPSS is resilient to churn, that means that failed overlay connections are replaced with new ones such that every peer maintains a constant number of connections to neighbours. Layering our protocol over WPSS allows us to design a protocol that is oblivious to connectivity limitations caused by firewalls and NATs and where churn-related overlay maintenance is greatly simplified.

A. Gossip-based Algorithm

The distributed selection of the supernodes set is achieved through a push-pull gossip protocol. Each node pi maintains a set called view, which is the local approximation of the supernode set, containing up to K node descriptors. Each descriptor (i, lc, age, cap) is associated to the identifier i of the node pi that created it, and contains a monotonically increasing logical clock lc, an age parameter age, and the evaluation of node capabilities cap as reported by pi itself when the descriptor has been created. Two descriptors contained in a view cannot be associated to the same process.

Periodically, each node pi obtains the identifier of a random peer pj from the underlying peer sampling protocol and sends a random sample S of the local view to it, adding a freshly emitted process descriptor for itself pi. Upon reception of the sample S, the node pj merges it together with the local view as follows: 1) A merge list M is built by concatenating S and the view Vj ; 2) Duplicated descriptors (i.e., descriptors referring to the same process) are removed, by keeping the one with the freshest logical clock. 3) The merge list M is sorted in descending order of capabilities, according to a pre-defined total ordering; 4) M gets truncated to the first K elements; 5) The local view of j gets replaced with the set of descriptors in M.

It should be noticed that this is not a peer sampling protocol: the fact that a descriptor of pk is contained in the view of pi does not imply a connected neighbourhood relation between pi and pk. Consequently, each process is allowed to know the identity of the K supernodes, without creating a new topology (as in absolute slicing) and without affecting the existing one.

On every gossip exchange, the duplication removal phase discards superseded descriptors, that is those that are older than the ones which have been received in the last exchange (for the same nodes). As a result of the sorting and truncation phases, all but the best K descriptors get discarded, while the descriptors of the strongest nodes keep being propagated in the system. In the absence of churn, the eligible set becomes static and all the nodes will eventually share the same set of descriptors, thus satisfying the Consistency requirement.

B. Adaptiveness and churn handling

The effects of churn on the overlay structure are assumed to be handled by the underlying topology management service. However, we still need to remove faulty nodes from the supernode set, as they are no longer available for providing their services. Descriptors referring to faulty nodes must be removed from the views of every process. In the same way, we need to address fluctuations of the supernodes set due to evolutions in the node capabilities.

The descriptor field age is used to purge stale pieces of information from views. Intuitively, this field reports the cumulative time spent by a descriptor inside the view of each process. When a fresh descriptor (pi , lc, age, cap) is emitted by process pi, its age field is 0. As the process pj accepts a descriptor into its view, a time-stamp ta is associated to it. When a copy of the descriptor is re-propagated by pj, at time tg, its age gets incremented by δ = tg − ta. Finally, a global parameter of the algorithm named propagation age limit (PAL) defines an age threshold after which descriptors get discarded from the views of the nodes.

The age of a descriptor does not take into account the network delay time. This fact might affect the system performance if the network delay is not negligible with respect to δ, which in turn depends on the gossip period. Intuitively, a small PAL ensures therefore quick reaction to churn. On the other hand, it slows down convergence by reducing the lifetime of descriptors, regardless if they refer to faulty or alive processes.

This behavior can be partially mitigated through a sensible choice among equivalent descriptors during the merging phase. Equivalent descriptors refer to the same node and have the same logical clock, but since the age is monotonically increasing, a descriptor with higher age contains a more up-to-date approximation of the real utility value of the node. Naturally, this consideration holds under the assumption of a reasonable drift between clocks of different nodes.

Changes in the utility value of nodes in the eligible set E can be regarded as a mitigated form of churn: when a node becomes eligible, it starts emitting descriptors, while a node leaving E stops doing that. From the RANKSLICING protocol perspective, these events correspond to nodes appearing and disappearing respectively, although the underlying overlay is not affected.

If a node pi leaves the E set, all the previously emitted descriptors (if any) are no longer replaced by fresh ones. If pi was a supernode, the descriptors stored in the views will be eventually removed, as their age exceed the PAL threshold. Descriptors from the top eligible nodes will take their place during the following merge operations. This behavior matches the Adaptiveness requirement.

C. Algorithm improvements

Once received the first transmission from pi, node pj partially knows the content of the view of pi. The answer sample can give priority to the locally stored descriptors known to be more up to date with respect to the corresponding ones owned by pi . All the remaining empty slots of the answer sample are filled by taking random elements from the view, yet avoiding the items which are known to be owned by the initiator. The gossip session is terminated with the process pi merging the answer as pj did before. This is more effective than doubling the gossip frequency, as it takes advantage of the context provided by the inbound sample.

D. Algorithmic definition
We will refer to pi as the local node (executing the algorithm). The internal state of pi is defined by two dictionaries: view represents the view of the node pi mapping nodes to their descriptors, while tstamps maps each key of view into the timestamp registered when the view entry was created. For this purpose we define a set of methods for accessing a dictionary: Put(), Get(), Del(), KeysOf() and ValuesOf(). With the function EmitDescriptor() we represent the action of creating a fresh descriptor, characterized by an incremental logical clock value and age equal to 0. IsEligible() is a shorthand for ep(Ct (pi)), while the Now() function yields the current time-stamp. Methods Age() and UpdateAge() read and write the age field of a descriptor, respectively, while method ID() reads its identifier.

Each gossip session is characterized by the following steps:
1) The initiator queries the underlying topology management system for a random sample from the local neighborhood;

2) A sample of the view of size size is prepared by calling function PrepareSample(∅,size) (Algorithm 1) The output of the function is sent to the selected neighbor.

3) The neighbor receives a sample which is first passed as parameter to the Merge() function (Algorithm 2), then it generates an answer by passing the received sample S as the actual parameter for the PrepareSample(S,size) function.
4) Finally the initiator receives the answer sample, and runs the Merge() function on it.

Algorithm 1 shows the PrepareSample() function. The first part of the algorithm (lines 1 to 12) manages the emission of a local descriptor: eligibility gets verified and a descriptor is possibly inserted into the outbound sample. The view is maintained consistent accordingly, either updating or removing the local identifier stored in it, if any. The central part (lines 13 to 26) is executed only if a non-empty sample from the initiator is passed as parameter: prioritized elements of the local view, according to the logic described in Subsection IV-A are added to the outbound sample. Finally, the last part of the algorithm (lines 27 to 38) fills the remaining available space in the outbound sample with randomly taken elements. Only descriptors that have not expired are inserted into the sample.

Algorithm 2 shows the Merge() function. The mergeMap dictionary is filled with a fresh local descriptor (lines 4- 5) and with all elements of the view (lines 6-11). All the elements from the inbound sample to be merged are added through the PutFresh() procedure (line 14 and Algorithm 3), which ensures that the most up-to-date descriptor is maintained for each referenced node. The descriptors are sorted with the SortByRanking() function (line 16), which implements a sorting of a set of descriptors based on the defined total ordering. Finally the state of the local node is rebuilt (lines 17-24).

E. Pragmatic Aspects

During the experimentation phase, a behavioral difference between nodes in the open Internet and NATed nodes emerged from the dataset. We noticed that the former group show a much quicker convergence with respect to the latter one.

This phenomenon is associated to how WPSS, that is the peer sampling system we used for both the simulations and the real deployment, constructs the NAT-resilient overlay network. Specifically, in WPSS public nodes have higher number of neighbors with respect to NATed ones. As consequence, a NATed node has fewer possible candidates for gossiping.

This is not a structural problem, as private nodes are still able to converge to the correct supernode set. However, we provide a modification of the algorithm to let private peers converge as quick as the open Internet ones, we call that overriding procedure.

This procedure works in the following manner: once the perceived quality measure of an open Internet node reaches a certain threshold value, reasonably close to 1, an override request containing its view and perceived quality is sent to the subset of neighbors behind a NAT. A node receiving an override request executes a merge operation as for a normal gossip session, although no answer is generated. This strategy introduces a new parameter, called OQT (Overriding Quality Threshold). F. Supernode set estimation quality The view maintained by each node corresponds to an approximation of the supernode set that the application can access locally both on the supernodes and on the other peers. We define a quality measure q of the approximated supernode set with values in the [0, 1] interval. A value of q close to 1 means that the view of a node is equal to the optimal supernode set of size K. We define then the actual quality of the system, which represents a general measure of the quality of the supernode set estimation in the whole system. The actual quality is an average computed by retrieving the view Vi of each node pi, and comparing it with an ideal supernode set L in the following way:

q = 1/N X pi∈Π Vi ∩ L/k

Such computation can be easily achieved when having global knowledge of the system, such as in simulation, by accessing the view of the nodes and comparing it with the optimal set of supernodes L. This is however not feasible in a real deployment. For that reason, we emulate global knowledge by letting peers report their current state to a central server and then we compute the quality there.

We define also another measure of quality of the supernode set that is estimated in a distributed manner: the perceived quality. This measure of quality is made available to the application using our protocol, to determine how good is the approximation of the supernode set in order to avoid executing expensive operations whenever the quality is low.

The estimation of the perceived quality is based on the assumption that the local view of each node increases in accuracy at every gossip session and it is executed each time a node pi obtains a new view V 0 i from merging an incoming sample with its current view Vi . The first step of the calculation by comparing the current view with the view obtained by merging the received sample: qi,0 = |Vi ∩ V 0 i | K

The resulting quality value qi,0 is subject to quick fluctuations, as it changes at every gossip round according to how similar the local view is to the view of the neighbor. A second quality evaluation qi,1 is therefore estimated through an iterative moving average: q (0) i,1 = 0 q (n) i,1 = α · q (n−1) i,1 + (1 − α) · qi,0 α ∈ [0, 1]

The base of the iterative computation q (0) i,1 is zero because initially each view is empty, hence the intersection with the ideal supernodes set would be empty. Intuitively, the perceived quality is an optimistic measure, as the quality values are based on pieces of information which come from the direct neighborhood.

V. EXPERIMENTAL RESULTS

In this section, we delve into the experimental evaluation of RANKSLICING, at first in a simulated environment, then on an actual deployment.

A. Experimental setup

We implemented our protocol in a production-quality development framework used also for our commercial products. That is an event-driven and component-based framework which enables code to be executed both in simulation and in a real deployment. The tool allows to emulate a number ofnetwork characteristics in simulation such as delays, connectivity limitations given by the presence of NATs, and bandwidth allocation dynamics. Every experiment in simulation was run at least 20 times for statistical significance, and in four different churn scenarios:

C00 No churn;

C03 Churn 0.003 (meaning that 0.3% of the nodes leave the system and get replaced within 10s);

C05 Churn 0.005 (0.5% every 10s);

C10 Churn 0.010 (1.0% every 10s);

In real deployment, we deploy our protocol on around 6500 of consumer machines. Those are provided by users of our commercial live streaming application who allowed us to run experiments on their computers. These hosts are located mostly in Sweden (80%) but also in Europe (12%) and in the US (7%). The ratio of open-Internet nodes is 20% while the rest of the hosts are behind NAT.

All experiments were conducted using an agent installed in our volunteers’ machine which can be instrumented remotely to run parameters study. Peer-to-peer communication in this setting was provided by the framework using UDP-based network library which features the same reliability, congestion and flow control as TCP, plus it offers NAT traversal capabilities. As NAT-aware topology management of choice, we use WPSS both in simulation and in the real world scenario.

B. Methodology

We first study the system behavior in simulation, where we strive to identify the best set of values for the aforementioned parameters by recreating similar conditions to the ones of our real deployment. For that, we configure the simulator to emulate the same estimated delay, bandwidth and connectivity success probability observed in our consumer test network.

After that, in deployment, we configure our protocol with the best set of values obtained in simulation, always for the parameters described above, and assert the correct functioning of the protocol.

In all our experiments, both in simulation and deployment, we used an overlay size of N = 1, 000 peers and a gossip session period T of 1 second. Regarding the ranking of peers, we chose to let the utility function assign each node a random value in the [0, 1] range. That value is kept constant for the whole execution of the test. We do not explore variations of the utility value of a peer over time but instead we concentrate on churn given that it produces the same kind of phenomena on our algorithm.

In simulation, we use both perceived and actual quality as main performance metrics for our protocol, as we defined in Subsection IV-F. In order to compute the actual quality, we compare the supernode set of each node with a “perfect” supernode set built from the global view of the system. After that, we average the actual quality values of all nodes for obtaining a single measure of actual quality.

In deployment, we also provide both perceived and actual quality. The former is reported by the peers to a snapshot server over time. The second instead is calculated by having peers send the totality of their supernode set to the snapshot server and then comparing that set with the ideal set of nodes derived from a central ranking of nodes constructed at the snapshot server.

C. Evaluation in simulation

In this section, we evaluate the performance of our protocol in simulation.

1) Sampling size: We start by studying the behavior of our protocol with different values of the sampling size, identified by the parameter H. In the same set of experiments, we assert whether the behavior of the protocol remains consistent for different values of K. The H parameter defines the maximum number of elements transmitted for each sample. Values valid for H are [1..K]. In general, we expect that for values of H that are closer to K to obtain faster convergence since most or all of the supernode set will be transferred between peers. As we will see, that will come at a higher network bandwidth usage. In order to assess the performance of the protocol, we observe the evolution of the absolute quality in 20 different experiments from the following parameter space:

• K ∈ { 50, 10 }

• H/K ∈ { 0.1, 0.2, 0.3, 0.4, 0.5, 1 }

• PAL = 9500ms (chosen arbitrarily)

Besides the actual quality, we monitor the evolution of the perceived quality in presence of churn and for all churn classes mentioned earlier. Table I reports the analysis of the time required to converge to the 90% of the steady state quality. This yields a first estimation of the convergence time. We choose 90% because, according to our experience in P2P live streaming, it is a reasonable performance level for our system to behave correctly.

We can observe that the best performance on both actual and perceived quality is achieved when H/K = 1, namely when the initiator of a gossip session shares all its view. From the analysis we can also see that we are still able to reach convergence in a reasonable amount of time for H/K in the [0.3, 1) range. However, convergence time significantly increases as values of H/K get closer to zero. This can be seen graphically in Figure 1, which shows the reported result for the C03 churn class.

In Table II, we report an analysis of the bandwidth utilized by our algorithm, in bytes per second, in the same set of experiments described above. Considering that the H parameter defines the size of the sample, we obtain a linear decrease of bandwidth utilization when the H/K ratio also decreases. From the table we can also see that bandwidth utilization is also partially influenced by both the churn and the parameter K.

2) Refinement of the perceived quality: As mentioned in Section IV-F, the perceived quality metric yields a less accurate estimation than the actual quality metric. The α parameter represents the smoothing factor for the perceived quality computation. The valid range for it is in the [0, 1) interval: as the value of α is set closer to 1 we obtain a higher hysteresis. By increasing α, we obtain that nodes perceive a slower convergence to the optimal value 1, which corresponds to the actual quality of the system. All peers start in a situation where peers have perceived quality q = 0 then the quality increases until it converges to a stable value, this in case of absence of churn. However, in the presence of churn, high values of α reduces the capacity of the protocol to detect the changes in the supernode set introduced by the churn. In this set of experiments, shown in Figure 2, we strive to find the value of the α parameter which yields the best estimation of the actual quality. For each experiment, we compute the difference between the actual quality and perceived quality measures in time, and integrate the results to observe the speed of the changes: smaller integral corresponds to a smaller difference area between the two measures, hence a minimization of the integral gives the best value for the α parameter.

We set the values of K = 50, H/K = 0.5 which are the best values identified in the previous experiments and PAL = 9500ms, which will see is also the best value for that metric. We ran 20 tests for α ∈ [0, 1), analyzing the behavior in the four churn classes. After a preliminary study we realized that the perceived quality measure works better for values in the [0.95, 0.98] range, while it abruptly worsens as values get closer to 1.

3) Identification of Propagation Age Limit (PAL): The Propagation Age Limit defines the maximum age threshold for descriptors. Intuitively, the value of PAL should be greater than T, otherwise node descriptors would not be propagated at all. Small values translate into a faster elimination of descriptors of potentially failed nodes, but also in a slower convergence time, since good descriptors are also quickly removed from the system. Here we show the results obtained from the analysis of this trade-off. We ran two sets of experiments, the first consisted in running the algorithm with the various churn classes, while in the second set of experiments the scenario scheduled a catastrophic churn effect, with the 20% of the nodes leaving the network simultaneously, after the supernodes set is built.

In both settings, we started from a base configuration of K = 50, H/K = 0.5, α = 0.95, varying the PAL parameter in the [9000, 15000] (milliseconds) range. In the first experiment, we identify how the PAL parameter influences the convergence time. The analysis involved two steps: the isolation of a target quality threshold for each churn class (achieved by measuring the minimum quality value when the system has reached a stable quality) and the aggregation by churn class of the convergence times to the defined threshold (average and standard deviation).

The second set of experiments aims at measuring the effects of the PAL parameter on the recovery from churn effects. The analysis has been achieved in the same way as the first set of experiments, but measuring the time required to recover to a stable quality value after a catastrophic churn event. This is due to the fact that, for those values, descriptors referring to working supernode get removed from the views before they can be refreshed.

D. Evaluation in deployment

The experiments executed on the real deployment involved a 1 000 nodes sample out of around 6 500 available hosts. Note that the network was subject to natural churn, that is the number of nodes was decreasing in time during any given experiment because of users turning off their machines. Besides that, the experiment could not be started on around 8% of the hosts due to firewall limitations and congestion in the network.

For deployment experiments, we set H = K and K = 10. As a consequence, the actual quality can only assume 10 values in the range [0,1]. Based on the simulation results, we select a value for α of 0.95 and a value for the PAL of 12s. The overriding quality threshold, OQT, was set to 0.975. We let all peers join uniformly at random in the first minute of the test, the algorithm is started on the node 30 seconds after the join such to allow time for the underlying topology, provided by WPSS, to be set up. Figure 5 shows the evolution of the average perceived quality in deployment. The same experiment is reported in Figure 6 from the point of view of quality distribution among nodes over time. In the Figure, the upper line represents the total number of peers. The colored areas beneath that line indicate how many of those peers experience a specific value of actual capacity. As we can see, almost all nodes achieve an actual quality value close or above 90% after 20 seconds that all peers have joined. We record also the time required to reach the steady state of the perceived quality values of 90% and 97.5%. The measure was taken for each node, relative to the time that it first reported to the snapshot server. As we can see, thanks to the variant of the algorithm described in Section IV-E, both set of nodes reach the same quality values over time.

VI. CONCLUSIONS

In this paper, we proposed RANKSLICING, a decentralized algorithm for supernode selection that allows to identify the best K nodes in the overlay. Our approach consists of an epidemic protocol that is highly resilient to churn and does not require new connections to be established, but rather relies on established connections, such as the ones provided by a NATresilient peer sampling framework. Thorough experimental analysis both in simulation and on a deployment of thousands of consumer machines shows that the solution is practical and meets the requirements of consistency and stability imposed by our use-case, that is a P2P live streaming application.

As future work, we will be incorporating RANKSLICING into our application and study its behavior together with the rest of the system.

VII. ACKNOWLEDGEMENTS

This work was partially supported by the European Union, specifically by the FP7 EU project iSocial (FP7-ITN-316808).