Snow: Self-organizing Broadcast Protocol for Cloud
Chengkai Tong
TL;DR
Snow introduces a self-organizing, application-level broadcast protocol for cloud data centers that avoids IP multicast and fixed-root trees. It builds a dynamic, multi-way balanced topology from a ring-ordered membership view, enabling $O(\log n)$ dissemination hops. It integrates reliable messages for critical deliveries and membership maintenance, plus an optional node-coloring optimization to boost convergence and mitigate stragglers. Experimental results show higher reliability and lower overhead under churn compared with Gossip and Plumtree, with an open-source implementation provided for further research.
Abstract
In large-scale distributed applications, efficient and reliable broadcast protocols are essential for node communication. Tree-based broadcast lacks flexibility and may suffer performance degradation or even broadcast failure when cluster membership changes. Gossip-based broadcast incurs high bandwidth overhead and only provides probabilistic delivery guarantees. In tree-based broadcasting, when an internal node leaves, its child nodes need to reconnect to a new parent. This process may introduce instability, leading to potential message duplication and increased transmission latency. However, in cloud environments, node departures and arrivals are common, causing frequent performance degradation in tree-based broadcasting. This paper introduces Snow, a self-organizing broadcast protocol designed for cloud environments. Instead, it dynamically sends or forwards messages based on each node's membership view, ultimately forming a broadcast structure resembling a multi-way balanced tree(the height difference of leaf nodes is at most 1). Our experimental results showed that Snow maintains message delivery reliability and latency guarantees under node churn while maintaining low overhead without sending unnecessary redundant messages.
