Hierarchical Gradient Coding: From Optimal Design to Privacy at Intermediate Nodes
Ali Gholami, Tayyebeh Jahani-Nezhad, Kai Wan, Giuseppe Caire
TL;DR
The paper tackles the bottleneck of server bandwidth in gradient coding by introducing a hierarchical architecture with relays between a central server and workers. It develops a two-layer polynomial coding scheme—intra-cluster and cluster-to-server encodings—that achieves the optimal communication-load trade-offs under straggler and adversary conditions, while also handling privacy via coordinated randomness. The key contributions are explicit load characterizations $C_1^{*}=d/m_1$ and ${C_2^{(n)}}^{*}=d/(m_1 m_2^{(n)})$, two detailed encoding schemes (with and without privacy), and proofs of optimality under linear encoding assumptions; the privacy extension delivers zero information leakage to relays without increasing total communication for a broad parameter range. The approach significantly reduces server bandwidth compared with non-hierarchical gradient coding and demonstrates practical privacy guarantees using polynomial-coded randomness, with potential impact on scalable distributed learning systems employing hierarchical topologies.
Abstract
Gradient coding is a distributed computing technique for computing gradient vectors over large datasets by outsourcing partial computations to multiple workers, typically connected directly to the server. In this work, we investigate gradient coding in a hierarchical setting, where intermediate nodes sit between the server and workers. This structure reduces the communication load received at the server, which is a bottleneck in conventional gradient coding systems. In this paper, the intermediate nodes, referred to as \textit{relays}, process the data received from workers and send the results to the server for the final gradient computation. Our main contribution is deriving the optimal communication-computation trade-off by designing a linear coding scheme, also considering straggling and adversarial nodes among both relays and workers. We propose a coding scheme which achieves both the optimal relay-to-server communication load and the optimal worker-to-relay communication load. We further extend our setting to incorporate privacy by requiring that relays learn no information about the computed partial gradients from the messages they receive. This is achieved by introducing shared randomness among workers, allowing each worker to encode its partial gradients such that the randomness cannot be canceled out at the relay. Meanwhile, the server can successfully decode the global gradient by eliminating this randomness after receiving the computations of the non-straggling relays. Importantly, this privacy guarantee is achieved without increasing the overall communication load.
