Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning
Diksha Gupta, Antonio Honsell, Chuan Xu, Nirupam Gupta, Giovanni Neglia
TL;DR
This work tackles the dual challenges of Byzantine robustness and communication efficiency in distributed learning by introducing RoSDHB, a lightweight algorithm that applies classical Polyak momentum on the server after coordinated gradient sparsification with a global mask. RoSDHB achieves convergence guarantees comparable to the state-of-the-art Byz-DASHA-PAGE under standard $(G,B)$-gradient dissimilarity without requiring bounded global Hessian variance, while reducing worker memory and uplink communication. Theoretical analysis employs a novel Lyapunov function to handle sparsification noise that scales with the gradient norm, and reveals a tight coupling between compression and robustness when data heterogeneity is present. Empirically, RoSDHB demonstrates stronger robustness and substantial communication savings on MNIST and CIFAR-10 under FOE and ALIE attacks, outperforming the SOTA by up to several-fold in speed and efficiency, validating its practical impact for scalable, secure distributed learning.
Abstract
Distributed learning enables scalable model training over decentralized data, but remains hindered by Byzantine faults and high communication costs. While both challenges have been studied extensively in isolation, their interplay has received limited attention. Prior work has shown that naively combining communication compression with Byzantine-robust aggregation can severely weaken resilience to faulty nodes. The current state-of-the-art, Byz-DASHA-PAGE, leverages a momentum-based variance reduction scheme to counteract the negative effect of compression noise on Byzantine robustness. In this work, we introduce RoSDHB, a new algorithm that integrates classical Polyak momentum with a coordinated compression strategy. Theoretically, RoSDHB matches the convergence guarantees of Byz-DASHA-PAGE under the standard $(G,B)$-gradient dissimilarity model, while relying on milder assumptions and requiring less memory and communication per client. Empirically, RoSDHB demonstrates stronger robustness while achieving substantial communication savings compared to Byz-DASHA-PAGE.
