Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
Richeng Jin, Yufan Huang, Xiaofan He, Huaiyu Dai, Tianfu Wu
TL;DR
The paper tackles federated learning under practical constraints by introducing Stochastic-Sign SGD, which uses stochastic gradient compressors to enable convergence despite data heterogeneity, while also supporting differential privacy via dp-sign. It provides rigorous theoretical guarantees for convergence to a neighborhood of the optimum, quantifies Byzantine resilience, and proposes enhancements such as weighted voting and Top-K sparsification to boost robustness and privacy-accuracy trade-offs. An error-feedback variant further improves learning by compensating compression-induced errors, with extended results to SGD and privacy-preserving settings. Empirical validation on MNIST and CIFAR-10 demonstrates competitive accuracy under communication constraints and resilience scenarios, highlighting the method’s practicality for large-scale, heterogeneous, and potentially adversarial FL deployments.
Abstract
Federated learning (FL) has emerged as a prominent distributed learning paradigm. FL entails some pressing needs for developing novel parameter estimation approaches with theoretical guarantees of convergence, which are also communication efficient, differentially private and Byzantine resilient in the heterogeneous data distribution settings. Quantization-based SGD solvers have been widely adopted in FL and the recently proposed SIGNSGD with majority vote shows a promising direction. However, no existing methods enjoy all the aforementioned properties. In this paper, we propose an intuitively-simple yet theoretically-sound method based on SIGNSGD to bridge the gap. We present Stochastic-Sign SGD which utilizes novel stochastic-sign based gradient compressors enabling the aforementioned properties in a unified framework. We also present an error-feedback variant of the proposed Stochastic-Sign SGD which further improves the learning performance in FL. We test the proposed method with extensive experiments using deep neural networks on the MNIST dataset and the CIFAR-10 dataset. The experimental results corroborate the effectiveness of the proposed method.
