AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Sai Aparna Aketi; Abolfazl Hashemi; Kaushik Roy

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Sai Aparna Aketi, Abolfazl Hashemi, Kaushik Roy

TL;DR

AdaGossip addresses the communication bottleneck in decentralized learning with compression by introducing adaptive, per-parameter consensus step-sizes driven by the observed gossip-error. The method computes a second-moment estimate of the gossip-error and uses it to set $\gamma_i^t=\dfrac{\gamma}{\sqrt{u_i^t}+\epsilon}$, enabling parameter-wise adjustment of averaging rates. Extending this to AdaG-SGD, the authors demonstrate consistent improvements (approximately 0.1–2% in test accuracy) over CHOCO-SGD across datasets (CIFAR-{10,100}, ImageNet), architectures (ResNet, LeNet-5, MobileNet-V2), and topologies (ring, Dyck, Torus) under various compression regimes. The findings highlight the practical impact for edge-device training where communication is costly, providing a robust approach to harmonize compression with convergence in decentralized settings.

Abstract

Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature. Most of these algorithms introduce an additional hyper-parameter referred to as consensus step-size which is tuned based on the compression ratio at the beginning of the training. In this work, we propose AdaGossip, a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. We demonstrate the effectiveness of the proposed method through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance ($0-2\%$ improvement in test accuracy) compared to the current state-of-the-art method for decentralized learning with communication compression.

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

TL;DR

, enabling parameter-wise adjustment of averaging rates. Extending this to AdaG-SGD, the authors demonstrate consistent improvements (approximately 0.1–2% in test accuracy) over CHOCO-SGD across datasets (CIFAR-{10,100}, ImageNet), architectures (ResNet, LeNet-5, MobileNet-V2), and topologies (ring, Dyck, Torus) under various compression regimes. The findings highlight the practical impact for edge-device training where communication is costly, providing a robust approach to harmonize compression with convergence in decentralized settings.

Abstract

improvement in test accuracy) compared to the current state-of-the-art method for decentralized learning with communication compression.

Paper Structure (14 sections, 4 equations, 1 figure, 8 tables, 3 algorithms)

This paper contains 14 sections, 4 equations, 1 figure, 8 tables, 3 algorithms.

Introduction
Contributions
Background
AdaGossip
Experiments
Experimental Setup
Decentralized Deep Learning Results
Ablation Study
Limitations
Conclusion
Decentralized Learning Setup
Datasets
Network Architecture
Consensus Rate

Figures (1)

Figure 1: Ablation study on the hyper-parameter $\beta$, number of agents $n$ and model size. The test accuracy is reported for the CIFAR-10 dataset trained on ResNet architecture over ring topology.

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

TL;DR

Abstract

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (1)