Accelerated Distributed Optimization with Compression and Error Feedback

Yuan Gao; Anton Rodomanov; Jeremy Rack; Sebastian U. Stich

Accelerated Distributed Optimization with Compression and Error Feedback

Yuan Gao, Anton Rodomanov, Jeremy Rack, Sebastian U. Stich

TL;DR

This work tackles the bottleneck of communication in distributed stochastic optimization by integrating Nesterov acceleration with contractive compression and error feedback, addressing the longstanding gap in theory for accelerated methods under contraction in the general convex regime. The authors introduce ADEF, a novel algorithm that uses gradient-difference compression and enhanced error feedback to compensate compression errors, and they provide a general descent framework for accelerated methods with inexact updates. The main theoretical result is the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex setting, with a rate of $T = O\left(\frac{R_0^2\sigma^2}{n\varepsilon^2} + \frac{\sqrt{L}\,R_0^2\sigma}{\delta^2\varepsilon^{3/2}} + \frac{\sqrt{\ell R_0^2}}{\delta^2\sqrt{\varepsilon}}\right)$ to achieve $F_T \le \varepsilon$; in the deterministic case ($\sigma^2=0$) it attains an accelerated $O(1/\sqrt{\varepsilon})$ rate with a $1/\delta^2$ dependence. Empirical results on synthetic and MNIST tasks corroborate the theory, showing reduced communication and competitive convergence relative to existing methods. The work thus advances scalable, communication-efficient training of large models by marrying compression with acceleration in a principled framework.

Abstract

Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.

Accelerated Distributed Optimization with Compression and Error Feedback

TL;DR

Abstract

Accelerated Distributed Optimization with Compression and Error Feedback

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (33)