Table of Contents
Fetching ...

Tight analyses of first-order methods with error feedback

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

TL;DR

This work addresses the communication bottleneck in distributed optimization by analyzing first-order methods with gradient compression and error feedback. Using the Performance Estimation framework, it derives tight Lyapunov-based convergence guarantees for EF and EF^21 in the single-agent, smooth μ-strongly convex setting, and compares them apples-to-apples against CGD. The authors show that EF and EF^21 have identical optimal contraction rates under deterministic compression, while CGD achieves faster rates across the tested regimes, and they provide analytically optimal step sizes. The methodology combines SDP-based worst-case analysis with symbolic regression and CAS to produce simple, provably tight Lyapunov functions, offering a transferable toolkit for assessing compressed optimization methods.

Abstract

Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes -- most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$ -- were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method -- with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.

Tight analyses of first-order methods with error feedback

TL;DR

This work addresses the communication bottleneck in distributed optimization by analyzing first-order methods with gradient compression and error feedback. Using the Performance Estimation framework, it derives tight Lyapunov-based convergence guarantees for EF and EF^21 in the single-agent, smooth μ-strongly convex setting, and compares them apples-to-apples against CGD. The authors show that EF and EF^21 have identical optimal contraction rates under deterministic compression, while CGD achieves faster rates across the tested regimes, and they provide analytically optimal step sizes. The methodology combines SDP-based worst-case analysis with symbolic regression and CAS to produce simple, provably tight Lyapunov functions, offering a transferable toolkit for assessing compressed optimization methods.

Abstract

Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes -- most notably and -- were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method -- with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between , , and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.

Paper Structure

This paper contains 40 sections, 2 theorems, 98 equations, 12 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

Consider running alg:ef with a deterministic compression operator satisfying as:compression for some $\epsilon \in [0, 1]$ on any function satisfying as:smoothas:strong_cvx. There exists a nonzero candidate Lyapunov function $\mathcal{V}_{(P,p)}$ of the form defined in eq:lyapunov_shape_EF, satisfyi where matrices $(M_{ij})_{i,j \in \{0, 1, \star\}}$ are defined as in eq:Mijdef, $(C_{i}^{\mathrm{E

Figures (12)

  • Figure 1: Single row of contour plots showing performance of $\mathrm{CGD}$, $\mathrm{EF}$, and $\mathrm{EF^{21}}$ as a function of step size $\eta$ and compression parameter $\epsilon$, with regions of non-convergence marked in red. The regions of non-convergence were computed using PEPit by finding cycles of length 2.
  • Figure 2: Line plot comparing the convergence rate of this paper (blue) with \ref{['eq:rho_richtarik']} (red) for various $\kappa$.
  • Figure 3: Line plot comparing the complexity of \ref{['eq:rho_richtarik']} with the rates of this paper for various $\kappa$.
  • Figure 4: Line plot comparing the convergence rate of this $\mathrm{CGD}$ (blue) with $\mathrm{EF}$/$\mathrm{EF^{21}}$ (red) for various $\kappa$.
  • Figure 5: Line plot comparing the complexity of $\mathrm{EF}$/$\mathrm{EF^{21}}$ with $\mathrm{CGD}$ for various $\kappa$.
  • ...and 7 more figures

Theorems & Definitions (6)

  • Definition 1: Candidate Lyapunov function
  • Remark 1
  • Lemma 1: $\mathrm{EF}$ feasibility problem
  • Lemma 2: $\mathrm{EF^{21}}$ feasibility problem
  • proof
  • proof