Table of Contents
Fetching ...

Composite Optimization with Error Feedback: the Dual Averaging Approach

Yuan Gao, Anton Rodomanov, Jeremy Rack, Sebastian Stich

TL;DR

This paper tackles the challenge of applying error feedback to distributed composite optimization, where the objective includes a smooth term f and a non-smooth or constrained term \psi. It shows that classical EF analyses fail in this setting due to the proximal-induced distortions, and introduces a novel fusion of Dual Averaging with EControl to restore a cumulative gradient structure and enable rigorous convergence guarantees. The main contributions are a first strong convergence analysis for composite EF via inexact dual averaging, a practical sampling template for virtual iterates, and experimental validation demonstrating linear speedup and superior performance over proximal EF variants. The results advance communication-efficient distributed optimization by enabling reliable handling of non-smooth regularizers and constraints while preserving favorable EF properties.

Abstract

Communication efficiency is a central challenge in distributed machine learning training, and message compression is a widely used solution. However, standard Error Feedback (EF) methods (Seide et al., 2014), though effective for smooth unconstrained optimization with compression (Karimireddy et al., 2019), fail in the broader and practically important setting of composite optimization, which captures, e.g., objectives consisting of a smooth loss combined with a non-smooth regularizer or constraints. The theoretical foundation and behavior of EF in the context of the general composite setting remain largely unexplored. In this work, we consider composite optimization with EF. We point out that the basic EF mechanism and its analysis no longer stand when a composite part is involved. We argue that this is because of a fundamental limitation in the method and its analysis technique. We propose a novel method that combines Dual Averaging with EControl (Gao et al., 2024), a state-of-the-art variant of the EF mechanism, and achieves for the first time a strong convergence analysis for composite optimization with error feedback. Along with our new algorithm, we also provide a new and novel analysis template for inexact dual averaging method, which might be of independent interest. We also provide experimental results to complement our theoretical findings.

Composite Optimization with Error Feedback: the Dual Averaging Approach

TL;DR

This paper tackles the challenge of applying error feedback to distributed composite optimization, where the objective includes a smooth term f and a non-smooth or constrained term \psi. It shows that classical EF analyses fail in this setting due to the proximal-induced distortions, and introduces a novel fusion of Dual Averaging with EControl to restore a cumulative gradient structure and enable rigorous convergence guarantees. The main contributions are a first strong convergence analysis for composite EF via inexact dual averaging, a practical sampling template for virtual iterates, and experimental validation demonstrating linear speedup and superior performance over proximal EF variants. The results advance communication-efficient distributed optimization by enabling reliable handling of non-smooth regularizers and constraints while preserving favorable EF properties.

Abstract

Communication efficiency is a central challenge in distributed machine learning training, and message compression is a widely used solution. However, standard Error Feedback (EF) methods (Seide et al., 2014), though effective for smooth unconstrained optimization with compression (Karimireddy et al., 2019), fail in the broader and practically important setting of composite optimization, which captures, e.g., objectives consisting of a smooth loss combined with a non-smooth regularizer or constraints. The theoretical foundation and behavior of EF in the context of the general composite setting remain largely unexplored. In this work, we consider composite optimization with EF. We point out that the basic EF mechanism and its analysis no longer stand when a composite part is involved. We argue that this is because of a fundamental limitation in the method and its analysis technique. We propose a novel method that combines Dual Averaging with EControl (Gao et al., 2024), a state-of-the-art variant of the EF mechanism, and achieves for the first time a strong convergence analysis for composite optimization with error feedback. Along with our new algorithm, we also provide a new and novel analysis template for inexact dual averaging method, which might be of independent interest. We also provide experimental results to complement our theoretical findings.

Paper Structure

This paper contains 20 sections, 19 theorems, 156 equations, 2 figures, 4 algorithms.

Key Result

Lemma 3.1

For any $t\geq 0$, we have:

Figures (2)

  • Figure 1: Synthetic regularized softmax objective
  • Figure 2: Superior performance Comparison of the performance of EControl with Dual Averaging, proximal EF, and proximal EF21 on the FashionMNIST classification problem with $\ell_1$ regularization. We use Top-$K$ compression with $\delta=0.1$. We see that EControl with Dual Averaging significantly outperforms the other methods.

Theorems & Definitions (39)

  • Definition 2.2
  • Remark 2.6
  • Lemma 3.1
  • Theorem 3.2
  • Lemma 3.3
  • Lemma 4.0
  • Remark 4.1
  • Lemma 4.1
  • Theorem 4.2
  • Remark 4.3
  • ...and 29 more