Table of Contents
Fetching ...

MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes

Igor Sokolov, Peter Richtárik

TL;DR

This work addresses distributed non-smooth federated optimization under server-to-worker compression. It extends EF21-P to the distributed non-smooth setting and introduces MARINA-P for non-smooth convex objectives, establishing optimal convergence rates of ${\mathcal O}(1/\sqrt{T})$ under constant and Polyak stepsizes, and ${\mathcal O}(\log T/\sqrt{T})$ for decreasing stepsizes. A key finding is that MARINA-P with correlated compressors achieves superior practical performance and theoretical guarantees that are independent of the number of workers $n$, extending the benefits of correlated compression to non-smooth regimes. Empirical results on synthetic non-smooth objectives show MARINA-P with correlated compressors outperforming EF21-P, highlighting the importance of server-side downlink compression and adaptive stepsizes in distributed non-smooth federated optimization.

Abstract

Non-smooth communication-efficient federated optimization is crucial for many machine learning applications, yet remains largely unexplored theoretically. Recent advancements have primarily focused on smooth convex and non-convex regimes, leaving a significant gap in understanding the non-smooth convex setting. Additionally, existing literature often overlooks efficient server-to-worker communication (downlink), focusing primarily on worker-to-server communication (uplink). We consider a setup where uplink costs are negligible and focus on optimizing downlink communication by improving state-of-the-art schemes like EF21-P (arXiv:2209.15218) and MARINA-P (arXiv:2402.06412) in the non-smooth convex setting. We extend the non-smooth convex theory of EF21-P [Anonymous, 2024], originally developed for single-node scenarios, to the distributed setting, and extend MARINA-P to the non-smooth convex setting. For both algorithms, we prove an optimal $O(1/\sqrt{T})$ convergence rate and establish communication complexity bounds matching classical subgradient methods. We provide theoretical guarantees under constant, decreasing, and adaptive (Polyak-type) stepsizes. Our experiments demonstrate that MARINA-P with correlated compressors outperforms other methods in both smooth non-convex and non-smooth convex settings. This work presents the first theoretical results for distributed non-smooth optimization with server-to-worker compression, along with comprehensive analysis for various stepsize schemes.

MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes

TL;DR

This work addresses distributed non-smooth federated optimization under server-to-worker compression. It extends EF21-P to the distributed non-smooth setting and introduces MARINA-P for non-smooth convex objectives, establishing optimal convergence rates of under constant and Polyak stepsizes, and for decreasing stepsizes. A key finding is that MARINA-P with correlated compressors achieves superior practical performance and theoretical guarantees that are independent of the number of workers , extending the benefits of correlated compression to non-smooth regimes. Empirical results on synthetic non-smooth objectives show MARINA-P with correlated compressors outperforming EF21-P, highlighting the importance of server-side downlink compression and adaptive stepsizes in distributed non-smooth federated optimization.

Abstract

Non-smooth communication-efficient federated optimization is crucial for many machine learning applications, yet remains largely unexplored theoretically. Recent advancements have primarily focused on smooth convex and non-convex regimes, leaving a significant gap in understanding the non-smooth convex setting. Additionally, existing literature often overlooks efficient server-to-worker communication (downlink), focusing primarily on worker-to-server communication (uplink). We consider a setup where uplink costs are negligible and focus on optimizing downlink communication by improving state-of-the-art schemes like EF21-P (arXiv:2209.15218) and MARINA-P (arXiv:2402.06412) in the non-smooth convex setting. We extend the non-smooth convex theory of EF21-P [Anonymous, 2024], originally developed for single-node scenarios, to the distributed setting, and extend MARINA-P to the non-smooth convex setting. For both algorithms, we prove an optimal convergence rate and establish communication complexity bounds matching classical subgradient methods. We provide theoretical guarantees under constant, decreasing, and adaptive (Polyak-type) stepsizes. Our experiments demonstrate that MARINA-P with correlated compressors outperforms other methods in both smooth non-convex and non-smooth convex settings. This work presents the first theoretical results for distributed non-smooth optimization with server-to-worker compression, along with comprehensive analysis for various stepsize schemes.

Paper Structure

This paper contains 33 sections, 13 theorems, 147 equations, 7 figures, 3 tables, 5 algorithms.

Key Result

Theorem 1

Let Assumptions as:existence_of_minimizer, as:fi_convexity and as:fi_lipschitzness hold. Define a Lyapunov function $V^t := \left\| x^{t} - x^* \right\|^{2}_2 + \frac{1}{\lambda_* \theta} \left\| w^t - x^{t} \right\|^{2}_2$, where $\lambda_* := \frac{\sqrt{1-\alpha}}{1 - \sqrt{1-\alpha}}$ and $\thet If, moreover, optimal $\gamma$ is chosen i.e. then 2. Polyak stepsize. If $\gamma_t$ is chosen a

Figures (7)

  • Figure 1: Performance comparison of EF21-P with Top$K$ and MARINA-P with sameRand$K$, indRand$K$, and Perm$K$ compressors ($K = d/n$). The left column of the legend corresponds to experiments with constant stepsizes, while the right column shows results with Polyak stepsizes. All stepsizes were set to the largest theoretically acceptable value multiplied by an individually tuned constant factor, selected from the set $\{2^{-9}, 2^{-8}, \dots, 2^{7}\}$.
  • Figure 2: Constant stepsize; $n = 10$.
  • Figure 3: Constant stepsize; $n = 100$.
  • Figure 4: Polyak stepsize; $n = 10$.
  • Figure 5: Polyak stepsize; $n = 100$.
  • ...and 2 more figures

Theorems & Definitions (29)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4: Expected density
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Remark 1
  • Corollary 2
  • Definition 5: Perm$K$
  • ...and 19 more