Table of Contents
Fetching ...

Distributed Event-Based Learning via ADMM

Guner Dilsad Er, Sebastian Trimpe, Michael Muehlebach

TL;DR

This work tackles distributed learning over networks with non-i.i.d. data and limited communication. It introduces an event-triggered, over-relaxed ADMM scheme that exchanges information only when local updates exceed a threshold $Δ$, and analyzes the resulting algorithm as a dynamical system to establish convergence. Under strong convexity, the method achieves linear (and accelerated) convergence; in nonconvex settings, it yields sublinear guarantees, and a reset mechanism provides robustness to communication failures. Empirical results on MNIST and CIFAR-10 demonstrate substantial communication savings (≥35%) and superior accuracy-communication trade-offs over baselines like FedAvg, FedProx, SCAFFOLD, and FedADMM, highlighting its practical impact for large-scale, heterogeneous distributed learning.

Abstract

We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm both in convex and nonconvex settings and derive accelerated convergence rates for the convex case. We also characterize the effect of communication failures and demonstrate that our algorithm is robust to these. The article concludes by presenting numerical results from distributed learning tasks on the MNIST and CIFAR-10 datasets. The experiments underline communication savings of 35% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, SCAFFOLD and FedADMM.

Distributed Event-Based Learning via ADMM

TL;DR

This work tackles distributed learning over networks with non-i.i.d. data and limited communication. It introduces an event-triggered, over-relaxed ADMM scheme that exchanges information only when local updates exceed a threshold , and analyzes the resulting algorithm as a dynamical system to establish convergence. Under strong convexity, the method achieves linear (and accelerated) convergence; in nonconvex settings, it yields sublinear guarantees, and a reset mechanism provides robustness to communication failures. Empirical results on MNIST and CIFAR-10 demonstrate substantial communication savings (≥35%) and superior accuracy-communication trade-offs over baselines like FedAvg, FedProx, SCAFFOLD, and FedADMM, highlighting its practical impact for large-scale, heterogeneous distributed learning.

Abstract

We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm both in convex and nonconvex settings and derive accelerated convergence rates for the convex case. We also characterize the effect of communication failures and demonstrate that our algorithm is robust to these. The article concludes by presenting numerical results from distributed learning tasks on the MNIST and CIFAR-10 datasets. The experiments underline communication savings of 35% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, SCAFFOLD and FedADMM.
Paper Structure (18 sections, 14 theorems, 120 equations, 12 figures, 8 tables, 3 algorithms)

This paper contains 18 sections, 14 theorems, 120 equations, 12 figures, 8 tables, 3 algorithms.

Key Result

Proposition 2.1

The error $\hat{\zeta}_k\!-\!\zeta_k$ at iteration $k$ is bounded by $|\hat{\zeta}_k\!-\!\zeta_k |\! \leq\! \Delta^d\!+\! T \bar{\chi}^d$, where $T$ denotes the reset period (see Alg. alg:over_relaxed_consensus) and $\bar{\chi}^d$ is a bound on the disturbance $\chi_k^{di}$.

Figures (12)

  • Figure 1: The figure illustrates the distributed learning setup. The Agents $1\!-\!4$ store $x^i$, $u^i$ and perform updates based on the information received by Agent $5$, according to Alg. \ref{['alg:over_relaxed_consensus']}. Agent $5$, stores $z$ and performs updates based on the information received by Agent $1\!-\!4$. This architecture is common in distributed learning, where a single server aggregates updates from multiple distributed clients to collaboratively train a model.
  • Figure 2: The figure visualizes the event-based communication structure of Alg. \ref{['alg:relaxed_event']} at the top and a discrete-time dynamical system which represents the sequence generated by the event-based ADMM algorithm on the bottom. The function $\phi$ is nonlinear and represents the evaluation of (sub)gradients.
  • Figure 3: Validation accuracy (top) and communication load percentage (bottom) over 150 communication rounds for training a CIFAR-10 classifier. The results indicate that Alg. \ref{['alg:over_relaxed_consensus']} achieves top accuracy at a lower communication rate. The plots compare the performance of various algorithms, including Alg. \ref{['alg:over_relaxed_consensus']} with different parameter settings (Vanilla and randomized), FedAvg, FedProx, FedADMM, and SCAFFOLD. Notably, ADMM-based methods (Alg. \ref{['alg:over_relaxed_consensus']}, Alg. \ref{['alg:over_relaxed_consensus']}-Rand and FedADMM) demonstrate better convergence by reaching up to $78\%$ test accuracy, compared to other algorithms FedAvg, FedProx and SCAFFOLD, which reach only $70\%$ accuracy. Among ADMM-based methods, Alg. \ref{['alg:over_relaxed_consensus']} and Alg. \ref{['alg:over_relaxed_consensus']}-Rand achieve the same accuracy with over 20% less communication load. Communication load curves are smoothed using a window length of three for visualization purposes.
  • Figure 4: The communication structure that arises from Alg. \ref{['alg:relaxed_event']}, where $s:=Bz$, $r:=Ax$, and $u$ denotes the dual variable.
  • Figure 5: The diagram visualizes the communication structure for the sharing problem for $N=4$ agents.
  • ...and 7 more figures

Theorems & Definitions (27)

  • Proposition 2.1
  • Corollary 2.2
  • Theorem 2.3
  • Theorem 4.1
  • proof
  • Definition 3.1
  • Definition 3.2
  • Proposition 3.3
  • proof
  • Lemma 4.1
  • ...and 17 more