Table of Contents
Fetching ...

Federated ADMM from Bayesian Duality

Thomas Möllenhoff, Siddharth Swaroop, Finale Doshi-Velez, Mohammad Emtiyaz Khan

TL;DR

The paper introduces Bayesian Duality as a unifying lens to derive and extend federated ADMM. By reformulating federated learning with variational Bayes over exponential-family posteriors, it recovers ADMM for isotropic Gaussians and naturally yields Newton-like and Adam-like variants (e.g., IVON-ADMM) when richer posteriors are used. The BayesADMM framework leverages dual updates in natural-parameter space, enabling uncertainty-aware updates that improve performance under client heterogeneity while preserving communication efficiency. Empirical results across diverse federated benchmarks show IVON-ADMM often outperforming strong baselines, sometimes achieving near-one-round convergence in convex settings and robust performance in deep learning tasks. Overall, the work opens a path to extending primal-dual methods with Bayesian principles, suggesting broad applicability to other distributed optimization techniques.

Abstract

We propose a new Bayesian approach to derive and extend the federated Alternating Direction Method of Multipliers (ADMM). We show that the solutions of variational-Bayesian objectives are associated with a duality structure that not only resembles ADMM but also extends it. For example, ADMM-like updates are recovered when the objective is optimized over the isotropic-Gaussian family, and new non-trivial extensions are obtained for other more flexible exponential families. Examples include a Newton-like variant that converges in one step on quadratics and an Adam-like variant called IVON-ADMM that has the same cost as Adam but yields up to 7% accuracy boosts in heterogeneous deep learning. Our work opens a new direction to use Bayes to extend ADMM and other primal-dual methods.

Federated ADMM from Bayesian Duality

TL;DR

The paper introduces Bayesian Duality as a unifying lens to derive and extend federated ADMM. By reformulating federated learning with variational Bayes over exponential-family posteriors, it recovers ADMM for isotropic Gaussians and naturally yields Newton-like and Adam-like variants (e.g., IVON-ADMM) when richer posteriors are used. The BayesADMM framework leverages dual updates in natural-parameter space, enabling uncertainty-aware updates that improve performance under client heterogeneity while preserving communication efficiency. Empirical results across diverse federated benchmarks show IVON-ADMM often outperforming strong baselines, sometimes achieving near-one-round convergence in convex settings and robust performance in deep learning tasks. Overall, the work opens a path to extending primal-dual methods with Bayesian principles, suggesting broad applicability to other distributed optimization techniques.

Abstract

We propose a new Bayesian approach to derive and extend the federated Alternating Direction Method of Multipliers (ADMM). We show that the solutions of variational-Bayesian objectives are associated with a duality structure that not only resembles ADMM but also extends it. For example, ADMM-like updates are recovered when the objective is optimized over the isotropic-Gaussian family, and new non-trivial extensions are obtained for other more flexible exponential families. Examples include a Newton-like variant that converges in one step on quadratics and an Adam-like variant called IVON-ADMM that has the same cost as Adam but yields up to 7% accuracy boosts in heterogeneous deep learning. Our work opens a new direction to use Bayes to extend ADMM and other primal-dual methods.

Paper Structure

This paper contains 36 sections, 4 theorems, 43 equations, 6 figures, 7 tables, 2 algorithms.

Key Result

Proposition 3.1

If the client steps 1 and 2 in fig:algs_b are iterated until convergence for a given $\bar{\boldsymbol{\lambda}}$, then the server step that follows in line 3 will be equivalent to the BLR.

Figures (6)

  • Figure 1: The left figure shows the core structure of federated ADMM to update the parameters of the server $\bar{\boldsymbol{\theta}}$ and clients $\boldsymbol{\theta}_k$. The server $\bar{\boldsymbol{\theta}}$ is updated using the dual vectors $\hbox{$\hbox{$\mathbf{v}$}$}_k = \nabla \ell_k$ for each local loss $\ell_k$, which are aggregated into $\bar{\hbox{$\hbox{$\mathbf{v}$}$}}$. The loop is derived from the optimality condition and drawn analogously to the closed circuit by Ro67. The right figure shows our new Bayesian Duality structure derived by using the optimality condition of a lifted VB objective over the space of distributions $q(\boldsymbol{\theta})$. Analogous to the dual vector $\hbox{$\hbox{$\mathbf{v}$}$}_k$ as gradients, we now have new dual vectors $\widehat{\boldsymbol{\lambda}}_k=\nabla_{\text{$\hbox{$\hbox{$\boldsymbol{\mu}$}$}$}} \mathbb{E}_{q_{\text{$\hbox{$\hbox{$\boldsymbol{\mu}$}$}_k$}}}[\ell_k]$ as natural gradients (defined in the main text). We show that the ADMM structure is a special case of the Bayesian Duality structure.
  • Figure 2: FederatedADMM algorithm iteratively updates the triplet $(\boldsymbol{\theta}_{1:K}, \hbox{$\hbox{$\mathbf{v}$}$}_{1:K}, \bar{\boldsymbol{\theta}})$ as shown in lines 1-3. Our new BayesADMM algorithm optimizes a federated VB reformulation aiming for a global exponential-family (EF) posterior $\bar{q}$. The algorithm exploits the dual pair $(\bar{\boldsymbol{\mu}}, \bar{\boldsymbol{\lambda}})$, consisting of the expectation and natural parameters, respectively, associated with $\bar{q}$. The three lines update the triplet $(q_{1:K}, \widehat{\boldsymbol{\lambda}}_{1:K}, \bar{q})$, analogously to ADMM, where $q_k$ is the local posterior with expectation parameter $\hbox{$\hbox{$\boldsymbol{\mu}$}$}_k$, and $\widehat{\boldsymbol{\lambda}}_k$ is the dual vector defined in the natural parameter space.
  • Figure 3: (a) PVI can diverge on logistic regression (MNIST). The damping used by AsBu22 improves this, but is slower than BayesADMM. (b) BayesADMM converges in a single round for certain loss functions, whereas federated ADMM needs many steps. In both (a) and (b), BayesADMM clearly improves over BregmanADMM WaBa14. The details of these experiments are in \ref{['app:logreg']}.
  • Figure 4: A single outlier that slows down ADMM (top row) poses no issues for our new Bayesian version BayesADMM (bottom row). The server (left column) with ADMM takes 5 iterations while with Bayes only needs 2 (decision boundaries are numbered with iteration number). Client 1 (middle column) is the source of the issue which takes 5 iterations to ignore the outlier, while with Bayes, it is much faster due to the use of uncertainty (gray lines). The rightmost column shows client 2 where there are no outliers.
  • Figure 5: IVON-ADMM converges significantly faster than IVON-PVI, both in terms of test NLL and test accuracy, here on the ResNet-20 experiment on CIFAR-100. Both methods perform much better than the non-Bayesian methods (shown as the horizontal dashed line) which do not exploit the posterior covariance. This shows the benefits of using uncertainty in heterogeneous learning, and benefit of IVON-ADMM over IVON-PVI.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition H.1
  • proof
  • Proposition H.1
  • proof