Table of Contents
Fetching ...

Federated Learning of a Mixture of Global and Local Models

Filip Hanzely, Peter Richtárik

TL;DR

This work reframes federated learning with a mixture objective that jointly learns a global model and personalized local models by optimizing $F(x)=f(x)+\lambda\psi(x)$ where $f(x)=\frac{1}{n}\sum_i f_i(x_i)$ and $\psi(x)=\frac{1}{2n}\sum_i\|x_i-\bar{x}\|^2$. It introduces L2GD, a loopless, non-uniform SGD algorithm that alternates between local GD and averaging, and proves that local steps can improve communication in heterogeneous data regimes by effectively solving a personalized FL objective. The paper further develops variance-reduced variants (L2SGD+ and L2SGD++) that achieve linear convergence to the global optimum with favorable communication complexity, and provides theoretical convergence rates and optimal participation probabilities. Empirical results on LibSVM datasets corroborate the theory, showing variance reduction accelerates convergence and that personalization does not harm performance under data heterogeneity. Overall, the framework clarifies the role of local updates in FL, links personalization to reduced communication, and offers scalable, provably efficient algorithms for mixed global-local training.

Abstract

We propose a new optimization formulation for training federated learning models. The standard formulation has the form of an empirical risk minimization problem constructed to find a single global model trained from the private data stored across all participating devices. In contrast, our formulation seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication. Further, we develop several efficient variants of SGD (with and without partial participation and with and without variance reduction) for solving the new formulation and prove communication complexity guarantees. Notably, our methods are similar but not identical to federated averaging / local SGD, thus shedding some light on the role of local steps in federated learning. In particular, we are the first to i) show that local steps can improve communication for problems with heterogeneous data, and ii) point out that personalization yields reduced communication complexity.

Federated Learning of a Mixture of Global and Local Models

TL;DR

This work reframes federated learning with a mixture objective that jointly learns a global model and personalized local models by optimizing where and . It introduces L2GD, a loopless, non-uniform SGD algorithm that alternates between local GD and averaging, and proves that local steps can improve communication in heterogeneous data regimes by effectively solving a personalized FL objective. The paper further develops variance-reduced variants (L2SGD+ and L2SGD++) that achieve linear convergence to the global optimum with favorable communication complexity, and provides theoretical convergence rates and optimal participation probabilities. Empirical results on LibSVM datasets corroborate the theory, showing variance reduction accelerates convergence and that personalization does not harm performance under data heterogeneity. Overall, the framework clarifies the role of local updates in FL, links personalization to reduced communication, and offers scalable, provably efficient algorithms for mixed global-local training.

Abstract

We propose a new optimization formulation for training federated learning models. The standard formulation has the form of an empirical risk minimization problem constructed to find a single global model trained from the private data stored across all participating devices. In contrast, our formulation seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication. Further, we develop several efficient variants of SGD (with and without partial participation and with and without variance reduction) for solving the new formulation and prove communication complexity guarantees. Notably, our methods are similar but not identical to federated averaging / local SGD, thus shedding some light on the role of local steps in federated learning. In particular, we are the first to i) show that local steps can improve communication for problems with heterogeneous data, and ii) point out that personalization yields reduced communication complexity.

Paper Structure

This paper contains 40 sections, 15 theorems, 88 equations, 6 figures, 1 table, 8 algorithms.

Key Result

theorem 1

The function $\lambda \to \psi(x(\lambda))$ is non-increasing, and for all $\lambda>0$ we have Moreover, the function $\lambda \to f(x(\lambda))$ is non-decreasing, and for all $\lambda \geq 0$ we have

Figures (6)

  • Figure 1: Distance of solution $x(\lambda)$ of \ref{['eq:main']} to pure local solution $x(0)$ and global solution $x(\infty)$ as a function of $\lambda$. Logistic regression on a1a dataset. See Appendix for the setup.
  • Figure 2: Communication rounds to get $\tfrac{F(x^k)-F(x^*)}{F(x^0)-F(x^*)}\leq 10^{-5}$ as a function of $p$ with $p^* \approx 0.09$ (for L2SGD+). Logistic regression on a1a dataset with $\lambda = 0.1$.
  • Figure 3: L2SGD+, vs L2SGD vs L2SGD2 with identical stepsize (details in the Appendix).
  • Figure 4: Variance reduced local SGD (Algorithm \ref{['alg:L2SGD']}), shifted local SGD (Algorithm \ref{['alg:lsgd_partial']}) and local SGD (Algorithm \ref{['alg:lsgd_none']}) applied on LibSVM problems for both homogeneous split of data and Heterogeneous split of the data. Stepsize for non-variance reduced method was chosen the same as for the analogous variance reduced method.
  • Figure 5: Effect of the aggregation probability ${p}$ (legend of the plots) on the convergence rate of Algorithm \ref{['alg:L2SGD']}. Choice ${p} = {p}^{\star}$ corresponds to red dotted line with triangle marker. Parameter $\lambda$ was chosen in each case as Table \ref{['tbl:data']} indicates.
  • ...and 1 more figures

Theorems & Definitions (23)

  • theorem 1
  • theorem 2
  • theorem 3
  • remark 4
  • example 5
  • lemma 6
  • remark 7
  • theorem 8
  • corollary 9
  • theorem 10
  • ...and 13 more