Table of Contents
Fetching ...

Think Locally, Act Globally: Federated Learning with Local and Global Representations

Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

TL;DR

LG-FedAvg introduces a local-global federated learning framework that trains compact local representations on edge devices and a smaller global model operating on these representations, reducing communication while preserving accuracy. The authors provide a theory-backed bias-variance analysis showing the ensemble can mitigate both data variance and device variance, and they validate the approach across image, multimodal, and mobile sensing tasks, including fairness-aware variants. Empirical results demonstrate improved communication efficiency, robustness to non-iid data, and personalization capability, with applications to mood prediction and online data shifts. The work offers a versatile, scalable framework for private, heterogeneous, and fair federated learning with broad potential impact on real-world deployment.

Abstract

Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.

Think Locally, Act Globally: Federated Learning with Local and Global Representations

TL;DR

LG-FedAvg introduces a local-global federated learning framework that trains compact local representations on edge devices and a smaller global model operating on these representations, reducing communication while preserving accuracy. The authors provide a theory-backed bias-variance analysis showing the ensemble can mitigate both data variance and device variance, and they validate the approach across image, multimodal, and mobile sensing tasks, including fairness-aware variants. Empirical results demonstrate improved communication efficiency, robustness to non-iid data, and personalization capability, with applications to mood prediction and online data shifts. The work offers a versatile, scalable framework for private, heterogeneous, and fair federated learning with broad potential impact on real-world deployment.

Abstract

Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.

Paper Structure

This paper contains 31 sections, 14 theorems, 33 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Theorem 1

The generalization loss for federated learning can be decomposed as with variance $\text{Var}[\hat{f}] = \mathbb{E}_{\mathbf{x}, \mathbf{r}_m} \left[\text{Var}_{\epsilon}[\hat{f}| \mathbf{x}, \mathbf{r}_m] \right]$ and bias $b^2 = \mathbb{E}\left[\left( f_{\mathbf{u}_m} - \mathbb{E}_\epsilon f(\hat{\mathbf{v}}, \hat{\mathbf{u}}_{m})\right)^2\right]$.

Figures (7)

  • Figure 1: (a) Local Global Federated Averaging (LG-FedAvg) allows for efficient global parameter updates (smaller number of global parameters $\theta^g$), flexibility in design across local and global models, the ability to handle heterogeneous data, and fair representation learning. (a) through (c) show various approaches of training local models including supervised, unsupervised, and self-supervised learning (e.g. jigsaw solving DBLP:journals/corr/NorooziF16). (d) shows adversarial training against protected attributes $\mathbf{P}_m$. Blue represents the global server and purple represents the local devices. $(\mathbf{X}_m, \mathbf{Y}_m)$ represents data on device $m$, $\mathbf{H}_m$ are learned local representations via local models $\ell_m(\ \cdot \ ;\theta_m^\ell) : \mathbf{x} \rightarrow \mathbf{h}$ and (optionally) auxiliary models $a_m(\ \cdot \ ;\theta_m^a) : \mathbf{h} \rightarrow \mathbf{z}$. $g(\ \cdot \ ;\theta^{g}) : \mathbf{h} \rightarrow \mathbf{y}$ is the global model. Agg is an aggregation function over local updates to the global model (e.g. FedAvg).
  • Figure 2: Test error (with shaded std dev) on synthetic data when local models perform better (plot (a): $\sigma=1.5,\rho=0.1$) and when global models perform better (plot (b): $\sigma=1.5,\rho=0.06$). For both settings, using an $\alpha$-interpolation of local and global models performs better than either extremes. (c): We also verify these theoretical findings on increasing device variance when splitting CIFAR-10 (fewer classes per device), where LG-FedAvg consistently outperforms local only and FedAvg. (d): On predicting personalized moods from real-world private mobile data, an $\alpha$-split across local and global models outperforms either extremes.
  • Figure 3: A closer look at the inference paths involved in adversarial training. The local models $\ell_m$, (local copy of the) global model $g$ and adversarial model $a_m$ are trained jointly for the global prediction objective and adversarial objective. Refer to equation (\ref{['eqn:min_thetaf1']}) for the dual optimization objective over local and global model and adversary parameters respectively.
  • Figure 4: Average test error under four settings: 1) when local models perform close to optimal (far left, $\sigma=1.5,\rho=0.5$), 2) when local models perform better (middle left, $\sigma=1.5,\rho=0.1$), 3) when global models perform better (middle right, $\sigma=1.5,\rho=0.06$), and 4) when global models perform close to optimal (far right, $\sigma=1.5,\rho=0.02$). For all settings, using an $\alpha$-interpolation of both local and global models performs either close to the optimal extremes (cases 1 and 4) or better than either extremes (cases 2 and 3).
  • Figure 5: Test accuracy on VQA across $20$ rounds (dotted green line marks the goal accuracy of $40\%$ used in Table \ref{['vqa']}). LG-FedAvg reaches an accuracy of $41.30\%$ compared to $40.22\%$ for FedAvg while using only $9.53\%$ of the parameters.
  • ...and 2 more figures

Theorems & Definitions (22)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • Corollary 1
  • Theorem 1
  • proof
  • Proposition 0
  • proof
  • ...and 12 more