Table of Contents
Fetching ...

Robust Aggregation for Federated Learning

Krishna Pillutla, Sham M. Kakade, Zaid Harchaoui

TL;DR

This work tackles robustness in federated learning by replacing the non-robust arithmetic-mean aggregator with a geometric-median-based approach, implemented via an iterative secure-aggregation protocol to preserve privacy. The resulting Robust Federated Aggregation (RFA) algorithm converges for least-squares additive models and tolerates corruption up to a breakdown point of $\tfrac{1}{2}$, even under data heterogeneity captured by a width parameter $\Omega$. The paper provides convergence analysis, introduces extensions (one-step RFA and on-device personalization), and demonstrates superior robustness to corrupted updates across vision and language tasks, while maintaining competitive performance in light corruption and staying privacy-preserving. The practical impact is a scalable, privacy-conscious FL framework with strong robustness guarantees and flexible variants to address heterogeneity and communication constraints. Open-source implementations in TensorFlow Federated further enable deployment in real-world, privacy-sensitive distributed learning settings.

Abstract

Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The approach relies on a robust aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of iterations of a regular non-robust averaging oracle. The robust aggregation oracle is privacy-preserving, similar to the non-robust secure average oracle it builds upon. We establish its convergence for least squares estimation of additive models. We provide experimental results with linear models and deep networks for three tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption. Two variants, a faster one with one-step robust aggregation and another one with on-device personalization, round off the paper.

Robust Aggregation for Federated Learning

TL;DR

This work tackles robustness in federated learning by replacing the non-robust arithmetic-mean aggregator with a geometric-median-based approach, implemented via an iterative secure-aggregation protocol to preserve privacy. The resulting Robust Federated Aggregation (RFA) algorithm converges for least-squares additive models and tolerates corruption up to a breakdown point of , even under data heterogeneity captured by a width parameter . The paper provides convergence analysis, introduces extensions (one-step RFA and on-device personalization), and demonstrates superior robustness to corrupted updates across vision and language tasks, while maintaining competitive performance in light corruption and staying privacy-preserving. The practical impact is a scalable, privacy-conscious FL framework with strong robustness guarantees and flexible variants to address heterogeneity and communication constraints. Open-source implementations in TensorFlow Federated further enable deployment in real-world, privacy-sensitive distributed learning settings.

Abstract

Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The approach relies on a robust aggregation oracle based on the geometric median, which returns a robust aggregate using a constant number of iterations of a regular non-robust averaging oracle. The robust aggregation oracle is privacy-preserving, similar to the non-robust secure average oracle it builds upon. We establish its convergence for least squares estimation of additive models. We provide experimental results with linear models and deep networks for three tasks in computer vision and natural language processing. The robust aggregation approach is agnostic to the level of corruption; it outperforms the classical aggregation approach in terms of robustness when the level of corruption is high, while being competitive in the regime of low corruption. Two variants, a faster one with one-step robust aggregation and another one with on-device personalization, round off the paper.

Paper Structure

This paper contains 97 sections, 10 theorems, 71 equations, 12 figures, 3 tables, 5 algorithms.

Key Result

Proposition 2

The iterate $v^{(R)}$ of algo:rfa:weiszfeld with input $v^{(0)} \in \operatorname*{conv}\{w_1, \cdots, w_m\}$ and $\nu > 0$ satisfies where $v^\star = \operatorname*{arg\,min} g$ and $\overline \nu = \min_{r \in [R], i\in[m]} \nu \lor \norm{v^{(r-1)} - w_i} \ge \nu$. Furthermore, if $0 < \nu \le \min_{i=1,\cdots, m} \norm{v^\star - w_i}$, then it holds that $g(v^{(R)}) - g(v^\star) \le {2 \norms

Figures (12)

  • Figure 1: Left two: Convergence of the smoothed Weiszfeld algorithm. Right two: Visualization of the re-weighting $\beta_i / \alpha_i$, where $\beta_i$ is the weight of $w_i$ in $\mathrm{GM}((w_i), (\alpha_i)) = \sum_i \beta_i w_i$. See Appendix \ref{['sec:a:expt:gm_algos']} for details.
  • Figure 2: Comparison of robustness of RFA and FedAvg under data corruption (top) and update corruption (bottom). The left three plots for update corruption show omniscient corruption while the rightmost one shows Gaussian corruption. The shaded area denotes minimum and maximum over 5 random seeds.
  • Figure 3: Comparison of RFA with other robust aggregation algorithms on Sent140 with data corruption.
  • Figure 4: Comparison of methods plotted against number of calls to the secure average oracle for different corruption settings. For the case of omniscient corruption, FedAvg and SGD are not shown in the plot if they diverge. The shaded area denotes the maximum and minimum over 5 random seeds.
  • Figure 5: Robustness of one-step RFA.
  • ...and 7 more figures

Theorems & Definitions (31)

  • Definition 1
  • Proposition 2
  • Proposition 3
  • proof
  • Theorem 4
  • Remark 5
  • Theorem 6: jain2017parallelizingjain2017markov
  • proof : Proof of \ref{['thm:rfa:convergence']}
  • Claim 8
  • proof
  • ...and 21 more