Table of Contents
Fetching ...

Distributionally Robust Federated Learning: An ADMM Algorithm

Wen Bai, Yi Wong, Xiao Qiao, Chin Pang Ho

TL;DR

This work tackles distributional heterogeneity and ambiguity in federated learning by introducing Distributionally Robust Federated Learning (DRFL) with per-client Wasserstein ambiguity sets. It derives a tractable reformulation that converts the inner worst-case expectation into a finite-dimensional optimization and embeds it into a min-max-min structure amenable to ADMM-based optimization (AD-LPMM). The proposed algorithm provides convergence guarantees in the convex setting and practical convergence to critical points otherwise, while enabling client-local worst-case loss updates in a distributed fashion. Empirical results on SVM and huber-regression tasks across several datasets show that DRFL consistently outperforms AFL, WAFL, and other robust baselines under data heterogeneity and distributional ambiguity, highlighting its potential for robust FL in real-world heterogeneous environments.

Abstract

Federated learning (FL) aims to train machine learning (ML) models collaboratively using decentralized data, bypassing the need for centralized data aggregation. Standard FL models often assume that all data come from the same unknown distribution. However, in practical situations, decentralized data frequently exhibit heterogeneity. We propose a novel FL model, Distributionally Robust Federated Learning (DRFL), that applies distributionally robust optimization to overcome the challenges posed by data heterogeneity and distributional ambiguity. We derive a tractable reformulation for DRFL and develop a novel solution method based on the alternating direction method of multipliers (ADMM) algorithm to solve this problem. Our experimental results demonstrate that DRFL outperforms standard FL models under data heterogeneity and ambiguity.

Distributionally Robust Federated Learning: An ADMM Algorithm

TL;DR

This work tackles distributional heterogeneity and ambiguity in federated learning by introducing Distributionally Robust Federated Learning (DRFL) with per-client Wasserstein ambiguity sets. It derives a tractable reformulation that converts the inner worst-case expectation into a finite-dimensional optimization and embeds it into a min-max-min structure amenable to ADMM-based optimization (AD-LPMM). The proposed algorithm provides convergence guarantees in the convex setting and practical convergence to critical points otherwise, while enabling client-local worst-case loss updates in a distributed fashion. Empirical results on SVM and huber-regression tasks across several datasets show that DRFL consistently outperforms AFL, WAFL, and other robust baselines under data heterogeneity and distributional ambiguity, highlighting its potential for robust FL in real-world heterogeneous environments.

Abstract

Federated learning (FL) aims to train machine learning (ML) models collaboratively using decentralized data, bypassing the need for centralized data aggregation. Standard FL models often assume that all data come from the same unknown distribution. However, in practical situations, decentralized data frequently exhibit heterogeneity. We propose a novel FL model, Distributionally Robust Federated Learning (DRFL), that applies distributionally robust optimization to overcome the challenges posed by data heterogeneity and distributional ambiguity. We derive a tractable reformulation for DRFL and develop a novel solution method based on the alternating direction method of multipliers (ADMM) algorithm to solve this problem. Our experimental results demonstrate that DRFL outperforms standard FL models under data heterogeneity and ambiguity.

Paper Structure

This paper contains 30 sections, 13 theorems, 88 equations, 5 figures, 1 algorithm.

Key Result

Proposition 4.2

For every $s\in\mathcal{S}$ and any fixed $\bm{w}\in\mathcal{W}$, where $\Omega_s \subseteq \mathbb{R}_+\times \mathbb{R}^{N_s}\times \mathcal{W}$ is defined as the following set In particular, problem eq:origin_problem is equivalent to

Figures (5)

  • Figure 1: $P (\mathbb{P}^\star \subseteq \mathbb{P})$ in different radius for two models (left) and the normalized distribution volume inside the ambiguity sets of two models in different guarantee levels to contain $\mathbb{P}^\star$ (right).
  • Figure 2: (Heart and Abalone datasets) Comparing noise resilience among our proposed DRFL, DRFA deng2020distributionally, AFL mohri2019agnostic, WAFL nguyen2022generalization and Standard FL (baseline). Gaussian noise with increasing mean and fixed standard deviation (SD) (left). Gaussian noise with increasing SD and fixed mean (middle). Gaussian noise with mean = constant $*$ SD (right). Remark: DRFA and standard HR exhibit equally suboptimal performance, with their results overlapping of Abalone dataset.
  • Figure 3: (Breast-cancer dataset) Comparing noise resilience among our proposed DRFL, DRFA deng2020distributionally, AFL mohri2019agnostic, WAFL nguyen2022generalization and Standard SVM (baseline). Gaussian noise with increasing mean and fixed SD = 2 (left). Gaussian noise with increasing SD and fixed mean = 0 (middle). Gaussian noise with mean = 2 * SD (right).
  • Figure 4: (Breast-cancer dataset) Comparing noise resilience in imbalances class scenario among our proposed DRFL, DRFA deng2020distributionally, AFL mohri2019agnostic, WAFL nguyen2022generalization and Standard SVM (baseline). Gaussian noise with increasing mean and fixed SD = 0.4 (left). Gaussian noise with increasing SD and fixed mean = 0.1 (middle). Gaussian noise with mean = 2 * SD (right).
  • Figure 5: (Heart dataset) Comparing noise resilience in imbalance class scenario among our proposed DRFL, DRFA deng2020distributionally, AFL mohri2019agnostic, WAFL nguyen2022generalization and Standard SVM (baseline). Gaussian noise with increasing mean and fixed standard deviation (SD) = 0.1 (left). Gaussian noise with increasing SD and fixed mean = 0.5 (middle). Gaussian noise with mean = 0.5*SD (right).

Theorems & Definitions (14)

  • Example 4.1
  • Proposition 4.2
  • Theorem 4.3
  • Corollary 6.1
  • Corollary 6.2
  • Proposition 6.3
  • Proposition 6.4
  • Proposition A.1
  • Corollary B.1
  • Corollary B.2
  • ...and 4 more