Table of Contents
Fetching ...

Agnostic Federated Learning

Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh

TL;DR

Agnostic Federated Learning (AFL) addresses the mismatch between training and test distributions in federated settings by optimizing a single central model for any mixture of client distributions. The authors develop data-dependent learning bounds using a weighted Rademacher complexity with a skewness term, and they derive a convex minimax optimization solved by a scalable stochastic algorithm (Stochastic-AFL) with convergence guarantees. Empirical results on Adult, Fashion-MNIST, and language-model tasks show AFL improves worst-domain performance compared to standard FL and domain-specific baselines, and extensions to domain clustering, priors over mixture weights, and personalization are explored. Overall, AFL provides a principled, robust framework for learning under distributional shift across multiple clients, with practical applicability to cloud services and domain adaptation scenarios.

Abstract

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients. We argue that, with the existing training and inference, federated models can be biased towards different clients. Instead, we propose a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions. We further show that this framework naturally yields a notion of fairness. We present data-dependent Rademacher complexity guarantees for learning with this objective, which guide the definition of an algorithm for agnostic federated learning. We also give a fast stochastic optimization algorithm for solving the corresponding optimization problem, for which we prove convergence bounds, assuming a convex loss function and hypothesis set. We further empirically demonstrate the benefits of our approach in several datasets. Beyond federated learning, our framework and algorithm can be of interest to other learning scenarios such as cloud computing, domain adaptation, drifting, and other contexts where the training and test distributions do not coincide.

Agnostic Federated Learning

TL;DR

Agnostic Federated Learning (AFL) addresses the mismatch between training and test distributions in federated settings by optimizing a single central model for any mixture of client distributions. The authors develop data-dependent learning bounds using a weighted Rademacher complexity with a skewness term, and they derive a convex minimax optimization solved by a scalable stochastic algorithm (Stochastic-AFL) with convergence guarantees. Empirical results on Adult, Fashion-MNIST, and language-model tasks show AFL improves worst-domain performance compared to standard FL and domain-specific baselines, and extensions to domain clustering, priors over mixture weights, and personalization are explored. Overall, AFL provides a principled, robust framework for learning under distributional shift across multiple clients, with practical applicability to cloud services and domain adaptation scenarios.

Abstract

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients. We argue that, with the existing training and inference, federated models can be biased towards different clients. Instead, we propose a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions. We further show that this framework naturally yields a notion of fairness. We present data-dependent Rademacher complexity guarantees for learning with this objective, which guide the definition of an algorithm for agnostic federated learning. We also give a fast stochastic optimization algorithm for solving the corresponding optimization problem, for which we prove convergence bounds, assuming a convex loss function and hypothesis set. We further empirically demonstrate the benefits of our approach in several datasets. Beyond federated learning, our framework and algorithm can be of interest to other learning scenarios such as cloud computing, domain adaptation, drifting, and other contexts where the training and test distributions do not coincide.

Paper Structure

This paper contains 23 sections, 9 theorems, 60 equations, 4 figures, 3 tables.

Key Result

proposition 1

Let $\ell$ be the cross-entropy loss. Then, there exist $\Lambda$, ${\mathscr H}$, and ${\mathscr D}_k$, $k \in [p]$, such that the following inequality holds:

Figures (4)

  • Figure 1: Illustration of the agnostic federated learning scenario.
  • Figure 2: Illustration of the positions in $\Lambda$ of $\lambda^*$, $\lambda_{\overline {\mathscr U}}$, the mixture weight corresponding to the distribution $\overline {\mathscr U}$, and an arbitrary $\lambda$. $\lambda^*$ defines the least risky distribution $\overline {\mathscr D}_{\lambda^*}$ for which to optimize the expected loss.
  • Figure 3: Pseudocodes of the Stochastic-AFL and Optimistic Stochastic-AFL algorithms.
  • Figure 4: Definition of the stochastic gradients with respect to $\lambda$ and $w$.

Theorems & Definitions (17)

  • proposition 1
  • proof
  • theorem 1
  • proof
  • lemma 1
  • proof
  • corollary 1
  • theorem 2
  • proof
  • lemma 2
  • ...and 7 more