Table of Contents
Fetching ...

Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?

Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su

TL;DR

This work analyzes personalization in federated learning through a minimax lens, focusing on how data heterogeneity across clients shapes optimal strategies. It introduces a dichotomy between FedAvg and pure local training: FedAvg is minimax-rate optimal under small heterogeneity, while pure local training is optimal under large heterogeneity, with a sharp threshold at $R^2 \asymp m/N$. A new concept, federated stability, enables precise upper bounds on individualized and weighted excess risks, and the authors show that a simple dichotomous strategy between the two baselines attains minimax optimality across problem instances. The paper also proves minimax lower bounds via logistic-regression constructions and demonstrates that FedAvg followed by local fine-tuning can be minimax-optimal under certain regularity conditions, while providing a detailed stability-based analysis of FedProx. These results illuminate the fundamental trade-offs in personalized FL and offer practical guidance for algorithm selection and hybrid strategies in heterogeneous data settings.

Abstract

A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.

Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?

TL;DR

This work analyzes personalization in federated learning through a minimax lens, focusing on how data heterogeneity across clients shapes optimal strategies. It introduces a dichotomy between FedAvg and pure local training: FedAvg is minimax-rate optimal under small heterogeneity, while pure local training is optimal under large heterogeneity, with a sharp threshold at . A new concept, federated stability, enables precise upper bounds on individualized and weighted excess risks, and the authors show that a simple dichotomous strategy between the two baselines attains minimax optimality across problem instances. The paper also proves minimax lower bounds via logistic-regression constructions and demonstrates that FedAvg followed by local fine-tuning can be minimax-optimal under certain regularity conditions, while providing a detailed stability-based analysis of FedProx. These results illuminate the fundamental trade-offs in personalized FL and offer practical guidance for algorithm selection and hybrid strategies in heterogeneous data settings.

Abstract

A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.

Paper Structure

This paper contains 38 sections, 26 theorems, 167 equations, 1 figure, 2 algorithms.

Key Result

Theorem 4

Let Assumption assump:regularity(b) hold and assume $n_i \geq 4\beta/\mu ~\forall i\in[m]$. Then the algorithm ${\mathcal{A}}_\textnormal{PLT}$ which outputs the minimizer of eq:pure_local_training satisfies for all $i = 1, \ldots, m$.

Figures (1)

  • Figure 1: Average classification accuracy of FedAvg, PureLocalTraining and FedAvg followed by fine tuning (left panel) as well as FedProx with different choice of $\lambda$ (right panel).

Theorems & Definitions (31)

  • Definition 1: Individualized excess risk
  • Definition 2: $\mathbf{p}$-average excess risk
  • Definition 3: Uniform stability
  • Theorem 4: Performance of PureLocalTraining
  • Definition 5: Federated stability
  • Theorem 6: Performance of FedAvg
  • Lemma 7: Logistic regressions are valid problem instances
  • Theorem 8: Minimax lower bounds for estimation errors
  • Corollary 9: Minimax lower bounds for excess errors
  • Proposition 10: Implications of federated stability restricted to FedProx
  • ...and 21 more