Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?
Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su
TL;DR
This work analyzes personalization in federated learning through a minimax lens, focusing on how data heterogeneity across clients shapes optimal strategies. It introduces a dichotomy between FedAvg and pure local training: FedAvg is minimax-rate optimal under small heterogeneity, while pure local training is optimal under large heterogeneity, with a sharp threshold at $R^2 \asymp m/N$. A new concept, federated stability, enables precise upper bounds on individualized and weighted excess risks, and the authors show that a simple dichotomous strategy between the two baselines attains minimax optimality across problem instances. The paper also proves minimax lower bounds via logistic-regression constructions and demonstrates that FedAvg followed by local fine-tuning can be minimax-optimal under certain regularity conditions, while providing a detailed stability-based analysis of FedProx. These results illuminate the fundamental trade-offs in personalized FL and offer practical guidance for algorithm selection and hybrid strategies in heterogeneous data settings.
Abstract
A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.
