Table of Contents
Fetching ...

Few-for-Many Personalized Federated Learning

Ping Guo, Tiantian Zhang, Xi Lin, Xiang Li, Zhi-Ri Tang, Qingfu Zhang

TL;DR

It is proved that this framework achieves near-optimal personalization: the approximation error diminishes as $K$ increases and each client's model converges to each client's optimum as data grows.

Abstract

Personalized Federated Learning (PFL) aims to train customized models for clients with highly heterogeneous data distributions while preserving data privacy. Existing approaches often rely on heuristics like clustering or model interpolation, which lack principled mechanisms for balancing heterogeneous client objectives. Serving $M$ clients with distinct data distributions is inherently a multi-objective optimization problem, where achieving optimal personalization ideally requires $M$ distinct models on the Pareto front. However, maintaining $M$ separate models poses significant scalability challenges in federated settings with hundreds or thousands of clients. To address this challenge, we reformulate PFL as a few-for-many optimization problem that maintains only $K$ shared server models ($K \ll M$) to collectively serve all $M$ clients. We prove that this framework achieves near-optimal personalization: the approximation error diminishes as $K$ increases and each client's model converges to each client's optimum as data grows. Building on this reformulation, we propose FedFew, a practical algorithm that jointly optimizes the $K$ server models through efficient gradient-based updates. Unlike clustering-based approaches that require manual client partitioning or interpolation-based methods that demand careful hyperparameter tuning, FedFew automatically discovers the optimal model diversity through its optimization process. Experiments across vision, NLP, and real-world medical imaging datasets demonstrate that FedFew, with just 3 models, consistently outperforms other state-of-the-art approaches. Code is available at https://github.com/pgg3/FedFew.

Few-for-Many Personalized Federated Learning

TL;DR

It is proved that this framework achieves near-optimal personalization: the approximation error diminishes as increases and each client's model converges to each client's optimum as data grows.

Abstract

Personalized Federated Learning (PFL) aims to train customized models for clients with highly heterogeneous data distributions while preserving data privacy. Existing approaches often rely on heuristics like clustering or model interpolation, which lack principled mechanisms for balancing heterogeneous client objectives. Serving clients with distinct data distributions is inherently a multi-objective optimization problem, where achieving optimal personalization ideally requires distinct models on the Pareto front. However, maintaining separate models poses significant scalability challenges in federated settings with hundreds or thousands of clients. To address this challenge, we reformulate PFL as a few-for-many optimization problem that maintains only shared server models () to collectively serve all clients. We prove that this framework achieves near-optimal personalization: the approximation error diminishes as increases and each client's model converges to each client's optimum as data grows. Building on this reformulation, we propose FedFew, a practical algorithm that jointly optimizes the server models through efficient gradient-based updates. Unlike clustering-based approaches that require manual client partitioning or interpolation-based methods that demand careful hyperparameter tuning, FedFew automatically discovers the optimal model diversity through its optimization process. Experiments across vision, NLP, and real-world medical imaging datasets demonstrate that FedFew, with just 3 models, consistently outperforms other state-of-the-art approaches. Code is available at https://github.com/pgg3/FedFew.
Paper Structure (37 sections, 6 theorems, 32 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 32 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\Theta^{(K)} = \{\theta_1, \ldots, \theta_K\}$ be the optimal solution with $K$ models for $M$ clients. Define $\Delta_{het} = \max_{i,j \in [M]} [L_i(\theta_j^*) - L_i(\theta_i^*)]$ as the maximum pairwise heterogeneity. Then the average error across clients is bounded by: where $\theta_i^* = \arg\min_\theta L_i(\theta)$ is client $i$'s optimal personalized model, $d$ is the model complexit

Figures (8)

  • Figure 1: Paradigms of Personalized Federated Learning.Left: Centralized methods maintain a single global model for all $M$ clients, failing to capture client heterogeneity. Center: Per-client methods train $M$ independent models, sacrificing collaborative learning benefits and suffering from data scarcity. Right: Our proposed few-for-many approach maintains $K$ server models ($K \ll M$) that collectively serve all clients. Each client selects the best-fitting model, achieving strong personalization while preserving collaboration.
  • Figure 2: Sensitivity Studies. (a) Test accuracy vs K on CIFAR-10. FedAvg baselines (dashed) shown for comparison. (b) Evolution over training rounds (log scale) for different K values.
  • Figure 3: Mean client accuracy comparison across communication configurations. All configurations achieve comparable mean client accuracy (87.8--88.3%), demonstrating that our method is robust to different communication-computation trade-offs.
  • Figure 4: Communication-computation trade-off. Convergence of $g^{\text{STCH-Set}}$ vs total local updates for different (local epochs, communication rounds) configurations. Local epochs (LE) $\in \{1, 2, 4, 8, 16\}$ with corresponding communication rounds (GR) to maintain 2000 total updates.
  • Figure 5: Fairness and weight diversity analysis. (a) Performance metrics across different $\mu$ values show that mean accuracy is relatively stable, but worst-case (minimum) accuracy drops significantly at $\mu=0.1$, suggesting a phase transition region. (b) The relationship between fairness (accuracy standard deviation, red) and outer weight diversity (coefficient of variation of $\alpha_i$, blue) reveals that both extreme values ($\mu \to 0$ and $\mu \to \infty$) achieve better fairness than intermediate values, with outer weight diversity decreasing monotonically as $\mu$ increases.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 3.1: Pareto Optimality miettinen1999nonlinear
  • Theorem 3.1: Convergence of K-for-M Framework
  • Remark 3.1: Convergence to Optimal Solution
  • Theorem 4.1: Uniform Smooth Approximation lin2025few
  • Theorem 4.2: Pareto Properties of STCH-Set lin2025few
  • Definition A.1: Maximum Heterogeneity
  • Lemma A.1: Pareto Optimality of K-for-M Solution
  • proof
  • Lemma A.2: Pareto Coverage Bound
  • proof
  • ...and 7 more