Few-for-Many Personalized Federated Learning

Ping Guo; Tiantian Zhang; Xi Lin; Xiang Li; Zhi-Ri Tang; Qingfu Zhang

Few-for-Many Personalized Federated Learning

Ping Guo, Tiantian Zhang, Xi Lin, Xiang Li, Zhi-Ri Tang, Qingfu Zhang

TL;DR

It is proved that this framework achieves near-optimal personalization: the approximation error diminishes as $K$ increases and each client's model converges to each client's optimum as data grows.

Abstract

Personalized Federated Learning (PFL) aims to train customized models for clients with highly heterogeneous data distributions while preserving data privacy. Existing approaches often rely on heuristics like clustering or model interpolation, which lack principled mechanisms for balancing heterogeneous client objectives. Serving $M$ clients with distinct data distributions is inherently a multi-objective optimization problem, where achieving optimal personalization ideally requires $M$ distinct models on the Pareto front. However, maintaining $M$ separate models poses significant scalability challenges in federated settings with hundreds or thousands of clients. To address this challenge, we reformulate PFL as a few-for-many optimization problem that maintains only $K$ shared server models ($K \ll M$) to collectively serve all $M$ clients. We prove that this framework achieves near-optimal personalization: the approximation error diminishes as $K$ increases and each client's model converges to each client's optimum as data grows. Building on this reformulation, we propose FedFew, a practical algorithm that jointly optimizes the $K$ server models through efficient gradient-based updates. Unlike clustering-based approaches that require manual client partitioning or interpolation-based methods that demand careful hyperparameter tuning, FedFew automatically discovers the optimal model diversity through its optimization process. Experiments across vision, NLP, and real-world medical imaging datasets demonstrate that FedFew, with just 3 models, consistently outperforms other state-of-the-art approaches. Code is available at https://github.com/pgg3/FedFew.

Few-for-Many Personalized Federated Learning

TL;DR

It is proved that this framework achieves near-optimal personalization: the approximation error diminishes as

increases and each client's model converges to each client's optimum as data grows.

Abstract

clients with distinct data distributions is inherently a multi-objective optimization problem, where achieving optimal personalization ideally requires

distinct models on the Pareto front. However, maintaining

separate models poses significant scalability challenges in federated settings with hundreds or thousands of clients. To address this challenge, we reformulate PFL as a few-for-many optimization problem that maintains only

shared server models (

) to collectively serve all

clients. We prove that this framework achieves near-optimal personalization: the approximation error diminishes as

increases and each client's model converges to each client's optimum as data grows. Building on this reformulation, we propose FedFew, a practical algorithm that jointly optimizes the

server models through efficient gradient-based updates. Unlike clustering-based approaches that require manual client partitioning or interpolation-based methods that demand careful hyperparameter tuning, FedFew automatically discovers the optimal model diversity through its optimization process. Experiments across vision, NLP, and real-world medical imaging datasets demonstrate that FedFew, with just 3 models, consistently outperforms other state-of-the-art approaches. Code is available at https://github.com/pgg3/FedFew.

Paper Structure (37 sections, 6 theorems, 32 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 6 theorems, 32 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Standard and Personalized Federated Learning
Multi-Objective Optimization in FL
PFL as Multi-Objective Optimization
Problem Setup and Client Objectives
Set-based Optimization: K-for-M Framework
FedFew Algorithm
Smooth Tchebycheff Set Scalarization
Decomposed Gradient Computation
Federated Implementation
Convergence Guarantees
Experiments
Experimental Setup
Main Results
...and 22 more sections

Key Result

Theorem 3.1

Let $\Theta^{(K)} = \{\theta_1, \ldots, \theta_K\}$ be the optimal solution with $K$ models for $M$ clients. Define $\Delta_{het} = \max_{i,j \in [M]} [L_i(\theta_j^*) - L_i(\theta_i^*)]$ as the maximum pairwise heterogeneity. Then the average error across clients is bounded by: where $\theta_i^* = \arg\min_\theta L_i(\theta)$ is client $i$'s optimal personalized model, $d$ is the model complexit

Figures (8)

Figure 1: Paradigms of Personalized Federated Learning.Left: Centralized methods maintain a single global model for all $M$ clients, failing to capture client heterogeneity. Center: Per-client methods train $M$ independent models, sacrificing collaborative learning benefits and suffering from data scarcity. Right: Our proposed few-for-many approach maintains $K$ server models ($K \ll M$) that collectively serve all clients. Each client selects the best-fitting model, achieving strong personalization while preserving collaboration.
Figure 2: Sensitivity Studies. (a) Test accuracy vs K on CIFAR-10. FedAvg baselines (dashed) shown for comparison. (b) Evolution over training rounds (log scale) for different K values.
Figure 3: Mean client accuracy comparison across communication configurations. All configurations achieve comparable mean client accuracy (87.8--88.3%), demonstrating that our method is robust to different communication-computation trade-offs.
Figure 4: Communication-computation trade-off. Convergence of $g^{\text{STCH-Set}}$ vs total local updates for different (local epochs, communication rounds) configurations. Local epochs (LE) $\in \{1, 2, 4, 8, 16\}$ with corresponding communication rounds (GR) to maintain 2000 total updates.
Figure 5: Fairness and weight diversity analysis. (a) Performance metrics across different $\mu$ values show that mean accuracy is relatively stable, but worst-case (minimum) accuracy drops significantly at $\mu=0.1$, suggesting a phase transition region. (b) The relationship between fairness (accuracy standard deviation, red) and outer weight diversity (coefficient of variation of $\alpha_i$, blue) reveals that both extreme values ($\mu \to 0$ and $\mu \to \infty$) achieve better fairness than intermediate values, with outer weight diversity decreasing monotonically as $\mu$ increases.
...and 3 more figures

Theorems & Definitions (17)

Definition 3.1: Pareto Optimality miettinen1999nonlinear
Theorem 3.1: Convergence of K-for-M Framework
Remark 3.1: Convergence to Optimal Solution
Theorem 4.1: Uniform Smooth Approximation lin2025few
Theorem 4.2: Pareto Properties of STCH-Set lin2025few
Definition A.1: Maximum Heterogeneity
Lemma A.1: Pareto Optimality of K-for-M Solution
proof
Lemma A.2: Pareto Coverage Bound
proof
...and 7 more

Few-for-Many Personalized Federated Learning

TL;DR

Abstract

Few-for-Many Personalized Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)