Table of Contents
Fetching ...

Federated Ensemble Learning with Progressive Model Personalization

Ala Emrani, Amir Najafi, Abolfazl Motahari

TL;DR

This work proposes a boosting-inspired framework that enables a smooth control of a fundamental tradeoff in PFL, and proves that the complexity of the shared layers is effectively suppressed, while the dependence on the boosting horizon $T$ is controlled through parameter reduction.

Abstract

Federated Learning provides a privacy-preserving paradigm for distributed learning, but suffers from statistical heterogeneity across clients. Personalized Federated Learning (PFL) mitigates this issue by considering client-specific models. A widely adopted approach in PFL decomposes neural networks into a shared feature extractor and client-specific heads. While effective, this design induces a fundamental tradeoff: deep or expressive shared components hinder personalization, whereas large local heads exacerbate overfitting under limited per-client data. Most existing methods rely on rigid, shallow heads, and therefore fail to navigate this tradeoff in a principled manner. In this work, we propose a boosting-inspired framework that enables a smooth control of this tradeoff. Instead of training a single personalized model, we construct an ensemble of $T$ models for each client. Across boosting iterations, the depth of the personalized component are progressively increased, while its effective complexity is systematically controlled via low-rank factorization or width shrinkage. This design simultaneously limits overfitting and substantially reduces per-client bias by allowing increasingly expressive personalization. We provide theoretical analysis that establishes generalization bounds with favorable dependence on the average local sample size and the total number of clients. Specifically, we prove that the complexity of the shared layers is effectively suppressed, while the dependence on the boosting horizon $T$ is controlled through parameter reduction. Notably, we provide a novel nonlinear generalization guarantee for decoupled PFL models. Extensive experiments on benchmark and real-world datasets (e.g., EMNIST, CIFAR-10/100, and Sent140) demonstrate that the proposed framework consistently outperforms state-of-the-art PFL methods under heterogeneous data distributions.

Federated Ensemble Learning with Progressive Model Personalization

TL;DR

This work proposes a boosting-inspired framework that enables a smooth control of a fundamental tradeoff in PFL, and proves that the complexity of the shared layers is effectively suppressed, while the dependence on the boosting horizon is controlled through parameter reduction.

Abstract

Federated Learning provides a privacy-preserving paradigm for distributed learning, but suffers from statistical heterogeneity across clients. Personalized Federated Learning (PFL) mitigates this issue by considering client-specific models. A widely adopted approach in PFL decomposes neural networks into a shared feature extractor and client-specific heads. While effective, this design induces a fundamental tradeoff: deep or expressive shared components hinder personalization, whereas large local heads exacerbate overfitting under limited per-client data. Most existing methods rely on rigid, shallow heads, and therefore fail to navigate this tradeoff in a principled manner. In this work, we propose a boosting-inspired framework that enables a smooth control of this tradeoff. Instead of training a single personalized model, we construct an ensemble of models for each client. Across boosting iterations, the depth of the personalized component are progressively increased, while its effective complexity is systematically controlled via low-rank factorization or width shrinkage. This design simultaneously limits overfitting and substantially reduces per-client bias by allowing increasingly expressive personalization. We provide theoretical analysis that establishes generalization bounds with favorable dependence on the average local sample size and the total number of clients. Specifically, we prove that the complexity of the shared layers is effectively suppressed, while the dependence on the boosting horizon is controlled through parameter reduction. Notably, we provide a novel nonlinear generalization guarantee for decoupled PFL models. Extensive experiments on benchmark and real-world datasets (e.g., EMNIST, CIFAR-10/100, and Sent140) demonstrate that the proposed framework consistently outperforms state-of-the-art PFL methods under heterogeneous data distributions.
Paper Structure (40 sections, 11 theorems, 93 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 40 sections, 11 theorems, 93 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Assume $T$ deep neural network architectures, each of depth $L>T$, and widths $N^{(t)}_l$. Suppose that all network weights are uniformly bounded and that the activation function is Lipschitz continuous. Then, with probability at least $1-\zeta$ for any $\zeta\in(0,1)$, the generalization gap define The $\widetilde{\mathcal{O}}(\cdot)$ notation hides logarithmic dependencies on $T$, $L$, the weigh

Figures (4)

  • Figure 1: Progressive personalization procedure, transitioning from a fully global model to increasingly personalized models across successive stages (for $t=1,2,3$ and $4$). As can be seen, we either shrink the personalized layer widths, or lower their effective parameter size via low-rank decomposition.
  • Figure 2: Overview of the proposed PPFE method. The number of personalized layers is progressively increased across stages. At each stage, samples are reweighted using feedback from the model of the previous stage, and the final model is an ensemble of the models from all stages.
  • Figure 3: Performance of PPFE on synthetic data under heterogeneous settings with varying numbers of clients and personalization ratios.
  • Figure 4: Effect of data size and data heterogeneity on the performance.

Theorems & Definitions (22)

  • Theorem 1: Main Result (informal), see Theorem \ref{['thm:main']}
  • Corollary 1: Special DNN Architectures (informal), see Corollary \ref{['corl:specialSetting']}
  • Remark 1: Width Shrinkage vs. Low-Rank Decomposition
  • Remark 2: Non-Vacuous Bounds for DNNs
  • Remark 3: Theoretical Analysis of FedRep collins2021exploiting
  • Theorem 2: Theorem 6.1 of mohri2018foundations
  • Definition 1: $\varepsilon$-Cover and Covering Number
  • Lemma 3: From mohri2018foundations
  • Lemma 4: Covering Number of Fully-Connected DNNs
  • proof
  • ...and 12 more