Table of Contents
Fetching ...

PLayer-FL: A Principled Approach to Personalized Layer-wise Cross-Silo Federated Learning

Ahmed Elhussein, Gamze Gürsoy

TL;DR

This work tackles non-IID data in Federated Learning by proposing Principled Layer-wise-FL (PLayer-FL), a partial-FL method driven by a federation sensitivity metric computed after the first training epoch to identify which layers should be federated. The federation sensitivity metric correlates with established generalization measures across architectures and reveals a transition point where federation benefits plateau, enabling a principled layer-wise federation policy. Empirically, PLayer-FL outperforms standard FL and existing partial-FL baselines across diverse non-IID tasks, while also delivering fairer outcomes and stronger participation incentives. The approach is computationally efficient and broadly applicable to CNNs, Transformers, and FCNs in cross-silo settings, though it currently lacks formal theoretical guarantees and is primarily demonstrated in cross-silo rather than cross-device contexts.

Abstract

Non-identically distributed data is a major challenge in Federated Learning (FL). Personalized FL tackles this by balancing local model adaptation with global model consistency. One variant, partial FL, leverages the observation that early layers learn more transferable features by federating only early layers. However, current partial FL approaches use predetermined, architecture-specific rules to select layers, limiting their applicability. We introduce Principled Layer-wise-FL (PLayer-FL), which uses a novel federation sensitivity metric to identify layers that benefit from federation. This metric, inspired by model pruning, quantifies each layer's contribution to cross-client generalization after the first training epoch, identifying a transition point in the network where the benefits of federation diminish. We first demonstrate that our federation sensitivity metric shows strong correlation with established generalization measures across diverse architectures. Next, we show that PLayer-FL outperforms existing FL algorithms on a range of tasks, also achieving more uniform performance improvements across clients.

PLayer-FL: A Principled Approach to Personalized Layer-wise Cross-Silo Federated Learning

TL;DR

This work tackles non-IID data in Federated Learning by proposing Principled Layer-wise-FL (PLayer-FL), a partial-FL method driven by a federation sensitivity metric computed after the first training epoch to identify which layers should be federated. The federation sensitivity metric correlates with established generalization measures across architectures and reveals a transition point where federation benefits plateau, enabling a principled layer-wise federation policy. Empirically, PLayer-FL outperforms standard FL and existing partial-FL baselines across diverse non-IID tasks, while also delivering fairer outcomes and stronger participation incentives. The approach is computationally efficient and broadly applicable to CNNs, Transformers, and FCNs in cross-silo settings, though it currently lacks formal theoretical guarantees and is primarily demonstrated in cross-silo rather than cross-device contexts.

Abstract

Non-identically distributed data is a major challenge in Federated Learning (FL). Personalized FL tackles this by balancing local model adaptation with global model consistency. One variant, partial FL, leverages the observation that early layers learn more transferable features by federating only early layers. However, current partial FL approaches use predetermined, architecture-specific rules to select layers, limiting their applicability. We introduce Principled Layer-wise-FL (PLayer-FL), which uses a novel federation sensitivity metric to identify layers that benefit from federation. This metric, inspired by model pruning, quantifies each layer's contribution to cross-client generalization after the first training epoch, identifying a transition point in the network where the benefits of federation diminish. We first demonstrate that our federation sensitivity metric shows strong correlation with established generalization measures across diverse architectures. Next, we show that PLayer-FL outperforms existing FL algorithms on a range of tasks, also achieving more uniform performance improvements across clients.

Paper Structure

This paper contains 38 sections, 9 equations, 14 figures, 14 tables, 3 algorithms.

Figures (14)

  • Figure 1: Layer gradient variance after one epoch. All models identically initialized and independently trained on non-IID data.
  • Figure 2: Hessian eigenvalue sum after one epoch. Each model was identically initialized and trained on respective non-IID datasets.
  • Figure 3: Model representation similarity by layer. Models identically initialized and independently trained on non-IID data.
  • Figure 4: Federation sensitivity after one epoch. All models identically initialized and trained on non-IID data subsets. Values expressed as % of first layer value
  • Figure A.1: Layer gradient variance after one epoch. All models identically initialized and independently trained on non-IID data.
  • ...and 9 more figures