Table of Contents
Fetching ...

Optimizing Personalized Federated Learning through Adaptive Layer-Wise Learning

Weihang Chen, Cheng Yang, Jie Ren, Zhiqiang Li, Zheng Wang

TL;DR

FLAYER tackles non-IID data in federated learning by introducing a layer-wise personalized framework that adaptively combines global and local information across network layers. It integrates a performance-guided local aggregation, a layer-specific adaptive learning rate, and a layer-wise masking strategy to manage the flow of information and updates. The method demonstrates state-of-the-art results among personalized FL baselines across CV and NLP tasks, with notable improvements and reduced training cost, and shows robustness to data heterogeneity and scalability to larger client counts. By enabling on-demand global knowledge integration while preserving global information, FLAYER offers practical benefits for privacy-preserving distributed learning and can be generalized to improve other FL frameworks.

Abstract

Real-life deployment of federated Learning (FL) often faces non-IID data, which leads to poor accuracy and slow convergence. Personalized FL (pFL) tackles these issues by tailoring local models to individual data sources and using weighted aggregation methods for client-specific learning. However, existing pFL methods often fail to provide each local model with global knowledge on demand while maintaining low computational overhead. Additionally, local models tend to over-personalize their data during the training process, potentially dropping previously acquired global information. We propose FLAYER, a novel layer-wise learning method for pFL that optimizes local model personalization performance. FLAYER considers the different roles and learning abilities of neural network layers of individual local models. It incorporates global information for each local model as needed to initialize the local model cost-effectively. It then dynamically adjusts learning rates for each layer during local training, optimizing the personalized learning process for each local model while preserving global knowledge. Additionally, to enhance global representation in pFL, FLAYER selectively uploads parameters for global aggregation in a layer-wise manner. We evaluate FLAYER on four representative datasets in computer vision and natural language processing domains. Compared to six state-of-the-art pFL methods, FLAYER improves the inference accuracy, on average, by 5.40\% (up to 14.29\%).

Optimizing Personalized Federated Learning through Adaptive Layer-Wise Learning

TL;DR

FLAYER tackles non-IID data in federated learning by introducing a layer-wise personalized framework that adaptively combines global and local information across network layers. It integrates a performance-guided local aggregation, a layer-specific adaptive learning rate, and a layer-wise masking strategy to manage the flow of information and updates. The method demonstrates state-of-the-art results among personalized FL baselines across CV and NLP tasks, with notable improvements and reduced training cost, and shows robustness to data heterogeneity and scalability to larger client counts. By enabling on-demand global knowledge integration while preserving global information, FLAYER offers practical benefits for privacy-preserving distributed learning and can be generalized to improve other FL frameworks.

Abstract

Real-life deployment of federated Learning (FL) often faces non-IID data, which leads to poor accuracy and slow convergence. Personalized FL (pFL) tackles these issues by tailoring local models to individual data sources and using weighted aggregation methods for client-specific learning. However, existing pFL methods often fail to provide each local model with global knowledge on demand while maintaining low computational overhead. Additionally, local models tend to over-personalize their data during the training process, potentially dropping previously acquired global information. We propose FLAYER, a novel layer-wise learning method for pFL that optimizes local model personalization performance. FLAYER considers the different roles and learning abilities of neural network layers of individual local models. It incorporates global information for each local model as needed to initialize the local model cost-effectively. It then dynamically adjusts learning rates for each layer during local training, optimizing the personalized learning process for each local model while preserving global knowledge. Additionally, to enhance global representation in pFL, FLAYER selectively uploads parameters for global aggregation in a layer-wise manner. We evaluate FLAYER on four representative datasets in computer vision and natural language processing domains. Compared to six state-of-the-art pFL methods, FLAYER improves the inference accuracy, on average, by 5.40\% (up to 14.29\%).

Paper Structure

This paper contains 24 sections, 9 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The local learning process of FLAYER on $k$-th client during the $t$-th iteration. In the local initialization stage, FLAYER aggregates both local and global head layers based on the local model's performance from the previous iteration. The initialized local model is then trained on the local data, using an adaptive learning rate for each layer. Based on the parameter changes before and after local training, FLAYER constructs a masking matrix to identify and select essential parameters, with different proportions per layer, for updating the global model.
  • Figure 2: The average CKA similarities of the same layers in different local models with CIFAR-10 under $Dir(0.1)$.
  • Figure 3: The ablation study with CIFAR-100, conducted under a $Dir(0.1)$ distribution.