Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

Minh Duong Nguyen; Khanh Le; Khoi Do; Nguyen H. Tran; Duc Nguyen; Chien Trinh; Zhaohui Yang

Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

Minh Duong Nguyen, Khanh Le, Khoi Do, Nguyen H. Tran, Duc Nguyen, Chien Trinh, Zhaohui Yang

TL;DR

This work introduces a new approach to pFL design, namely Federated Learning with Layer-wise Aggregation via Gradient Analysis (FedLAG), utilizing the concept of gradient conflict at the layer level, which achieves superior convergence behavior compared with other baselines.

Abstract

In personalized Federated Learning (pFL), high data heterogeneity can cause significant gradient divergence across devices, adversely affecting the learning process. This divergence, especially when gradients from different users form an obtuse angle during aggregation, can negate progress, leading to severe weight and gradient update degradation. To address this issue, we introduce a new approach to pFL design, namely Federated Learning with Layer-wise Aggregation via Gradient Analysis (FedLAG), utilizing the concept of gradient conflict at the layer level. Specifically, when layer-wise gradients of different clients form acute angles, those gradients align in the same direction, enabling updates across different clients toward identifying client-invariant features. Conversely, when layer-wise gradient pairs make create obtuse angles, the layers tend to focus on client-specific tasks. In hindsights, FedLAG assigns layers for personalization based on the extent of layer-wise gradient conflicts. Specifically, layers with gradient conflicts are excluded from the global aggregation process. The theoretical evaluation demonstrates that when integrated into other pFL baselines, FedLAG enhances pFL performance by a certain margin. Therefore, our proposed method achieves superior convergence behavior compared with other baselines. Extensive experiments show that our FedLAG outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance performance.

Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

TL;DR

Abstract

Paper Structure (54 sections, 12 theorems, 61 equations, 17 figures, 8 tables, 2 algorithms)

This paper contains 54 sections, 12 theorems, 61 equations, 17 figures, 8 tables, 2 algorithms.

Introduction
Problem Formulation & Preliminaries
Notations
Problem Setup
Local Updates.
Negative Transfer and Gradient Conflicts in Multi-task Learning
Validation of Layer-wise gradient conflicts in FL
Methodology
Gradient Divergence Analysis
Layer-wise Personalized Model Aggregation
Theoretical Analysis
Layer-wise Loss Improvement
Convergence Analysis
Experiment Setup
Datasets
...and 39 more sections

Key Result

Lemma 5.1

Each user $u$ achieve an improvement in loss when using FedLAG over the vanilla FL approach as follows:

Figures (17)

Figure 1: The issue of gradient conflicts, we denote ideal gradient as the aggregated gradient when component gradients do not make any conflicts, i.e., when the conflicted gradient with angle more than $\pi/2$ is projected into $\pi/2$ as mentioned in 2020-MTL-PCGrad.
Figure 2: Illustration of gradient conflicts on 2D toy dataset with 1-layer 3-parameter network (left), and layer-wise gradient conflict of FedAvg during the training (right).
Figure 3: The FedLAG architecture. First, calculate the previous layer-wise gradient $h^{(r-1)}_{u}$ using received and stored models. Second, measure angles between pairs of gradient vectors, considering angles above $90$ degrees as conflicts. Thirdly, the $GC_\epsilon (l)$ (in Definition \ref{['def:conflict-score']}) score for layer $l$ increases by $1$ for each conflicted pair. The $k$ layers with highest $GC_\epsilon (l)$ are assigned to PL. The GAL and PL are used to assign the personalized and global aggregated layers.
Figure 4: The comparison of FedLAG with different fixed layer disentanglement. First-$K$ means we fix first $K$ layers for the disentanglement, Last-$K$ means we fix last $K$ layers, Middle-$K$ means we fix $K$ layers at the middle of the network.
Figure 5: Convergence of FedLAG integration.
...and 12 more figures

Theorems & Definitions (20)

Definition 2.1: Conflicting gradients 2020-MTL-GradientSurgery
Definition 2.2: Layer-wise Conflicting Gradients 2023-MTL-recon
Definition 4.1: $GC_{\xi}(l)$ score
Lemma 5.1: Personalization Improvement
Lemma 5.2: Generalization Improvement
Theorem 5.3
Remark 5.4
Remark 5.5
Remark 5.6
Remark 5.7
...and 10 more

Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

TL;DR

Abstract

Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (20)