Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

Rongyu Zhang; Yun Chen; Chenrui Wu; Fangxin Wang; Bo Li

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

Rongyu Zhang, Yun Chen, Chenrui Wu, Fangxin Wang, Bo Li

TL;DR

This work tackles non-i.i.d. and long-tailed data in federated learning for autonomous driving by introducing MuPFL, a three-level personalized framework. Local Biased Activation Value Dropout (BAVD) reduces overfitting and speeds training, intermediate Adaptive Cluster-based Model Update (ACMU) dynamically clusters and pre-updates similar models, and the central server's Prior Knowledge-assisted Classifier Fine-tuning (PKCF) injects global knowledge into local classifiers to improve tail-class performance. The approach yields consistent accuracy improvements over strong baselines and substantially faster convergence, as demonstrated on image classification and Cityscapes semantic segmentation tasks, with ablations confirming the contributions of each module. Overall, MuPFL offers a scalable, efficient path to robust personalized learning under extreme data heterogeneity, enabling better real-world deployment in autonomous systems.

Abstract

Federated learning (FL) offers a privacy-centric distributed learning framework, enabling model training on individual clients and central aggregation without necessitating data exchange. Nonetheless, FL implementations often suffer from non-i.i.d. and long-tailed class distributions across mobile applications, e.g., autonomous vehicles, which leads models to overfitting as local training may converge to sub-optimal. In our study, we explore the impact of data heterogeneity on model bias and introduce an innovative personalized FL framework, Multi-level Personalized Federated Learning (MuPFL), which leverages the hierarchical architecture of FL to fully harness computational resources at various levels. This framework integrates three pivotal modules: Biased Activation Value Dropout (BAVD) to mitigate overfitting and accelerate training; Adaptive Cluster-based Model Update (ACMU) to refine local models ensuring coherent global aggregation; and Prior Knowledge-assisted Classifier Fine-tuning (PKCF) to bolster classification and personalize models in accord with skewed local data with shared knowledge. Extensive experiments on diverse real-world datasets for image classification and semantic segmentation validate that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions, which enhances accuracy by as much as 7.39% and accelerates training by up to 80% at most, marking significant advancements in both efficiency and effectiveness.

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 27 equations, 10 figures, 6 tables)

This paper contains 20 sections, 1 theorem, 27 equations, 10 figures, 6 tables.

Introduction
Related works
Federated learning
Long-tailed learning
Methodology: MuPFL
Motivations
Preliminary
Biased Activation Value Dropout
Adaptive Cluster-based Model Update
Prior Knowledge-assisted Classifier Fine-tuning
Computational complexity analysis
Convergence analysis
Experiments
Experimental setup
Quantitative analysis
...and 5 more sections

Key Result

Theorem 1

Combining the above assumption, we can derive the convergence bound of the proposed MuPFL for a client $i$ after local training epochs $E$ at global communication round $t+1$ as:

Figures (10)

Figure 1: Exhibition of a long-tailed training set data distribution in the iNaturalist 2018 dataset2018The with over 8,000 species, making it a challenging dataset for classification. Some species like sharks and iNaturalist 2018 dataset have many images, while others have very few.
Figure 2: The illustration of long-tailed data in personalized federated learning scenarios for autonomous driving.
Figure 3: In sub-figure (a), it shows the optimization path for federated learning with two clients (color in cyan and red) in the case where clients' update weights diverge (left) and similar (right). For FedAvg, two clients' update weights are diverged, which intuitively indicates that forcing model aggregation can easily trap the model in a local optimum. As for MuPFL, there is a region (marked in gray in the graph) where both client's risk functions are minimized (e.g., ${w}^{*}$) thus leading to a global optimal. Sub-figure (b) is the motivation experiments conducted on CIFAR-10.
Figure 4: The visualization analysis of the Class Activation Map (CAM). We adopt CAM to compare the attention of the baseline and our proposed MuPFL with BAVD focus on the main objects in the images.
Figure 5: We propose a personalized federated learning framework MuPFL for non-i.i.d. and Long-tailed data. It forms a training pipeline with three stages: 1) BAVD: local models use biased activation value dropout to tackle the overfitting problem; 2) PKCF: clients use global federated feature $h_{glo}$ to pre-train local classifiers; 3) BAVD: fine-tuning clients' feature extractor $g_{i}$.
...and 5 more figures

Theorems & Definitions (1)

Theorem 1

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

TL;DR

Abstract

Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (1)