Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

Jaewon Jang; Bonjun Choi

Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

Jaewon Jang, Bonjun Choi

TL;DR

This paper tackles data and class heterogeneity in Federated Learning by using representation learning to decouple each client's model into a shared base and a local head, expressed as $\theta_i=(\theta_{i,b},\theta_{i,h})$. It introduces a dense partitioning of the base layer into $K$ sublayers and two layer-scheduling strategies, Vanilla (from shallow to deep) and Anti (from deep to shallow), training with frozen heads during rounds and only fine-tuning heads at the end. The results show that Vanilla Scheduling significantly reduces computational costs in early rounds while maintaining accuracy, whereas Anti Scheduling delivers the best accuracy under high data and class heterogeneity, particularly on CIFAR-100 and Tiny-ImageNet. This approach offers a practical, communication- and computation-efficient path to personalized federated learning with robust cross-client performance.

Abstract

Federated learning ensures the privacy of clients by conducting distributed training on individual client devices and sharing only the model weights with a central server. However, in real-world scenarios, the heterogeneity of data among clients necessitates appropriate personalization methods. In this paper, we aim to address this heterogeneity using a form of parameter decoupling known as representation learning. Representation learning divides deep learning models into 'base' and 'head' components. The base component, capturing common features across all clients, is shared with the server, while the head component, capturing unique features specific to individual clients, remains local. We propose a new representation learning-based approach that suggests decoupling the entire deep learning model into more densely divided parts with the application of suitable scheduling methods, which can benefit not only data heterogeneity but also class heterogeneity. In this paper, we compare and analyze two layer scheduling approaches, namely forward (\textit{Vanilla}) and backward (\textit{Anti}), in the context of data and class heterogeneity among clients. Our experimental results show that the proposed algorithm, when compared to existing personalized federated learning algorithms, achieves increased accuracy, especially under challenging conditions, while reducing computation costs.

Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

TL;DR

This paper tackles data and class heterogeneity in Federated Learning by using representation learning to decouple each client's model into a shared base and a local head, expressed as

. It introduces a dense partitioning of the base layer into

sublayers and two layer-scheduling strategies, Vanilla (from shallow to deep) and Anti (from deep to shallow), training with frozen heads during rounds and only fine-tuning heads at the end. The results show that Vanilla Scheduling significantly reduces computational costs in early rounds while maintaining accuracy, whereas Anti Scheduling delivers the best accuracy under high data and class heterogeneity, particularly on CIFAR-100 and Tiny-ImageNet. This approach offers a practical, communication- and computation-efficient path to personalized federated learning with robust cross-client performance.

Abstract

Paper Structure (18 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Federated Learning
Personalized Federated Learning
Meta Learning
Transfer Learning
Multi-task Learning
Representaion Learning
Proposed Algorithm
Method 1: Vanilla Scheduling
Method 2: Anti Scheduling
Experiments
Ablation Study
Comparison of Client-specific Accuracy
Estimation of Computational Cost for Each Algorithm
...and 3 more sections

Figures (7)

Figure 1: Illustration of the proposed Vanilla and Anti Scheduling algorithms. The upper shows Vanilla Scheduling, starting with the shallowest layer and advancing to deeper ones. The lower shows Anti Scheduling, which begins with the deepest layer and progresses inversely.
Figure 2: Results of sampling the CIFAR-10 dataset among 10 clients from a Dirichlet distribution. The Dirichlet parameter $\alpha$ is 0.1, generating highly heterogeneous data. However, the nature of the CIFAR-10 dataset does not offer much class heterogeneity.
Figure 3: This figure shows the average accuracy of clients on the CIFAR-100 dataset. As can be seen, the earlier round accuracy of our scheduling algorithm is lower than FedAvg and FedBABU. This is because, in the earlier rounds, not all base and head participate and the training is conducted using only a portion of the unfrozen base layers.
Figure 4: This figure represents the average accuracy of clients on the Tiny-ImageNet dataset, where the initial round accuracy of our scheduling algorithm is lower compared to FedAvg and FedBABU, consistent with its characteristics.
Figure 5: Comparison of client-specific accuracy in the CIFAR-100 dataset. This figure visualizes the accuracy of each client in ascending order.
...and 2 more figures

Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

TL;DR

Abstract

Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)