Table of Contents
Fetching ...

Personalized Federated Learning via Backbone Self-Distillation

Pengju Wang, Bochao Liu, Dan Zeng, Chenggang Yan, Shiming Ge

TL;DR

This work tackles personalization in federated learning under data heterogeneity by introducing FedBSD, which splits each client model into a shared backbone and a private head. Only backbone weights are uploaded and aggregated to form a global backbone, while clients perform backbone self-distillation using the global backbone as teacher to improve their local backbone representations, with the private head remaining locally personalized. The approach is validated on multiple simulated and real-world datasets, outperforming 12 baselines and achieving strong personalization with reduced communication, demonstrating the practicality of feature-level backbone refinement for heterogeneous FL. The method is data-free (no external data required) and can be extended to other FL paradigms, with future work exploring integration with model compression. All results indicate that backbone self-distillation effectively mitigates client drift while preserving local personalization, enabling scalable, efficient personalized FL in practice.

Abstract

In practical scenarios, federated learning frequently necessitates training personalized models for each client using heterogeneous data. This paper proposes a backbone self-distillation approach to facilitate personalized federated learning. In this approach, each client trains its local model and only sends the backbone weights to the server. These weights are then aggregated to create a global backbone, which is returned to each client for updating. However, the client's local backbone lacks personalization because of the common representation. To solve this problem, each client further performs backbone self-distillation by using the global backbone as a teacher and transferring knowledge to update the local backbone. This process involves learning two components: the shared backbone for common representation and the private head for local personalization, which enables effective global knowledge transfer. Extensive experiments and comparisons with 12 state-of-the-art approaches demonstrate the effectiveness of our approach.

Personalized Federated Learning via Backbone Self-Distillation

TL;DR

This work tackles personalization in federated learning under data heterogeneity by introducing FedBSD, which splits each client model into a shared backbone and a private head. Only backbone weights are uploaded and aggregated to form a global backbone, while clients perform backbone self-distillation using the global backbone as teacher to improve their local backbone representations, with the private head remaining locally personalized. The approach is validated on multiple simulated and real-world datasets, outperforming 12 baselines and achieving strong personalization with reduced communication, demonstrating the practicality of feature-level backbone refinement for heterogeneous FL. The method is data-free (no external data required) and can be extended to other FL paradigms, with future work exploring integration with model compression. All results indicate that backbone self-distillation effectively mitigates client drift while preserving local personalization, enabling scalable, efficient personalized FL in practice.

Abstract

In practical scenarios, federated learning frequently necessitates training personalized models for each client using heterogeneous data. This paper proposes a backbone self-distillation approach to facilitate personalized federated learning. In this approach, each client trains its local model and only sends the backbone weights to the server. These weights are then aggregated to create a global backbone, which is returned to each client for updating. However, the client's local backbone lacks personalization because of the common representation. To solve this problem, each client further performs backbone self-distillation by using the global backbone as a teacher and transferring knowledge to update the local backbone. This process involves learning two components: the shared backbone for common representation and the private head for local personalization, which enables effective global knowledge transfer. Extensive experiments and comparisons with 12 state-of-the-art approaches demonstrate the effectiveness of our approach.
Paper Structure (11 sections, 9 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 9 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: To facilitate federated learning on heterogeneous data in practice, our proposed approach divides each client model into a shared backbone and a personalized head. Only the shared backbone is communicated between the client and the server. Furthermore, each client updates its model by employing self-distillation to mitigate accuracy degradation.
  • Figure 2: The framework of backbone self-distillation. Firstly, each client $C_k$ divides its local model $\bm{w}_k$ into a shared backbone $\bm{w}_{k,b}$ and a private head $\bm{w}_{k,h}$, and communicates only the shared backbone $\bm{w}_{k,b}$ with server $S$. Secondly, the server aggregates the shared backbone $\{\bm{w}_{k,b}\}_{k=1}^{n}$ to form a global backbone $\bm{w}_g$ and sends it back to each client. Finally, to mitigate accuracy degradation due to partial knowledge sharing, each client $C_k$ performs self-distillation on its local data $\mathbb{D}_k$ by transferring knowledge from the teacher $\bm{w}_{g}$ to the student $\bm{w}_{k,b}$, and trains the private head $\bm{w}_{k,h}$ to personalize the local model.
  • Figure 3: Test accuracy of various approaches on DomainNet (left) and Digits (right).
  • Figure 4: Ablation studies for data heterogeneity, training epochs, and communication rounds on CIFAR10 (left) and CIFAR100 (right).