Personalized Federated Learning via Backbone Self-Distillation
Pengju Wang, Bochao Liu, Dan Zeng, Chenggang Yan, Shiming Ge
TL;DR
This work tackles personalization in federated learning under data heterogeneity by introducing FedBSD, which splits each client model into a shared backbone and a private head. Only backbone weights are uploaded and aggregated to form a global backbone, while clients perform backbone self-distillation using the global backbone as teacher to improve their local backbone representations, with the private head remaining locally personalized. The approach is validated on multiple simulated and real-world datasets, outperforming 12 baselines and achieving strong personalization with reduced communication, demonstrating the practicality of feature-level backbone refinement for heterogeneous FL. The method is data-free (no external data required) and can be extended to other FL paradigms, with future work exploring integration with model compression. All results indicate that backbone self-distillation effectively mitigates client drift while preserving local personalization, enabling scalable, efficient personalized FL in practice.
Abstract
In practical scenarios, federated learning frequently necessitates training personalized models for each client using heterogeneous data. This paper proposes a backbone self-distillation approach to facilitate personalized federated learning. In this approach, each client trains its local model and only sends the backbone weights to the server. These weights are then aggregated to create a global backbone, which is returned to each client for updating. However, the client's local backbone lacks personalization because of the common representation. To solve this problem, each client further performs backbone self-distillation by using the global backbone as a teacher and transferring knowledge to update the local backbone. This process involves learning two components: the shared backbone for common representation and the private head for local personalization, which enables effective global knowledge transfer. Extensive experiments and comparisons with 12 state-of-the-art approaches demonstrate the effectiveness of our approach.
