Spectral Co-Distillation for Personalized Federated Learning
Zihan Chen, Howard H. Yang, Tony Q. S. Quek, Kai Fong Ernest Chong
TL;DR
This work tackles data heterogeneity in personalized federated learning by introducing spectral co-distillation, which uses the Fourier spectrum of model parameters to couple generic and personalized models in a bi-directional distillation framework. It defines spectrum-based regularizers with both full and truncated spectra and couples GM and PM training through two alternating distillation losses, complemented by a wait-free local training protocol that eliminates idle time during communication rounds. Empirical results on CIFAR-10/100 and iNaturalist-2017 show state-of-the-art GM and PM performance under non-IID settings, as well as substantial reductions in total training time due to the wait-free protocol. The proposed approach provides a principled, spectrum-driven method to improve personalization while maintaining strong global Generalization, with practical benefits for real-world heterogeneous data deployments.
Abstract
Personalized federated learning (PFL) has been widely investigated to address the challenge of data heterogeneity, especially when a single generic model is inadequate in satisfying the diverse performance requirements of local clients simultaneously. Existing PFL methods are inherently based on the idea that the relations between the generic global and personalized local models are captured by the similarity of model weights. Such a similarity is primarily based on either partitioning the model architecture into generic versus personalized components, or modeling client relationships via model weights. To better capture similar (yet distinct) generic versus personalized model representations, we propose \textit{spectral distillation}, a novel distillation method based on model spectrum information. Building upon spectral distillation, we also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training. Moreover, to utilize the local idle time in conventional PFL, we propose a wait-free local training protocol. Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.
