Table of Contents
Fetching ...

Spectral Co-Distillation for Personalized Federated Learning

Zihan Chen, Howard H. Yang, Tony Q. S. Quek, Kai Fong Ernest Chong

TL;DR

This work tackles data heterogeneity in personalized federated learning by introducing spectral co-distillation, which uses the Fourier spectrum of model parameters to couple generic and personalized models in a bi-directional distillation framework. It defines spectrum-based regularizers with both full and truncated spectra and couples GM and PM training through two alternating distillation losses, complemented by a wait-free local training protocol that eliminates idle time during communication rounds. Empirical results on CIFAR-10/100 and iNaturalist-2017 show state-of-the-art GM and PM performance under non-IID settings, as well as substantial reductions in total training time due to the wait-free protocol. The proposed approach provides a principled, spectrum-driven method to improve personalization while maintaining strong global Generalization, with practical benefits for real-world heterogeneous data deployments.

Abstract

Personalized federated learning (PFL) has been widely investigated to address the challenge of data heterogeneity, especially when a single generic model is inadequate in satisfying the diverse performance requirements of local clients simultaneously. Existing PFL methods are inherently based on the idea that the relations between the generic global and personalized local models are captured by the similarity of model weights. Such a similarity is primarily based on either partitioning the model architecture into generic versus personalized components, or modeling client relationships via model weights. To better capture similar (yet distinct) generic versus personalized model representations, we propose \textit{spectral distillation}, a novel distillation method based on model spectrum information. Building upon spectral distillation, we also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training. Moreover, to utilize the local idle time in conventional PFL, we propose a wait-free local training protocol. Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.

Spectral Co-Distillation for Personalized Federated Learning

TL;DR

This work tackles data heterogeneity in personalized federated learning by introducing spectral co-distillation, which uses the Fourier spectrum of model parameters to couple generic and personalized models in a bi-directional distillation framework. It defines spectrum-based regularizers with both full and truncated spectra and couples GM and PM training through two alternating distillation losses, complemented by a wait-free local training protocol that eliminates idle time during communication rounds. Empirical results on CIFAR-10/100 and iNaturalist-2017 show state-of-the-art GM and PM performance under non-IID settings, as well as substantial reductions in total training time due to the wait-free protocol. The proposed approach provides a principled, spectrum-driven method to improve personalization while maintaining strong global Generalization, with practical benefits for real-world heterogeneous data deployments.

Abstract

Personalized federated learning (PFL) has been widely investigated to address the challenge of data heterogeneity, especially when a single generic model is inadequate in satisfying the diverse performance requirements of local clients simultaneously. Existing PFL methods are inherently based on the idea that the relations between the generic global and personalized local models are captured by the similarity of model weights. Such a similarity is primarily based on either partitioning the model architecture into generic versus personalized components, or modeling client relationships via model weights. To better capture similar (yet distinct) generic versus personalized model representations, we propose \textit{spectral distillation}, a novel distillation method based on model spectrum information. Building upon spectral distillation, we also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training. Moreover, to utilize the local idle time in conventional PFL, we propose a wait-free local training protocol. Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.
Paper Structure (13 sections, 5 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 5 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Spectral co-distillation framework with wait-free local training for PFL, in which the generic model (GM) training and the personalized model (PM) training are carried out via spectral distillation in two different stages.
  • Figure 2: A comparison of the (a) conventional compute-and-wait protocol with the (b) proposed wait-free training protocol.
  • Figure 3: Performance comparison for generalizability on new clients of various methods.

Theorems & Definitions (1)

  • Remark 4.1