Table of Contents
Fetching ...

Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data

Yuting He, Yiqiang Chen, XiaoDong Yang, Hanchao Yu, Yi-Hua Huang, Yang Gu

TL;DR

This work addresses data heterogeneity in federated learning by tackling catastrophic forgetting of global knowledge during local updates. It introduces FedSSD, a selective self-distillation method that uses a credibility-aware distillation loss, with sample-level and class-level weighting derived from a server-side credibility matrix. The authors provide convergence analysis under standard assumptions and demonstrate through extensive experiments on CIFAR10/100 and TinyImageNet that FedSSD achieves faster convergence and better generalization than state-of-the-art FL methods. The approach preserves global knowledge while enabling local models to learn from local data, offering a practical and scalable enhancement for non-IID FL scenarios.

Abstract

Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize towards their own local optima and forget the global knowledge, resulting in performance degradation and convergence slowdown. Many existing works have attempted to address the non-IID issue by adding an extra global-model-based regularizing item to the local training but without an adaption scheme, which is not efficient enough to achieve high performance with deep learning models. In this paper, we propose a Selective Self-Distillation method for Federated learning (FedSSD), which imposes adaptive constraints on the local updates by self-distilling the global model's knowledge and selectively weighting it by evaluating the credibility at both the class and sample level. The convergence guarantee of FedSSD is theoretically analyzed and extensive experiments are conducted on three public benchmark datasets, which demonstrates that FedSSD achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.

Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data

TL;DR

This work addresses data heterogeneity in federated learning by tackling catastrophic forgetting of global knowledge during local updates. It introduces FedSSD, a selective self-distillation method that uses a credibility-aware distillation loss, with sample-level and class-level weighting derived from a server-side credibility matrix. The authors provide convergence analysis under standard assumptions and demonstrate through extensive experiments on CIFAR10/100 and TinyImageNet that FedSSD achieves faster convergence and better generalization than state-of-the-art FL methods. The approach preserves global knowledge while enabling local models to learn from local data, offering a practical and scalable enhancement for non-IID FL scenarios.

Abstract

Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize towards their own local optima and forget the global knowledge, resulting in performance degradation and convergence slowdown. Many existing works have attempted to address the non-IID issue by adding an extra global-model-based regularizing item to the local training but without an adaption scheme, which is not efficient enough to achieve high performance with deep learning models. In this paper, we propose a Selective Self-Distillation method for Federated learning (FedSSD), which imposes adaptive constraints on the local updates by self-distilling the global model's knowledge and selectively weighting it by evaluating the credibility at both the class and sample level. The convergence guarantee of FedSSD is theoretically analyzed and extensive experiments are conducted on three public benchmark datasets, which demonstrates that FedSSD achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.

Paper Structure

This paper contains 19 sections, 14 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: The framework of FedAvg.
  • Figure 2: The catastrophic forgetting issue on non-IID CIFAR10. Here, $Acc_G$ and $Acc_L$ denote the global test accuracy and the average local test accuracy, respectively.
  • Figure 3: The class-wise test accuracy of the global model and the local model on CIFAR10.
  • Figure 4: Examples of confusion matrices for training on local biased data on CIFAR10.
  • Figure 5: An overview of FedSSD in the heterogeneous setting.
  • ...and 4 more figures