Federated Learning on Non-iid Data via Local and Global Distillation

Xiaolin Zheng; Senci Ying; Fei Zheng; Jianwei Yin; Longfei Zheng; Chaochao Chen; Fengqin Dong

Federated Learning on Non-iid Data via Local and Global Distillation

Xiaolin Zheng, Senci Ying, Fei Zheng, Jianwei Yin, Longfei Zheng, Chaochao Chen, Fengqin Dong

TL;DR

This paper addresses non-iid data challenges in federated learning by proposing FedSND, which combines client-side self-distillation with server-side noise distillation to mitigate local overfitting and global weights-shift. The method generates adaptive noisy pseudo-samples and distills knowledge across clients, enabling more robust global aggregation without relying on shared data. Experiments across vision and NLP tasks show FedSND achieves higher accuracy and better communication efficiency than state-of-the-art baselines, with ablation studies confirming the complementary benefits of both distillation modules. The work offers a practical, data-agnostic approach to improve federated learning under realistic data heterogeneity.

Abstract

Most existing federated learning algorithms are based on the vanilla FedAvg scheme. However, with the increase of data complexity and the number of model parameters, the amount of communication traffic and the number of iteration rounds for training such algorithms increases significantly, especially in non-independently and homogeneously distributed scenarios, where they do not achieve satisfactory performance. In this work, we propose FedND: federated learning with noise distillation. The main idea is to use knowledge distillation to optimize the model training process. In the client, we propose a self-distillation method to train the local model. In the server, we generate noisy samples for each client and use them to distill other clients. Finally, the global model is obtained by the aggregation of local models. Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.

Federated Learning on Non-iid Data via Local and Global Distillation

TL;DR

Abstract

Paper Structure (19 sections, 10 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 10 equations, 10 figures, 1 table, 2 algorithms.

Introduction
Related Work
Federated Learning with Noise Distillation
Problem Statement
Federated Averaging (FedAvg)
The Non-iid Problem
Framework Overview
Self Distillation
Noisy sample Generation
Noise Distillation
Summary
Experiments
Experimental Setup
Accuracy Comparison
Communication Efficiency
...and 4 more sections

Figures (10)

Figure 1: Multiple animal protection organizations need to jointly build machine learning models for animal identification. However, due to environmental and human factors, the animal data collected by different local organizations varies significantly.
Figure 2: FedSND framework overview: Our algorithm runs in two parts: several clients train the same local model on different datasets, and optimize the training through the proposed self distillation module. After several rounds of training, the client uploads the model to the server. The server distills the received model and aggregates it to generate a global model. Finally, the client downloads the latest global model as a new round of local model to continue training.
Figure 3: Self Distillation: The client model is composed of three sub-models with the same structure. Two of them are different in the parameters of the dropout layer, while the other model is a model that has been trained in the previous epoch. Samples were simultaneously trained on three sub-models and distilled by KL loss.
Figure 4: Noise Generation: First, the noise is sampled from the random distribution and output through the client model as a train sample. Then, the constructed loss function $L_e$ is used for reverse updating to obtain a noisy sample that approximates the real sample.
Figure 5: Noise Distillation: The server will first randomly select the client models participating in distillation, and generate noisy samples for each selected model. Then, the noisy samples are used as training data to distill other client models in a cross method. The noisy sample has the information of its own client model and can play the role of models' parameters normalization.
...and 5 more figures

Federated Learning on Non-iid Data via Local and Global Distillation

TL;DR

Abstract

Federated Learning on Non-iid Data via Local and Global Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)