Federated Learning on Non-iid Data via Local and Global Distillation
Xiaolin Zheng, Senci Ying, Fei Zheng, Jianwei Yin, Longfei Zheng, Chaochao Chen, Fengqin Dong
TL;DR
This paper addresses non-iid data challenges in federated learning by proposing FedSND, which combines client-side self-distillation with server-side noise distillation to mitigate local overfitting and global weights-shift. The method generates adaptive noisy pseudo-samples and distills knowledge across clients, enabling more robust global aggregation without relying on shared data. Experiments across vision and NLP tasks show FedSND achieves higher accuracy and better communication efficiency than state-of-the-art baselines, with ablation studies confirming the complementary benefits of both distillation modules. The work offers a practical, data-agnostic approach to improve federated learning under realistic data heterogeneity.
Abstract
Most existing federated learning algorithms are based on the vanilla FedAvg scheme. However, with the increase of data complexity and the number of model parameters, the amount of communication traffic and the number of iteration rounds for training such algorithms increases significantly, especially in non-independently and homogeneously distributed scenarios, where they do not achieve satisfactory performance. In this work, we propose FedND: federated learning with noise distillation. The main idea is to use knowledge distillation to optimize the model training process. In the client, we propose a self-distillation method to train the local model. In the server, we generate noisy samples for each client and use them to distill other clients. Finally, the global model is obtained by the aggregation of local models. Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.
