FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning
Jiayu Liu, Yong Wang, Nianbin Wang, Jing Yang, Xiaohui Tao
TL;DR
FedDW targets non-IID challenges in federated learning by enforcing IID-like class-structure through a consistency regularization between global soft-label matrices and the classifier weight-derived class relations. It defines DLE data, builds a global SL matrix, and regularizes the local classifier via Frobenius distance to align CR and SL under heterogeneity, with global aggregation and local updates guiding training. The authors provide convergence analysis, discuss regularization properties and convex approximations, and demonstrate through extensive experiments on MNIST, CIFAR-10/100, and IMDB that FedDW improves accuracy with negligible additional computation and communication, while maintaining compatibility with existing FL methods. The results suggest FedDW is scalable across client counts, rounds, and model architectures, making it a practical approach for large-scale heterogeneous FL improvements. Overall, FedDW offers a principled, efficient mechanism to preserve global class relationships in distributed learning, improving generalization under non-IID data distributions.
Abstract
Federated Learning (FL) is an innovative distributed machine learning paradigm that enables neural network training across devices without centralizing data. While this addresses issues of information sharing and data privacy, challenges arise from data heterogeneity across clients and increasing network scale, leading to impacts on model performance and training efficiency. Previous research shows that in IID environments, the parameter structure of the model is expected to adhere to certain specific consistency principles. Thus, identifying and regularizing these consistencies can mitigate issues from heterogeneous data. We found that both soft labels derived from knowledge distillation and the classifier head parameter matrix, when multiplied by their own transpose, capture the intrinsic relationships between data classes. These shared relationships suggest inherent consistency. Therefore, the work in this paper identifies the consistency between the two and leverages it to regulate training, underpinning our proposed FedDW framework. Experimental results show FedDW outperforms 10 state-of-the-art FL methods, improving accuracy by an average of 3% in highly heterogeneous settings. Additionally, we provide a theoretical proof that FedDW offers higher efficiency, with the additional computational load from backpropagation being negligible. The code is available at https://github.com/liuvvvvv1/FedDW.
