Table of Contents
Fetching ...

HFedCKD: Toward Robust Heterogeneous Federated Learning via Data-free Knowledge Distillation and Two-way Contrast

Yiting Zheng, Bohan Lin, Jinqian Chen, Jihua Zhu

TL;DR

The paper tackles the challenge of heterogeneous federated learning under limited communication by introducing HFedCKD, a framework that combines data-free knowledge distillation with inverse probability weighting and hierarchical bidirectional contrastive learning. It stabilizes global knowledge transfer when client participation is sparse and models are diverse by dynamically reweighting client contributions (IPWD) and aligning feature spaces while preserving personalization through Encode-Global Alignment and Decode-History Alignment. Key contributions include a data-free KD scheme that mitigates generator bias, a two-way contrastive learning mechanism across multiple layers, and comprehensive experiments on image and IoT datasets demonstrating robustness to low participation and non-IID data. The work has practical impact for deploying robust, privacy-preserving FL in real-world, resource-constrained environments with heterogeneous devices and variable participation.

Abstract

Most current federated learning frameworks are modeled as static processes, ignoring the dynamic characteristics of the learning system. Under the limited communication budget of the central server, the flexible model architecture of a large number of clients participating in knowledge transfer requires a lower participation rate, active clients have uneven contributions, and the client scale seriously hinders the performance of FL. We consider a more general and practical federation scenario and propose a system heterogeneous federation method based on data-free knowledge distillation and two-way contrast (HFedCKD). We apply the Inverse Probability Weighted Distillation (IPWD) strategy to the data-free knowledge transfer framework. The generator completes the data features of the nonparticipating clients. IPWD implements a dynamic evaluation of the prediction contribution of each client under different data distributions. Based on the antibiased weighting of its prediction loss, the weight distribution of each client is effectively adjusted to fairly integrate the knowledge of participating clients. At the same time, the local model is split into a feature extractor and a classifier. Through differential contrast learning, the feature extractor is aligned with the global model in the feature space, while the classifier maintains personalized decision-making capabilities. HFedCKD effectively alleviates the knowledge offset caused by a low participation rate under data-free knowledge distillation and improves the performance and stability of the model. We conduct extensive experiments on image and IoT datasets to comprehensively evaluate and verify the generalization and robustness of the proposed HFedCKD framework.

HFedCKD: Toward Robust Heterogeneous Federated Learning via Data-free Knowledge Distillation and Two-way Contrast

TL;DR

The paper tackles the challenge of heterogeneous federated learning under limited communication by introducing HFedCKD, a framework that combines data-free knowledge distillation with inverse probability weighting and hierarchical bidirectional contrastive learning. It stabilizes global knowledge transfer when client participation is sparse and models are diverse by dynamically reweighting client contributions (IPWD) and aligning feature spaces while preserving personalization through Encode-Global Alignment and Decode-History Alignment. Key contributions include a data-free KD scheme that mitigates generator bias, a two-way contrastive learning mechanism across multiple layers, and comprehensive experiments on image and IoT datasets demonstrating robustness to low participation and non-IID data. The work has practical impact for deploying robust, privacy-preserving FL in real-world, resource-constrained environments with heterogeneous devices and variable participation.

Abstract

Most current federated learning frameworks are modeled as static processes, ignoring the dynamic characteristics of the learning system. Under the limited communication budget of the central server, the flexible model architecture of a large number of clients participating in knowledge transfer requires a lower participation rate, active clients have uneven contributions, and the client scale seriously hinders the performance of FL. We consider a more general and practical federation scenario and propose a system heterogeneous federation method based on data-free knowledge distillation and two-way contrast (HFedCKD). We apply the Inverse Probability Weighted Distillation (IPWD) strategy to the data-free knowledge transfer framework. The generator completes the data features of the nonparticipating clients. IPWD implements a dynamic evaluation of the prediction contribution of each client under different data distributions. Based on the antibiased weighting of its prediction loss, the weight distribution of each client is effectively adjusted to fairly integrate the knowledge of participating clients. At the same time, the local model is split into a feature extractor and a classifier. Through differential contrast learning, the feature extractor is aligned with the global model in the feature space, while the classifier maintains personalized decision-making capabilities. HFedCKD effectively alleviates the knowledge offset caused by a low participation rate under data-free knowledge distillation and improves the performance and stability of the model. We conduct extensive experiments on image and IoT datasets to comprehensively evaluate and verify the generalization and robustness of the proposed HFedCKD framework.

Paper Structure

This paper contains 19 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The comparison of performance between homogeneous and heterogeneous models on the CIFAR-100 dataset with 200 clients (left) and 500 clients (right).
  • Figure 2: In Fig.2a and 2d, we project the feature representation onto the 2D plane by t-SNE. The goal of the global model is to learn the knowledge of the client. When updated in each round, the generated model should be able to effectively capture the characteristics of each client. However, in fact, there is a large deviation in feature distribution between different clients. A low participation rate will cause the update of the global model to only reflect the feature distribution of participating clients, causing a bias in the global optimization goal. In Fig.2c, low participation rate makes the generator unable to obtain information from the features of non-participating clients, the quality of the generated pseudo samples decreases, and the feature diversity is lost, causing the update of the global model to deviate from the direction of global optimization.
  • Figure 3: HFedCKD framework, which consists of the following steps in each communication round: 1.initialize the global model $\theta_{g}^{r}$; 2.distribute $\theta_{gd}^{r}$ to all participating clients via knowledge transfer; 3.initialize the local model $\theta_{init}^{r}$; 4.decompose $\theta_{init}^{r}$ into a feature encode and a decode, and conduct reversed comparative learning with $\theta_{gd}^{r}$ and the historical local model $\theta_{k}^{r-1}$, obtaining the updated local model $\theta_{k}^r$; 5.perform unsupervised knowledge distillation based on IPWD to integrate $\theta_{i}^{r}, \theta_{j}^{r}, \theta_{k}^{r}$ with $\theta_{g}^{r}$, obtaining the updated global model $\theta_{g}^{r+1}$.
  • Figure 4: The performance of the limited model and unlimited heterogeneous FL method, Intuitively and synthetically.