Table of Contents
Fetching ...

SemiDFL: A Semi-Supervised Paradigm for Decentralized Federated Learning

Xinyang Liu, Pengchao Han, Xuan Li, Bo Liu

TL;DR

SemiDFL tackles the challenge of semi-supervised learning in decentralized federated learning where clients hold diverse labeled and unlabeled data under highly non-IID conditions. It introduces consensus in both model and data spaces by combining neighborhood pseudo-labeling, a diffusion-model based consensus for data synthesis, and adaptive aggregation to fuse neighbor models based on synthesized-data performance. The approach yields a unified consensus data space via diffusion-generated samples and a consensus model space through adaptive neighbor weighting, leading to improved classifier training without sharing raw data. Extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10 demonstrate that SemiDFL consistently outperforms existing SSL baselines in DFL settings and closely approaches the centralized upper bound, highlighting its practical potential for privacy-preserving, scalable SSL in distributed systems.

Abstract

Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semisupervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabeled data. We then design a consensusbased diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both IID and non-IID SSL scenarios.

SemiDFL: A Semi-Supervised Paradigm for Decentralized Federated Learning

TL;DR

SemiDFL tackles the challenge of semi-supervised learning in decentralized federated learning where clients hold diverse labeled and unlabeled data under highly non-IID conditions. It introduces consensus in both model and data spaces by combining neighborhood pseudo-labeling, a diffusion-model based consensus for data synthesis, and adaptive aggregation to fuse neighbor models based on synthesized-data performance. The approach yields a unified consensus data space via diffusion-generated samples and a consensus model space through adaptive neighbor weighting, leading to improved classifier training without sharing raw data. Extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10 demonstrate that SemiDFL consistently outperforms existing SSL baselines in DFL settings and closely approaches the centralized upper bound, highlighting its practical potential for privacy-preserving, scalable SSL in distributed systems.

Abstract

Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semisupervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabeled data. We then design a consensusbased diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both IID and non-IID SSL scenarios.

Paper Structure

This paper contains 28 sections, 10 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Framework of SemiDFL. (a) is an example of a decentralized communication topology employed in our experiments; (b) illustrates the overall process of the proposed SemiDFL, which consists of six main steps as indexed. Among these steps, steps 2, 4, and 5 are general ideas, while steps 1, 3 and 6 are the main contributions of our work. The detailed working flows are illustrated on the right of (b).
  • Figure 2: Accuracy versus non-IID degree.
  • Figure 3: Accuracy versus labeled data ratio ($\alpha=100$).