Table of Contents
Fetching ...

Robust Training of Federated Models with Extremely Label Deficiency

Yonggang Zhang, Zhiqin Yang, Xinmei Tian, Nannan Wang, Tongliang Liu, Bo Han

TL;DR

This work addresses the gradient conflict problem in federated semi-supervised learning under label deficiency by introducing Twin-sight, a twin-model paradigm that separately trains a supervised model and an unsupervised model to leverage labeled and unlabeled data without conflicting objectives. The interaction between the two models is enforced through a neighborhood-preserving loss that aligns their feature neighborhoods, enabling mutual guidance across perspectives. Extensive experiments on four datasets with varying degrees of label availability demonstrate that Twin-sight outperforms state-of-the-art baselines and exhibits robustness to data heterogeneity and client participation scenarios. The approach offers practical implications for scalable, privacy-preserving learning where labeling is scarce and data distributions are highly non-IID.

Abstract

Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving constraint, which encourages the preservation of the neighbourhood relationship among data features extracted by both models. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings, demonstrating the efficacy of the proposed Twin-sight.

Robust Training of Federated Models with Extremely Label Deficiency

TL;DR

This work addresses the gradient conflict problem in federated semi-supervised learning under label deficiency by introducing Twin-sight, a twin-model paradigm that separately trains a supervised model and an unsupervised model to leverage labeled and unlabeled data without conflicting objectives. The interaction between the two models is enforced through a neighborhood-preserving loss that aligns their feature neighborhoods, enabling mutual guidance across perspectives. Extensive experiments on four datasets with varying degrees of label availability demonstrate that Twin-sight outperforms state-of-the-art baselines and exhibits robustness to data heterogeneity and client participation scenarios. The approach offers practical implications for scalable, privacy-preserving learning where labeling is scarce and data distributions are highly non-IID.

Abstract

Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving constraint, which encourages the preservation of the neighbourhood relationship among data features extracted by both models. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings, demonstrating the efficacy of the proposed Twin-sight.
Paper Structure (24 sections, 11 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 24 sections, 11 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of Twin-sight. The framework illustrates the process for both fully-labeled and fully-unlabeled clients. Each client incorporates a supervised model and an unsupervised model. The supervised model undergoes supervised learning using either ground-truth labels or pseudo labels, while the unsupervised model performs self-supervised learning. This approach enables the generation of twin sights for each sample, capturing both supervised and unsupervised perspectives. Subsequently, these two models are aligned, leveraging the complementary information.
  • Figure 2: (a) The gradient similarity between two objective functions, i.e., defined on labeled and unlabeled data, throughout the training process. The figure demonstrates the gradient conflict. (b) Data heterogeneity under $Dir(\gamma=0.1)$. Each bubble indicates the number of $y$-th class at client $k$.
  • Figure 3: The data distribution of different clients under SVHN.
  • Figure 4: The blue and black areas in the figure correspond to the total amount of labeled and unlabeled data, respectively. The other two areas in the figure show the class distribution of a labeled client and an unlabeled client. This figure reveals that the challenges faced by FSSL are not limited to the problem of label scarcity, but also include the impact of data heterogeneity.
  • Figure 5: Different self-supervised model with our methods under the setting of $\gamma = 0.1, E = 1, K = 10$, CIFAR-10.
  • ...and 3 more figures