Table of Contents
Fetching ...

Incremental Self-training for Semi-supervised Learning

Jifeng Guo, Zhulin Liu, Tong Zhang, C. L. Philip Chen

TL;DR

Incremental Self-training (IST) tackles the dual challenges of noisy pseudo-labels and high training time in semi-supervised learning by treating unlabeled data incrementally. It clusters unlabeled samples to estimate certainty, then feeds high-certainty samples in batches via a sequential query list $\mathbf{Q}_{a}(t)$, focusing early learning on easier examples and reserving boundary cases for later refinement. The approach includes three stages—Initialization, Auxiliary Training Data Acquisition, and Classifier Updating—and supports both iterative and non-iterative backbones, achieving faster convergence and improved accuracy across multiple image datasets. Empirical results show IST outperforming state-of-the-art self-training on three tasks while reducing computational overhead, with ablations confirming the impact of clustering method choice and batch processing on performance. Overall, IST offers a practical, generalizable enhancement to SSL by leveraging structured unlabeled data utilization and progressive learning dynamics.

Abstract

Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.

Incremental Self-training for Semi-supervised Learning

TL;DR

Incremental Self-training (IST) tackles the dual challenges of noisy pseudo-labels and high training time in semi-supervised learning by treating unlabeled data incrementally. It clusters unlabeled samples to estimate certainty, then feeds high-certainty samples in batches via a sequential query list , focusing early learning on easier examples and reserving boundary cases for later refinement. The approach includes three stages—Initialization, Auxiliary Training Data Acquisition, and Classifier Updating—and supports both iterative and non-iterative backbones, achieving faster convergence and improved accuracy across multiple image datasets. Empirical results show IST outperforming state-of-the-art self-training on three tasks while reducing computational overhead, with ablations confirming the impact of clustering method choice and batch processing on performance. Overall, IST offers a practical, generalizable enhancement to SSL by leveraging structured unlabeled data utilization and progressive learning dynamics.

Abstract

Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.
Paper Structure (9 sections, 6 figures, 2 tables)

This paper contains 9 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Performance comparison between the ST and IST with two types of backbone. The iterative backbone is based on the FlexMatchzhang2021flexmatch using CIFAR-100, and the non-iterative backbone is based on BLS Chen2018Broad.
  • Figure 2: An Illustration of the unlabeled data utilization process. Here, the sample points with color represent being added to the unlabeled data pool for utilization. Cluster distributions are visualizations of query lists to facilitate the understanding of principles.
  • Figure 3: Comparison of ST and IST with iterative backbone.
  • Figure 4: Comparison of ST and IST with non-iterative backbone.
  • Figure 5: Comparison of accuracy and time in different clustering methods and data. In the figure, orange represents a decrease in accuracy.
  • ...and 1 more figures