Incremental Self-training for Semi-supervised Learning
Jifeng Guo, Zhulin Liu, Tong Zhang, C. L. Philip Chen
TL;DR
Incremental Self-training (IST) tackles the dual challenges of noisy pseudo-labels and high training time in semi-supervised learning by treating unlabeled data incrementally. It clusters unlabeled samples to estimate certainty, then feeds high-certainty samples in batches via a sequential query list $\mathbf{Q}_{a}(t)$, focusing early learning on easier examples and reserving boundary cases for later refinement. The approach includes three stages—Initialization, Auxiliary Training Data Acquisition, and Classifier Updating—and supports both iterative and non-iterative backbones, achieving faster convergence and improved accuracy across multiple image datasets. Empirical results show IST outperforming state-of-the-art self-training on three tasks while reducing computational overhead, with ablations confirming the impact of clustering method choice and batch processing on performance. Overall, IST offers a practical, generalizable enhancement to SSL by leveraging structured unlabeled data utilization and progressive learning dynamics.
Abstract
Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.
