AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning
Zhiyu Wu, Jinshi Cui
TL;DR
AllMatch tackles the underutilization of unlabeled data in semi-supervised learning by introducing two mechanisms: class-specific adaptive thresholds (CAT) that couple global learning status with per-class weight norms, and binary classification consistency (BCC) that leverages top-k candidate classes to supervise all unlabeled samples. CAT provides a global threshold and class-wise adjustments, enabling more accurate pseudo-label selection for hard classes, while BCC ensures consistent candidate–negative divisions across augmented views. Together, these strategies yield improved pseudo-label quality and full utilization of the unlabeled set, achieving state-of-the-art results on multiple balanced and imbalanced benchmarks. The approach demonstrates strong practical impact by robustly exploiting unlabeled data in diverse data regimes and architectures.
Abstract
Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.
