SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
Zerun Wang, Liuyu Xiang, Lang Huang, Jiafeng Mao, Ling Xiao, Toshihiko Yamasaki
TL;DR
The paper tackles overtrust in open-set semi-supervised learning by reframing OSSL as a $K+1$-class problem, where OOD samples are treated as an additional labeled class. It introduces an OOD memory queue to curate reliable OOD examples and a simultaneous close-set/open-set self-training (SCO) approach to integrate the two tasks on a single $K+1$-class head. Empirical results across five OSSL benchmarks show substantial improvements over state-of-the-art methods, with ablations validating the effectiveness of the memory queue and SCO training. This approach enables more effective utilization of open-set unlabeled data without extra manual labeling, improving both ID classification and OOD detection performance.
Abstract
Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization.
