Table of Contents
Fetching ...

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Zerun Wang, Liuyu Xiang, Lang Huang, Jiafeng Mao, Ling Xiao, Toshihiko Yamasaki

TL;DR

The paper tackles overtrust in open-set semi-supervised learning by reframing OSSL as a $K+1$-class problem, where OOD samples are treated as an additional labeled class. It introduces an OOD memory queue to curate reliable OOD examples and a simultaneous close-set/open-set self-training (SCO) approach to integrate the two tasks on a single $K+1$-class head. Empirical results across five OSSL benchmarks show substantial improvements over state-of-the-art methods, with ablations validating the effectiveness of the memory queue and SCO training. This approach enables more effective utilization of open-set unlabeled data without extra manual labeling, improving both ID classification and OOD detection performance.

Abstract

Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization.

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

TL;DR

The paper tackles overtrust in open-set semi-supervised learning by reframing OSSL as a -class problem, where OOD samples are treated as an additional labeled class. It introduces an OOD memory queue to curate reliable OOD examples and a simultaneous close-set/open-set self-training (SCO) approach to integrate the two tasks on a single -class head. Empirical results across five OSSL benchmarks show substantial improvements over state-of-the-art methods, with ablations validating the effectiveness of the memory queue and SCO training. This approach enables more effective utilization of open-set unlabeled data without extra manual labeling, improving both ID classification and OOD detection performance.

Abstract

Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization.
Paper Structure (15 sections, 9 equations, 4 figures, 5 tables)

This paper contains 15 sections, 9 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison between prior OSSL methods and SCOMatch on CIFAR-10 with six ID classes. (a): Prior methods overtrust labeled ID data, leading to overfitted decision boundaries that self-training cannot rectify. This results in more false positive or negative OODs in the red circles of the confusion matrix. (b): SCOMatch selects reliable OOD samples for ($K$+1)-class SSL, achieving higher accuracy for both ID and OOD classes.
  • Figure 2: The training process of SCOMatch. The ($K$+1)-classification head is the only head structure in our model (the backbone is not depicted for simplification). (a) The OOD sample selection by our proposed OOD memory queue and corresponding update strategy. (b) The integration of the original $K$-class task and the new ($K$+1)-class SSL by our simultaneous close-set and open-set self-training. These two processes function concurrently with the same model, but we separate them for the clarity of explanation. Here, the animal classes are ID and others are OOD.
  • Figure 3: (a): The quality of OOD samples in the OOD memory queue during training. The grey dashed line represents the actual ratio of OOD samples in unlabeled data. (b): Correct and wrong pseudo-label number of four methods during training on CIFAR-10 with 50 labeled images.
  • Figure 4: t-SNE visualization results of randomly selected 100 samples from CIFAR-10 test data. Black dots denote the features of OOD samples. Other colors are ID samples.