Table of Contents
Fetching ...

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

TL;DR

The paper addresses the realistic OSSL setting where unlabeled data include unseen classes and demonstrates that indiscriminately using all open-set data can harm ID generalization. It proposes WiseOpen, a gradient-variance-based data-selection framework, to wisefully leverage friendly open-set samples while discarding unfriendly ones; two practical variants WiseOpen-E and WiseOpen-L balance accuracy gains with computation. Theoretical analysis links gradient variance to generalization, and extensive experiments on CIFAR-10/100 and Tiny-ImageNet show consistent ID accuracy improvements and competitive OOD detection. The approach is designed as a plug-in module that can enhance existing OSSL methods like OpenMatch and IOMatch, offering a practical pathway toward more robust open-set learning in real-world data distributions.

Abstract

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

TL;DR

The paper addresses the realistic OSSL setting where unlabeled data include unseen classes and demonstrates that indiscriminately using all open-set data can harm ID generalization. It proposes WiseOpen, a gradient-variance-based data-selection framework, to wisefully leverage friendly open-set samples while discarding unfriendly ones; two practical variants WiseOpen-E and WiseOpen-L balance accuracy gains with computation. Theoretical analysis links gradient variance to generalization, and extensive experiments on CIFAR-10/100 and Tiny-ImageNet show consistent ID accuracy improvements and competitive OOD detection. The approach is designed as a plug-in module that can enhance existing OSSL methods like OpenMatch and IOMatch, offering a practical pathway toward more robust open-set learning in real-world data distributions.

Abstract

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.
Paper Structure (15 sections, 1 theorem, 28 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 15 sections, 1 theorem, 28 equations, 4 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

Under assumptions ass:grad, ass:grad:Fb, ass:smooth, ass:PL, we have the following ERB in expectation: (a) when all data are used: by setting $\eta \le \frac{1}{(1-\lambda)(\tau\epsilon + (1-\tau)\nu) L}$, we have (b) when only labeled data are used: by setting $\eta = \frac{2}{n\mu} \log\left( \frac{n\mu^2(\mathcal{L}(\theta_0)-\mathcal{L}(\theta_*))}{\sigma^2L}\right)$, we have (c) when labele

Figures (4)

  • Figure 1: An example of models' performance (testing accuracy on ID classification) with different strategies of using the open-set data (OS data) illustrates the effectiveness of selectively leveraging OS data during the training process. Experiments are conducted on Tiny-ImageNet at 120 seen classes with 50 labels for each class. We employ the following methods: (1) Labeled Only (w/o OS data), an SL method only trained with labeled data; (2) OpenMatch saito2021openmatch (w/ all OS data), an OSSL method trained with all OS data; and (3) WiseOpen-L on top of OpenMatch(w/ selected OS data), an OSSL method trained with selected OS data.
  • Figure 2: Performance of original OpenMatch and our proposed WiseOpen on top of OpenMatch. Experiments are conducted on CIFAR-100 with 100 labels per class. $\dagger$ means using Top-k threshold while $\ddagger$ means using Otsu threshold.
  • Figure 3: ID Classification performance of WiseOpen-E $\ddagger$ with different losses used in GV-SM on top of OpenMatch and IOMatch. Models are trained with CIFAR-100 at 50 labels.
  • Figure 4: Visualization of the confusion matrices on the unlabeled training set of CIFAR-10.

Theorems & Definitions (1)

  • Theorem 1