Table of Contents
Fetching ...

Revisiting Unknowns: Towards Effective and Efficient Open-Set Active Learning

Chen-Chen Zong, Yu-Qi Chi, Xie-Yang Wang, Yan Cui, Sheng-Jun Huang

TL;DR

This paper proposes E$^2$OAL (Effective and Efficient Open-set Active Learning), a unified and detector-free framework that fully exploits labeled unknowns for both stronger supervision and more reliable querying.

Abstract

Open-set active learning (OSAL) aims to identify informative samples for annotation when unlabeled data may contain previously unseen classes-a common challenge in safety-critical and open-world scenarios. Existing approaches typically rely on separately trained open-set detectors, introducing substantial training overhead and overlooking the supervisory value of labeled unknowns for improving known-class learning. In this paper, we propose E$^2$OAL (Effective and Efficient Open-set Active Learning), a unified and detector-free framework that fully exploits labeled unknowns for both stronger supervision and more reliable querying. E$^2$OAL first uncovers the latent class structure of unknowns through label-guided clustering in a frozen contrastively pre-trained feature space, optimized by a structure-aware F1-product objective. To leverage labeled unknowns, it employs a Dirichlet-calibrated auxiliary head that jointly models known and unknown categories, improving both confidence calibration and known-class discrimination. Building on this, a logit-margin purity score estimates the likelihood of known classes to construct a high-purity candidate pool, while an OSAL-specific informativeness metric prioritizes partially ambiguous yet reliable samples. These components together form a flexible two-stage query strategy with adaptive precision control and minimal hyperparameter sensitivity. Extensive experiments across multiple OSAL benchmarks demonstrate that E$^2$OAL consistently surpasses state-of-the-art methods in accuracy, efficiency, and query precision, highlighting its effectiveness and practicality for real-world applications. The code is available at github.com/chenchenzong/E2OAL.

Revisiting Unknowns: Towards Effective and Efficient Open-Set Active Learning

TL;DR

This paper proposes EOAL (Effective and Efficient Open-set Active Learning), a unified and detector-free framework that fully exploits labeled unknowns for both stronger supervision and more reliable querying.

Abstract

Open-set active learning (OSAL) aims to identify informative samples for annotation when unlabeled data may contain previously unseen classes-a common challenge in safety-critical and open-world scenarios. Existing approaches typically rely on separately trained open-set detectors, introducing substantial training overhead and overlooking the supervisory value of labeled unknowns for improving known-class learning. In this paper, we propose EOAL (Effective and Efficient Open-set Active Learning), a unified and detector-free framework that fully exploits labeled unknowns for both stronger supervision and more reliable querying. EOAL first uncovers the latent class structure of unknowns through label-guided clustering in a frozen contrastively pre-trained feature space, optimized by a structure-aware F1-product objective. To leverage labeled unknowns, it employs a Dirichlet-calibrated auxiliary head that jointly models known and unknown categories, improving both confidence calibration and known-class discrimination. Building on this, a logit-margin purity score estimates the likelihood of known classes to construct a high-purity candidate pool, while an OSAL-specific informativeness metric prioritizes partially ambiguous yet reliable samples. These components together form a flexible two-stage query strategy with adaptive precision control and minimal hyperparameter sensitivity. Extensive experiments across multiple OSAL benchmarks demonstrate that EOAL consistently surpasses state-of-the-art methods in accuracy, efficiency, and query precision, highlighting its effectiveness and practicality for real-world applications. The code is available at github.com/chenchenzong/E2OAL.
Paper Structure (19 sections, 11 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 11 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Per-round and mean test accuracy on CIFAR-100 (40 known / 60 unknown) using ResNet-18. $C_{known}$ excludes labeled unknowns, $C_{known{+}1}$ collapses them into a single "unknown" class, and $C_{all}$ leverages their true labels. $C_{all}$ consistently performs best, suggesting that preserving the latent structure of unknown classes benefits known-class learning.
  • Figure 2: Overview of the proposed E$^2$OAL framework. Each AL round consists of two stages: (1) Adaptive class estimation and calibration-aware training, where latent unknown classes are discovered via label-guided clustering and incorporated into model learning through Dirichlet-based auxiliary supervision; (2) Flexible two-stage query selection, where a high-purity candidate pool is first constructed using a purity score guided by a target query precision, followed by informativeness-driven sample selection.
  • Figure 3: Test accuracy across AL rounds under varying mismatch ratios on CIFAR-10/100 and Tiny-ImageNet.
  • Figure 4: Mean query precision vs. mean test accuracy across rounds under varying mismatch ratios on CIFAR-10/100 and Tiny-ImageNet.
  • Figure 5: Total training time (hours) on CIFAR-100 under a 40% mismatch ratio. Bars indicate actual training time with the average query precision annotated; the dashed line shows the approximate projection assuming a linear relationship between query precision and time, aligned to the precision level of "Uncertainty".
  • ...and 5 more figures