Table of Contents
Fetching ...

Bidirectional Uncertainty-Based Active Learning for Open Set Annotation

Chen-Chen Zong, Ye-Wen Wang, Kun-Peng Ning, Hai-Bo Ye, Sheng-Jun Huang

TL;DR

This paper attempts to query examples that are both likely from known classes and highly informative, and proposes a Bidirectional Uncertainty-based Active Learning (BUAL) framework, and proposes a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling.

Abstract

Active learning (AL) in open set scenarios presents a novel challenge of identifying the most valuable examples in an unlabeled data pool that comprises data from both known and unknown classes. Traditional methods prioritize selecting informative examples with low confidence, with the risk of mistakenly selecting unknown-class examples with similarly low confidence. Recent methods favor the most probable known-class examples, with the risk of picking simple already mastered examples. In this paper, we attempt to query examples that are both likely from known classes and highly informative, and propose a Bidirectional Uncertainty-based Active Learning (BUAL) framework. Specifically, we achieve this by first pushing the unknown class examples toward regions with high-confidence predictions, i.e., the proposed Random Label Negative Learning method. Then, we propose a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling. BUAL successfully extends existing uncertainty-based AL methods to complex open-set scenarios. Extensive experiments on multiple datasets with varying openness demonstrate that BUAL achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/BUAL.

Bidirectional Uncertainty-Based Active Learning for Open Set Annotation

TL;DR

This paper attempts to query examples that are both likely from known classes and highly informative, and proposes a Bidirectional Uncertainty-based Active Learning (BUAL) framework, and proposes a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling.

Abstract

Active learning (AL) in open set scenarios presents a novel challenge of identifying the most valuable examples in an unlabeled data pool that comprises data from both known and unknown classes. Traditional methods prioritize selecting informative examples with low confidence, with the risk of mistakenly selecting unknown-class examples with similarly low confidence. Recent methods favor the most probable known-class examples, with the risk of picking simple already mastered examples. In this paper, we attempt to query examples that are both likely from known classes and highly informative, and propose a Bidirectional Uncertainty-based Active Learning (BUAL) framework. Specifically, we achieve this by first pushing the unknown class examples toward regions with high-confidence predictions, i.e., the proposed Random Label Negative Learning method. Then, we propose a Bidirectional Uncertainty sampling strategy by jointly estimating uncertainty posed by both positive and negative learning to perform consistent and stable sampling. BUAL successfully extends existing uncertainty-based AL methods to complex open-set scenarios. Extensive experiments on multiple datasets with varying openness demonstrate that BUAL achieves state-of-the-art performance. The code is available at https://github.com/chenchenzong/BUAL.
Paper Structure (10 sections, 6 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 6 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The statistics of prediction confidence before and after fine-tuning the model. In the zoomed-in area of Figure \ref{['fig.2b']}, we swapped the display order of the two to prevent occlusion, allowing for a more intuitive view of how the distribution has changed.
  • Figure 2: The framework of BUAL. A two-stage $K$ class classifier is maintained, where the first stage is trained in a normal manner saved as $f_p(\cdot)$ and the second stage is trained using the proposed random label negative learning method denoted as $f_n(\cdot)$. An auxiliary $K+1$ class classifier $f_{aux}(\cdot)$ is trained in parallel. By collecting the predicted uncertainty from $f_p(\cdot)$ and $f_n(\cdot)$ on each candidate example along with the global and local balancing factors, the proposed bidirectional sampling strategy can accurately estimate the potential utility of each example and perform effective sample sampling under complex open-set scenarios.
  • Figure 3: Use all labels per iteration for negative learning (left) vs. use one random label per iteration for negative learning (right).
  • Figure 4: The possible RLNL update scenario for unlabeled unknown class data in batch deep learning manner. The green "$\longrightarrow$" is the batch update gradient produced by the example itself, and the purple "$\longrightarrow$" is the update gradient produced by labeled data. Initially, the decision boundary is close to the left-hand category.
  • Figure 5: The t-SNE feature visualization of labeled data, unlabeled known class data, and unlabeled unknown class data on CIFAR-10 with an openness ratio of 0.5 before and after performing RLNL. For a more intuitive visualization, we only show a single known class. More visualization results are shown in the supplementary file.
  • ...and 3 more figures