Table of Contents
Fetching ...

Inconsistency-Based Data-Centric Active Open-Set Annotation

Ruiyu Mao, Ouyang Xu, Yunhui Guo

TL;DR

NEAT introduces a data-centric approach to active open-set annotation by identifying known-class samples through label clusterability and selecting informative instances via inconsistency with local feature distributions. It leverages CLIP-based features to avoid training a separate detector, achieving higher accuracy, precision, and recall than learning-based baselines while offering significant computational savings. Theoretical analysis provides bounds on known-class detection error, and extensive experiments across CIFAR-10/100 and Tiny-ImageNet demonstrate strong, robust performance with efficient query cycles. The approach offers practical impact for open-world labeling where unknown classes are present and labeling budgets are constrained.

Abstract

Active learning is a commonly used approach that reduces the labeling effort required to train deep neural networks. However, the effectiveness of current active learning methods is limited by their closed-world assumptions, which assume that all data in the unlabeled pool comes from a set of predefined known classes. This assumption is often not valid in practical situations, as there may be unknown classes in the unlabeled data, leading to the active open-set annotation problem. The presence of unknown classes in the data can significantly impact the performance of existing active learning methods due to the uncertainty they introduce. To address this issue, we propose a novel data-centric active learning method called NEAT that actively annotates open-set data. NEAT is designed to label known classes data from a pool of both known and unknown classes unlabeled data. It utilizes the clusterability of labels to identify the known classes from the unlabeled pool and selects informative samples from those classes based on a consistency criterion that measures inconsistencies between model predictions and local feature distribution. Unlike the recently proposed learning-centric method for the same problem, NEAT is much more computationally efficient and is a data-centric active open-set annotation method. Our experiments demonstrate that NEAT achieves significantly better performance than state-of-the-art active learning methods for active open-set annotation.

Inconsistency-Based Data-Centric Active Open-Set Annotation

TL;DR

NEAT introduces a data-centric approach to active open-set annotation by identifying known-class samples through label clusterability and selecting informative instances via inconsistency with local feature distributions. It leverages CLIP-based features to avoid training a separate detector, achieving higher accuracy, precision, and recall than learning-based baselines while offering significant computational savings. Theoretical analysis provides bounds on known-class detection error, and extensive experiments across CIFAR-10/100 and Tiny-ImageNet demonstrate strong, robust performance with efficient query cycles. The approach offers practical impact for open-world labeling where unknown classes are present and labeling budgets are constrained.

Abstract

Active learning is a commonly used approach that reduces the labeling effort required to train deep neural networks. However, the effectiveness of current active learning methods is limited by their closed-world assumptions, which assume that all data in the unlabeled pool comes from a set of predefined known classes. This assumption is often not valid in practical situations, as there may be unknown classes in the unlabeled data, leading to the active open-set annotation problem. The presence of unknown classes in the data can significantly impact the performance of existing active learning methods due to the uncertainty they introduce. To address this issue, we propose a novel data-centric active learning method called NEAT that actively annotates open-set data. NEAT is designed to label known classes data from a pool of both known and unknown classes unlabeled data. It utilizes the clusterability of labels to identify the known classes from the unlabeled pool and selects informative samples from those classes based on a consistency criterion that measures inconsistencies between model predictions and local feature distribution. Unlike the recently proposed learning-centric method for the same problem, NEAT is much more computationally efficient and is a data-centric active open-set annotation method. Our experiments demonstrate that NEAT achieves significantly better performance than state-of-the-art active learning methods for active open-set annotation.
Paper Structure (24 sections, 2 theorems, 9 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 9 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

Given the assumption 0.1. and 0.2. and the number of neighbors $K$, the probability of making a detection error is upper-bounded as,

Figures (9)

  • Figure 1: Dataset consists of color images as known dog class and gray-scale images as unknown wolf class. Prior work using learning-based approach may identify some unknown classes as known classes. Our work focusing on local feature distribution can find known classes more accurately.
  • Figure 2: Neat achieves higher precision, recall and accuracy compared with existing active learning methods for active open-set annotation. We evaluated Neat and the baseline active learning methods on CIFAR10, CIFAR100 and Tiny-ImageNet based on accuracy, precision and recall.
  • Figure 3: Neat is effective compared with other active learning methods for deep neural networks.
  • Figure 4: Neat can accurately identify known classes from the unlabeled pool.
  • Figure 5: Neat achieves higher precision, recall and accuracy compared with existing active learning methods for active open-set annotation. We evaluate Neat and the baseline active learning methods on CIFAR10, CIFAR100 and Tiny-ImageNet based on accuracy, precision and recall.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 4.1
  • Theorem 4.1
  • Theorem 8.1