Table of Contents
Fetching ...

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

Jaehyuk Heo, Pilsung Kang

TL;DR

A novel query strategy, VLPure-AL, is proposed, which minimizes cost losses while reducing dependence on OOD samples, and achieves the lowest cost loss and highest performance across all scenarios.

Abstract

Active learning (AL) aims to enhance model performance by selectively collecting highly informative data, thereby minimizing annotation costs. However, in practical scenarios, unlabeled data may contain out-of-distribution (OOD) samples, which are not used for training, leading to wasted annotation costs if data is incorrectly selected. Therefore, to make active learning feasible in real-world applications, it is crucial to consider not only the informativeness of unlabeled samples but also their purity to determine whether they belong to the in-distribution (ID). Recent studies have applied AL under these assumptions, but challenges remain due to the trade-off between informativeness and purity, as well as the heavy dependence on OOD samples. These issues lead to the collection of OOD samples, resulting in a significant waste of annotation costs. To address these challenges, we propose a novel query strategy, VLPure-AL, which minimizes cost losses while reducing dependence on OOD samples. VLPure-AL sequentially evaluates the purity and informativeness of data. First, it utilizes a pre-trained vision-language model to detect and exclude OOD data with high accuracy by leveraging linguistic and visual information of ID data. Second, it selects highly informative data from the remaining ID data, and then the selected samples are annotated by human experts. Experimental results on datasets with various open-set conditions demonstrate that VLPure-AL achieves the lowest cost loss and highest performance across all scenarios. Code is available at https://github.com/DSBA-Lab/OpenAL.

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

TL;DR

A novel query strategy, VLPure-AL, is proposed, which minimizes cost losses while reducing dependence on OOD samples, and achieves the lowest cost loss and highest performance across all scenarios.

Abstract

Active learning (AL) aims to enhance model performance by selectively collecting highly informative data, thereby minimizing annotation costs. However, in practical scenarios, unlabeled data may contain out-of-distribution (OOD) samples, which are not used for training, leading to wasted annotation costs if data is incorrectly selected. Therefore, to make active learning feasible in real-world applications, it is crucial to consider not only the informativeness of unlabeled samples but also their purity to determine whether they belong to the in-distribution (ID). Recent studies have applied AL under these assumptions, but challenges remain due to the trade-off between informativeness and purity, as well as the heavy dependence on OOD samples. These issues lead to the collection of OOD samples, resulting in a significant waste of annotation costs. To address these challenges, we propose a novel query strategy, VLPure-AL, which minimizes cost losses while reducing dependence on OOD samples. VLPure-AL sequentially evaluates the purity and informativeness of data. First, it utilizes a pre-trained vision-language model to detect and exclude OOD data with high accuracy by leveraging linguistic and visual information of ID data. Second, it selects highly informative data from the remaining ID data, and then the selected samples are annotated by human experts. Experimental results on datasets with various open-set conditions demonstrate that VLPure-AL achieves the lowest cost loss and highest performance across all scenarios. Code is available at https://github.com/DSBA-Lab/OpenAL.
Paper Structure (16 sections, 9 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of data selection criterion for different active learning scenarios: (a) uses informativeness as the selection criteria for standard AL, while (b) and Ours use both informativeness and purity as the selection criteria for open-set AL. Unlike (b), Ours evaluates purity and informativeness sequentially.
  • Figure 2: Open-set AL process.
  • Figure 3: Comparison of wasted annotation costs in relation to total annotation costs incurred when applying various AL methods to CIFAR10 and CIFAR100 configured as open-set data. $C_\text{OOD}$ represents the cost incurred when selected data is OOD data, and $C_\text{ID}$ represents the cost for ID data. Standard AL methods such as CONF, LL, and CORESET result in higher cost losses compared to RANDOM due to the higher selection of OOD data. On the other hand, existing open-set AL methods like MQNet, EOAL, and LfOSA show less cost loss than RANDOM but still suffer significant losses relative to the total annotation costs.
  • Figure 4: Overview of evaluating the purity of an image in VLPure-AL. We utilize the CLIPN to perform zero-shot OOD detection. According to CLIPN, two text encoders and an image encoder compute an image's in-distribution (ID) probability. We additionally use visual similarity weights to update ID probability using labeled ID samples $S^{\text{ID}}$. Self-temperature tuning is used to find the optimal temperature parameter for improving zero-shot OOD performance.
  • Figure 5: Example of the data purity assessment process in VLPure-AL.
  • ...and 7 more figures