Table of Contents
Fetching ...

A Unified Approach Towards Active Learning and Out-of-Distribution Detection

Sebastian Schmidt, Leonard Schenk, Leo Schwinn, Stephan Günnemann

TL;DR

The paper tackles the open-world challenge of needing labeled data for training while operating under unlabeled data and potentially unfamiliar inputs. It introduces SISOM, a unified framework that uses enriched feature-space distances, gradient-weighted feature representations, and a distance-ratio based sampling strategy to address both AL and OOD detection in a single module. A latent-space analysis and the self-balancing variant SISOMe further integrate uncertainty and diversity, achieving top ranks on OpenOOD benchmarks and strong AL performance across multiple datasets. This unified approach reduces deployment overhead, offers post-training latent-space refinement, and provides practical insights into the ambiguity between unlabeled and near-OOD data, with promising directions for open-set AL and batch diversification in future work.

Abstract

When applying deep learning models in open-world scenarios, active learning (AL) strategies are crucial for identifying label candidates from a nearly infinite amount of unlabeled data. In this context, robust out-of-distribution (OOD) detection mechanisms are essential for handling data outside the target distribution of the application. However, current works investigate both problems separately. In this work, we introduce SISOM as the first unified solution for both AL and OOD detection. By leveraging feature space distance metrics SISOM combines the strengths of the currently independent tasks to solve both effectively. We conduct extensive experiments showing the problems arising when migrating between both tasks. In these evaluations SISOM underlined its effectiveness by achieving first place in two of the widely used OpenOOD benchmarks and second place in the remaining one. In AL, SISOM outperforms others and delivers top-1 performance in three benchmarks

A Unified Approach Towards Active Learning and Out-of-Distribution Detection

TL;DR

The paper tackles the open-world challenge of needing labeled data for training while operating under unlabeled data and potentially unfamiliar inputs. It introduces SISOM, a unified framework that uses enriched feature-space distances, gradient-weighted feature representations, and a distance-ratio based sampling strategy to address both AL and OOD detection in a single module. A latent-space analysis and the self-balancing variant SISOMe further integrate uncertainty and diversity, achieving top ranks on OpenOOD benchmarks and strong AL performance across multiple datasets. This unified approach reduces deployment overhead, offers post-training latent-space refinement, and provides practical insights into the ambiguity between unlabeled and near-OOD data, with promising directions for open-set AL and batch diversification in future work.

Abstract

When applying deep learning models in open-world scenarios, active learning (AL) strategies are crucial for identifying label candidates from a nearly infinite amount of unlabeled data. In this context, robust out-of-distribution (OOD) detection mechanisms are essential for handling data outside the target distribution of the application. However, current works investigate both problems separately. In this work, we introduce SISOM as the first unified solution for both AL and OOD detection. By leveraging feature space distance metrics SISOM combines the strengths of the currently independent tasks to solve both effectively. We conduct extensive experiments showing the problems arising when migrating between both tasks. In these evaluations SISOM underlined its effectiveness by achieving first place in two of the widely used OpenOOD benchmarks and second place in the remaining one. In AL, SISOM outperforms others and delivers top-1 performance in three benchmarks
Paper Structure (19 sections, 14 equations, 13 figures, 10 tables)

This paper contains 19 sections, 14 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: CIFAR-10 UMAP plot of unlabeled, near, and far OOD data compared to labeled data. For details, see \ref{['ap:latenspaceVisu']}.
  • Figure 2: Real-world application life cycle comprising active learning in the training phase (left) and out-of-distribution detection in the operation phase (right).
  • Figure 3: SISOM framework for OOD detection and AL combined.
  • Figure 4: Density plots for SISOM with energy, Optimal Sigmoid Steepness (OS) and Reduced Subset Selection (RS) on CIFAR-100 with near-OOD (nOOD) and far-OOD (fOOD) as defined in OpenOOD.
  • Figure 5: T-SNE feature space comparison of Loss Learning, CoreSet, and SISOM for SVHN on cycle 1. SISOM effectively targets the areas in-between the clusters.
  • ...and 8 more figures