Table of Contents
Fetching ...

AUTO: Adaptive Outlier Optimization for Test-Time OOD Detection

Puning Yang, Jian Liang, Jie Cao, Ran He

TL;DR

This work formalizes Test-Time OOD Detection (TTOD) and proposes Adaptive Outlier Optimization (AUTO), a framework that dynamically updates an OOD detector using unlabeled test streams. AUTO combines an in-out-aware filter, a dynamic ID memory bank, and a prediction-alignment objective to annotate test samples as pseudo-ID or pseudo-OOD and update the model while mitigating catastrophic forgetting. Through extensive experiments on CV and NLP benchmarks, AUTO consistently improves OOD detection performance while preserving ID accuracy, including challenging multi-OOD and time-series OOD scenarios, and remains effective with parameter-efficient online updates. The framework offers practical, training-free deployment advantages and demonstrates strong generalization across modalities and deployment contexts.

Abstract

Out-of-distribution (OOD) detection aims to detect test samples that do not fall into any training in-distribution (ID) classes. Prior efforts focus on regularizing models with ID data only, largely underperforming counterparts that utilize auxiliary outliers. However, data safety and privacy make it infeasible to collect task-specific outliers in advance for different scenarios. Besides, using task-irrelevant outliers leads to inferior OOD detection performance. To address the above issue, we present a new setup called test-time OOD detection, which allows the deployed model to utilize real OOD data from the unlabeled data stream during testing. We propose Adaptive Outlier Optimization (AUTO) which allows for continuous adaptation of the OOD detector. Specifically, AUTO consists of three key components: 1) an in-out-aware filter to selectively annotate test samples with pseudo-ID and pseudo-OOD and ingeniously trigger the updating process while encountering each pseudo-OOD sample; 2) a dynamic-updated memory to overcome the catastrophic forgetting led by frequent parameter updates; 3) a prediction-aligning objective to calibrate the rough OOD objective during testing. Extensive experiments show that AUTO significantly improves OOD detection performance over state-of-the-art methods. Besides, evaluations on complicated scenarios (e.g. multi-OOD, time-series OOD) also conduct the superiority of AUTO.

AUTO: Adaptive Outlier Optimization for Test-Time OOD Detection

TL;DR

This work formalizes Test-Time OOD Detection (TTOD) and proposes Adaptive Outlier Optimization (AUTO), a framework that dynamically updates an OOD detector using unlabeled test streams. AUTO combines an in-out-aware filter, a dynamic ID memory bank, and a prediction-alignment objective to annotate test samples as pseudo-ID or pseudo-OOD and update the model while mitigating catastrophic forgetting. Through extensive experiments on CV and NLP benchmarks, AUTO consistently improves OOD detection performance while preserving ID accuracy, including challenging multi-OOD and time-series OOD scenarios, and remains effective with parameter-efficient online updates. The framework offers practical, training-free deployment advantages and demonstrates strong generalization across modalities and deployment contexts.

Abstract

Out-of-distribution (OOD) detection aims to detect test samples that do not fall into any training in-distribution (ID) classes. Prior efforts focus on regularizing models with ID data only, largely underperforming counterparts that utilize auxiliary outliers. However, data safety and privacy make it infeasible to collect task-specific outliers in advance for different scenarios. Besides, using task-irrelevant outliers leads to inferior OOD detection performance. To address the above issue, we present a new setup called test-time OOD detection, which allows the deployed model to utilize real OOD data from the unlabeled data stream during testing. We propose Adaptive Outlier Optimization (AUTO) which allows for continuous adaptation of the OOD detector. Specifically, AUTO consists of three key components: 1) an in-out-aware filter to selectively annotate test samples with pseudo-ID and pseudo-OOD and ingeniously trigger the updating process while encountering each pseudo-OOD sample; 2) a dynamic-updated memory to overcome the catastrophic forgetting led by frequent parameter updates; 3) a prediction-aligning objective to calibrate the rough OOD objective during testing. Extensive experiments show that AUTO significantly improves OOD detection performance over state-of-the-art methods. Besides, evaluations on complicated scenarios (e.g. multi-OOD, time-series OOD) also conduct the superiority of AUTO.
Paper Structure (21 sections, 12 equations, 8 figures, 14 tables)

This paper contains 21 sections, 12 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Problem Formulation. In contrast to naive OOD detection, we consider more practical and challenging scenarios, resulting in more comprehensive evaluations.
  • Figure 2: Illustration of the Adaptive oUTlier Optimization (AUTO) framework. The key components include an in-out-aware filter, a dynamic ID memory bank, and a prediction-aligning objective. Different color means different operations at test time: Each sample is given the MSP score and judged by the filter. Then, according to the judgment, the sample will activate different operations. For instance, if it is recognized as a pseudo-ID sample, blue lines are activated: this sample will be utilized to replace the sample with the same label in the ID memory bank.
  • Figure 3: The distribution of MSP statistics is changing during testing, thus we update the OOD-aware margin and keep the ID-aware margin fixed.
  • Figure 4: (a): Models incur catastrophic forgetting due to constant updating, we mitigate the ID degradation with an ID memory. (b): We calibrate the objective of model and enhance ID and OOD performance further.
  • Figure 5: Calibration on the OOD objective. Based on a uniform vector, we propose to consider background information, leading to a more optimal objective that aligns with the model's intuition.
  • ...and 3 more figures