AUTO: Adaptive Outlier Optimization for Test-Time OOD Detection
Puning Yang, Jian Liang, Jie Cao, Ran He
TL;DR
This work formalizes Test-Time OOD Detection (TTOD) and proposes Adaptive Outlier Optimization (AUTO), a framework that dynamically updates an OOD detector using unlabeled test streams. AUTO combines an in-out-aware filter, a dynamic ID memory bank, and a prediction-alignment objective to annotate test samples as pseudo-ID or pseudo-OOD and update the model while mitigating catastrophic forgetting. Through extensive experiments on CV and NLP benchmarks, AUTO consistently improves OOD detection performance while preserving ID accuracy, including challenging multi-OOD and time-series OOD scenarios, and remains effective with parameter-efficient online updates. The framework offers practical, training-free deployment advantages and demonstrates strong generalization across modalities and deployment contexts.
Abstract
Out-of-distribution (OOD) detection aims to detect test samples that do not fall into any training in-distribution (ID) classes. Prior efforts focus on regularizing models with ID data only, largely underperforming counterparts that utilize auxiliary outliers. However, data safety and privacy make it infeasible to collect task-specific outliers in advance for different scenarios. Besides, using task-irrelevant outliers leads to inferior OOD detection performance. To address the above issue, we present a new setup called test-time OOD detection, which allows the deployed model to utilize real OOD data from the unlabeled data stream during testing. We propose Adaptive Outlier Optimization (AUTO) which allows for continuous adaptation of the OOD detector. Specifically, AUTO consists of three key components: 1) an in-out-aware filter to selectively annotate test samples with pseudo-ID and pseudo-OOD and ingeniously trigger the updating process while encountering each pseudo-OOD sample; 2) a dynamic-updated memory to overcome the catastrophic forgetting led by frequent parameter updates; 3) a prediction-aligning objective to calibrate the rough OOD objective during testing. Extensive experiments show that AUTO significantly improves OOD detection performance over state-of-the-art methods. Besides, evaluations on complicated scenarios (e.g. multi-OOD, time-series OOD) also conduct the superiority of AUTO.
