Table of Contents
Fetching ...

OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary

Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, Nanyang Ye

TL;DR

OODD introduces a test-time OOD detection framework that builds a dynamic dictionary of latent OOD features managed by a priority queue, enabling calibration without training-time fine-tuning. It combines informative inliers to form a robust ID dictionary with a dynamic OOD dictionary to capture evolving outliers, leveraging cosine similarity for efficient OOD scoring and a Dual OOD Stabilization mechanism to reduce early-test fluctuations. The approach achieves state-of-the-art results on the OpenOOD benchmark across CIFAR and ImageNet scales, notably improving Far OOD detection (FPR95 and AUROC) while maintaining or improving near OOD performance, and offers a 3x speedup over KNN-based baselines. The method also demonstrates strong compatibility with CLIP-based encoders and post-hoc detectors, and is robust to temporal drift, making it practically impactful for open-world deployment where OOD distributions shift over time.

Abstract

Out-of-distribution (OOD) detection remains challenging for deep learning models, particularly when test-time OOD samples differ significantly from training outliers. We propose OODD, a novel test-time OOD detection method that dynamically maintains and updates an OOD dictionary without fine-tuning. Our approach leverages a priority queue-based dictionary that accumulates representative OOD features during testing, combined with an informative inlier sampling strategy for in-distribution (ID) samples. To ensure stable performance during early testing, we propose a dual OOD stabilization mechanism that leverages strategically generated outliers derived from ID data. To our best knowledge, extensive experiments on the OpenOOD benchmark demonstrate that OODD significantly outperforms existing methods, achieving a 26.0% improvement in FPR95 on CIFAR-100 Far OOD detection compared to the state-of-the-art approach. Furthermore, we present an optimized variant of the KNN-based OOD detection framework that achieves a 3x speedup while maintaining detection performance.

OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary

TL;DR

OODD introduces a test-time OOD detection framework that builds a dynamic dictionary of latent OOD features managed by a priority queue, enabling calibration without training-time fine-tuning. It combines informative inliers to form a robust ID dictionary with a dynamic OOD dictionary to capture evolving outliers, leveraging cosine similarity for efficient OOD scoring and a Dual OOD Stabilization mechanism to reduce early-test fluctuations. The approach achieves state-of-the-art results on the OpenOOD benchmark across CIFAR and ImageNet scales, notably improving Far OOD detection (FPR95 and AUROC) while maintaining or improving near OOD performance, and offers a 3x speedup over KNN-based baselines. The method also demonstrates strong compatibility with CLIP-based encoders and post-hoc detectors, and is robust to temporal drift, making it practically impactful for open-world deployment where OOD distributions shift over time.

Abstract

Out-of-distribution (OOD) detection remains challenging for deep learning models, particularly when test-time OOD samples differ significantly from training outliers. We propose OODD, a novel test-time OOD detection method that dynamically maintains and updates an OOD dictionary without fine-tuning. Our approach leverages a priority queue-based dictionary that accumulates representative OOD features during testing, combined with an informative inlier sampling strategy for in-distribution (ID) samples. To ensure stable performance during early testing, we propose a dual OOD stabilization mechanism that leverages strategically generated outliers derived from ID data. To our best knowledge, extensive experiments on the OpenOOD benchmark demonstrate that OODD significantly outperforms existing methods, achieving a 26.0% improvement in FPR95 on CIFAR-100 Far OOD detection compared to the state-of-the-art approach. Furthermore, we present an optimized variant of the KNN-based OOD detection framework that achieves a 3x speedup while maintaining detection performance.

Paper Structure

This paper contains 21 sections, 2 theorems, 15 equations, 7 figures, 8 tables.

Key Result

Theorem 1

Given the setup above, if $\hat{p}_{\text{out }}\left(\mathbf{z}_i\right)=\hat{c}_0 \mathbf{1}\left\{\hat{p}_{\text{in }}\left(\mathbf{z}_i ;\right)<\frac{\beta \varepsilon \hat{c}_0}{(1-\beta)(1-\varepsilon)}\right\}$, $\tilde{\varepsilon}>\frac{\eta_1}{\eta_1+\eta_2}$and $\lambda=$$-\sqrt[m-1]{\fr

Figures (7)

  • Figure 1: Before calibration, we dynamically feed the lower tail of the OOD score distribution into an OOD dictionary through a priority queue and then use this dictionary to calibrate the OOD scores.
  • Figure 2: Overview of the proposed OODD framework, which performs real-time, batch-wise updates to the OOD dictionary using a priority queue based on the left-tail distribution of OOD score. Using training ID samples, we generate multiple random crops for each sample sun2024diversity, which are then filtered at the patch and class levels. High-confidence inliers are selected to compute the latent OOD score, while low-confidence outliers initialize the priority queue, allowing for a more robust and adaptive OOD detection process.
  • Figure 3: Results of varying $\alpha$ on ImageNet-200 ID. IIS refers to the implementation of Informative Inlier Sampling based on the naive KNN method sun2022out. The value of $\alpha$ is on the horizontal axis, and the FPR95 is on the vertical axis.
  • Figure 4: Comparison of AUROC detection performance using different queue sizes.
  • Figure 5: AUROC comparison using different types of outliers (C-Out, T-Out, D-Out, None) to initialize the OOD dictionary.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Lemma 1