Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift

Wooseok Lee; Jin Mo Yang; Saewoong Bahk; Hyung-Sin Kim

Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift

Wooseok Lee, Jin Mo Yang, Saewoong Bahk, Hyung-Sin Kim

Abstract

For reliable deployment of deep-learning systems, out-of-distribution (OOD) detection is indispensable. In the real world, where test-time inputs often arrive as streaming mixtures of in-distribution (ID) and OOD samples under evolving covariate shifts, OOD samples are domain-constrained and bounded by the environment, and both ID and OOD are jointly affected by the same covariate factors. Existing methods typically assume a stationary ID distribution, but this assumption breaks down in such settings, leading to severe performance degradation. We empirically discover that, even under covariate shift, covariate-shifted ID (csID) and OOD (csOOD) samples remain separable along a discriminative axis in feature space. Building on this observation, we propose DART, a test-time, online OOD detection method that dynamically tracks dual prototypes -- one for ID and the other for OOD -- to recover the drifting discriminative axis, augmented with multi-layer fusion and flip correction for robustness. Extensive experiments on a wide range of challenging benchmarks, where all datasets are subjected to 15 common corruption types at severity level 5, demonstrate that our method significantly improves performance, yielding 15.32 percentage points (pp) AUROC gain and 49.15 pp FPR@95TPR reduction on ImageNet-C vs. Textures-C compared to established baselines. These results highlight the potential of the test-time discriminative axis tracking for dependable OOD detection in dynamically changing environments.

Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift

Abstract

Paper Structure (84 sections, 12 equations, 20 figures, 15 tables, 1 algorithm)

This paper contains 84 sections, 12 equations, 20 figures, 15 tables, 1 algorithm.

Introduction
Related Work
Out-of-Distribution (OOD) Detection: Training-Driven vs. Post-hoc
Covariate Shift and Joint-Shift Evaluation
Method
Formulation Setup
Existence of the Discriminative Axis in Feature Space
Batch-wise Prototype Refinement: Tracking the Discriminative Axis
Multi-Layer Score Fusion
Experiments
Experimental Setup
Evaluation Details.
Baseline Methods.
Main OOD Detection Results
Results on Covariate Shifted Dataset.
...and 69 more sections

Figures (20)

Figure 1: AUROC comparison on both covariate shifted and clean ImageNet-based benchmark. Existing methods suffer under covariate shift, with train distribution–informed approaches dropping to around 0.5. In contrast, the oracle axis achieves consistently high performance regardless of shift, and our method effectively discovers this axis, attaining near-oracle results.
Figure 2: Comparison of traditional and real-world ID-OOD assumptions. (a) Traditional OOD detection assumes ID data (blue circle) exists within an unbounded OOD space (gray background). (b) In real-world scenarios, OOD data is bounded by physical and environmental constraints (observation boundary, top-left inset), limiting the space where OOD samples can occur. Furthermore, covariate shifts such as weather conditions can simultaneously affect both ID and OOD distributions (dashed regions), causing them to shift jointly in feature space.
Figure 3: Unit-wise activation analysis. The left panel shows the JSD between ID and OOD activations, with arrows marking units of large divergence. The right panel visualizes the activation distributions of these units, where ID (blue) and OOD (red) are clearly separable.
Figure 4: Distribution of ID (blue dots) and OOD (red dots) samples at features space projected with the oracle discriminative axis as the horizontal axis
Figure 5: Layer-wise RDS distributions across three covariate shift types. Each plot shows the RDS distribution of csID (blue curve, CIFAR-100) and csOOD (red curve, LSUN) samples at different network depths (low, mid, high-level) through three sequential batches. The visualizations reveal how different corruption types affect feature separability at specific network layers; under Gaussian noise, separability degrades in high-level layers, whereas under defocus blur, it degrades in low-level layers.
...and 15 more figures

Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift

Abstract

Tracking the Discriminative Axis: Dual Prototypes for Test-Time OOD Detection Under Covariate Shift

Authors

Abstract

Table of Contents

Figures (20)