Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

Qinyu Zhao; Ming Xu; Kartik Gupta; Akshay Asthana; Liang Zheng; Stephen Gould

Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

TL;DR

This work addresses the fragility of state-of-the-art feature-shaping methods for out-of-distribution detection across diverse models. It introduces a general optimization framework for shaping penultimate features and derives a concrete piecewise-constant reshaping that explains how existing methods operate, while also providing a novel ID-data-only solution with a closed-form. Empirically, the proposed ID-only shaping method generalizes robustly across backbones (ConvNets, ViT, MLP) and datasets, outperforming many baselines and maintaining gains where previous methods fail. The approach offers a practical, architecture-agnostic pathway to improve OOD detection in real-world systems by leveraging only in-distribution data for tuning and inference-time feature reshaping.

Abstract

Feature shaping refers to a family of methods that exhibit state-of-the-art performance for out-of-distribution (OOD) detection. These approaches manipulate the feature representation, typically from the penultimate layer of a pre-trained deep learning model, so as to better differentiate between in-distribution (ID) and OOD samples. However, existing feature-shaping methods usually employ rules manually designed for specific model architectures and OOD datasets, which consequently limit their generalization ability. To address this gap, we first formulate an abstract optimization framework for studying feature-shaping methods. We then propose a concrete reduction of the framework with a simple piecewise constant shaping function and show that existing feature-shaping methods approximate the optimal solution to the concrete optimization problem. Further, assuming that OOD data is inaccessible, we propose a formulation that yields a closed-form solution for the piecewise constant shaping function, utilizing solely the ID data. Through extensive experiments, we show that the feature-shaping function optimized by our method improves the generalization ability of OOD detection across a large variety of datasets and model architectures.

Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

TL;DR

Abstract

Paper Structure (25 sections, 18 equations, 5 figures, 11 tables)

This paper contains 25 sections, 18 equations, 5 figures, 11 tables.

Introduction
A Recap of Feature Shaping for OOD Detection
OOD detection for image classification
Feature-shaping
Optimal Feature Shaping
Value- and interval-specific feature impact
Optimizing the reshaping function
Optimizing the shaping function without OOD samples
Experiments
OOD detection benchmark results
Further analysis
Discussion
General feature-shaping functions.
Feature contributions to the maximum logits.
Conclusion
...and 10 more sections

Figures (5)

Figure 1: Comparing our method with existing feature-shaping methods. The dashed lines denote the performance of our method for comparison. (a) ImageNet (ID) vs. iNaturalist (OOD) with ViT-B-16; (b) ImageNet (ID) vs. iNaturalist (OOD) with MLP-Mixer-B; (c) CIFAR100 (ID) vs. CIFAR10 (OOD) with MLP-Mixer-Nano; (d) Average performance of different methods across eight OOD datasets with two ConvNets and with four transformer-based models.
Figure 2: Visualization of shaping functions. The blue lines (ours w/ OOD) derive from Eq. \ref{['prob_opt_2']}, while the green line (ours w/o OOD) from Eq. \ref{['eq_btheta_opt']}. Red lines represent different existing methods, while shaded regions indicate estimated standard deviations. $\theta$ has been rescaled for the best visualization.
Figure 3: Diagram to show the intuition in deriving Eq. \ref{['eq_our_problem']}.
Figure 4: Compatibility and sensitivity analysis. (a-b) Our method can improve other OOD scores and methods. "Base" denotes using the original OOD score or method, while "+Our" indicates combining the score or method with our feature-shaping function. (c) Our method's performance with different hyperparameter settings, i.e., numbers of intervals $K$.
Figure 5: Empirical analysis to explain a specific form of the optimal shaping function.

Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

TL;DR

Abstract

Towards Optimal Feature-Shaping Methods for Out-of-Distribution Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)