Table of Contents
Fetching ...

AdaSCALE: Adaptive Scaling for OOD Detection

Sudarshan Regmi

TL;DR

AdaSCALE tackles out-of-distribution detection by introducing an adaptive per-sample scaling scheme that uses minor input perturbations to quantify OODness. The method computes a sample-specific percentile threshold via $p = p_{\min} + (1 - F_{Q'}(Q'))(p_{\max} - p_{\min})$ and derives a scaling factor $r$ to modulate activations or logits, with AdaSCALE-A and AdaSCALE-L as activation- and logit-based variants. Its core innovations are the activation-perturbation–based OODness metric $Q' = \lambda Q + C_o$ and the adaptive percentile mechanism, enabling state-of-the-art OOD detection across ImageNet-1k and CIFAR benchmarks with minimal reliance on ID statistics. The approach demonstrates strong generalization across architectures, robustness to corruptions and adversarial training, and practical utility for large-scale deployment with limited ID data. Collectively, AdaSCALE advances post-hoc OOD detection by marrying activation-space signals to adaptive, per-sample scaling, yielding significantly safer and more reliable predictions in real-world systems.

Abstract

The ability of the deep learning model to recognize when a sample falls outside its learned distribution is critical for safe and reliable deployment. Recent state-of-the-art out-of-distribution (OOD) detection methods leverage activation shaping to improve the separation between in-distribution (ID) and OOD inputs. These approaches resort to sample-specific scaling but apply a static percentile threshold across all samples regardless of their nature, resulting in suboptimal ID-OOD separability. In this work, we propose \textbf{AdaSCALE}, an adaptive scaling procedure that dynamically adjusts the percentile threshold based on a sample's estimated OOD likelihood. This estimation leverages our key observation: OOD samples exhibit significantly more pronounced activation shifts at high-magnitude activations under minor perturbation compared to ID samples. AdaSCALE enables stronger scaling for likely ID samples and weaker scaling for likely OOD samples, yielding highly separable energy scores. Our approach achieves state-of-the-art OOD detection performance, outperforming the latest rival OptFS by 14.94% in near-OOD and 21.67% in far-OOD datasets in average FPR@95 metric on the ImageNet-1k benchmark across eight diverse architectures. The code is available at: https://github.com/sudarshanregmi/AdaSCALE/

AdaSCALE: Adaptive Scaling for OOD Detection

TL;DR

AdaSCALE tackles out-of-distribution detection by introducing an adaptive per-sample scaling scheme that uses minor input perturbations to quantify OODness. The method computes a sample-specific percentile threshold via and derives a scaling factor to modulate activations or logits, with AdaSCALE-A and AdaSCALE-L as activation- and logit-based variants. Its core innovations are the activation-perturbation–based OODness metric and the adaptive percentile mechanism, enabling state-of-the-art OOD detection across ImageNet-1k and CIFAR benchmarks with minimal reliance on ID statistics. The approach demonstrates strong generalization across architectures, robustness to corruptions and adversarial training, and practical utility for large-scale deployment with limited ID data. Collectively, AdaSCALE advances post-hoc OOD detection by marrying activation-space signals to adaptive, per-sample scaling, yielding significantly safer and more reliable predictions in real-world systems.

Abstract

The ability of the deep learning model to recognize when a sample falls outside its learned distribution is critical for safe and reliable deployment. Recent state-of-the-art out-of-distribution (OOD) detection methods leverage activation shaping to improve the separation between in-distribution (ID) and OOD inputs. These approaches resort to sample-specific scaling but apply a static percentile threshold across all samples regardless of their nature, resulting in suboptimal ID-OOD separability. In this work, we propose \textbf{AdaSCALE}, an adaptive scaling procedure that dynamically adjusts the percentile threshold based on a sample's estimated OOD likelihood. This estimation leverages our key observation: OOD samples exhibit significantly more pronounced activation shifts at high-magnitude activations under minor perturbation compared to ID samples. AdaSCALE enables stronger scaling for likely ID samples and weaker scaling for likely OOD samples, yielding highly separable energy scores. Our approach achieves state-of-the-art OOD detection performance, outperforming the latest rival OptFS by 14.94% in near-OOD and 21.67% in far-OOD datasets in average FPR@95 metric on the ImageNet-1k benchmark across eight diverse architectures. The code is available at: https://github.com/sudarshanregmi/AdaSCALE/

Paper Structure

This paper contains 42 sections, 5 equations, 5 figures, 64 tables, 1 algorithm.

Figures (5)

  • Figure 1: Adaptive scaling (AdaSCALE) vs. fixed scaling (ASH djurisic2023extremely, SCALE xu2024scaling, LTS djurisic2024logit). While fixed scaling approaches uses a constant percentile threshold $p$ and hence constant $k$ (e.g., $k=3$) across all samples, AdaSCALE adjusts $k$ based on estimated OODness. AdaSCALE assigns larger $k$ values (e.g., $k=5$) to OOD-likely samples, yielding smaller scaling factors, and smaller $k$ values (e.g., $k=1$) to ID-likely samples, yielding larger scaling factors. This adaptive mechanism enhances ID-OOD separability. (See \ref{['fig:schematic_diagram']} for complete working mechanism.)
  • Figure 2: Activation shift comparison (with the mean denoted by a solid line and the standard deviation by a shaded region) between ID and OOD in the ResNet-50 model. The activation shift is significantly more pronounced in OOD samples compared to ID samples at high-magnitude activations (left side of the x-axis), providing a discriminative signal for OOD detection.
  • Figure 3: $Q_\text{OOD}$/$Q_\text{ID}$ vs $Q'_\text{OOD}$/$Q'_\text{ID}$ in various OOD datasets with ResNet-50 on ImageNet-1k. $Q^{\prime}_\text{OOD}/Q^{\prime}_\text{ID} > Q_\text{OOD}/Q_\text{ID}$ suggests $C_o$ helps mitigate overconfident estimations.
  • Figure 4: Schematic diagram of AdaSCALE's working mechanism. AdaSCALE computes activation shifts between an original image and its slightly perturbed counterpart to estimate OODness. It, in-turn, determines an adaptive percentile threshold ($p$ and thereby $k$), which controls the scaling factor $r$. Since $r$ is defined as the ratio of total activation sum to the sum of activations above the percentile threshold, samples with higher OODness receive lower scaling factors. This adaptive approach yields highly separable energy scores that enable effective OOD detection.
  • Figure 5: Perturbed activation magnitudes comparison between ID and OOD samples. ID samples consistently maintain higher average activation values in comparison to OOD samples.