Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

Heeyoung Lee; Hoyoon Byun; Changdae Oh; JinYeong Bak; Kyungwoo Song

Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

Heeyoung Lee, Hoyoon Byun, Changdae Oh, JinYeong Bak, Kyungwoo Song

TL;DR

MixDiff addresses the need for robust OOD detection in constrained-access environments where API users cannot inspect model internals. By perturbing both the target sample and a similar in-distribution oracle sample with Mixup and comparing their outputs, MixDiff provides an extra discriminative signal that calibrates base OOD scores against overconfident predictions. The authors offer both theoretical support and extensive empirical validation across vision and text tasks, showing consistent improvements over standard, training-free, and some training-based baselines under restricted access. This approach enables more reliable OOD detection in real-world API scenarios and demonstrates versatility across modalities, including text and vision-language models like CLIP.

Abstract

Accessing machine learning models through remote APIs has been gaining prevalence following the recent trend of scaling up model parameters for increased performance. Even though these models exhibit remarkable ability, detecting out-of-distribution (OOD) samples remains a crucial safety concern for end users as these samples may induce unreliable outputs from the model. In this work, we propose an OOD detection framework, MixDiff, that is applicable even when the model's parameters or its activations are not accessible to the end user. To bypass the access restriction, MixDiff applies an identical input-level perturbation to a given target sample and a similar in-distribution (ID) sample, then compares the relative difference in the model outputs of these two samples. MixDiff is model-agnostic and compatible with existing output-based OOD detection methods. We provide theoretical analysis to illustrate MixDiff's effectiveness in discerning OOD samples that induce overconfident outputs from the model and empirically demonstrate that MixDiff consistently enhances the OOD detection performance on various datasets in vision and text domains.

Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

TL;DR

Abstract

Paper Structure (32 sections, 2 theorems, 7 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 32 sections, 2 theorems, 7 equations, 5 figures, 4 tables, 1 algorithm.

Related work
Output-based OOD scoring functions
Enhancing output-based OOD scores
Utilization of deeper access for more discriminative OOD scores
Methodology
Oracle-side perturbation
Target-side perturbation
Comparison of perturbed samples' outputs
Practical implementation
Theoretical analysis
Experiments
Experimental setup
Implementation details
Baselines
Logits as model outputs
...and 17 more sections

Key Result

Proposition 1

Let pre-trained model $f(\cdot)$ and base OOD score function $h(\cdot)$ be twice-differentiable functions, and $x_{i\lambda}=\lambda x_t + (1-\lambda)x_i$ be a mixed sample with ratio $\lambda \in (0,1)$. Then OOD score function of mixed sample, $h(f(x_{i\lambda}))$, is written as: where $\lim_{\lambda\rightarrow 1}\varphi_t(\lambda)=0$,

Figures (5)

Figure 1: (a) Class activation map of an OOD sample (train) for the predicted class (bus) exhibits a high degree of sensitivity when an auxiliary image (camel) is mixed to it. The same class activation map of an image of an actual bus is more robust to the same perturbation. (Top 2 classes are shown). (b) Average $L_1$ distance of the class activation maps of high confidence class and the ground truth class after perturbation (averaged over each OOD class).
Figure 2: The overall figure of MixDiff with the number of Mixup ratios, $R=1$, the number of classes, $K=6$, the number of auxiliary samples, $N=3$, and the number of oracle instances, $M=2$. We omit Mixup ratio subscript $r$ for simplicity.
Figure 3: (a) Approximation error for Equation \ref{['eq:proposition_eq1']} on synthetic data. Without higher-order terms, we can reasonably approximate the OOD score of mixed sample with decomposed terms. (b) The syntactic data distribution. Data is sampled from four independent Gaussian distributions, with two considered as ID samples for each class and the other two as OOD samples. We train a logistic regression model with this dataset. (c) The prediction results of the trained model. (d) Although the target sample is a hard OOD sample, there are auxiliary samples (blue dot) that guarantee that MixDiff is positive under some reasonable conditions introduced in Theorem \ref{['theorem_mixdiff']}.
Figure 4: Additional analyses on CIFAR100. (a) AUROC scores of MixDiff+Entropy with varying values of $N$ and $R$ (top). AUROC score of Entropy (bottom). We also provide processing time analysis in Appendix K. (b) Difference of the OOD, ID samples' average uncertainty scores belonging to a given interval of MSP score. None-overlapping five consecutive intervals whose values lie below the threshold set by FPR95 are constructed. MixDiff scores can discriminate OOD, ID samples even when its base score values are almost identical.
Figure 5: Effect of low-confidence oracles.

Theorems & Definitions (2)

Proposition 1: OOD scores for mixed samples
Theorem 1

Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

TL;DR

Abstract

Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)