Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments
Heeyoung Lee, Hoyoon Byun, Changdae Oh, JinYeong Bak, Kyungwoo Song
TL;DR
MixDiff addresses the need for robust OOD detection in constrained-access environments where API users cannot inspect model internals. By perturbing both the target sample and a similar in-distribution oracle sample with Mixup and comparing their outputs, MixDiff provides an extra discriminative signal that calibrates base OOD scores against overconfident predictions. The authors offer both theoretical support and extensive empirical validation across vision and text tasks, showing consistent improvements over standard, training-free, and some training-based baselines under restricted access. This approach enables more reliable OOD detection in real-world API scenarios and demonstrates versatility across modalities, including text and vision-language models like CLIP.
Abstract
Accessing machine learning models through remote APIs has been gaining prevalence following the recent trend of scaling up model parameters for increased performance. Even though these models exhibit remarkable ability, detecting out-of-distribution (OOD) samples remains a crucial safety concern for end users as these samples may induce unreliable outputs from the model. In this work, we propose an OOD detection framework, MixDiff, that is applicable even when the model's parameters or its activations are not accessible to the end user. To bypass the access restriction, MixDiff applies an identical input-level perturbation to a given target sample and a similar in-distribution (ID) sample, then compares the relative difference in the model outputs of these two samples. MixDiff is model-agnostic and compatible with existing output-based OOD detection methods. We provide theoretical analysis to illustrate MixDiff's effectiveness in discerning OOD samples that induce overconfident outputs from the model and empirically demonstrate that MixDiff consistently enhances the OOD detection performance on various datasets in vision and text domains.
