Table of Contents
Fetching ...

A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective

Alberto Caron, Chris Hicks, Vasilios Mavroudis

TL;DR

The paper reframes out-of-distribution detection as a non-parametric statistical testing problem and establishes identifiability conditions that determine when OOD can be reliably detected. It introduces a Wasserstein distance-based test statistic and proves both asymptotic uniform consistency under a separation condition and non-asymptotic power bounds, clarifying the limits of detectability. The analysis argues for the advantages of distributional-distance tests over KL/JS-based methods, especially in high-dimensional and non-overlapping regimes. Two experiments, a synthetic generative-model task and an MNIST versus Fashion-MNIST setup, demonstrate the practical effectiveness of the Wasserstein OOD test for detecting distributional shifts at test time.

Abstract

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.

A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective

TL;DR

The paper reframes out-of-distribution detection as a non-parametric statistical testing problem and establishes identifiability conditions that determine when OOD can be reliably detected. It introduces a Wasserstein distance-based test statistic and proves both asymptotic uniform consistency under a separation condition and non-asymptotic power bounds, clarifying the limits of detectability. The analysis argues for the advantages of distributional-distance tests over KL/JS-based methods, especially in high-dimensional and non-overlapping regimes. Two experiments, a synthetic generative-model task and an MNIST versus Fashion-MNIST setup, demonstrate the practical effectiveness of the Wasserstein OOD test for detecting distributional shifts at test time.

Abstract

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.
Paper Structure (13 sections, 9 theorems, 32 equations, 3 figures)

This paper contains 13 sections, 9 theorems, 32 equations, 3 figures.

Key Result

Theorem 3.1

Let $\mathcal{D}_m$ be a test dataset. The test based on $T^{wass}_m = m^{1/2} W_p (P_{\theta}, Q)$ for hypotheses $H_0: D_m \sim P_{\theta}$ vs $H_1: D_m \sim Q \neq P_{\theta}$, is such that as $m \rightarrow \infty$, over alternatives $Q_m$ that satisfy $n^{1/2} W(P_{\theta}, Q_m) \geq \Delta_m$, where $\lim_{m \rightarrow \infty} \Delta_m = \infty$.

Figures (3)

  • Figure 1: Examples of a discrete distribution shifts where KL and JS divergences offer a less informative measure, while $W(P,Q)$ is able to capture that the shift on the right is geometrically much further apart from the reference distribution than the one on the left.
  • Figure 2: The plot on the left depicts the latent factors $(Z_1, Z_2)$ true distribution, learnt distribution via FL, and OOD one. The plot on the right reports the mean AUROC of each OOD tests (with 90% error bands) for number of standard deviations from the ID mean.
  • Figure 3: First plot on the left shows samples from MNIST (ID) and Fashion MNIST (OOD) datasets. Centre plot shows the distribution of the two principal latent factors, computed with Truncated SVD, on MNIST and Fashion MNIST. Table on the right reports results in terms of AUROC, TPR and FPR of the four OOD test considered.

Theorems & Definitions (14)

  • Remark 2.2: OOD Test
  • Theorem 3.1: Uniform Consistency
  • Theorem 3.2: Non-Asymptotic Lower Bound
  • Theorem 3.3: Worst Case Upper Bound
  • Theorem 3.4: Intermediate Case Asymptotic Upper Bound
  • Theorem A.1: Restatement of Theorem \ref{['thm:consist']}
  • proof
  • Theorem A.2: Restatement of Theorem \ref{['thm:3.2']}
  • proof
  • Theorem A.3: bolley2007quantitative
  • ...and 4 more