Table of Contents
Fetching ...

Statistical Test for Anomaly Detections by Variational Auto-Encoders

Daiki Miwa, Tomohiro Shiraishi, Vo Nguyen Le Duy, Teruyuki Katsuoka, Ichiro Takeuchi

TL;DR

The paper addresses the reliability of VAE-based anomaly detection in high-stakes settings by introducing the VAE-AD Test, a selective-inference–based statistical test that outputs p-values for detected anomalous regions. It treats the anomaly region as a data-driven hypothesis and derives p-values from a truncated normal distribution conditional on the region selection, leveraging the VAE’s piecewise-linear structure. The method provides finite-sample validity and demonstrates improved Type I error control and higher power relative to baselines in synthetic and brain-imaging experiments. This approach enhances trust in deep learning–based anomaly localization by quantifying statistical reliability and enabling controlled decisions in medical imaging contexts.

Abstract

In this study, we consider the reliability assessment of anomaly detection (AD) using Variational Autoencoder (VAE). Over the last decade, VAE-based AD has been actively studied in various perspective, from method development to applied research. However, when the results of ADs are used in high-stakes decision-making, such as in medical diagnosis, it is necessary to ensure the reliability of the detected anomalies. In this study, we propose the VAE-AD Test as a method for quantifying the statistical reliability of VAE-based AD within the framework of statistical testing. Using the VAE-AD Test, the reliability of the anomaly regions detected by a VAE can be quantified in the form of p-values. This means that if an anomaly is declared when the p-value is below a certain threshold, it is possible to control the probability of false detection to a desired level. Since the VAE-AD Test is constructed based on a new statistical inference framework called selective inference, its validity is theoretically guaranteed in finite samples. To demonstrate the validity and effectiveness of the proposed VAE-AD Test, numerical experiments on artificial data and applications to brain image analysis are conducted.

Statistical Test for Anomaly Detections by Variational Auto-Encoders

TL;DR

The paper addresses the reliability of VAE-based anomaly detection in high-stakes settings by introducing the VAE-AD Test, a selective-inference–based statistical test that outputs p-values for detected anomalous regions. It treats the anomaly region as a data-driven hypothesis and derives p-values from a truncated normal distribution conditional on the region selection, leveraging the VAE’s piecewise-linear structure. The method provides finite-sample validity and demonstrates improved Type I error control and higher power relative to baselines in synthetic and brain-imaging experiments. This approach enhances trust in deep learning–based anomaly localization by quantifying statistical reliability and enabling controlled decisions in medical imaging contexts.

Abstract

In this study, we consider the reliability assessment of anomaly detection (AD) using Variational Autoencoder (VAE). Over the last decade, VAE-based AD has been actively studied in various perspective, from method development to applied research. However, when the results of ADs are used in high-stakes decision-making, such as in medical diagnosis, it is necessary to ensure the reliability of the detected anomalies. In this study, we propose the VAE-AD Test as a method for quantifying the statistical reliability of VAE-based AD within the framework of statistical testing. Using the VAE-AD Test, the reliability of the anomaly regions detected by a VAE can be quantified in the form of p-values. This means that if an anomaly is declared when the p-value is below a certain threshold, it is possible to control the probability of false detection to a desired level. Since the VAE-AD Test is constructed based on a new statistical inference framework called selective inference, its validity is theoretically guaranteed in finite samples. To demonstrate the validity and effectiveness of the proposed VAE-AD Test, numerical experiments on artificial data and applications to brain image analysis are conducted.
Paper Structure (36 sections, 2 theorems, 27 equations, 27 figures, 1 algorithm)

This paper contains 36 sections, 2 theorems, 27 equations, 27 figures, 1 algorithm.

Key Result

Theorem 4.1

Consider a random image $\bm X$ and an observed image $\bm x$. Let $M_{\bm X}$ and $M_{\bm x}$ be the hypotheses obtained by applying a piecewise-assignment function in the form of Eq. (eq:piecewise_assignment_function) to $\bm X$ and $\bm x$, respectively. Let $\bm{\eta} \in \mathbb{R}^n$ be a vect Then, the conditional distribution is a truncated normal distribution $TN(\bm{\eta}^\top \bm{\mu},

Figures (27)

  • Figure 1: Image without tumor region. $p_{\rm naive}=0.000$ (false detection) and $p_{\rm selective}=0.668$ (true negative).
  • Figure 2: Image with tumor region. $p_{\rm naive}=0.000$ (true detection) and $p_{\rm selective}=0.000$ (true detection).
  • Figure 4: Type I errors (false positive detection rates) and powers (true positive detection rates) of the proposed VAE-AD Test and three baselines, Naive, OC and Bonf in Indepence and Correlation setting. Naive test, which does not consider the fact that abnormal regions are selected in a data-driven manner, fails to control the Type I error, failing to meet the requirements of a statistical test. On the other hand, the proposed method, VAE-AD Test, and two other baselines, OC and Bonf, all successfully control the Type I error at 0.05 in all settings. The power of the proposed VAE-AD Test is significantly larger than two baselines, OC and Bonf in all problem settings.
  • Figure 5: $p_{\rm naive}=0.000,\; p_{\rm selective}=0.431$
  • Figure 6: $p_{\rm naive}=0.000,\; p_{\rm selective}=0.849$
  • ...and 22 more figures

Theorems & Definitions (5)

  • Definition 1: Piecewise-Assignment Function
  • Theorem 4.1
  • Definition 2: Piecewise-Linear Function
  • Lemma 1
  • proof