Table of Contents
Fetching ...

Variational Autoencoder for Anomaly Detection: A Comparative Study

Huy Hoang Nguyen, Cuong Nhat Nguyen, Xuan Tung Dao, Quoc Trung Duong, Dzung Pham Thi Kim, Minh-Tan Pham

TL;DR

The paper addresses unsupervised anomaly detection by comparing three VAE-based architectures: the traditional VAE, VAE-GRF (Gaussian Random Field prior), and ViT-VAE (Vision Transformer-based latent representation). It benchmarks these models on MVTec AD and the MiAD dataset to assess robustness, hyperparameter sensitivity, and cross-domain generalization beyond single-dataset optimization. The findings show ViT-VAE generally delivers superior performance, especially on non-texture classes, while VAE-GRF requires careful hyperparameter tuning to realize its potential; MiAD proves to be a more challenging and robust benchmark. The work enhances evaluation practices in anomaly detection by including MiAD, and it provides open-source code for reproducibility and further research advancement.

Abstract

This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a vision transformer (ViT-VAE). The findings reveal that ViT-VAE exhibits exemplary performance across various scenarios, whereas VAE-GRF may necessitate more intricate hyperparameter tuning to attain its optimal performance state. Additionally, to mitigate the propensity for over-reliance on results derived from the widely used MVTec dataset, this paper leverages the recently-public MiAD dataset for benchmarking. This deliberate inclusion seeks to enhance result competitiveness by alleviating the impact of domain-specific models tailored exclusively for MVTec, thereby contributing to a more robust evaluation framework. Codes is available at https://github.com/endtheme123/VAE-compare.git.

Variational Autoencoder for Anomaly Detection: A Comparative Study

TL;DR

The paper addresses unsupervised anomaly detection by comparing three VAE-based architectures: the traditional VAE, VAE-GRF (Gaussian Random Field prior), and ViT-VAE (Vision Transformer-based latent representation). It benchmarks these models on MVTec AD and the MiAD dataset to assess robustness, hyperparameter sensitivity, and cross-domain generalization beyond single-dataset optimization. The findings show ViT-VAE generally delivers superior performance, especially on non-texture classes, while VAE-GRF requires careful hyperparameter tuning to realize its potential; MiAD proves to be a more challenging and robust benchmark. The work enhances evaluation practices in anomaly detection by including MiAD, and it provides open-source code for reproducibility and further research advancement.

Abstract

This paper aims to conduct a comparative analysis of contemporary Variational Autoencoder (VAE) architectures employed in anomaly detection, elucidating their performance and behavioral characteristics within this specific task. The architectural configurations under consideration encompass the original VAE baseline, the VAE with a Gaussian Random Field prior (VAE-GRF), and the VAE incorporating a vision transformer (ViT-VAE). The findings reveal that ViT-VAE exhibits exemplary performance across various scenarios, whereas VAE-GRF may necessitate more intricate hyperparameter tuning to attain its optimal performance state. Additionally, to mitigate the propensity for over-reliance on results derived from the widely used MVTec dataset, this paper leverages the recently-public MiAD dataset for benchmarking. This deliberate inclusion seeks to enhance result competitiveness by alleviating the impact of domain-specific models tailored exclusively for MVTec, thereby contributing to a more robust evaluation framework. Codes is available at https://github.com/endtheme123/VAE-compare.git.
Paper Structure (17 sections, 4 figures, 2 tables)

This paper contains 17 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the differences between VAE and VAE-GRF architectures as well as their anomal map differencies gangloff2023unsupervised .
  • Figure 2: Architecture of ViT-VAE model lee2022anovit
  • Figure 3: Examples of good and anomalous samples of from the MVTec and the MiAD datasets.
  • Figure 4: Illustrations of anomaly maps provided by the three models for some MVTec and MIAD samples.