Table of Contents
Fetching ...

Targeted collapse regularized autoencoder for anomaly detection: black hole at the center

Amin Ghafourian, Huanyi Shui, Devesh Upadhyay, Rajesh Gupta, Dimitar Filev, Iman Soltani Bozchalooi

TL;DR

The paper addresses anomaly detection with autoencoders, noting that reconstruction-based scores can fail when anomalies are well reconstructed. It introduces Targeted Collapse Regularized Autoencoders (Toll), which add a latent-norm penalty to the reconstruction loss to encourage compact, descriptive latent representations without adding architectural components. The authors provide a theoretical analysis of learning dynamics in the linear case and validate Toll on five datasets (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, Arrhythmia), showing competitive or superior performance and robust ablations. Toll also demonstrates compatibility with stronger methods (e.g., FITYMI) to further boost performance. The work suggests that a simple, hyperparameter-friendly regularization can significantly improve anomaly detection across data modalities and can complement existing advanced approaches.

Abstract

Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms more complex alternatives. We further demonstrate that implementing this idea in the context of state-of-the-art methods can further improve their performance. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it helps with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.

Targeted collapse regularized autoencoder for anomaly detection: black hole at the center

TL;DR

The paper addresses anomaly detection with autoencoders, noting that reconstruction-based scores can fail when anomalies are well reconstructed. It introduces Targeted Collapse Regularized Autoencoders (Toll), which add a latent-norm penalty to the reconstruction loss to encourage compact, descriptive latent representations without adding architectural components. The authors provide a theoretical analysis of learning dynamics in the linear case and validate Toll on five datasets (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, Arrhythmia), showing competitive or superior performance and robust ablations. Toll also demonstrates compatibility with stronger methods (e.g., FITYMI) to further boost performance. The work suggests that a simple, hyperparameter-friendly regularization can significantly improve anomaly detection across data modalities and can complement existing advanced approaches.

Abstract

Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms more complex alternatives. We further demonstrate that implementing this idea in the context of state-of-the-art methods can further improve their performance. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it helps with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.
Paper Structure (19 sections, 3 theorems, 22 equations, 4 figures, 8 tables)

This paper contains 19 sections, 3 theorems, 22 equations, 4 figures, 8 tables.

Key Result

Lemma 1

The weight matrix $W$ under gradient flow evolves by where $S=\frac{1}{n}XX^T$ is the empirical covariance matrix.

Figures (4)

  • Figure 1: Overview of anomaly detection with Toll. Normal and anomalous samples and their associated latent representations and reconstructions are respectively represented by green and red outlines and vectors. During training, in addition to minimizing reconstruction error, latent representation norms are also minimized, with hyperparameter $\beta$ specifying the trade-off between the two terms. To use the trained model for anomaly detection, the overall loss associated with a sample is used as the anomaly score. It is expected that anomalous samples incur larger values of combined reconstruction error and latent representation norm as specified by $\beta$. (Figure best viewed in color.)
  • Figure 2: Contours of anomaly score output using an unregularized (a) and a norm-regularized (b) linear autoencoder in the 2D input data space. The autoencoder is trained to encode samples from a two-dimensional Gaussian distribution (bright points) to a one-dimensional latent space and decode back to the original dimensionality. To apply regularization, a $\beta$ coefficient is multiplied by the regularization term, and $1-\beta$ is multiplied by the reconstruction error term. The unregularized autoencoder encodes the first principal component of the dataset ($x_1$) and therefore reconstruction errors scale with the distance of points from the origin along $x_2$. Regularization prevents the complete vanishing of variation along $x_2$ in the bottleneck and, for a good choice of $\beta$, the norm-regularized autoencoder yields a much more accurate anomaly score profile.
  • Figure 3: Samples from MNIST (a, top left), Fashion-MNIST (a, bottom left), CIFAR-100 (a, right), and CIFAR-10 (b) datasets. Rows show different classes.
  • Figure 4: Effect of regularization intensity for digit "8" as normal class in MNIST. The AUC values shown are averaged over 3 random seeds.

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 1
  • Theorem 2