Table of Contents
Fetching ...

Reconstruction Error-based Anomaly Detection with Few Outlying Examples

Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina

TL;DR

This work tackles reconstruction-error based anomaly detection by exploiting a small set of labeled anomalies in a semi-supervised setting. It introduces AE--SAD, which trains an Autoencoder with an inverted loss $L_F(x) = (1 - y) \|x - \hat{x}\|_2^2 + \lambda y \|F(x) - \hat{x}\|_2^2$, using $F(x)$ (typically $F(x)=1-x$) to push anomalies outside the normal data domain while keeping normal reconstructions faithful. Empirical results across tabular and image datasets show AE--SAD surpasses standard Autoencoders and other semi-supervised methods, including robustness to unseen anomalies and data pollution, with good generalization to novel anomaly classes. The method uses simple architectures but delivers strong anomaly detection performance, suggesting practical impact for scenarios with limited labeled anomalies. Future work includes extending the inverted-loss concept to VAEs and GAN-based reconstruction models to potentially further improve detection capabilities.

Abstract

Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction error. Unfortunately, these architectures often become able to well reconstruct also the anomalies in the data. This phenomenon is more evident when there are anomalies in the training set. In particular when these anomalies are labeled, a setting called semi-supervised, the best way to train Autoencoders is to ignore anomalies and minimize the reconstruction error on normal data. The goal of this work is to investigate approaches to allow reconstruction error-based architectures to instruct the model to put known anomalies outside of the domain description of the normal data. Specifically, our strategy exploits a limited number of anomalous examples to increase the contrast between the reconstruction error associated with normal examples and those associated with both known and unknown anomalies, thus enhancing anomaly detection performances. The experiments show that this new procedure achieves better performances than the standard Autoencoder approach and the main deep learning techniques for semi-supervised anomaly detection.

Reconstruction Error-based Anomaly Detection with Few Outlying Examples

TL;DR

This work tackles reconstruction-error based anomaly detection by exploiting a small set of labeled anomalies in a semi-supervised setting. It introduces AE--SAD, which trains an Autoencoder with an inverted loss , using (typically ) to push anomalies outside the normal data domain while keeping normal reconstructions faithful. Empirical results across tabular and image datasets show AE--SAD surpasses standard Autoencoders and other semi-supervised methods, including robustness to unseen anomalies and data pollution, with good generalization to novel anomaly classes. The method uses simple architectures but delivers strong anomaly detection performance, suggesting practical impact for scenarios with limited labeled anomalies. Future work includes extending the inverted-loss concept to VAEs and GAN-based reconstruction models to potentially further improve detection capabilities.

Abstract

Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction error. Unfortunately, these architectures often become able to well reconstruct also the anomalies in the data. This phenomenon is more evident when there are anomalies in the training set. In particular when these anomalies are labeled, a setting called semi-supervised, the best way to train Autoencoders is to ignore anomalies and minimize the reconstruction error on normal data. The goal of this work is to investigate approaches to allow reconstruction error-based architectures to instruct the model to put known anomalies outside of the domain description of the normal data. Specifically, our strategy exploits a limited number of anomalous examples to increase the contrast between the reconstruction error associated with normal examples and those associated with both known and unknown anomalies, thus enhancing anomaly detection performances. The experiments show that this new procedure achieves better performances than the standard Autoencoder approach and the main deep learning techniques for semi-supervised anomaly detection.
Paper Structure (14 sections, 3 equations, 10 figures, 10 tables)

This paper contains 14 sections, 3 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Original and reconstructed images of training and test set. In this example the class $8$ is normal and the class $1,3,5,9$ are the anomalous classes used in the training.
  • Figure 2: In this example the class $8$ is normal and the class $1,3,5,9$ are the anomalous classes used in the training.
  • Figure 3: Sensitivity to the regularization parameter $\lambda$.
  • Figure 4: Sensitivity to the number of labeled anomalous examples $s$.
  • Figure 5: Sensitivity to the number of epochs. For both standard Autoencoder and $\textrm{AE--SAD}$ is reported in the legend the AUC value relative to the epoch in which the lowest value of the loss has been observed.
  • ...and 5 more figures