Table of Contents
Fetching ...

Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

Hyuntae Kim, Changhee Lee

TL;DR

This paper tackles unsupervised anomaly detection by learning normality patterns without labeled anomalies. It introduces DHAG, a domain-agnostic framework composed of an encoder, multiple latent perturbators, and a discriminator, trained to distinguish original normal samples from synthetic, diversified, and hard-to-distinguish anomalies. The core innovations are perturbations that are small yet diverse and orthogonal in latent space, coupled with a pseudo-labeling scheme that alternates between augmenting normals and generating anomalies, guided by a joint objective that includes cross-entropy and diversity losses. Empirical results on both image (CIFAR-10, FMNIST) and tabular (e.g., Thyroid, KDD) datasets show DHAG surpasses or matches state-of-the-art benchmarks, with additional gains in semi-supervised settings when a subset of real anomalies is available. The method thus provides a practical, adaptable approach for robust anomaly detection across diverse data domains, including potential real-world deployment where domain-specific transformations are unavailable.

Abstract

Unsupervised anomaly detection is a daunting task, as it relies solely on normality patterns from the training data to identify unseen anomalies during testing. Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples. The objective here is to acquire insights into normality patterns by learning to differentiate between normal samples and these crafted anomalies. However, these approaches often encounter limitations when domain-specific transformations are not well-specified such as in tabular data, or when it becomes trivial to distinguish between them. To address these issues, we introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator. The perturbators are trained to generate input-dependent perturbations, which are subsequently utilized to construct synthetic anomalies, and the discriminator is trained to distinguish normal samples from them. We ensure that the generated anomalies are both diverse and hard to distinguish through two key strategies: i) directing perturbations to be orthogonal to each other and ii) constraining perturbations to remain in proximity to normal samples. Throughout experiments on real-world datasets, we demonstrate the superiority of our method over state-of-the-art benchmarks, which is evident not only in image data but also in tabular data, where domain-specific transformation is not readily accessible. Additionally, we empirically confirm the adaptability of our method to semi-supervised settings, demonstrating its capacity to incorporate supervised signals to enhance anomaly detection performance even further.

Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

TL;DR

This paper tackles unsupervised anomaly detection by learning normality patterns without labeled anomalies. It introduces DHAG, a domain-agnostic framework composed of an encoder, multiple latent perturbators, and a discriminator, trained to distinguish original normal samples from synthetic, diversified, and hard-to-distinguish anomalies. The core innovations are perturbations that are small yet diverse and orthogonal in latent space, coupled with a pseudo-labeling scheme that alternates between augmenting normals and generating anomalies, guided by a joint objective that includes cross-entropy and diversity losses. Empirical results on both image (CIFAR-10, FMNIST) and tabular (e.g., Thyroid, KDD) datasets show DHAG surpasses or matches state-of-the-art benchmarks, with additional gains in semi-supervised settings when a subset of real anomalies is available. The method thus provides a practical, adaptable approach for robust anomaly detection across diverse data domains, including potential real-world deployment where domain-specific transformations are unavailable.

Abstract

Unsupervised anomaly detection is a daunting task, as it relies solely on normality patterns from the training data to identify unseen anomalies during testing. Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples. The objective here is to acquire insights into normality patterns by learning to differentiate between normal samples and these crafted anomalies. However, these approaches often encounter limitations when domain-specific transformations are not well-specified such as in tabular data, or when it becomes trivial to distinguish between them. To address these issues, we introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator. The perturbators are trained to generate input-dependent perturbations, which are subsequently utilized to construct synthetic anomalies, and the discriminator is trained to distinguish normal samples from them. We ensure that the generated anomalies are both diverse and hard to distinguish through two key strategies: i) directing perturbations to be orthogonal to each other and ii) constraining perturbations to remain in proximity to normal samples. Throughout experiments on real-world datasets, we demonstrate the superiority of our method over state-of-the-art benchmarks, which is evident not only in image data but also in tabular data, where domain-specific transformation is not readily accessible. Additionally, we empirically confirm the adaptability of our method to semi-supervised settings, demonstrating its capacity to incorporate supervised signals to enhance anomaly detection performance even further.
Paper Structure (25 sections, 4 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overall network architecture of our method. We present a toy example with three samples ($m=3$), three perturbators ($L=3$), and the number of augmentation perturbations, $K$, set to 1.
  • Figure 2: The t-SNE visualization for 'T-shirt' class from the FMNIST dataset. The visualizations contain five different sample types: perturbed counterparts of normal samples with $\tilde{y}_{\ell} = 1$ (Anomaly perturbed) and those with $\tilde{y}_{\ell} = 0$ (Normal perturbed), test anomaly samples (test anomaly), test normal samples (test normal), and train normal samples (train normal).
  • Figure 3: The average AUC for the CIFAR-10 dataset (both for each class and for overall average) with (a) varying $K$ with fixed $L$ and (b) varying $L$ with fixed $K$.
  • Figure 4: The visualization of images (i.e., FMNIST) before and after the application of feature-level perturbation generated by DHAG-variant and PLAD methods.
  • Figure 5: The visualization of test anomaly score distributions in the FMNIST dataset, where the 'sneaker' class is designated as the normal training data, with a comparative analysis between DHAG and DROCC. The figure also displays anomaly scores for some selected samples.