AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

Aditya Singh; Pavan Reddy

AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

Aditya Singh, Pavan Reddy

TL;DR

This work extends AnoGAN from image domains to tabular data by integrating CT-GAN for realistic tabular sample generation. A key contribution is mitigating randomness at test time through a hard Gumbel-Softmax activation and using backpropagated $MSE$ loss to optimize latent vectors for an $MSE$-based anomaly score, with thresholds selected via $AUC-ROC$ analysis. The approach is demonstrated on the Google Smart Buildings dataset (60,425 samples with 3.2% anomalies), where it outperforms traditional baselines and shows robustness as training extends or anomaly frequency decreases. Overall, the paper demonstrates the viability of applying AnoGAN to structured data, offering a principled framework for anomaly detection in real-world tabular settings and suggesting avenues for future enhancement.

Abstract

Anomaly detection, a critical facet in data analysis, involves identifying patterns that deviate from expected behavior. This research addresses the complexities inherent in anomaly detection, exploring challenges and adapting to sophisticated malicious activities. With applications spanning cybersecurity, healthcare, finance, and surveillance, anomalies often signify critical information or potential threats. Inspired by the success of Anomaly Generative Adversarial Network (AnoGAN) in image domains, our research extends its principles to tabular data. Our contributions include adapting AnoGAN's principles to a new domain and promising advancements in detecting previously undetectable anomalies. This paper delves into the multifaceted nature of anomaly detection, considering the dynamic evolution of normal behavior, context-dependent anomaly definitions, and data-related challenges like noise and imbalances.

AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

TL;DR

loss to optimize latent vectors for an

-based anomaly score, with thresholds selected via

analysis. The approach is demonstrated on the Google Smart Buildings dataset (60,425 samples with 3.2% anomalies), where it outperforms traditional baselines and shows robustness as training extends or anomaly frequency decreases. Overall, the paper demonstrates the viability of applying AnoGAN to structured data, offering a principled framework for anomaly detection in real-world tabular settings and suggesting avenues for future enhancement.

Abstract

Paper Structure (11 sections, 2 equations, 6 figures, 1 table)

This paper contains 11 sections, 2 equations, 6 figures, 1 table.

Introduction
Challenges in Anomaly Detection
Overview of GANs
Related Work
Methodology
Dataset
Data preprocessing
CT-GAN Implementation and Randomness Handling
Optimizing Noise Vector and Anomaly Scoring
Results
Conclusion and Future work

Figures (6)

Figure 1: GANs, or Generative Adversarial Networks, are intricate deep neural network structures consisting of two networks, namely the generator and discriminator. These networks work in opposition to each other, which is why they are called "adversarial." The generator accepts random numbers as input and produces an image. This generated image is then presented to the discriminator, along with a continuous stream of images sourced from the genuine, ground-truth dataset. article
Figure 2: Framework for identifying anomalies. This framework involves two main stages of model training: generative adversarial training, which results in a trained generator and discriminator, and encoder training, which yields a trained encoder. Both of these training phases are executed using normal or "healthy" data. Subsequently, the framework is used for anomaly detection, where it is applied to both unseen healthy cases and anomalous data. schlegl2017unsupervised
Figure 3: Statistical values of the data collected from the Google campus for the Variable Air Volume devices sipple2020interpretable
Figure 4: The figure provides a visual representation of the Kernel Density Estimation (KDE) for both the real and generated datasets. The KDE serves as a smoothed probability density function, offering insights into the underlying distribution of the data. Specifically, the KDE for the real data showcases the probability density across different values, illustrating the distribution's characteristics and patterns. On the other hand, the KDE for the generated data, produced by the Generator component of the Generative Adversarial Network (GAN), offers a comparative view. This comparison enables an assessment of how well the generator has learned to replicate the statistical properties of the real data. By visually inspecting the KDE curves, one can discern the fidelity of the generated data distribution in relation to the authentic dataset, providing a valuable tool for evaluating the performance and quality of the GAN's generative capabilities.
Figure 5: The figure illustrates the progression of Generator and Discriminator losses over training epochs in the Generative Adversarial Network (GAN), showcasing how the model refines generative and discriminative capabilities during learning.
...and 1 more figures

AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

TL;DR

Abstract

AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)