AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection
Aditya Singh, Pavan Reddy
TL;DR
This work extends AnoGAN from image domains to tabular data by integrating CT-GAN for realistic tabular sample generation. A key contribution is mitigating randomness at test time through a hard Gumbel-Softmax activation and using backpropagated $MSE$ loss to optimize latent vectors for an $MSE$-based anomaly score, with thresholds selected via $AUC-ROC$ analysis. The approach is demonstrated on the Google Smart Buildings dataset (60,425 samples with 3.2% anomalies), where it outperforms traditional baselines and shows robustness as training extends or anomaly frequency decreases. Overall, the paper demonstrates the viability of applying AnoGAN to structured data, offering a principled framework for anomaly detection in real-world tabular settings and suggesting avenues for future enhancement.
Abstract
Anomaly detection, a critical facet in data analysis, involves identifying patterns that deviate from expected behavior. This research addresses the complexities inherent in anomaly detection, exploring challenges and adapting to sophisticated malicious activities. With applications spanning cybersecurity, healthcare, finance, and surveillance, anomalies often signify critical information or potential threats. Inspired by the success of Anomaly Generative Adversarial Network (AnoGAN) in image domains, our research extends its principles to tabular data. Our contributions include adapting AnoGAN's principles to a new domain and promising advancements in detecting previously undetectable anomalies. This paper delves into the multifaceted nature of anomaly detection, considering the dynamic evolution of normal behavior, context-dependent anomaly definitions, and data-related challenges like noise and imbalances.
