Why the Counterintuitive Phenomenon of Likelihood Rarely Appears in Tabular Anomaly Detection with Deep Generative Models?
Donghwan Kim, Junghun Phee, Hyunsoo Yoon
TL;DR
This work investigates whether the counterintuitive likelihood phenomenon observed in image-domain anomaly detection extends to tabular data. By defining a domain-agnostic criterion and conducting large-scale experiments on 47 tabular datasets and 10 CV/NLP embeddings, it shows that the phenomenon is rare in tabular domains and that simple likelihood testing with normalizing flows (NF-SLT) often outperforms baselines. The authors provide theoretical and empirical analyses linking dimensionality and feature correlation to the robustness of likelihood-based detection, including an intrinsic-dimension perspective and entropy-driven considerations. The findings support the practical viability of flow-based likelihood methods for tabular anomaly detection and highlight the importance of data structure in model reliability.
Abstract
Deep generative models with tractable and analytically computable likelihoods, exemplified by normalizing flows, offer an effective basis for anomaly detection through likelihood-based scoring. We demonstrate that, unlike in the image domain where deep generative models frequently assign higher likelihoods to anomalous data, such counterintuitive behavior occurs far less often in tabular settings. We first introduce a domain-agnostic formulation that enables consistent detection and evaluation of the counterintuitive phenomenon, addressing the absence of precise definition. Through extensive experiments on 47 tabular datasets and 10 CV/NLP embedding datasets in ADBench, benchmarked against 13 baseline models, we demonstrate that the phenomenon, as defined, is consistently rare in general tabular data. We further investigate this phenomenon from both theoretical and empirical perspectives, focusing on the roles of data dimensionality and difference in feature correlation. Our results suggest that likelihood-only detection with normalizing flows offers a practical and reliable approach for anomaly detection in tabular domains.
