Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
Oliver Hennhöfer, Christine Preisach
TL;DR
This work formalizes leave-one-out-, bootstrap-, and cross-conformal anomaly detectors by adapting conformal prediction to unsupervised one-class classifiers, enabling valid p-values under exchangeability and mitigating calibration-data limitations. It shows that resampling-conformal methods expand calibration information and can yield higher statistical power while controlling the batch-wise FDR at level $\alpha$ via the Benjamini-Hochberg procedure, even in low-data regimes. Through two large-scale experiments on ten benchmark datasets with three base detectors, the authors demonstrate that Jackknife-, CV-, and bootstrap-based conformal anomaly detectors offer reliable FDR control and improved power relative to split-conformal, with calibration-set size effects tapering in high-data regimes. The methods are model-agnostic, integrate with common anomaly detectors, and are accompanied by public software for reproducibility, facilitating practical uncertainty quantification in real-world anomaly detection systems.
Abstract
The requirement of uncertainty quantification for anomaly detection systems has become increasingly important. In this context, effectively controlling Type I error rates ($α$) without compromising the statistical power ($1-β$) of these systems can build trust and reduce costs related to false discoveries. The field of conformal anomaly detection emerges as a promising approach for providing respective statistical guarantees by model calibration. However, the dependency on calibration data poses practical limitations - especially within low-data regimes. In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for anomaly detection, incrementing on methods from the field of conformal prediction. Looking beyond the classical inductive conformal anomaly detection, we demonstrate that derived methods for calculating resampling-conformal $p$-values strike a practical compromise between statistical efficiency (full-conformal) and computational efficiency (split-conformal) as they make more efficient use of available data. We validate derived methods and quantify their improvements for a range of one-class classifiers and datasets.
