A Radon-Nikodým Perspective on Anomaly Detection: Theory and Implications
Shlok Mehendale, Aditya Challa, Rahul Yedida, Sravan Danda, Santonu Sarkar, Snehanshu Saha
TL;DR
This work introduces RN-Loss, a principled loss design for anomaly detection that multiplies a base loss by a Radon–Nikodým derivative to rectify distributional shifts between training and evaluation. Grounded in PAC learnability, the approach yields supervised instantiations as weighted losses and unsupervised instantiations via CBLOF-like corrections, with a unified theoretical framework linking both paradigms. Empirically, RN-Loss-based RN-Net and RN-LSTM outperform state-of-the-art methods across 96 diverse datasets, notably in multivariate and time-series domains, while RN-corrected unsupervised variants maintain competitive performance. The methodology offers a flexible, efficient, and broadly applicable tool for anomaly detection with potential implications for imbalanced data and regularization perspectives through the derivative.
Abstract
Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of PAC (Probably Approximately Correct) learnability. Depending on the context a Radon-Nikodým derivative takes different forms. In the simplest case of supervised anomaly detection, Radon-Nikodým derivative takes the form of a simple weighted loss. In the case of unsupervised anomaly detection (with distributional assumptions), Radon-Nikodým derivative takes the form of the popular cluster based local outlier factor. We evaluate our algorithm on 96 datasets, including univariate and multivariate data from diverse domains, including healthcare, cybersecurity, and finance. We show that RN-Derivative algorithms outperform state-of-the-art methods on 68% of Multivariate datasets (based on F1 scores) and also achieves peak F1-scores on 72% of time series (Univariate) datasets.
