Table of Contents
Fetching ...

A Radon-Nikodým Perspective on Anomaly Detection: Theory and Implications

Shlok Mehendale, Aditya Challa, Rahul Yedida, Sravan Danda, Santonu Sarkar, Snehanshu Saha

TL;DR

This work introduces RN-Loss, a principled loss design for anomaly detection that multiplies a base loss by a Radon–Nikodým derivative to rectify distributional shifts between training and evaluation. Grounded in PAC learnability, the approach yields supervised instantiations as weighted losses and unsupervised instantiations via CBLOF-like corrections, with a unified theoretical framework linking both paradigms. Empirically, RN-Loss-based RN-Net and RN-LSTM outperform state-of-the-art methods across 96 diverse datasets, notably in multivariate and time-series domains, while RN-corrected unsupervised variants maintain competitive performance. The methodology offers a flexible, efficient, and broadly applicable tool for anomaly detection with potential implications for imbalanced data and regularization perspectives through the derivative.

Abstract

Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of PAC (Probably Approximately Correct) learnability. Depending on the context a Radon-Nikodým derivative takes different forms. In the simplest case of supervised anomaly detection, Radon-Nikodým derivative takes the form of a simple weighted loss. In the case of unsupervised anomaly detection (with distributional assumptions), Radon-Nikodým derivative takes the form of the popular cluster based local outlier factor. We evaluate our algorithm on 96 datasets, including univariate and multivariate data from diverse domains, including healthcare, cybersecurity, and finance. We show that RN-Derivative algorithms outperform state-of-the-art methods on 68% of Multivariate datasets (based on F1 scores) and also achieves peak F1-scores on 72% of time series (Univariate) datasets.

A Radon-Nikodým Perspective on Anomaly Detection: Theory and Implications

TL;DR

This work introduces RN-Loss, a principled loss design for anomaly detection that multiplies a base loss by a Radon–Nikodým derivative to rectify distributional shifts between training and evaluation. Grounded in PAC learnability, the approach yields supervised instantiations as weighted losses and unsupervised instantiations via CBLOF-like corrections, with a unified theoretical framework linking both paradigms. Empirically, RN-Loss-based RN-Net and RN-LSTM outperform state-of-the-art methods across 96 diverse datasets, notably in multivariate and time-series domains, while RN-corrected unsupervised variants maintain competitive performance. The methodology offers a flexible, efficient, and broadly applicable tool for anomaly detection with potential implications for imbalanced data and regularization perspectives through the derivative.

Abstract

Which principle underpins the design of an effective anomaly detection loss function? The answer lies in the concept of Radon-Nikodým theorem, a fundamental concept in measure theory. The key insight from this article is: Multiplying the vanilla loss function with the Radon-Nikodým derivative improves the performance across the board. We refer to this as RN-Loss. We prove this using the setting of PAC (Probably Approximately Correct) learnability. Depending on the context a Radon-Nikodým derivative takes different forms. In the simplest case of supervised anomaly detection, Radon-Nikodým derivative takes the form of a simple weighted loss. In the case of unsupervised anomaly detection (with distributional assumptions), Radon-Nikodým derivative takes the form of the popular cluster based local outlier factor. We evaluate our algorithm on 96 datasets, including univariate and multivariate data from diverse domains, including healthcare, cybersecurity, and finance. We show that RN-Derivative algorithms outperform state-of-the-art methods on 68% of Multivariate datasets (based on F1 scores) and also achieves peak F1-scores on 72% of time series (Univariate) datasets.

Paper Structure

This paper contains 40 sections, 3 theorems, 17 equations, 5 figures, 4 tables.

Key Result

theorem 1

Let $(\Omega,\mathcal{A})$ be a measurable space with $\mathcal{A}$ as the $\sigma$ algebra and $\mu, \nu$ denote two $\sigma-$finite measures such that $\nu << \mu$ ($\nu$ is absolutely continuous with respect to $\mu$). Then, there exists a function $f$ such that, where A $\in \mathcal{A}$.

Figures (5)

  • Figure 1: Comparative Analysis of Anomaly Detection Algorithms. Performance evaluation prioritizes recall, with precision as the secondary metric for tied cases. The comparison spans three algorithm categories: (1) Deep Learning approaches (AutoEncoders, DAGMM, DevNet, GAN, DeepSAD, FTTransformer (current state-of-the-art), and PReNet), (2) Unsupervised methods (LOF, Elliptic Envelope, Isolation Forest, dBTAI, MGBTAI, and quantile-based approaches including q-LSTM variants and QReg), and (3) RN-Derivative algorithms (RN-Net and RN-LSTM). RN-Net outperforms state-of-the-art methods on 68% of Multivariate datasets (based on F-1 scores), while the RN-LSTM + RN-Net combination achieves peak F1-scores on 72% of time series (Univariate) datasets. For detailed numerical comparisons
  • Figure 2: Overview of Datasets: Observe that the datasets considered have wide range of total sizes, anomaly percentages, and as well as diversity with respect to other characteristics such as Univariate vs Multivariate, Time-Series vs Non-Time-Series. (a) shows the number of datasets with different number of samples. Observe that the sizes vary from $80$ to $619,329$. (b) Shows the number of datasets with respect to percentage anomalies. We consider 4 ranges corresponding to "very less", "less", "medium" and "large" number of anomalies. (c) indicates number of datasets which has a time-series characteristic.
  • Figure 3: Performance of RN-Loss for supervised anomaly detection. indicates the percentage of the datasets in which the specific algorithm performs better than RN-Net. indicates the percentage of the datasets in which the specific algorithm tied with RN-Net. indicates the percentage of the datasets in which the specific algorithm performs worse than RN-Net. (a) shows the performance of RN-Loss with respect to AUROC. (b) shows the performance of the RN-Loss with respect to F1-score. (d)-(f) shows the performance of RN-Loss when considering various ranges of anamoly percentages.
  • Figure 4: Performance of RN-Net on Multivariate and Time-Series Datasets. (a) illustrates the results on multivariate datasets. (b) shows the results on time-series datasets.
  • Figure 5: Performance of Radon--Nikodým derivative correction of unsupervised algorithms with respect to AUROC. Observe that the Radon--Nikodým derivative corrected unsupervised algorithms -- dbTAI(Mod.), KMeans-CBLOF and KMeans-CBLOF(Mod.) perform better than recent state-of-the-art algorithms such as ICL, NeuTraL, SLAD etc. indicates the percentage of datasets where the row algorithm outperforms the column algorithm. indicates the percentage where both algorithms perform equally. indicates the percentage where the row algorithm underperforms relative to the column algorithm.

Theorems & Definitions (3)

  • theorem 1: Radon--Nikodým RN_Thm
  • proposition 1
  • theorem 2