Constraining Anomaly Detection with Anomaly-Free Regions

Maximilian Toller; Hussain Hussain; Roman Kern; Bernhard C. Geiger

Constraining Anomaly Detection with Anomaly-Free Regions

Maximilian Toller, Hussain Hussain, Roman Kern, Bernhard C. Geiger

TL;DR

The empirical results confirm that anomaly detection constrained via AFRs improves upon unconstrained anomaly detection and shows that, when equipped with an estimated AFR, an efficient algorithm based on random guessing becomes a strong baseline that several widely-used methods struggle to overcome.

Abstract

We propose the novel concept of anomaly-free regions (AFR) to improve anomaly detection. An AFR is a region in the data space for which it is known that there are no anomalies inside it, e.g., via domain knowledge. This region can contain any number of normal data points and can be anywhere in the data space. AFRs have the key advantage that they constrain the estimation of the distribution of non-anomalies: The estimated probability mass inside the AFR must be consistent with the number of normal data points inside the AFR. Based on this insight, we provide a solid theoretical foundation and a reference implementation of anomaly detection using AFRs. Our empirical results confirm that anomaly detection constrained via AFRs improves upon unconstrained anomaly detection. Specifically, we show that, when equipped with an estimated AFR, an efficient algorithm based on random guessing becomes a strong baseline that several widely-used methods struggle to overcome. On a dataset with a ground-truth AFR available, the current state of the art is outperformed.

Constraining Anomaly Detection with Anomaly-Free Regions

TL;DR

Abstract

Paper Structure (54 sections, 5 theorems, 37 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 54 sections, 5 theorems, 37 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Concept.
Novelty.
Illustration.
Mathematical foundation.
Real-world usefulness.
Contributions.
Related Work
Theory
Constrained Likelihood
Maximum Likelihood Estimators for $p$ and $\bm{\theta}$
Karush-Kuhn-Tucker conditions
MLE of $p$
MLE of $\bm{\theta}$ under the Gaussian assumption
Implementation
...and 39 more sections

Key Result

Theorem 1

For $\lambda_1=0$ and $\lambda_2\neq 0$, the equation system eq:KKT has the following solution for $p$ where $\Omega<n-\sum_{t=1}^{n}\hat{B}_t$ is the density surplus gradient, which is given by and where (either of the) derivatives of the right-hand side are evaluated at an MLE of $\theta$.

Figures (5)

Figure 1: Constraining density estimation with an AFR. For a fixed toy dataset and a fixed AFR, three different density estimates for normal data points are compared. The left and middle density estimates are inconsistent and give a too low likelihood of observing the (known normal) data points inside the AFR. The right density estimate makes use of the available knowledge (AFR) and gives a reasonable likelihood for the data points inside the AFR.
Figure 2: Illustration of the estimation procedure for $\mu$. Since Eq. \ref{['eq:mu']} is quasi-concave, it is easy to find the constrained MLE, which is located at the function's root that is closer to the unconstrained MLE. The center of the AFR is at $\frac{a+b}{2}$, and there we have $\frac{\partial L}{\partial \mu}=0$, so Eq. \ref{['eq:mu']} is undefined at $\frac{a+b}{2}$.
Figure 3: Results of the sensitivity analysis. CAMLE was computed with 1100 different AFRs per dimension of dataset Annthyroid. CAMLE retaints the highest scores among all compared detectors for AFR lengths $0.108\le\Delta<0.892$.
Figure 4: Sensitivity analysis for all other evaluation datasets. The dashed blue line indicates CAMLE's performance on this dataset using the default parametrization as reported in Table \ref{['tab:benchmar_results']}. These results indicate high stability of the method with respect to the estimated AFR. Please note the small variance of the results.
Figure 5: Visualization of the Office dataset. An anonymous worker's daily work time deviation has natural fluctuation due to flexible work time. Some deviations are anomalous, e.g., because the worker forget to log out when leaving the company. The region $[-29;29]$ contains no anomalies and is a valid AFR derived from domain knowledge.

Theorems & Definitions (9)

Definition 1: Anomaly-free Region
Theorem 1
Theorem 2
Theorem 1
proof
Theorem 2
proof
Proposition 1
proof

Constraining Anomaly Detection with Anomaly-Free Regions

TL;DR

Abstract

Constraining Anomaly Detection with Anomaly-Free Regions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)