Table of Contents
Fetching ...

When lookout sees crackle: Anomaly detection via kernel density estimation

Rob J Hyndman, Sevvandi Kandanaarachchi, Katharine Turner

Abstract

We present an updated version of lookout -- an algorithm for detecting anomalies using kernel density estimates with bandwidth based on Rips death diameters -- with theoretical guarantees. The kernel density estimator for updated lookout is shown to be consistent, and the proposed multivariate scaling is robust and efficient. We show our updated algorithm performs better than the previous version on diverse examples.

When lookout sees crackle: Anomaly detection via kernel density estimation

Abstract

We present an updated version of lookout -- an algorithm for detecting anomalies using kernel density estimates with bandwidth based on Rips death diameters -- with theoretical guarantees. The kernel density estimator for updated lookout is shown to be consistent, and the proposed multivariate scaling is robust and efficient. We show our updated algorithm performs better than the previous version on diverse examples.
Paper Structure (30 sections, 11 theorems, 46 equations, 10 figures, 2 algorithms)

This paper contains 30 sections, 11 theorems, 46 equations, 10 figures, 2 algorithms.

Key Result

Theorem 3

If there exist sequences of constants $\{a_n>0\}$ and $\{b_n\}$ such that for a non-degenerate distribution function $H$, then $H$ is a member of the Generalized Extreme Value (GEV) family where the domain of the function is $\{z: 1 + \xi (z - \mu)/\sigma >0 \}$ and the parameters $\mu, \xi \in \mathbb{R}$ and $\sigma > 0$.

Figures (10)

  • Figure 1: Experiment 1 data and results. Left: data for two iterations of the experiment with $r = 0.2$ and $r = 0.9$. Anomalies with lower values of $r$ are easier to identify. Right: the relative performance of the two algorithms. The points in the right panel are slightly jittered horizontally to avoid overlapping points.
  • Figure 2: Experiment 2 data and results. Left: data for two iterations of the experiment with $\mu = 2.5$ and $\mu = 3.75$. Anomalies with higher values of $\mu$ are easier to identify. Right: the relative performance of the two algorithms. The points in the right panel are slightly jittered horizontally to avoid overlapping points.
  • Figure 3: Experiment 3 data and results using Normal distributions. Left: data for $n=10000$. Right: the relative performance of the two algorithms. The points in the right panel are slightly jittered horizontally to avoid overlapping points.
  • Figure 4: Experiment 4 data and results using Gamma distributions. Left: data for $n=10000$. Right: the relative performance of the two algorithms. The points in the right panel are slightly jittered horizontally to avoid overlapping points.
  • Figure 5: Comparing new lookout with the older version on some showcase examples
  • ...and 5 more figures

Theorems & Definitions (23)

  • Definition 1: Anomalies
  • Definition 2: Surprisals
  • Theorem 3: Fisher-Tippett-Gnedenko
  • Theorem 4: Pickands
  • Theorem 5: Consistency
  • Definition 6: Admissible bandwidth
  • Theorem 7
  • proof
  • Lemma 8
  • proof
  • ...and 13 more