Table of Contents
Fetching ...

Anomaly detection using surprisals

Rob J Hyndman, David T. Frazier

TL;DR

A unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model is proposed and conditions under which tail ordering is preserved and finite-sample confidence guarantees are derived via the Dvoretzky--Kiefer--Wolfowitz inequality.

Abstract

Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.

Anomaly detection using surprisals

TL;DR

A unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model is proposed and conditions under which tail ordering is preserved and finite-sample confidence guarantees are derived via the Dvoretzky--Kiefer--Wolfowitz inequality.

Abstract

Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.
Paper Structure (16 sections, 6 theorems, 42 equations, 5 figures)

This paper contains 16 sections, 6 theorems, 42 equations, 5 figures.

Key Result

Lemma 2.1

Let $S_1,\dots, S_n$ be iid with distribution function $G$. Then, for any $\alpha \in(0,1)$, and $\epsilon = \sqrt{\log(2/\alpha)/(2n)}$, we have for all $s \ge s_\star$, $\widehat{\mathbb{P}}_n(s)-\epsilon \leq G(s) \leq \widehat{\mathbb{P}}_n+\epsilon$ with probability at least $1-\alpha$ if and o

Figures (5)

  • Figure 1: Left: Observations are computed from a $N(0,1)$ distribution. Right: Observations are computed from a $t(4)$ distribution. Distribution indicates which distribution was used to compute the surprisal values. Estimate shows how the tail probabilities were computed. This demonstrates that the surprisal probabilities computed using the empirical distribution or under the GPD are still relatively accurate, even when the wrong distribution is used to compute the surprisal values.
  • Figure 2: Estimated false anomaly rate under different approximations when $\alpha = 0.01$.
  • Figure 3: French mortality rates by sex and age from 1816 to 1999. We use a log scale because the rates are vastly different for different age groups.
  • Figure 4: Anomalies identified in the French mortality data by year and age. Wars and epidemics in French history are revealed.
  • Figure 5: Proportion of not outs for each batter as a function of the number of innings they played. The blue line and associated 95% confidence interval shows the probability of a batter not being dismissed as a function of the number of innings they have played.

Theorems & Definitions (12)

  • Lemma 2.1
  • Theorem 3.1
  • proof
  • Lemma B.1
  • proof
  • Corollary B.1
  • proof
  • Lemma B.2
  • proof
  • Corollary B.2
  • ...and 2 more