Table of Contents
Fetching ...

Oddballness: universal anomaly detection with language models

Filip Graliński, Ryszard Staruch, Krzysztof Jurkiewicz

TL;DR

It is demonstrated in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

Abstract

We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

Oddballness: universal anomaly detection with language models

TL;DR

It is demonstrated in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

Abstract

We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.
Paper Structure (7 sections, 6 equations, 1 figure, 3 tables)

This paper contains 7 sections, 6 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Illustration of oddballness $\xi_D$ and "probability of probability" ($\pi_D$) for event $\omega_2$ of probablity $p_2=0.25$ for $D_3 = \{p_1=0.7, p_2=0.25, p_3=0.05\}$