Out-of-Distribution Detection using Maximum Entropy Coding

Mojtaba Abolfazli; Mohammad Zaeri Amirani; Anders Høst-Madsen; June Zhang; Andras Bratincsak

Out-of-Distribution Detection using Maximum Entropy Coding

Mojtaba Abolfazli, Mohammad Zaeri Amirani, Anders Høst-Madsen, June Zhang, Andras Bratincsak

TL;DR

This work tackles out-of-distribution detection for both known and unknown default distributions by extending Kolmogorov–Martin–Löf randomness to continuous data through maximum-entropy statistics and universal coding. It builds a principled test that compares the null code length $L_P(\mathbf{x}^M)$ with universal code lengths $L_{P_T}(\mathbf{x}^M)$ derived from statistic-specific maximum-entropy models, optionally combining them with CTW weighting. The authors provide practical instantiations, including a histogram-based detector, a universal multivariate Gaussian coder with sparse precision graphs, a Gamma coder for radial statistics, and a Glow-based latent transform to handle unknown $P$, with extensive synthetic and real-world MNIST experiments showing superior performance to KS tests and a Typicality baseline in many scenarios. Limitations include high-dimensional covariance inversions and the assumption that latent transforms render Gaussianity, guiding future work toward mixtures of Gaussians, higher-dimensional histograms, and scalable inversion techniques.

Abstract

Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties. For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.

Out-of-Distribution Detection using Maximum Entropy Coding

TL;DR

with universal code lengths

derived from statistic-specific maximum-entropy models, optionally combining them with CTW weighting. The authors provide practical instantiations, including a histogram-based detector, a universal multivariate Gaussian coder with sparse precision graphs, a Gamma coder for radial statistics, and a Glow-based latent transform to handle unknown

, with extensive synthetic and real-world MNIST experiments showing superior performance to KS tests and a Typicality baseline in many scenarios. Limitations include high-dimensional covariance inversions and the assumption that latent transforms render Gaussianity, guiding future work toward mixtures of Gaussians, higher-dimensional histograms, and scalable inversion techniques.

Abstract

Given a default distribution

and a set of test data

this paper seeks to answer the question if it was likely that

was generated by

. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics

. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with

. We show that this approach satisfied a number of theoretical properties. For real world data

usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.

Paper Structure (14 sections, 7 theorems, 52 equations, 1 figure, 4 tables, 2 algorithms)

This paper contains 14 sections, 7 theorems, 52 equations, 1 figure, 4 tables, 2 algorithms.

Introduction
Related Work
Methodology
Coding
Histogram
Transformations
Multivariate Gaussian Default Model, $P$
OOD under multivariate Gaussian default model
Universal Multivariate Gaussian Coder
Universal Gamma Coder
Experiments on Synthetic Data
Unknown Default Model, $P$
Experiments on Real-World Data
Conclusion, Limitations, and Future work

Key Result

Proposition 1

Consider the set of distributions $P'$ that satisfy $E_{P'}[{\mathbf{T}}(\mathbf{x})]={\mathbf{t}}$. Among those the maximum entropy distribution $P_T$ is the minimax coding distribution, i.e., it achieves

Figures (1)

Figure 1: Different datasets used in MNIST experiments. Note that the synthetically-perturbed images look very similar to the original.

Theorems & Definitions (13)

Proposition 1
proof
Lemma 2
proof
Theorem 3
proof
Theorem 4
proof
Theorem 5
proof
...and 3 more

Out-of-Distribution Detection using Maximum Entropy Coding

TL;DR

Abstract

Out-of-Distribution Detection using Maximum Entropy Coding

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (13)