Out-of-Distribution Detection using Maximum Entropy Coding
Mojtaba Abolfazli, Mohammad Zaeri Amirani, Anders Høst-Madsen, June Zhang, Andras Bratincsak
TL;DR
This work tackles out-of-distribution detection for both known and unknown default distributions by extending Kolmogorov–Martin–Löf randomness to continuous data through maximum-entropy statistics and universal coding. It builds a principled test that compares the null code length $L_P(\mathbf{x}^M)$ with universal code lengths $L_{P_T}(\mathbf{x}^M)$ derived from statistic-specific maximum-entropy models, optionally combining them with CTW weighting. The authors provide practical instantiations, including a histogram-based detector, a universal multivariate Gaussian coder with sparse precision graphs, a Gamma coder for radial statistics, and a Glow-based latent transform to handle unknown $P$, with extensive synthetic and real-world MNIST experiments showing superior performance to KS tests and a Typicality baseline in many scenarios. Limitations include high-dimensional covariance inversions and the assumption that latent transforms render Gaussianity, guiding future work toward mixtures of Gaussians, higher-dimensional histograms, and scalable inversion techniques.
Abstract
Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties. For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.
