Detecting Out-of-Distribution Samples via Conditional Distribution Entropy with Optimal Transport
Chuanwen Feng, Wenlong Chen, Ao Ke, Yilong Ren, Xike Xie, S. Kevin Zhou
TL;DR
This work tackles out-of-distribution detection when test inputs are available by modeling training and test data as empirical distributions and measuring their geometric discrepancy through discrete entropic optimal transport. The authors introduce a conditional distribution entropy score derived from the OT transport plan, enabling a principled, parameterized measure of uncertainty that distinguishes ID from OOD samples. The method integrates supervised or self-supervised contrastive training to obtain compact, discriminative features and demonstrates state-of-the-art performance across benchmarks such as CIFAR-100 vs CIFAR-10 and in large semantic spaces, with efficient Sinkhorn-based computation. By combining pair- and population-wise information without distributional assumptions, the approach offers a practical, training-agnostic framework for robust OOD detection in open-world and continual-learning contexts.
Abstract
When deploying a trained machine learning model in the real world, it is inevitable to receive inputs from out-of-distribution (OOD) sources. For instance, in continual learning settings, it is common to encounter OOD samples due to the non-stationarity of a domain. More generally, when we have access to a set of test inputs, the existing rich line of OOD detection solutions, especially the recent promise of distance-based methods, falls short in effectively utilizing the distribution information from training samples and test inputs. In this paper, we argue that empirical probability distributions that incorporate geometric information from both training samples and test inputs can be highly beneficial for OOD detection in the presence of test inputs available. To address this, we propose to model OOD detection as a discrete optimal transport problem. Within the framework of optimal transport, we propose a novel score function known as the \emph{conditional distribution entropy} to quantify the uncertainty of a test input being an OOD sample. Our proposal inherits the merits of certain distance-based methods while eliminating the reliance on distribution assumptions, a-prior knowledge, and specific training mechanisms. Extensive experiments conducted on benchmark datasets demonstrate that our method outperforms its competitors in OOD detection.
