Vendi Novelty Scores for Out-of-Distribution Detection
Amey P. Pasarkar, Adji Bousso Dieng
TL;DR
The paper introduces the Vendi Novelty Score (VNS), a non-parametric, diversity-based OOD detector that leverages the Vendi Scores to quantify novelty in a class-conditional feature space. VNS computes a class-conditional novelty contribution for each class and aggregates these signals using the model’s predicted class probabilities over a top-K set, with an optional global density correction to capture dataset-wide novelty. Empirically, VNS achieves state-of-the-art or competitive OOD performance across CIFAR-10, CIFAR-100, and ImageNet-1K benchmarks with multiple architectures, while maintaining strong data-efficiency, working well with as little as 1% of the training data. The approach blends local (class-conditional) and global (dataset-wide) novelty signals in linear time and scales to large datasets, offering a practical and robust tool for safe deployment of deep models.
Abstract
Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estimates in feature space, often under restrictive distributional assumptions. In this work, we introduce a third paradigm and formulate OOD detection from a diversity perspective. We propose the Vendi Novelty Score (VNS), an OOD detector based on the Vendi Scores (VS), a family of similarity-based diversity metrics. VNS quantifies how much a test sample increases the VS of the in-distribution feature set, providing a principled notion of novelty that does not require density modeling. VNS is linear-time, non-parametric, and naturally combines class-conditional (local) and dataset-level (global) novelty signals. Across multiple image classification benchmarks and network architectures, VNS achieves state-of-the-art OOD detection performance. Remarkably, VNS retains this performance when computed using only 1% of the training data, enabling deployment in memory- or access-constrained settings.
