Table of Contents
Fetching ...

Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega

TL;DR

The paper addresses the need for scalable OOD detection in large language-model settings. It proposes NNK-Means, a soft clustering approach based on non-negative kernel regression, and its entropy-constrained variant EC-NNK-Means, to detect OOD with reduced compute and storage. Empirical results across four benchmarks show competitive or superior AUROC while achieving significant reductions in inference time and memory, and the method remains effective with or without labeled ID data and across diverse embeddings. The approach offers a practical, scalable option for tail-end phenomena in extreme-scale data.

Abstract

As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11x improvement in inference time and 87% reduction in storage requirements) and outperforms existing approaches by up to 4 AUROC points on four different benchmarks. We also introduce an entropy-constrained version of our algorithm, which leads to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings.

Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression

TL;DR

The paper addresses the need for scalable OOD detection in large language-model settings. It proposes NNK-Means, a soft clustering approach based on non-negative kernel regression, and its entropy-constrained variant EC-NNK-Means, to detect OOD with reduced compute and storage. Empirical results across four benchmarks show competitive or superior AUROC while achieving significant reductions in inference time and memory, and the method remains effective with or without labeled ID data and across diverse embeddings. The approach offers a practical, scalable option for tail-end phenomena in extreme-scale data.

Abstract

As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11x improvement in inference time and 87% reduction in storage requirements) and outperforms existing approaches by up to 4 AUROC points on four different benchmarks. We also introduce an entropy-constrained version of our algorithm, which leads to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings.
Paper Structure (48 sections, 21 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 48 sections, 21 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration comparing KNN (top) with kMeans (middle) and our proposed NNK-Means (bottom). The use of soft-clustering allows our method to detect OOD instances even when they are close to ID training data. It also better captures the underlying data geometry, enabling more accurate identification of ID data points than kMeans.
  • Figure 2: Final number of atoms and AUROC for different values of Entropy Constraint hyper-parameter $\lambda$, and number of starting atoms. Reported on 20 Newsgroups with 25% ID classes. EC-NNK-Means can yield competitive performance with 90% less memory usage.
  • Figure 3: OOD Detection AUROC on 20 Newsgroups with 50% ID classes, with different Sentence-BERT embeddings. Results are averaged over 5 random seeds.
  • Figure 4: 2D visualization of 20 Newsgroups validation dataset and learned clusters, with 25% ID classes.