Table of Contents
Fetching ...

Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection

Faizul Rakib Sayem, Shahana Ibrahim

TL;DR

This work targets robust few-shot out-of-distribution detection in vision-language models by integrating subspace learning with prompt optimization. By projecting ID features into a subspace spanned by learnable prompt vectors and ID-irrelevant features into the orthogonal null space, the method enforces geometry-aware separation between ID and OOD samples. The end-to-end loss combines cross-entropy, subspace regularizations, and entropy terms, enabling strong OOD performance with a small set of prompts and minimal extra cost. Empirical results on ImageNet-1k/100 and standard OOD benchmarks show SubCoOp consistently surpassing prior prompt-tuning methods in OOD detection while preserving ID accuracy, highlighting its practical potential for open-world deployment of CLIP-based systems.

Abstract

The reliability of artificial intelligence (AI) systems in open-world settings depends heavily on their ability to flag out-of-distribution (OOD) inputs unseen during training. Recent advances in large-scale vision-language models (VLMs) have enabled promising few-shot OOD detection frameworks using only a handful of in-distribution (ID) samples. However, existing prompt learning-based OOD methods rely solely on softmax probabilities, overlooking the rich discriminative potential of the feature embeddings learned by VLMs trained on millions of samples. To address this limitation, we propose a novel context optimization (CoOp)-based framework that integrates subspace representation learning with prompt tuning. Our approach improves ID-OOD separability by projecting the ID features into a subspace spanned by prompt vectors, while projecting ID-irrelevant features into an orthogonal null space. To train such OOD detection framework, we design an easy-to-handle end-to-end learning criterion that ensures strong OOD detection performance as well as high ID classification accuracy. Experiments on real-world datasets showcase the effectiveness of our approach.

Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection

TL;DR

This work targets robust few-shot out-of-distribution detection in vision-language models by integrating subspace learning with prompt optimization. By projecting ID features into a subspace spanned by learnable prompt vectors and ID-irrelevant features into the orthogonal null space, the method enforces geometry-aware separation between ID and OOD samples. The end-to-end loss combines cross-entropy, subspace regularizations, and entropy terms, enabling strong OOD performance with a small set of prompts and minimal extra cost. Empirical results on ImageNet-1k/100 and standard OOD benchmarks show SubCoOp consistently surpassing prior prompt-tuning methods in OOD detection while preserving ID accuracy, highlighting its practical potential for open-world deployment of CLIP-based systems.

Abstract

The reliability of artificial intelligence (AI) systems in open-world settings depends heavily on their ability to flag out-of-distribution (OOD) inputs unseen during training. Recent advances in large-scale vision-language models (VLMs) have enabled promising few-shot OOD detection frameworks using only a handful of in-distribution (ID) samples. However, existing prompt learning-based OOD methods rely solely on softmax probabilities, overlooking the rich discriminative potential of the feature embeddings learned by VLMs trained on millions of samples. To address this limitation, we propose a novel context optimization (CoOp)-based framework that integrates subspace representation learning with prompt tuning. Our approach improves ID-OOD separability by projecting the ID features into a subspace spanned by prompt vectors, while projecting ID-irrelevant features into an orthogonal null space. To train such OOD detection framework, we design an easy-to-handle end-to-end learning criterion that ensures strong OOD detection performance as well as high ID classification accuracy. Experiments on real-world datasets showcase the effectiveness of our approach.

Paper Structure

This paper contains 7 sections, 9 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: The proposed Subspace learning-based Context Optimization (SubCoOp) framework for prompt-learning-based OOD detection.
  • Figure 2: Example images from the iNaturalist dataset that are visually and semantically similar to certain ImageNet-1k classes. Comparison of similarity scores from SCT and our proposed SubCoOp. While SCT assigns high similarity scores to the ImageNet-1k ID classes, leading to incorrect detection as ID, SubCoOp effectively suppresses such scores, enabling the correct OOD detection.
  • Figure 3: OOD detection performance of various few-shot techniques in ImageNet-1k dataset
  • Figure 4: OOD performance of our method SubCoOp and other methods across various OOD datasets with ID dataset as ImageNet-100
  • Figure 5: Average OOD detection performance across different image encoders for ImageNet-1k dataset
  • ...and 5 more figures

Theorems & Definitions (1)

  • Remark 1