Table of Contents
Fetching ...

Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du

TL;DR

This paper tackles the challenging problem of near OOD detection in CLIP-based prompt learning, a setting where in-distribution and near OOD samples share the same domain but have disjoint labels. It introduces a simple, fast post-hoc framework that augments existing logit-based scores with an Empty-Class score and derives Relative Energy/MaxLogit scores by subtracting a margin-scaled Empty-Class term, with the tightening margin $eta$ estimated from ID data via a bivariate normal MLE to minimize score correlation. Empirically, the method consistently improves near OOD AUROC and reduces FPR95 across 13 datasets and 8 prompt-learning models (including CoOp, CoCoOp, LoCoOp, and others), achieving up to 11.67% AUROC gains without retraining. The approach also sheds light on the relationship between dataset distance and OOD detector performance, shows applicability to far OOD detection with broad improvements, and discusses limitations and potential extensions to MCM and other OOD settings. Overall, the work provides a practical, architecture-agnostic tool to bolster near OOD detection in vision-language prompts, with clear implications for safer deployment of CLIP-style systems.

Abstract

Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detection. However, the more challenging task of few-shot near OOD detection has not yet been addressed. In this study, we investigate the near OOD detection capabilities of prompt learning models and observe that commonly used OOD scores have limited performance in near OOD detection. To enhance the performance, we propose a fast and simple post-hoc method that complements existing logit-based scores, improving near OOD detection AUROC by up to 11.67% with minimal computational cost. Our method can be easily applied to any prompt learning model without change in architecture or re-training the models. Comprehensive empirical evaluations across 13 datasets and 8 models demonstrate the effectiveness and adaptability of our method.

Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

TL;DR

This paper tackles the challenging problem of near OOD detection in CLIP-based prompt learning, a setting where in-distribution and near OOD samples share the same domain but have disjoint labels. It introduces a simple, fast post-hoc framework that augments existing logit-based scores with an Empty-Class score and derives Relative Energy/MaxLogit scores by subtracting a margin-scaled Empty-Class term, with the tightening margin estimated from ID data via a bivariate normal MLE to minimize score correlation. Empirically, the method consistently improves near OOD AUROC and reduces FPR95 across 13 datasets and 8 prompt-learning models (including CoOp, CoCoOp, LoCoOp, and others), achieving up to 11.67% AUROC gains without retraining. The approach also sheds light on the relationship between dataset distance and OOD detector performance, shows applicability to far OOD detection with broad improvements, and discusses limitations and potential extensions to MCM and other OOD settings. Overall, the work provides a practical, architecture-agnostic tool to bolster near OOD detection in vision-language prompts, with clear implications for safer deployment of CLIP-style systems.

Abstract

Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detection. However, the more challenging task of few-shot near OOD detection has not yet been addressed. In this study, we investigate the near OOD detection capabilities of prompt learning models and observe that commonly used OOD scores have limited performance in near OOD detection. To enhance the performance, we propose a fast and simple post-hoc method that complements existing logit-based scores, improving near OOD detection AUROC by up to 11.67% with minimal computational cost. Our method can be easily applied to any prompt learning model without change in architecture or re-training the models. Comprehensive empirical evaluations across 13 datasets and 8 models demonstrate the effectiveness and adaptability of our method.
Paper Structure (30 sections, 2 theorems, 8 equations, 5 figures, 24 tables)

This paper contains 30 sections, 2 theorems, 8 equations, 5 figures, 24 tables.

Key Result

Lemma 3.1

Given $N$ scalar observations $\{\hat{x}_i\}_{i=1}^N$ and $\{\hat{y}_i\}_{i=1}^N$, we define two variables $x=\hat{x}$ and $y=\hat{y}-\beta \cdot \hat{x}$. The scale parameter $\beta$ that zeros out the covariance of two variables (i.e., the off-diagonals of a covariance matrix) which is approximate where $\mu_{\hat{x}}=\frac{1}{N}\sum_{i=1}^N\hat{x}_i$ and $\mu_{\hat{y}}=\frac{1}{N}\sum_{i=1}^N\h

Figures (5)

  • Figure 1: Density plots of Energy scores (left) and MaxLogit (right) computed with CoOp zhou2022learning on Flowers102 nilsback2008automated. Large regions of ID and near OOD samples overlap, which are highlighted by shaded boxes.
  • Figure 2: (a) Original MaxLogit scores, (b) Relative MaxLogit scores, and (c) Relative MaxLogit scores with scale $\beta$ of test ID and near OOD samples with respect to Empty-Class scores. Areas where ID samples and near OOD samples overlap are highlighted with shaded boxes. All scores are computed using MaPLe khattak2023maple on Caltech101 li2004learning.
  • Figure 3: Near OOD detection AUROC using Relative MaxLogit score vs. margin scale for CoOp zhou2022learning, CoCoOp zhou2022conditional, IVLP khattak2023maple, KgCoOp yao2023visual, ProGrad zhu2023prompt, MaPLe khattak2023maple, PromptSRC khattak2023self, and LoCoOp miyai2023locoop on 16-shots UCF101 soomro2012ucf101. The margin scale is approximated by Eq. \ref{['eq:scale']}, shown as red dotted lines.
  • Figure 4: Density plot of dataset distance between ID test dataset and near OOD dataset. The distance is measured when a relative score or MCM outperforms others in each dataset.
  • Figure 5: Comparison of MaxLogit (left) or MCM score (right) vs. Empty-Class score for near OOD samples with IVLP khattak2023maple on EuroSAT helber2019eurosat.

Theorems & Definitions (3)

  • Lemma 3.1
  • Lemma A.1
  • proof