Table of Contents
Fetching ...

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Yabin Zhang, Lei Zhang

TL;DR

This work introduces adaptive negative proxies, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance.

Abstract

Recent research has shown that pre-trained vision-language models are effective at identifying out-of-distribution (OOD) samples by using negative labels as guidance. However, employing consistent negative labels across different OOD datasets often results in semantic misalignments, as these text labels may not accurately reflect the actual space of OOD images. To overcome this issue, we introduce \textit{adaptive negative proxies}, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance. Specifically, our approach utilizes a feature memory bank to selectively cache discriminative features from test images, representing the targeted OOD distribution. This facilitates the creation of proxies that can better align with specific OOD datasets. While task-adaptive proxies average features to reflect the unique characteristics of each dataset, the sample-adaptive proxies weight features based on their similarity to individual test samples, exploring detailed sample-level nuances. The final score for identifying OOD samples integrates static negative labels with our proposed adaptive proxies, effectively combining textual and visual knowledge for enhanced performance. Our method is training-free and annotation-free, and it maintains fast testing speed. Extensive experiments across various benchmarks demonstrate the effectiveness of our approach, abbreviated as AdaNeg. Notably, on the large-scale ImageNet benchmark, our AdaNeg significantly outperforms existing methods, with a 2.45\% increase in AUROC and a 6.48\% reduction in FPR95. Codes are available at \url{https://github.com/YBZh/OpenOOD-VLM}.

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

TL;DR

This work introduces adaptive negative proxies, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance.

Abstract

Recent research has shown that pre-trained vision-language models are effective at identifying out-of-distribution (OOD) samples by using negative labels as guidance. However, employing consistent negative labels across different OOD datasets often results in semantic misalignments, as these text labels may not accurately reflect the actual space of OOD images. To overcome this issue, we introduce \textit{adaptive negative proxies}, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance. Specifically, our approach utilizes a feature memory bank to selectively cache discriminative features from test images, representing the targeted OOD distribution. This facilitates the creation of proxies that can better align with specific OOD datasets. While task-adaptive proxies average features to reflect the unique characteristics of each dataset, the sample-adaptive proxies weight features based on their similarity to individual test samples, exploring detailed sample-level nuances. The final score for identifying OOD samples integrates static negative labels with our proposed adaptive proxies, effectively combining textual and visual knowledge for enhanced performance. Our method is training-free and annotation-free, and it maintains fast testing speed. Extensive experiments across various benchmarks demonstrate the effectiveness of our approach, abbreviated as AdaNeg. Notably, on the large-scale ImageNet benchmark, our AdaNeg significantly outperforms existing methods, with a 2.45\% increase in AUROC and a 6.48\% reduction in FPR95. Codes are available at \url{https://github.com/YBZh/OpenOOD-VLM}.

Paper Structure

This paper contains 17 sections, 15 equations, 4 figures, 13 tables, 1 algorithm.

Figures (4)

  • Figure 1: Qualitative and quantitative analyses of semantic misalignment between OOD labels and negative proxies using ImageNet (ID) and SUN (OOD) datasets. (a) Visualization of ID labels, OOD labels, negative labels from NegLabel, and adaptive negative proxies (AdaNeg). (b) Quantitative analysis based on ID-Similarity to OOD Ratio (ISOR in short, see Appendix \ref{['subsec:id_aligning_scores']}). Lower ISOR indicates a higher similarity to OOD labels and reduced similarity to ID labels. AdaNeg consistently achieves lower ISOR, demonstrating enhanced alignment with OOD characteristics. Visualizations include the top 1,000 discriminative proxies from both NegLabel and AdaNeg.
  • Figure 2: The overall framework of AdaNeg, where we selectively cache test images and generate adaptive proxies with an external feature memory bank. The final score combines textual and visual knowledge from static negative labels and our adaptive proxies, integrating multi-modal information.
  • Figure 3: Analyses on the hyper-parameters of (a) threshold $\gamma$ in Eq. \ref{['Equ:criterion_memorized_tau_g']}, (b) gap value $g$ in Eq. \ref{['Equ:criterion_memorized_tau_g']}, and (c) memory length $L$ on the ImageNet dataset under OpenOOD setting.
  • Figure A4: Analyses on the hyper-parameters of (a) $\lambda$ in Eq. \ref{['equ:adaneg_score_all']} and (b) $\beta$ in Eq. \ref{['Equ:sample_adaptive_classifier']} on the ImageNet dataset under OpenOOD setting.