Table of Contents
Fetching ...

Leveraging NTPs for Efficient Hallucination Detection in VLMs

Ofir Azachi, Kfir Eliyahu, Eyal El Ani, Rom Himelstein, Roi Reichart, Yuval Pinter, Nitay Calderon

TL;DR

This work tackles hallucination detection in vision-language models (VLMs) by avoiding costly external predictors and instead leveraging next-token probabilities (NTPs) collected during generation. It introduces Description NTPs and Linguistic NTPs, derives both raw and statistical features, and trains lightweight ML models that can operate on-the-fly; a dataset of $1{,}400$ probes from $350$ images evaluates the approach. Results show that statistical NTP features can rival predictor-based signals in speed and approach their accuracy, and that combining NTP features with predictor outputs yields the strongest performance. The study demonstrates a practical, scalable path to improving VLM reliability and highlights how linguistic biases in generation influence hallucination signals, pointing to future directions in uncertainty-aware generation and feature integration.

Abstract

Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.

Leveraging NTPs for Efficient Hallucination Detection in VLMs

TL;DR

This work tackles hallucination detection in vision-language models (VLMs) by avoiding costly external predictors and instead leveraging next-token probabilities (NTPs) collected during generation. It introduces Description NTPs and Linguistic NTPs, derives both raw and statistical features, and trains lightweight ML models that can operate on-the-fly; a dataset of probes from images evaluates the approach. Results show that statistical NTP features can rival predictor-based signals in speed and approach their accuracy, and that combining NTP features with predictor outputs yields the strongest performance. The study demonstrates a practical, scalable path to improving VLM reliability and highlights how linguistic biases in generation influence hallucination signals, pointing to future directions in uncertainty-aware generation and feature integration.

Abstract

Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.

Paper Structure

This paper contains 17 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Illustration of our method: Linguistic NTPs are extracted during the VLM’s text generation process. Description NTPs require an additional forward pass using only the generated text. Statistical features are then computed from the NTPs, and a lightweight traditional ML model uses these features to detect hallucinations.
  • Figure 2: An example for the data features.
  • Figure 3: AUC-ROC performance of traditional ML models using statistical features of NTPs and various $\mathbf{Pred}$ features. Each bar group corresponds to a specific feature combination, while the dashed lines denote the LLaVA and PaliGemma baselines. Error bars indicate 95% confidence intervals.
  • Figure 4:
  • Figure 5: Leave-one-out ablation study on our features. Excluding (left) and including (right) LLaVA predictions.
  • ...and 3 more figures