Leveraging NTPs for Efficient Hallucination Detection in VLMs
Ofir Azachi, Kfir Eliyahu, Eyal El Ani, Rom Himelstein, Roi Reichart, Yuval Pinter, Nitay Calderon
TL;DR
This work tackles hallucination detection in vision-language models (VLMs) by avoiding costly external predictors and instead leveraging next-token probabilities (NTPs) collected during generation. It introduces Description NTPs and Linguistic NTPs, derives both raw and statistical features, and trains lightweight ML models that can operate on-the-fly; a dataset of $1{,}400$ probes from $350$ images evaluates the approach. Results show that statistical NTP features can rival predictor-based signals in speed and approach their accuracy, and that combining NTP features with predictor outputs yields the strongest performance. The study demonstrates a practical, scalable path to improving VLM reliability and highlights how linguistic biases in generation influence hallucination signals, pointing to future directions in uncertainty-aware generation and feature integration.
Abstract
Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.
