Table of Contents
Fetching ...

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -

Laura Fieback, Nishilkumar Balar, Jakob Spiegelberg, Hanno Gottschalk

TL;DR

Large Vision Language Models still suffer from hallucinations that misalign with visual input. The work introduces Efficient Contrastive Decoding (ECD), a training-free decoding approach that uses a lightweight probabilistic hallucination detector, built from intermediate LVLM features, to penalize likely hallucinations during generation. A discrimination mechanism combines the detector output with the LVLM's token probabilities to form a contrastive decoding distribution, and an Adaptive Plausibility Constraint further refines outputs. Across CHAIR, AMBER, POPE, and MME benchmarks on LLaVA 1.5, InstructBLIP, and MiniGPT-4, ECD consistently reduces hallucinations while delivering favorable time efficiency compared to prior contrastive decoding methods, demonstrating a practical path to safer LVLMs without additional training. These results highlight the method's potential for real-world deployment where reliable vision-language reasoning is critical.

Abstract

Despite recent advances in Large Vision Language Models (LVLMs), these models still suffer from generating hallucinatory responses that do not align with the visual input provided. To mitigate such hallucinations, we introduce Efficient Contrastive Decoding (ECD), a simple method that leverages probabilistic hallucination detection to shift the output distribution towards contextually accurate answers at inference time. By contrasting token probabilities and hallucination scores, ECD subtracts hallucinated concepts from the original distribution, effectively suppressing hallucinations. Notably, our proposed method can be applied to any open-source LVLM and does not require additional LVLM training. We evaluate our method on several benchmark datasets and across different LVLMs. Our experiments show that ECD effectively mitigates hallucinations, outperforming state-of-the-art methods with respect to performance on LVLM benchmarks and computation time.

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -

TL;DR

Large Vision Language Models still suffer from hallucinations that misalign with visual input. The work introduces Efficient Contrastive Decoding (ECD), a training-free decoding approach that uses a lightweight probabilistic hallucination detector, built from intermediate LVLM features, to penalize likely hallucinations during generation. A discrimination mechanism combines the detector output with the LVLM's token probabilities to form a contrastive decoding distribution, and an Adaptive Plausibility Constraint further refines outputs. Across CHAIR, AMBER, POPE, and MME benchmarks on LLaVA 1.5, InstructBLIP, and MiniGPT-4, ECD consistently reduces hallucinations while delivering favorable time efficiency compared to prior contrastive decoding methods, demonstrating a practical path to safer LVLMs without additional training. These results highlight the method's potential for real-world deployment where reliable vision-language reasoning is critical.

Abstract

Despite recent advances in Large Vision Language Models (LVLMs), these models still suffer from generating hallucinatory responses that do not align with the visual input provided. To mitigate such hallucinations, we introduce Efficient Contrastive Decoding (ECD), a simple method that leverages probabilistic hallucination detection to shift the output distribution towards contextually accurate answers at inference time. By contrasting token probabilities and hallucination scores, ECD subtracts hallucinated concepts from the original distribution, effectively suppressing hallucinations. Notably, our proposed method can be applied to any open-source LVLM and does not require additional LVLM training. We evaluate our method on several benchmark datasets and across different LVLMs. Our experiments show that ECD effectively mitigates hallucinations, outperforming state-of-the-art methods with respect to performance on LVLM benchmarks and computation time.

Paper Structure

This paper contains 38 sections, 9 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Visualization of (a) log probability values and (b) hallucination scores for true and hallucinated tokens.
  • Figure 2: Ablation study for hyperparameter $\alpha$.