Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -
Laura Fieback, Nishilkumar Balar, Jakob Spiegelberg, Hanno Gottschalk
TL;DR
Large Vision Language Models still suffer from hallucinations that misalign with visual input. The work introduces Efficient Contrastive Decoding (ECD), a training-free decoding approach that uses a lightweight probabilistic hallucination detector, built from intermediate LVLM features, to penalize likely hallucinations during generation. A discrimination mechanism combines the detector output with the LVLM's token probabilities to form a contrastive decoding distribution, and an Adaptive Plausibility Constraint further refines outputs. Across CHAIR, AMBER, POPE, and MME benchmarks on LLaVA 1.5, InstructBLIP, and MiniGPT-4, ECD consistently reduces hallucinations while delivering favorable time efficiency compared to prior contrastive decoding methods, demonstrating a practical path to safer LVLMs without additional training. These results highlight the method's potential for real-world deployment where reliable vision-language reasoning is critical.
Abstract
Despite recent advances in Large Vision Language Models (LVLMs), these models still suffer from generating hallucinatory responses that do not align with the visual input provided. To mitigate such hallucinations, we introduce Efficient Contrastive Decoding (ECD), a simple method that leverages probabilistic hallucination detection to shift the output distribution towards contextually accurate answers at inference time. By contrasting token probabilities and hallucination scores, ECD subtracts hallucinated concepts from the original distribution, effectively suppressing hallucinations. Notably, our proposed method can be applied to any open-source LVLM and does not require additional LVLM training. We evaluate our method on several benchmark datasets and across different LVLMs. Our experiments show that ECD effectively mitigates hallucinations, outperforming state-of-the-art methods with respect to performance on LVLM benchmarks and computation time.
