HII-DPO: Eliminate Hallucination via Accurate Hallucination-Inducing Counterfactual Images
Yilin Yang, Zhenghui Guo, Yuke Wang, Omprakash Gnawali, Sheng Di, Chengming Zhang
TL;DR
The paper tackles language-prior driven hallucinations in Vision-Language Models by proposing a three-part framework: generating Hallucination-Inducing Images (HIIs), constructing the Masked-Object-Hallucination (MOH) benchmark to quantify scene-conditioned hallucinations, and applying Direct Preference Optimization (DPO) on HIIs to align models more closely with visual grounding. HIIs are created through an object-detection and iterative-masking pipeline, filtered by model-specific DDG responses, and used to build fine-grained preference data that focus on hallucinated sentences. The results show state-of-the-art reductions in hallucination rates across multiple benchmarks and model scales (up to 38% improvement on standard hallucination benchmarks and up to 92% HR reduction in some tasks) while preserving general VQA capabilities. This approach provides a robust diagnostic tool (MOH) and a scalable alignment strategy (HII-DPO) to mitigate linguistic priors in multimodal systems, with significant implications for deploying trustworthy AI in safety-critical domains.
Abstract
Large Vision-Language Models (VLMs) have achieved remarkable success across diverse multimodal tasks but remain vulnerable to hallucinations rooted in inherent language bias. Despite recent progress, existing hallucination mitigation methods often overlook the underlying hallucination patterns driven by language bias. In this work, we design a novel pipeline to accurately synthesize Hallucination-Inducing Images (HIIs). Using synthesized HIIs, we reveal a consistent scene-conditioned hallucination pattern: models tend to mention objects that are highly typical of the scene even when visual evidence is removed. To quantify the susceptibility of VLMs to this hallucination pattern, we establish the Masked-Object-Hallucination (MOH) benchmark to rigorously evaluate existing state-of-the-art alignment frameworks. Finally, we leverage HIIs to construct high-quality preference datasets for fine-grained alignment. Experimental results demonstrate that our approach effectively mitigates hallucinations while preserving general model capabilities. Specifically, our method achieves up to a 38% improvement over the current state-of-the-art on standard hallucination benchmarks.
