Table of Contents
Fetching ...

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Fangxun Shu, Hao Jiang, Linchao Zhu

TL;DR

This work tackles hallucination in Large Vision Language Models by introducing a fine-grained AI feedback framework that yields sentence-level, type-specific, and severity-aware signals. It builds a detect-then-rewrite pipeline to automatically construct a scalable preference dataset and trains a hallucination detector to drive a severity-aware Direct Preference Optimization (HSA-DPO) for mitigation. The approach achieves state-of-the-art results on detection benchmarks and delivers substantial reductions in hallucinations on multiple LVLM benchmarks, while reducing annotation cost via scalable oversight. Overall, the method demonstrates effective, cost-efficient, and scalable mitigation of LVLM hallucinations with preserved multimodal capabilities.

Abstract

The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g., labeling by proprietary models or human experts). To address these issues, we propose detecting and mitigating hallucinations in LVLMs via fine-grained AI feedback. The basic idea is that we generate a small-size sentence-level hallucination annotation dataset by proprietary models, whereby we train a hallucination detection model which can perform sentence-level hallucination detection, covering primary hallucination types (i.e., object, attribute, and relationship). Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model. Furthermore, we propose differentiating the severity of hallucinations, and introducing a Hallucination Severity-Aware Direct Preference Optimization (HSA-DPO) for mitigating hallucination in LVLMs by incorporating the severity of hallucinations into preference learning. Extensive experiments demonstrate the effectiveness of our method.

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

TL;DR

This work tackles hallucination in Large Vision Language Models by introducing a fine-grained AI feedback framework that yields sentence-level, type-specific, and severity-aware signals. It builds a detect-then-rewrite pipeline to automatically construct a scalable preference dataset and trains a hallucination detector to drive a severity-aware Direct Preference Optimization (HSA-DPO) for mitigation. The approach achieves state-of-the-art results on detection benchmarks and delivers substantial reductions in hallucinations on multiple LVLM benchmarks, while reducing annotation cost via scalable oversight. Overall, the method demonstrates effective, cost-efficient, and scalable mitigation of LVLM hallucinations with preserved multimodal capabilities.

Abstract

The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g., labeling by proprietary models or human experts). To address these issues, we propose detecting and mitigating hallucinations in LVLMs via fine-grained AI feedback. The basic idea is that we generate a small-size sentence-level hallucination annotation dataset by proprietary models, whereby we train a hallucination detection model which can perform sentence-level hallucination detection, covering primary hallucination types (i.e., object, attribute, and relationship). Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model. Furthermore, we propose differentiating the severity of hallucinations, and introducing a Hallucination Severity-Aware Direct Preference Optimization (HSA-DPO) for mitigating hallucination in LVLMs by incorporating the severity of hallucinations into preference learning. Extensive experiments demonstrate the effectiveness of our method.
Paper Structure (28 sections, 4 equations, 4 figures, 5 tables)

This paper contains 28 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of HSA-DPO (red) with state-of-the-art models in mitigating hallucinations (Silkie, GPT-4V, RLHF-V) on Object HalBench and AMBER benchmarks. Notably, HSA-DPO outperforms state-of-the-art models in all metrics.
  • Figure 2: Our work consists of three components: § \ref{['sec:fine_grained_ai_feedback']} fine-grained hallucination detection from GPT-4/GPT-4V; § \ref{['sec:hallucination_detection']} hallucination detection model for detect-then-rewrite preference dataset construction pipeline; and § \ref{['sec:hallucination_mitigation']} hallucination severity-aware direct preference optimization.
  • Figure 3: Effect of scaling preference dataset (Figure A) and different hallucination types (Figure B).
  • Figure 4: Qualitative results of different models on VCR and DDG. Correct answers, factual hallucinations are highlighted in red and green respectively.