Table of Contents
Fetching ...

Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

Zhe Xu, Zhicai Wang, Junkang Wu, Jinda Lu, Xiang Wang

TL;DR

This work targets object hallucination in Large Vision-Language Models by formalizing spurious co-occurrence biases through a Structural Causal Model and introducing Visual Content Intervention to generate counterfactuals. It presents Causal-HalBench, a scalable benchmark built with an automated data-pipeline that uses counterfactual image inpainting to quantify causal effects via ACE, DCS, CAC, AAC, and CHR. Across multiple LVLMs, the study reveals pervasive susceptibility to spurious correlations, with model behavior varying by architecture and scale, and suggests newer models may exhibit increased bias in some cases. The framework and benchmark offer a concrete causal-analysis toolset for evaluating and guiding mitigation of object hallucination in LVLMs, with implications for more faithful multimodal reasoning in real-world applications.

Abstract

Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primar- ily stems from spurious correlations arising when models strongly associate highly co-occurring objects during train- ing, leading to hallucinated objects influenced by visual con- text. Current benchmarks mainly focus on hallucination de- tection but lack a formal characterization and quantitative evaluation of spurious correlations in LVLMs. To address this, we introduce causal analysis into the object recognition scenario of LVLMs, establishing a Structural Causal Model (SCM). Utilizing the language of causality, we formally de- fine spurious correlations arising from co-occurrence bias. To quantify the influence induced by these spurious correla- tions, we develop Causal-HalBench, a benchmark specifically constructed with counterfactual samples and integrated with comprehensive causal metrics designed to assess model ro- bustness against spurious correlations. Concurrently, we pro- pose an extensible pipeline for the construction of these coun- terfactual samples, leveraging the capabilities of proprietary LVLMs and Text-to-Image (T2I) models for their genera- tion. Our evaluations on mainstream LVLMs using Causal- HalBench demonstrate these models exhibit susceptibility to spurious correlations, albeit to varying extents.

Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

TL;DR

This work targets object hallucination in Large Vision-Language Models by formalizing spurious co-occurrence biases through a Structural Causal Model and introducing Visual Content Intervention to generate counterfactuals. It presents Causal-HalBench, a scalable benchmark built with an automated data-pipeline that uses counterfactual image inpainting to quantify causal effects via ACE, DCS, CAC, AAC, and CHR. Across multiple LVLMs, the study reveals pervasive susceptibility to spurious correlations, with model behavior varying by architecture and scale, and suggests newer models may exhibit increased bias in some cases. The framework and benchmark offer a concrete causal-analysis toolset for evaluating and guiding mitigation of object hallucination in LVLMs, with implications for more faithful multimodal reasoning in real-world applications.

Abstract

Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primar- ily stems from spurious correlations arising when models strongly associate highly co-occurring objects during train- ing, leading to hallucinated objects influenced by visual con- text. Current benchmarks mainly focus on hallucination de- tection but lack a formal characterization and quantitative evaluation of spurious correlations in LVLMs. To address this, we introduce causal analysis into the object recognition scenario of LVLMs, establishing a Structural Causal Model (SCM). Utilizing the language of causality, we formally de- fine spurious correlations arising from co-occurrence bias. To quantify the influence induced by these spurious correla- tions, we develop Causal-HalBench, a benchmark specifically constructed with counterfactual samples and integrated with comprehensive causal metrics designed to assess model ro- bustness against spurious correlations. Concurrently, we pro- pose an extensible pipeline for the construction of these coun- terfactual samples, leveraging the capabilities of proprietary LVLMs and Text-to-Image (T2I) models for their genera- tion. Our evaluations on mainstream LVLMs using Causal- HalBench demonstrate these models exhibit susceptibility to spurious correlations, albeit to varying extents.

Paper Structure

This paper contains 14 sections, 6 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of Causal-HalBench. We present a comparison between current hallucination benchmarks and our Causal-HalBench for Object Hallucination in LVLMs.
  • Figure 2: Illustration of SCM. We present the original SCM of spurious correlation (left), contrasted with the SCM derived via VCI (right).
  • Figure 3: The Data Construction Pipeline of Causal-HalBench. We propose a three-stage, fully automated, and scalable pipeline for constructing counterfactual samples in Causal-HalBench.
  • Figure 4: Visualization of Co-occurrence Patterns. We present the heatmap of the co-occurrence matrix for original images (left), contrasted with the heatmap of the co-occurrence matrix for modified images (right).