Table of Contents
Fetching ...

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

Shiwei Tan, Hengyi Wang, Weiyi Qin, Qi Xu, Zhigang Hua, Hao Wang

TL;DR

This work proposes a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions and substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.

Abstract

Multimodal Large Language Models (MLLMs) deliver detailed responses on vision-language tasks, yet remain susceptible to object hallucination (introducing objects not present in the image), undermining reliability in practice. Prior efforts often rely on heuristic penalties, post-hoc correction, or generic decoding tweaks, which do not directly intervene in the mechanisms that trigger object hallucination and thus yield limited gains. To address this challenge, we propose a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions. By reshaping the decoding dynamics to attenuate spurious dependencies, our approach reduces false object tokens while maintaining descriptive quality. Across captioning and QA benchmarks, our framework substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

TL;DR

This work proposes a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions and substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.

Abstract

Multimodal Large Language Models (MLLMs) deliver detailed responses on vision-language tasks, yet remain susceptible to object hallucination (introducing objects not present in the image), undermining reliability in practice. Prior efforts often rely on heuristic penalties, post-hoc correction, or generic decoding tweaks, which do not directly intervene in the mechanisms that trigger object hallucination and thus yield limited gains. To address this challenge, we propose a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions. By reshaping the decoding dynamics to attenuate spurious dependencies, our approach reduces false object tokens while maintaining descriptive quality. Across captioning and QA benchmarks, our framework substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.
Paper Structure (27 sections, 11 equations, 15 figures, 5 tables)

This paper contains 27 sections, 11 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Simplified causal graphs for typical MLLMs and our COAD. Left: Typical MLLMs implicitly hallucinate objects (e.g., "fork") in the hidden states ${\bf z}$ due to previously generated text ${\bf x}$ (e.g., "knife"). Right: Our COAD performs causal inference to remove links between the hidden states ${\bf z}$ and generated text ${\bf x}$, thereby avoiding hallucination.
  • Figure 2: Illustration of confounding. (a) $z$ induces a spurious association between $x$ and $y$ even without a causal effect. (b) Adding $x \rightarrow y$ introduces a genuine causal effect, but $P(y | x)$ remains confounded by $z$.
  • Figure 3: Overview of our COAD. We employ an object detector to identify the objects present in an image. The MLLM is then finetuned to condition its token predictions on both these detected objects and the input image. COAD subsequently use causal inference to combine the output distributions of both the pretrained and finetuned MLLMs to generate the final prediction.
  • Figure 4: Rolled-out causal structures of the MLLM decoding process.(a) Full temporal (rolled-out) causal graph over decoding timesteps: the image ${\bf S}$ and initial prompt ${\bf x}^{(0)}$ are fixed, ${\bf z}$ denotes the image-derived object-belief variable (constant over time), and ${\bf x}^{(t)}$, $y^{(t)}$ evolve according to ${\bf x}^{(t)}, {\bf S}, {\bf z} \rightarrow y^{(t)}$ and ${\bf x}^{(t)}, y^{(t)} \rightarrow {\bf x}^{(t+1)}$. (b) Collapsed version of (a) obtained by treating $y^{(t)}$ as an implicit step in the autoregressive update, so that each ${\bf x}^{(t)}$ ($t>0$) is jointly determined by ${\bf S}$, ${\bf z}$, and ${\bf x}^{(t-1)}$. (c) Time-compressed representation in which the accumulated influence of all previous timesteps on ${\bf x}^{(t)}$ is summarized by dashed edges from $({\bf S}, {\bf z})$ to ${\bf x}^{(t)}$.
  • Figure 5: Illustration of our COAD's causal model before and after intervention.
  • ...and 10 more figures