Table of Contents
Fetching ...

TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention

Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

TL;DR

The paper tackles Object Hallucination in LVLMs by revealing that internal hidden states contain per-token truthfulness signals that transfer across models. It introduces TruthPrInt, a two-stage decoding framework that learns a truthful direction in latent space and performs truthful-guided pre-interventions, and ComnHallu, a subspace-alignment method to improve cross-domain detection transfer. Through extensive experiments on CHAIR, POPE, and LLaVA-Bench across multiple LVLMs, TruthPrInt achieves state-of-the-art OH mitigation while maintaining or improving caption quality and efficiency. The work offers practical mechanisms for real-time OH mitigation and highlights the value of internal representations for trustworthy multimodal AI.

Abstract

Object Hallucination (OH) has been acknowledged as one of the major trustworthy challenges in Large Vision-Language Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the "overall truthfulness" of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as "per-token" hallucination indicators, which is essential for mitigating OH. In this paper, we first conduct an in-depth exploration of LVLM internal states in relation to OH issues and discover that (1) LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, (2) different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist "generic truthful directions" shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inference-time intervention during LVLM decoding. We further propose ComnHallu to enhance both cross-LVLM and cross-data hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.

TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention

TL;DR

The paper tackles Object Hallucination in LVLMs by revealing that internal hidden states contain per-token truthfulness signals that transfer across models. It introduces TruthPrInt, a two-stage decoding framework that learns a truthful direction in latent space and performs truthful-guided pre-interventions, and ComnHallu, a subspace-alignment method to improve cross-domain detection transfer. Through extensive experiments on CHAIR, POPE, and LLaVA-Bench across multiple LVLMs, TruthPrInt achieves state-of-the-art OH mitigation while maintaining or improving caption quality and efficiency. The work offers practical mechanisms for real-time OH mitigation and highlights the value of internal representations for trustworthy multimodal AI.

Abstract

Object Hallucination (OH) has been acknowledged as one of the major trustworthy challenges in Large Vision-Language Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the "overall truthfulness" of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as "per-token" hallucination indicators, which is essential for mitigating OH. In this paper, we first conduct an in-depth exploration of LVLM internal states in relation to OH issues and discover that (1) LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, (2) different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist "generic truthful directions" shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inference-time intervention during LVLM decoding. We further propose ComnHallu to enhance both cross-LVLM and cross-data hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.

Paper Structure

This paper contains 25 sections, 6 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: The overall pipeline of TruthPrInt for OH mitigation. TruthPrInt first collects internal states from LVLMs and learns "truthful direction" from the latent space. A subspace alignment method ComnHallu is also proposed to enhance testing-time transferability among various LVLMs and datasets. During decoding, TruthPrInt guides the target VLM towards the truthful direction by rejecting hallucinated tokens and tracing back to "early starting points" for pre-intervention.
  • Figure 2: The performance of the designed hallucination detector across various LVLMs. Although internal states offer limited discriminative features for overall accuracy, they achieve high-specificity detections with low false alarm rates.
  • Figure 3: ComnHallu (a) identifies common latent subspaces shared by both target (training) domain and source (testing) domain, capturing hallucination features, which (b) maintains internal states to be high-specificity when transferring both data domain and models. $T_{\alpha\, \text{fp}}$ means the threshold resulting $\text{FPR}=\alpha$ in the CC-Sbu-Align validation set.
  • Figure 4: The schematic diagram of TruthPrInt. When a hallucinated object token (e.g., "cup" for the first time) is detected, we trace it back by locating the token with the lowest confidence preceding this sentence (e.g., "including") and selecting the second candidate (e.g., "such"). This process is repeated $\mathcal{N}_{B}$ times.
  • Figure 5: Trade-off between truthfulness and diversity. We show that TruthPrInt offers flexible adjusting of threshold $\tau$: smaller $\tau$ for truthfulness in safety-critical scenarios while larger $\tau$ for diverse generations.
  • ...and 4 more figures