Table of Contents
Fetching ...

Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data

Spencer Whitehead, Jacob Phillips, Sean Hendryx

TL;DR

The paper addresses the reliability challenge of multimodal language models by reframing hallucination detection as end-to-end sequence labeling that localizes hallucinated spans without predefined spans. It introduces corrupted grounding data, generated by masking grounded spans and filling them with hallucinated phrases from a text-only LM, to pre-train detectors and boost sample efficiency during fine-tuning. Empirical results on M-HalDetect show that pre-training on corrupted grounding data improves performance at low data regimes across model scales, with grounding annotations providing a meaningful learning signal. The approach offers a scalable path to detect and localize multimodal hallucinations, supporting downstream filtering and alignment strategies while highlighting data quality and distribution considerations.

Abstract

Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability. The ability to automatically detect these errors is important for mitigating them, but has been less explored and existing efforts do not localize hallucinations, instead framing this as a classification task. In this work, we first pose multimodal hallucination detection as a sequence labeling task where models must localize hallucinated text spans and present a strong baseline model. Given the high cost of human annotations for this task, we propose an approach to improve the sample efficiency of these models by creating corrupted grounding data, which we use for pre-training. Leveraging phrase grounding data, we generate hallucinations to replace grounded spans and create hallucinated text. Experiments show that pre-training on this data improves sample efficiency when fine-tuning, and that the learning signal from the grounding data plays an important role in these improvements.

Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data

TL;DR

The paper addresses the reliability challenge of multimodal language models by reframing hallucination detection as end-to-end sequence labeling that localizes hallucinated spans without predefined spans. It introduces corrupted grounding data, generated by masking grounded spans and filling them with hallucinated phrases from a text-only LM, to pre-train detectors and boost sample efficiency during fine-tuning. Empirical results on M-HalDetect show that pre-training on corrupted grounding data improves performance at low data regimes across model scales, with grounding annotations providing a meaningful learning signal. The approach offers a scalable path to detect and localize multimodal hallucinations, supporting downstream filtering and alignment strategies while highlighting data quality and distribution considerations.

Abstract

Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability. The ability to automatically detect these errors is important for mitigating them, but has been less explored and existing efforts do not localize hallucinations, instead framing this as a classification task. In this work, we first pose multimodal hallucination detection as a sequence labeling task where models must localize hallucinated text spans and present a strong baseline model. Given the high cost of human annotations for this task, we propose an approach to improve the sample efficiency of these models by creating corrupted grounding data, which we use for pre-training. Leveraging phrase grounding data, we generate hallucinations to replace grounded spans and create hallucinated text. Experiments show that pre-training on this data improves sample efficiency when fine-tuning, and that the learning signal from the grounding data plays an important role in these improvements.
Paper Structure (23 sections, 8 figures, 7 tables)

This paper contains 23 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Our approach for creating corrupted grounding data to pre-train multimodal hallucination detectors. Examples of this data are in Appendix \ref{['app:sec:qualitative']}.
  • Figure 2: Sample efficiency of different models at 500, 1k, and 10k fine-tuning samples. Dotted lines are models that only fine-tune (FT), while solid lines are models that first pre-train on our data then fine-tune (PT+FT). Pre-training with our corrupted grounding data consistently improves the sample efficiency. Scores are listed in Appendix \ref{['app:sec:sampeff_scores']}.
  • Figure 3: Ablations with LLaVA-1.613B for utilizing grounding annotations and LMs for our data. Random Spans indicates that random text spans are masked and in-filled instead of grounded spans. Random In-Fill uses grounded spans but fills them in with random phrases.
  • Figure 4: Classification sample efficiency of different models at 500, 1k, and 10k M-HalDetect fine-tuning samples. Dotted lines are models that only fine-tune (FT), while solid lines are models that first pre-train on our data then fine-tune (PT+FT).
  • Figure 5: Examples of our corrupted grounding data. We show the prompt and original response with grounded spans (green), followed by our corrupted response with some hallucinations inserted for grounded spans (red), and then the final hallucination labels that we use for pre-training. For clarity, in the hallucination labels, we only highlight phrases marked as hallucinations.
  • ...and 3 more figures