Table of Contents
Fetching ...

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

Ryuhei Miyazato, Shunsuke Kitada, Kei Harada

Abstract

Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work suggests that hallucination detection using internal representations is more efficient and accurate than approaches that rely solely on model outputs. However, existing internal-representation-based methods typically rely on a single representation or detector, limiting their ability to capture diverse hallucination signals. In this paper, we propose EnsemHalDet, an ensemble-based hallucination detection framework that leverages multiple internal representations of VLMs, including attention outputs and hidden states. EnsemHalDet trains independent detectors for each representation and combines them through ensemble learning. Experimental results across multiple VQA datasets and VLMs show that EnsemHalDet consistently outperforms prior methods and single-detector models in terms of AUC. These results demonstrate that ensembling diverse internal signals significantly improves robustness in multimodal hallucination detection.

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

Abstract

Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work suggests that hallucination detection using internal representations is more efficient and accurate than approaches that rely solely on model outputs. However, existing internal-representation-based methods typically rely on a single representation or detector, limiting their ability to capture diverse hallucination signals. In this paper, we propose EnsemHalDet, an ensemble-based hallucination detection framework that leverages multiple internal representations of VLMs, including attention outputs and hidden states. EnsemHalDet trains independent detectors for each representation and combines them through ensemble learning. Experimental results across multiple VQA datasets and VLMs show that EnsemHalDet consistently outperforms prior methods and single-detector models in terms of AUC. These results demonstrate that ensembling diverse internal signals significantly improves robustness in multimodal hallucination detection.

Paper Structure

This paper contains 33 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: VLMs can produce hallucinated responses that are inconsistent with factual knowledge or image content. However, such hallucinations leave detectable signals in the model’s internal representations. We leverage multiple internal states of VLMs to achieve robust and accurate hallucination detection
  • Figure 2: Overview of EnsemHalDet: This method extracts attention heads and hidden states across multiple layers. For each representation, binary classifiers are trained via logistic regression to detect hallucinations. By integrating detectors from the most effective layers and heads through ensemble learning, we detect hallucinated answers.
  • Figure 3: Overview of the detector-level ensemble process. For attention-head-based features (AH), we train $L \times H$ detectors, one for each layer-head pair. For hidden-state-based features (HS), we train $L$ detectors, one for each layer. The detectors are first ranked on validation$_1$, filtered to top-K candidates, further refined by greedy forward selection on validation$_2$, and finally combined by a stacking-based meta-classifier (logistic regression).
  • Figure 4: Prompt used for hallucination evaluation. We follow the CRAG-MM evaluation protocol.