Table of Contents
Fetching ...

HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

Ahmed Akl, Abdelwahed Khamis, Ali Cheraghian, Zhe Wang, Sara Khalifa, Kewen Wang

TL;DR

A systematic analysis of LVLM decoders built on three widely used large language model backbones reveals clear layer-wise differences in susceptibility to object hallucination, and introduces the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.

HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing

TL;DR

A systematic analysis of LVLM decoders built on three widely used large language model backbones reveals clear layer-wise differences in susceptibility to object hallucination, and introduces the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.
Paper Structure (23 sections, 10 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illustrates the phenomenon of knowledge distortion because of the fixed weight editing. On top, LVLM e.g. LLaVA-1.5 hallucinates non-existent objects, chair, couch, frequently co-occurring to bed. Left bottom, Fixed Weight Editing reduce hallucinated object, yet drop existent objects, bed. Right bottom, Our approach, HIME, mitigate hallucination, chair, couch, and preserve the pre-trained knowledge, bed.
  • Figure 2: HIME hallucination mitigation framework. Given an image with truthful and hallucinated captions, the Hallucination Insensitivity Score (HIS) is derived by contrasting attention distributions via KL divergence. During model editing , model weights are orthogonally projected into the truthful subspace and weighted by inverse HIS. The resulting Edited LVLM Decoder emphasises truthful representations while reducing hallucination sensitivity across layers.
  • Figure 3: Layer-wise distribution of the Hallucination Insensitivity Score (HIS) across four VLM backbones. The y-axis reports the HIS value (higher indicates greater robustness to hallucinations). Across four VLM decoders, we observe a highly structured, non-uniform depth profile. Across architectures, mid-depth layers exhibit consistently higher robustness, while late/early decoder layers show pronounced sensitivity to hallucinations. This recurring pattern suggests targeted editing of specific depth regions rather than global model intervention.
  • Figure 4: Results from the baseline LLaVA and our approach HIME on the cognition tasks.
  • Figure 5: Ablation study evaluation results for Hallucination Insensitivity Score (HIS) on LLaVA-1.5. Lower CHAIR$_s$, and CHAIR$_i$ is better, while higher BLUE is better.
  • ...and 7 more figures