Table of Contents
Fetching ...

HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models

Yangguang Lin, Quan Fang, Yufei Li, Jiachen Sun, Junyu Gao, Jitao Sang

TL;DR

HulluEdit is introduced, a single-pass, reference-free intervention framework that decomposes the hidden states of the model into orthogonal subspace editing, enabling selective suppression of hallucinatory patterns without interfering with visual grounding.

Abstract

Object hallucination in Large Vision-Language Models (LVLMs) significantly hinders their reliable deployment. Existing methods struggle to balance efficiency and accuracy: they often require expensive reference models and multiple forward passes, or apply static edits that risk suppressing genuine visual evidence. To address this, we introduce HulluEdit, a single-pass, reference-free intervention framework. Our core innovation is orthogonal subspace editing: we decompose the hidden states of the model into orthogonal subspaces - visual evidence, conflicting priors, and residual uncertainty - enabling selective suppression of hallucinatory patterns without interfering with visual grounding. This approach mathematically guarantees that edits applied to the prior subspace leave the visual component entirely unaffected. Extensive experiments show that HulluEdit achieves state-of-the-art hallucination reduction on benchmarks including POPE and CHAIR across diverse architectures, while preserving general capabilities on MME and maintaining efficient inference. Our method consistently outperforms contrastive decoding and static subspace editing baselines, offering a new pathway toward more trustworthy LVLMs.

HulluEdit: Single-Pass Evidence-Consistent Subspace Editing for Mitigating Hallucinations in Large Vision-Language Models

TL;DR

HulluEdit is introduced, a single-pass, reference-free intervention framework that decomposes the hidden states of the model into orthogonal subspace editing, enabling selective suppression of hallucinatory patterns without interfering with visual grounding.

Abstract

Object hallucination in Large Vision-Language Models (LVLMs) significantly hinders their reliable deployment. Existing methods struggle to balance efficiency and accuracy: they often require expensive reference models and multiple forward passes, or apply static edits that risk suppressing genuine visual evidence. To address this, we introduce HulluEdit, a single-pass, reference-free intervention framework. Our core innovation is orthogonal subspace editing: we decompose the hidden states of the model into orthogonal subspaces - visual evidence, conflicting priors, and residual uncertainty - enabling selective suppression of hallucinatory patterns without interfering with visual grounding. This approach mathematically guarantees that edits applied to the prior subspace leave the visual component entirely unaffected. Extensive experiments show that HulluEdit achieves state-of-the-art hallucination reduction on benchmarks including POPE and CHAIR across diverse architectures, while preserving general capabilities on MME and maintaining efficient inference. Our method consistently outperforms contrastive decoding and static subspace editing baselines, offering a new pathway toward more trustworthy LVLMs.
Paper Structure (35 sections, 36 equations, 7 figures, 10 tables)

This paper contains 35 sections, 36 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Comparison of (a) traditional LVLMs prone to object hallucinations versus (b) our HulluEdit method that mitigates hallucinations via orthogonal subspace decomposition.
  • Figure 2: Overview of HalluEdit. We estimate a visual subspace $U$ from weighted visual tokens, an orthogonal anti -prior subspace $P$ from the text cache, and retain the residual subspace $R$ as uncertainty that is softly regularized when editing $h$.
  • Figure 3: Decoding throughput comparison measured in tokens per second (TPS). HulluEdit achieves competitive inference speed, significantly faster than recent hallucination mitigation methods like OPERA and HALC while maintaining strong performance.
  • Figure 4: Qualitative comparison of object hallucination mitigation. An example image with captions generated by the Original model and our HulluEdit method. Hallucinated objects are highlighted in red.
  • Figure 5: Qualitative comparison of object hallucination mitigation on LLaVA-1.5-7B. The baseline model (top) generates descriptions containing spurious objects (highlighted in red), while HulluEdit (bottom) produces outputs strictly aligned with visual evidence. These examples demonstrate our method's ability to suppress conflicting linguistic priors while preserving accurate visual descriptions.
  • ...and 2 more figures