Table of Contents
Fetching ...

Suppressing VLM Hallucinations with Spectral Representation Filtering

Ameen Ali, Tamim Zoabi, Lior Wolf

TL;DR

This work tackles object hallucination in vision-language models by treating hallucinations as structured, low-rank covariance deviations in internal representations. It introduces Spectral Representation Filtering (SRF), a training-free, post-hoc method that identifies dominant hallucination modes via eigendecomposition of the hallucination covariance $\Sigma_H$ and applies a soft spectral damping $f_{\alpha}(\lambda) = 1/(1+\alpha\lambda)$ through a precomputed operator $\mathbf{P}_{\alpha}$ to the deeper FFN projections, effectively equalizing feature variance without architectural changes. SRF operates by updating FFN weights as $\mathbf{W}_{\ell}^{\text{corr}} = \mathbf{P}_{\alpha} \mathbf{W}_{\ell}^{\text{out}}$ for selected layers, incurring zero runtime cost. Empirically, SRF yields state-of-the-art reductions in object hallucination across LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2 on benchmarks like CHAIR, POPE, A-OKVQA, and LLaVA-Bench, while maintaining caption quality and grounding, demonstrating a practical, broadly applicable solution to improve reliability of multimodal AI systems. The approach highlights the value of covariance geometry in diagnosing and correcting hallucination-prone directions, offering a scalable, generalizable pathway for robust multimodal grounding.

Abstract

Vision-language models (VLMs) frequently produce hallucinations in the form of descriptions of objects, attributes, or relations that do not exist in the image due to over-reliance on language priors and imprecise cross-modal grounding. We introduce Spectral Representation Filtering (SRF), a lightweight, training-free method to suppress such hallucinations by analyzing and correcting the covariance structure of the model's representations. SRF identifies low-rank hallucination modes through eigendecomposition of the covariance of the differences between features collected for truthful and hallucinatory captions, revealing structured biases in the feature space. A soft spectral filter then attenuates these modes in the feed-forward projection weights of deeper vLLM layers, equalizing feature variance while preserving semantic fidelity. Unlike decoding or retraining-based approaches, SRF operates entirely post-hoc, incurs zero inference overhead, and requires no architectural modifications. Across three families of VLMs (LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2), SRF consistently reduces hallucination rates on MSCOCO, POPE-VQA, and other visual tasks benchmarks, achieving state-of-the-art faithfulness without degrading caption quality.

Suppressing VLM Hallucinations with Spectral Representation Filtering

TL;DR

This work tackles object hallucination in vision-language models by treating hallucinations as structured, low-rank covariance deviations in internal representations. It introduces Spectral Representation Filtering (SRF), a training-free, post-hoc method that identifies dominant hallucination modes via eigendecomposition of the hallucination covariance and applies a soft spectral damping through a precomputed operator to the deeper FFN projections, effectively equalizing feature variance without architectural changes. SRF operates by updating FFN weights as for selected layers, incurring zero runtime cost. Empirically, SRF yields state-of-the-art reductions in object hallucination across LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2 on benchmarks like CHAIR, POPE, A-OKVQA, and LLaVA-Bench, while maintaining caption quality and grounding, demonstrating a practical, broadly applicable solution to improve reliability of multimodal AI systems. The approach highlights the value of covariance geometry in diagnosing and correcting hallucination-prone directions, offering a scalable, generalizable pathway for robust multimodal grounding.

Abstract

Vision-language models (VLMs) frequently produce hallucinations in the form of descriptions of objects, attributes, or relations that do not exist in the image due to over-reliance on language priors and imprecise cross-modal grounding. We introduce Spectral Representation Filtering (SRF), a lightweight, training-free method to suppress such hallucinations by analyzing and correcting the covariance structure of the model's representations. SRF identifies low-rank hallucination modes through eigendecomposition of the covariance of the differences between features collected for truthful and hallucinatory captions, revealing structured biases in the feature space. A soft spectral filter then attenuates these modes in the feed-forward projection weights of deeper vLLM layers, equalizing feature variance while preserving semantic fidelity. Unlike decoding or retraining-based approaches, SRF operates entirely post-hoc, incurs zero inference overhead, and requires no architectural modifications. Across three families of VLMs (LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2), SRF consistently reduces hallucination rates on MSCOCO, POPE-VQA, and other visual tasks benchmarks, achieving state-of-the-art faithfulness without degrading caption quality.

Paper Structure

This paper contains 10 sections, 12 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: UMAP visualization of MiniGPT4 and mPLUG-Owl2 hidden activations for truthful (blue) and hallucinatory (amber), samples from LURE, showing distinct clusters that reveal low-rank hallucination subspaces.
  • Figure 2: Qualitative comparison of captions generated by LLaVA and mPLUG-Owl2 before and after applying Spectral Representation Filtering (SRF). The baseline model hallucinates non-existent fruits such as apples and bananas, while SRF suppresses these errors.
  • Figure 3: Hallucination spectrum (eigenvalues of $\Sigma_H = Q\Lambda Q^{\top}$) for three vision–language models. The curves show a small set of high-variance “spikes’’ followed by a long decaying tail, indicating that hallucination behavior is concentrated in a low-dimensional set of dominant eigenmodes, with the remaining spectrum reflecting noise-like variation.
  • Figure 4: Effect of the spectral suppression operator on the MiniGPT-4 hallucination spectrum. The attenuated curve (dashed) selectively contracts dominant hallucination-aligned eigenmodes while leaving lower-variance structure largely unchanged.
  • Figure 5: A qualitative case study comparing model-generated descriptions for a complex meme. We visualize the outputs from the Greedy Baseline, BEAM, Nullu, and our method, color-coding text segments as Hallucinations, Partially Grounded, or Truth.