Suppressing VLM Hallucinations with Spectral Representation Filtering
Ameen Ali, Tamim Zoabi, Lior Wolf
TL;DR
This work tackles object hallucination in vision-language models by treating hallucinations as structured, low-rank covariance deviations in internal representations. It introduces Spectral Representation Filtering (SRF), a training-free, post-hoc method that identifies dominant hallucination modes via eigendecomposition of the hallucination covariance $\Sigma_H$ and applies a soft spectral damping $f_{\alpha}(\lambda) = 1/(1+\alpha\lambda)$ through a precomputed operator $\mathbf{P}_{\alpha}$ to the deeper FFN projections, effectively equalizing feature variance without architectural changes. SRF operates by updating FFN weights as $\mathbf{W}_{\ell}^{\text{corr}} = \mathbf{P}_{\alpha} \mathbf{W}_{\ell}^{\text{out}}$ for selected layers, incurring zero runtime cost. Empirically, SRF yields state-of-the-art reductions in object hallucination across LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2 on benchmarks like CHAIR, POPE, A-OKVQA, and LLaVA-Bench, while maintaining caption quality and grounding, demonstrating a practical, broadly applicable solution to improve reliability of multimodal AI systems. The approach highlights the value of covariance geometry in diagnosing and correcting hallucination-prone directions, offering a scalable, generalizable pathway for robust multimodal grounding.
Abstract
Vision-language models (VLMs) frequently produce hallucinations in the form of descriptions of objects, attributes, or relations that do not exist in the image due to over-reliance on language priors and imprecise cross-modal grounding. We introduce Spectral Representation Filtering (SRF), a lightweight, training-free method to suppress such hallucinations by analyzing and correcting the covariance structure of the model's representations. SRF identifies low-rank hallucination modes through eigendecomposition of the covariance of the differences between features collected for truthful and hallucinatory captions, revealing structured biases in the feature space. A soft spectral filter then attenuates these modes in the feed-forward projection weights of deeper vLLM layers, equalizing feature variance while preserving semantic fidelity. Unlike decoding or retraining-based approaches, SRF operates entirely post-hoc, incurs zero inference overhead, and requires no architectural modifications. Across three families of VLMs (LLaVA-1.5, MiniGPT-4, and mPLUG-Owl2), SRF consistently reduces hallucination rates on MSCOCO, POPE-VQA, and other visual tasks benchmarks, achieving state-of-the-art faithfulness without degrading caption quality.
