Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Shuo Li; Jiajun Sun; Guodong Zheng; Xiaoran Fan; Yujiong Shen; Yi Lu; Zhiheng Xi; Yuming Yang; Wenming Tan; Tao Ji; Tao Gui; Qi Zhang; Xuanjing Huang

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Shuo Li, Jiajun Sun, Guodong Zheng, Xiaoran Fan, Yujiong Shen, Yi Lu, Zhiheng Xi, Yuming Yang, Wenming Tan, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR

This work identifies object hallucinations in multimodal LLMs as a consequence of over-sensitivity to frequency-domain image features. It introduces Multi-Frequency Perturbations (MFP), a pluggable pipeline that extracts and fuses high- and low-frequency image features with original visual tokens through cross-attention, and applies inference-time attenuation to suppress redundant frequency information. The method demonstrates strong, architecture-agnostic improvements across CHAIR, POPE, MME, and MMBench benchmarks, and can further enhance performance when combined with existing SOTA approaches like PAI. Overall, MFP provides a practical, training-time compatible strategy to improve the reliability of MLLMs in object grounding and description tasks, with broad applicability across visual encoders and model scales.

Abstract

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

TL;DR

Abstract

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)