MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Chenxi Wang; Xiang Chen; Ningyu Zhang; Bozhong Tian; Haoming Xu; Shumin Deng; Huajun Chen

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Chenxi Wang, Xiang Chen, Ningyu Zhang, Bozhong Tian, Haoming Xu, Shumin Deng, Huajun Chen

TL;DR

This work investigates why multimodal LLMs hallucinate, showing that before final decoding the model can recognize objects yet prior-guided biases suppress these facts. It introduces DeCo, a training-free dynamic decoding framework that selects an anchor preceding layer and corrects the final logits with anchor-layer knowledge and dynamic modulation. DeCo is model-agnostic and integrates with common decoding strategies, improving hallucination suppression across image captioning and VQA benchmarks while incurring modest latency overhead. The results indicate that leveraging preceding-layer information is an effective and practical pathway to reduce hallucinations in MLLMs, with strong empirical support and analysis of robustness and hyperparameters.

Abstract

Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. We speculate that this may be due to the strong knowledge priors of the language model suppressing the visual information, leading to hallucinations. Motivated by this, we propose a novel dynamic correction decoding method for MLLMs DeCo, which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits. Note that DeCo is model agnostic and can be seamlessly incorporated with various classic decoding strategies and applied to different MLLMs. We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines, highlighting its potential to mitigate hallucinations. Code is available at https://github.com/zjunlp/DeCo.

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

TL;DR

Abstract

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)