Table of Contents
Fetching ...

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Wei Chen, Xin Yan, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, Long Chen

TL;DR

The paper tackles hallucinations in multimodal LLMs by addressing the limitations of joint training-based approaches like Direct Preference Optimization and training-free methods such as Visual Contrastive Decoding. It introduces Decoupling Contrastive Decoding (DCD), which separately learns positive and negative image projections and uses a vision-aware negative projection during inference, forming an inference-time contrastive objective $ ext{hat ext{-}logit} = (1+oldsymbol{ extalpha}) ext{logit}_w - oldsymbol{ extalpha} ext{logit}_l$. By decoupling $ ext{L}_{ ext{pos}}$ and $ ext{L}_{ ext{neg}}$ during training, DCD avoids likelihood displacement and better captures authentic hallucination patterns, enabling competitive hallucination suppression while preserving general reasoning across benchmarks. Empirical results show that DCD, especially the Neg-only and Pos+Neg variants, achieves strong hallucination mitigation with minimal or no loss in general performance, outperforming VCD and matching or surpassing DPO on several tasks, with clear improvements in robustness and transfer.

Abstract

Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious hallucination issue: generating outputs misaligned with obvious visual or factual evidence. Currently, training-based solutions, like direct preference optimization (DPO), leverage paired preference data to suppress hallucinations. However, they risk sacrificing general reasoning capabilities due to the likelihood displacement. Meanwhile, training-free solutions, like contrastive decoding, achieve this goal by subtracting the estimated hallucination pattern from a distorted input. Yet, these handcrafted perturbations (e.g., add noise to images) may poorly capture authentic hallucination patterns. To avoid these weaknesses of existing methods, and realize robust hallucination mitigation (i.e., maintaining general reasoning performance), we propose a novel framework: Decoupling Contrastive Decoding (DCD). Specifically, DCD decouples the learning of positive and negative samples in preference datasets, and trains separate positive and negative image projections within the MLLM. The negative projection implicitly models real hallucination patterns, which enables vision-aware negative images in the contrastive decoding inference stage. Our DCD alleviates likelihood displacement by avoiding pairwise optimization and generalizes robustly without handcrafted degradation. Extensive ablations across hallucination benchmarks and general reasoning tasks demonstrate the effectiveness of DCD, i.e., it matches DPO's hallucination suppression while preserving general capabilities and outperforms the handcrafted contrastive decoding methods.

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

TL;DR

The paper tackles hallucinations in multimodal LLMs by addressing the limitations of joint training-based approaches like Direct Preference Optimization and training-free methods such as Visual Contrastive Decoding. It introduces Decoupling Contrastive Decoding (DCD), which separately learns positive and negative image projections and uses a vision-aware negative projection during inference, forming an inference-time contrastive objective . By decoupling and during training, DCD avoids likelihood displacement and better captures authentic hallucination patterns, enabling competitive hallucination suppression while preserving general reasoning across benchmarks. Empirical results show that DCD, especially the Neg-only and Pos+Neg variants, achieves strong hallucination mitigation with minimal or no loss in general performance, outperforming VCD and matching or surpassing DPO on several tasks, with clear improvements in robustness and transfer.

Abstract

Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious hallucination issue: generating outputs misaligned with obvious visual or factual evidence. Currently, training-based solutions, like direct preference optimization (DPO), leverage paired preference data to suppress hallucinations. However, they risk sacrificing general reasoning capabilities due to the likelihood displacement. Meanwhile, training-free solutions, like contrastive decoding, achieve this goal by subtracting the estimated hallucination pattern from a distorted input. Yet, these handcrafted perturbations (e.g., add noise to images) may poorly capture authentic hallucination patterns. To avoid these weaknesses of existing methods, and realize robust hallucination mitigation (i.e., maintaining general reasoning performance), we propose a novel framework: Decoupling Contrastive Decoding (DCD). Specifically, DCD decouples the learning of positive and negative samples in preference datasets, and trains separate positive and negative image projections within the MLLM. The negative projection implicitly models real hallucination patterns, which enables vision-aware negative images in the contrastive decoding inference stage. Our DCD alleviates likelihood displacement by avoiding pairwise optimization and generalizes robustly without handcrafted degradation. Extensive ablations across hallucination benchmarks and general reasoning tasks demonstrate the effectiveness of DCD, i.e., it matches DPO's hallucination suppression while preserving general capabilities and outperforms the handcrafted contrastive decoding methods.

Paper Structure

This paper contains 18 sections, 10 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between existing hallucination mitigation methods and DCD. (a) Training-based method (e.g., DPO rafailov2024direct): DPO directly optimizes the likelihood gap between positive (correct) and negative (hallucinatory) responses using preference datasets. However, maximizing this gap ($y^{+}$ vs. $y^{-}$) can inadvertently lower the probability of both responses, causing likelihood displacement and potential degradation of general reasoning capabilities. Here, $v$, $x$, $y^{+}$, and $y^{-}$ denote images, questions, positive responses, and negative responses, respectively; $\theta$ represents model parameters, and $\alpha$ is the contrastive decoding coefficient. (b) Training-free method (e.g., VCD leng2024mitigating) vs. DCD: Traditional contrastive decoding (VCD) reduces hallucinations by comparing model outputs from original ($v^{+}$) and artificially distorted ($v^{-}$; e.g., noise-added) visual inputs at inference time.
  • Figure 2: Comparison of DCD with DPO rafailov2024direct and VCD leng2024mitigating in the training and inference stages. (a) Training stage: DPO jointly optimizes positive–negative responses, risking likelihood displacement. Our method (DCD) separately learns positive and negative image projections to avoid this issue. (b) Inference stage: VCD uses artificial noise as negative inputs, whereas DCD leverages learned negative visual features that reflect authentic hallucination patterns, enhancing effective hallucination suppression.
  • Figure 3: Comparison of visualization samples among VCD leng2024mitigating, DPO rafailov2024direct, and our method (trained negatives solely on BPO pi2024strengthening).
  • Figure 4: Model response generated by using negative image embeddings as inputs for positive image embeddings. For "VCD", we utilize noisy images as image inputs and for "Ours", we utilize negative image projection to project image inputs.