Table of Contents
Fetching ...

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu

TL;DR

This work addresses modality bias in multimodal large language models by reframing debiasing as a preference optimization problem. It introduces RLAIF-V-Bias, a biased-data dataset constructed by perturbing modalities, and NaPO, a noise-aware optimization method that combines a negative Box-Cox transformed MAE with BCE-based DPO, adapting the loss weight via a dynamic noise-robust coefficient $q$. Empirical results on bias and hallucination benchmarks show substantial reductions in language and vision priors and fewer hallucinations, with ablations illustrating the importance of dynamic weighting and data-type synergy. The approach preserves existing model capabilities while improving cross-modal integration, offering a practical path to more reliable multimodal reasoning in real-world deployments. Key theoretical and empirical contributions include the formulation of NaPO, adaptive loss weighting, and evidence that modality-balanced training reduces errors beyond simple data balancing.

Abstract

Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIFVBias, a debiased preference optimization dataset, and a Noise Aware Preference Optimization algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise robust Mean Absolute Error with the Binary Cross Entropy in Direct Preference Optimization by a negative Box Cox transformation, and dynamically adjust the algorithm noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

TL;DR

This work addresses modality bias in multimodal large language models by reframing debiasing as a preference optimization problem. It introduces RLAIF-V-Bias, a biased-data dataset constructed by perturbing modalities, and NaPO, a noise-aware optimization method that combines a negative Box-Cox transformed MAE with BCE-based DPO, adapting the loss weight via a dynamic noise-robust coefficient . Empirical results on bias and hallucination benchmarks show substantial reductions in language and vision priors and fewer hallucinations, with ablations illustrating the importance of dynamic weighting and data-type synergy. The approach preserves existing model capabilities while improving cross-modal integration, offering a practical path to more reliable multimodal reasoning in real-world deployments. Key theoretical and empirical contributions include the formulation of NaPO, adaptive loss weighting, and evidence that modality-balanced training reduces errors beyond simple data balancing.

Abstract

Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIFVBias, a debiased preference optimization dataset, and a Noise Aware Preference Optimization algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise robust Mean Absolute Error with the Binary Cross Entropy in Direct Preference Optimization by a negative Box Cox transformation, and dynamically adjust the algorithm noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.

Paper Structure

This paper contains 32 sections, 14 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Examples of different types of modality-biased responses and their preferred counterparts. Left: The model relies excessively on prior knowledge, assuming a bear is brown while overlooking the image, which shows a polar bear. Right: Although the model answers the question correctly, it provides unnecessary image details that are irrelevant to the question.
  • Figure 2: Method details. First, biased responses are constructed by using masking to guide the model toward over-relying on prompts and generating responses based on the base model. Next, NaPO is applied for noise-robust preference optimization to counteract noise in automatically constructed data, dynamically assessing data noise levels to calculate NaPO’s noise robustness coefficient $q$ (see Equation (\ref{['eq:dynq']})). Here we assumed that the original data is of high quality, so DPO is used to train on it directly. Additional experiments were conducted with NaPO on the original data, and the results can be found in Appendix \ref{['sec:additional_experiments']}.
  • Figure 3: Comparison of different loss functions. We plotted the function $(1-x^q)q^{-1}$ for values of $q$ in the range $(0.1, 0.3, 0.5, 0.7, 0.9)$, and compared it with both MAE $(1 - x)$ and BCE $-ln(x)$. By adjusting the value of q, we can balance the noise robustness and the rapid convergence ability of NaPO.
  • Figure 4: Analysis of noise and margin distribution in automatically constructed data. The first row shows LogP margins between each biased response type and the ground truth, while the second row shows avg LogP margins. In language-biased responses, biased (noise-free) data have a higher avg LogP margin than unbiased (noise) data. Similarly, in vision-biased responses, biased (noise-free) data show a higher LogP margin than unbiased (noise) data.
  • Figure 5: Hyperparameter analysis on language-biasd data. The chart illustrates the results of the model on VLindBench and Object HalBench when training on language-biased data with different $\alpha$ values in NaPO. We observed that the model achieves better performance across all four metrics when $\alpha$ is set to 0.5.
  • ...and 3 more figures