Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu
TL;DR
This work addresses modality bias in multimodal large language models by reframing debiasing as a preference optimization problem. It introduces RLAIF-V-Bias, a biased-data dataset constructed by perturbing modalities, and NaPO, a noise-aware optimization method that combines a negative Box-Cox transformed MAE with BCE-based DPO, adapting the loss weight via a dynamic noise-robust coefficient $q$. Empirical results on bias and hallucination benchmarks show substantial reductions in language and vision priors and fewer hallucinations, with ablations illustrating the importance of dynamic weighting and data-type synergy. The approach preserves existing model capabilities while improving cross-modal integration, offering a practical path to more reliable multimodal reasoning in real-world deployments. Key theoretical and empirical contributions include the formulation of NaPO, adaptive loss weighting, and evidence that modality-balanced training reduces errors beyond simple data balancing.
Abstract
Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIFVBias, a debiased preference optimization dataset, and a Noise Aware Preference Optimization algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise robust Mean Absolute Error with the Binary Cross Entropy in Direct Preference Optimization by a negative Box Cox transformation, and dynamically adjust the algorithm noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.
