The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models
Xinyi Liu, Weiguang Wang, Hangfeng He
TL;DR
Disentangling epistemic and aleatoric uncertainty in vision-language models under prompt biases is essential for reliable decision-making. The authors introduce bias-shuffled prompt perturbations and evaluate GPT-4o and Qwen2-VL on VL_Checklist and CREPE, using AUROC and entropy-based decomposition to quantify uncertainty. They find that bias effects intensify at lower bias-free confidence, with bias-induced underestimation of epistemic entropy (overconfidence) and weaker effects on aleatoric entropy, and that combining multiple bias mitigations yields the largest gains. The work informs bias-mitigation strategies for uncertainty quantification in multimodal models and supports approaches that explicitly separate epistemic and aleatoric sources; Entropy decomposition: $Entropy = Epistemic Entropy + P(correct) \cdot Aleatoric Entropy$.
Abstract
With the growing adoption of Large Language Models (LLMs) for open-ended tasks, accurately assessing epistemic uncertainty, which reflects a model's lack of knowledge, has become crucial to ensuring reliable outcomes. However, quantifying epistemic uncertainty in such tasks is challenging due to the presence of aleatoric uncertainty, which arises from multiple valid answers. While bias can introduce noise into epistemic uncertainty estimation, it may also reduce noise from aleatoric uncertainty. To investigate this trade-off, we conduct experiments on Visual Question Answering (VQA) tasks and find that mitigating prompt-introduced bias improves uncertainty quantification in GPT-4o. Building on prior work showing that LLMs tend to copy input information when model confidence is low, we further analyze how these prompt biases affect measured epistemic and aleatoric uncertainty across varying bias-free confidence levels with GPT-4o and Qwen2-VL. We find that all considered biases have greater effects in both uncertainties when bias-free model confidence is lower. Moreover, lower bias-free model confidence is associated with greater bias-induced underestimation of epistemic uncertainty, resulting in overconfident estimates, whereas it has no significant effect on the direction of bias effect in aleatoric uncertainty estimation. These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.
