Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA
Zaiying Zhao, Toshihiko Yamasaki
TL;DR
This paper expands fairness evaluation for LVLMs beyond demographics by introducing a counterfactual VQA benchmark that tests model decision boundaries under controlled contextual shifts. It demonstrates that non-demographic attributes such as environment and social behavior can distort reasoning more than traditional demographic biases, and that instruction-based debiasing is often ineffective while exposure to human-norm exemplars yields more consistent, balanced responses. The work provides a practical, model-agnostic auditing framework for black-box LVLMs and reveals a trade-off between fairness and practicality in debiasing approaches. Overall, it advances understanding of contextual biases in multimodal reasoning and offers actionable insights for auditing and improving model behavior.
Abstract
Recent advances in large vision-language models (LVLMs) have amplified concerns about fairness, yet existing evaluations remain confined to demographic attributes and often conflate fairness with refusal behavior. This paper broadens the scope of fairness by introducing a counterfactual VQA benchmark that probes the decision boundaries of closed-source LVLMs under controlled context shifts. Each image pair differs in a single visual attribute that has been validated as irrelevant to the question, enabling ground-truth-free and refusal-aware analysis of reasoning stability. Comprehensive experiments reveal that non-demographic attributes, such as environmental context or social behavior, distort LVLM decision-making more strongly than demographic ones. Moreover, instruction-based debiasing shows limited effectiveness and can even amplify these asymmetries, whereas exposure to a small number of human norm validated examples from our benchmark encourages more consistent and balanced responses, highlighting its potential not only as an evaluative framework but also as a means for understanding and improving model behavior. Together, these results provide a practial basis for auditing contextual biases even in black-box LVLMs and contribute to more transparent and equitable multimodal reasoning.
