Table of Contents
Fetching ...

Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA

Zaiying Zhao, Toshihiko Yamasaki

TL;DR

This paper expands fairness evaluation for LVLMs beyond demographics by introducing a counterfactual VQA benchmark that tests model decision boundaries under controlled contextual shifts. It demonstrates that non-demographic attributes such as environment and social behavior can distort reasoning more than traditional demographic biases, and that instruction-based debiasing is often ineffective while exposure to human-norm exemplars yields more consistent, balanced responses. The work provides a practical, model-agnostic auditing framework for black-box LVLMs and reveals a trade-off between fairness and practicality in debiasing approaches. Overall, it advances understanding of contextual biases in multimodal reasoning and offers actionable insights for auditing and improving model behavior.

Abstract

Recent advances in large vision-language models (LVLMs) have amplified concerns about fairness, yet existing evaluations remain confined to demographic attributes and often conflate fairness with refusal behavior. This paper broadens the scope of fairness by introducing a counterfactual VQA benchmark that probes the decision boundaries of closed-source LVLMs under controlled context shifts. Each image pair differs in a single visual attribute that has been validated as irrelevant to the question, enabling ground-truth-free and refusal-aware analysis of reasoning stability. Comprehensive experiments reveal that non-demographic attributes, such as environmental context or social behavior, distort LVLM decision-making more strongly than demographic ones. Moreover, instruction-based debiasing shows limited effectiveness and can even amplify these asymmetries, whereas exposure to a small number of human norm validated examples from our benchmark encourages more consistent and balanced responses, highlighting its potential not only as an evaluative framework but also as a means for understanding and improving model behavior. Together, these results provide a practial basis for auditing contextual biases even in black-box LVLMs and contribute to more transparent and equitable multimodal reasoning.

Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA

TL;DR

This paper expands fairness evaluation for LVLMs beyond demographics by introducing a counterfactual VQA benchmark that tests model decision boundaries under controlled contextual shifts. It demonstrates that non-demographic attributes such as environment and social behavior can distort reasoning more than traditional demographic biases, and that instruction-based debiasing is often ineffective while exposure to human-norm exemplars yields more consistent, balanced responses. The work provides a practical, model-agnostic auditing framework for black-box LVLMs and reveals a trade-off between fairness and practicality in debiasing approaches. Overall, it advances understanding of contextual biases in multimodal reasoning and offers actionable insights for auditing and improving model behavior.

Abstract

Recent advances in large vision-language models (LVLMs) have amplified concerns about fairness, yet existing evaluations remain confined to demographic attributes and often conflate fairness with refusal behavior. This paper broadens the scope of fairness by introducing a counterfactual VQA benchmark that probes the decision boundaries of closed-source LVLMs under controlled context shifts. Each image pair differs in a single visual attribute that has been validated as irrelevant to the question, enabling ground-truth-free and refusal-aware analysis of reasoning stability. Comprehensive experiments reveal that non-demographic attributes, such as environmental context or social behavior, distort LVLM decision-making more strongly than demographic ones. Moreover, instruction-based debiasing shows limited effectiveness and can even amplify these asymmetries, whereas exposure to a small number of human norm validated examples from our benchmark encourages more consistent and balanced responses, highlighting its potential not only as an evaluative framework but also as a means for understanding and improving model behavior. Together, these results provide a practial basis for auditing contextual biases even in black-box LVLMs and contribute to more transparent and equitable multimodal reasoning.

Paper Structure

This paper contains 36 sections, 19 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: We examine biases arising from both demographic and non-demographic attributes. Our counterfactual VQA reveals how changing a single visual attribute exposes instabilities in closed-source LVLMs, with non-demographic factors, e.g., social behaviors and aesthetic elements, inducing even stronger distortions.
  • Figure 2: Taxonomy of attributes for our counterfactual VQA. We organize them into 5 categories with 17 subcategories. The numbers in parentheses denote the number of unique attributes curated in each subcategory.
  • Figure 3: Overview of the counterfactual VQA construction process.
  • Figure 4: Consistency-coverage curves and symmetric refusal curves for representative models. RA denotes refusal asymmetry rate.
  • Figure 5: Examples where bias attributes influence LVLM decision-making. The bias attribute in each example belongs to category (a) Demography, (b) Culture, (c) Environment, (d) Behavior, and (e) Aesthetic, respectively. Additional examples are provided in Appendix \ref{['appendix:additional_examples']}
  • ...and 2 more figures