Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Phillip Howard; Kathleen C. Fraser; Anahita Bhiwandiwalla; Svetlana Kiritchenko

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

TL;DR

This work tackles the question of social bias in large vision-language models by leveraging SocialCounterfactuals—171k synthetic counterfactual image-text sets—to isolate the influence of perceived race, gender, and physical attributes on LVLM outputs. It conducts a large-scale, multi-model study (five open LVLMs plus GPT-4o) generating over 57 million responses across diverse prompts, and evaluates bias through a multi-faceted framework: MaxToxicity, lexical stereotypes via PMI and GPT-4o filtering, competency word usage aligned with the Stereotype Content Model, and numeric ratings. Key findings show nontrivial bias across models with tail toxicity disproportionately affecting obese and Black groups, intersectional effects, and varying reliance on stereotypes in open-ended descriptions; toxicity in LVLMs also correlates with the bias present in underlying LLMs. The work also demonstrates that some inference-time mitigation is possible, though inconsistent, and argues for more robust debiasing strategies and broader evaluation to ensure safer, fairer LVLM deployment at scale.

Abstract

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images, producing over 57 million responses from popular models. Our multi-dimensional bias evaluation framework reveals that social attributes such as perceived race, gender, and physical characteristics depicted in images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of individuals.

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

TL;DR

Abstract

Paper Structure (54 sections, 7 equations, 12 figures, 27 tables)

This paper contains 54 sections, 7 equations, 12 figures, 27 tables.

Introduction
Related Work
Methodology
Dataset
Generation Setting
Evaluation
MaxToxicity
Stereotypes
Competency
Numeric Ratings
Bias Probing Results for LVLMs
MaxToxicity
Evaluation of Open LVLMs
Evaluation of GPT-4o
Toxicity Evaluation with a Dataset of Real Images
...and 39 more sections

Figures (12)

Figure 1: Given counterfactual images depicting a common subject with different social attributes, we prompt LVLMs with open-ended questions such as "What are 5 keywords that describe the personality traits of this person?" We then evaluate generated responses for words related to competency, stereotypes, and toxicity.
Figure 2: Proportional representation of intersectional social groups among generations which exceed the 90th percentile of MaxToxicity scores.
Figure 3: Average ratings on the "Performance" prompt for males and females of different body types. (Error bars: 95% confidence intervals.)
Figure 4: Average length (in words) of generated LVLM responses to the Backstory prompt for different physical-gender groups
Figure 5: Distribution of toxicity scores for BakLLaVA responses to the Characteristics prompt, broken down by occupation.
...and 7 more figures

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

TL;DR

Abstract

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

Authors

TL;DR

Abstract

Table of Contents

Figures (12)