Table of Contents
Fetching ...

Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models

Messi H. J. Lee, Soyeon Jeon, Jacob M. Montgomery, Calvin K. Lai

TL;DR

This study expands bias analysis in vision-language systems beyond simple trait associations by incorporating homogeneity bias and prototypicality in open-ended narratives. Using the RADIATE face set and four diverse VLMs, it shows gender prototypicality amplifies homogeneity bias while race patterns are more nuanced and model-dependent, with White Americans often appearing more homogeneous. Trait associations reveal Black Americans are consistently linked to basketball across models, while other associations vary by model, illustrating context-dependent and sometimes positive stereotyping with potential negative consequences. The work highlights that bias mitigation must address nuanced visual cues and open-ended contexts, and it calls for non-linear analyses and more inclusive datasets to better capture real-world diversity.

Abstract

Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.

Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models

TL;DR

This study expands bias analysis in vision-language systems beyond simple trait associations by incorporating homogeneity bias and prototypicality in open-ended narratives. Using the RADIATE face set and four diverse VLMs, it shows gender prototypicality amplifies homogeneity bias while race patterns are more nuanced and model-dependent, with White Americans often appearing more homogeneous. Trait associations reveal Black Americans are consistently linked to basketball across models, while other associations vary by model, illustrating context-dependent and sometimes positive stereotyping with potential negative consequences. The work highlights that bias mitigation must address nuanced visual cues and open-ended contexts, and it calls for non-linear analyses and more inclusive datasets to better capture real-world diversity.

Abstract

Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.

Paper Structure

This paper contains 32 sections, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Sample RADIATE facial stimuli showing the lowest-rated (left) and highest-rated (right) faces for racial prototypicality within each demographic category: Black men, Black women, White men, and White women.
  • Figure 2: Standardized cosine similarity of the two gender groups calculated using all-mpnet-base-v2.
  • Figure 3: Standardized cosine similarity of the two racial groups calculated using all-mpnet-base-v2.
  • Figure 4: Prevalence of basketball of racial groups. In all four VLMs, Black Americans were significantly more associated with basketball than White Americans. Error bars indicate 95% confidence intervals. Visualization for other topics can be found in Figure \ref{['Figure: Trait Associations (Race)']} of the Supplementary Materials.
  • Figure S2: Standardized cosine similarity (1,000 random samples for each gender group) by prototypicality, calculated using all three encoder models. The top and bottom 10% of prototypicality values were excluded to minimize tail effects.
  • ...and 3 more figures