Table of Contents
Fetching ...

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Leander Girrbach, Stephan Alaniz, Yiran Huang, Trevor Darrell, Zeynep Akata

TL;DR

The paper addresses gender bias in vision-language assistants by introducing VL-Gender, a framework that evaluates biases across personality traits, skills, and occupations using 22 open-source VLAs and a curated, occupation-free image dataset. It employs robust prompt variations and a VQA-style evaluation to reveal biases that often mirror real-world gender imbalances, such as male associations with negative traits and female associations with positive traits, with some models also displaying real-world occupation biases. Debiasing experiments compare five methods, finding that full fine-tuning offers the strongest bias reduction with acceptable performance costs, while other methods provide more conservative or task-preserving adjustments. The work emphasizes pre-deployment bias assessment, reproducibility, and the need for scalable debiasing strategies to promote equitable societal outcomes in VLAs.

Abstract

Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks. The widespread adoption of instruction-tuned image-to-text vision-language assistants (VLAs) like LLaVA and InternVL necessitates evaluating gender biases. We study gender bias in 22 popular open-source VLAs with respect to personality traits, skills, and occupations. Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. Similarly, they tend to attribute more skills and positive personality traits to women than to men, and we see a consistent tendency to associate negative personality traits with men. To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance on downstream tasks. We argue for pre-deploying gender bias assessment in VLAs and motivate further development of debiasing strategies to ensure equitable societal outcomes. Code is available at https://github.com/ExplainableML/vla-gender-bias.

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

TL;DR

The paper addresses gender bias in vision-language assistants by introducing VL-Gender, a framework that evaluates biases across personality traits, skills, and occupations using 22 open-source VLAs and a curated, occupation-free image dataset. It employs robust prompt variations and a VQA-style evaluation to reveal biases that often mirror real-world gender imbalances, such as male associations with negative traits and female associations with positive traits, with some models also displaying real-world occupation biases. Debiasing experiments compare five methods, finding that full fine-tuning offers the strongest bias reduction with acceptable performance costs, while other methods provide more conservative or task-preserving adjustments. The work emphasizes pre-deployment bias assessment, reproducibility, and the need for scalable debiasing strategies to promote equitable societal outcomes in VLAs.

Abstract

Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks. The widespread adoption of instruction-tuned image-to-text vision-language assistants (VLAs) like LLaVA and InternVL necessitates evaluating gender biases. We study gender bias in 22 popular open-source VLAs with respect to personality traits, skills, and occupations. Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. Similarly, they tend to attribute more skills and positive personality traits to women than to men, and we see a consistent tendency to associate negative personality traits with men. To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance on downstream tasks. We argue for pre-deploying gender bias assessment in VLAs and motivate further development of debiasing strategies to ensure equitable societal outcomes. Code is available at https://github.com/ExplainableML/vla-gender-bias.

Paper Structure

This paper contains 44 sections, 5 equations, 25 figures, 11 tables.

Figures (25)

  • Figure 1: We measure gender bias across personality traits, work-related soft skills, and occupations. First, we collect suitable attributes and integrate them into a predefined prompt template. The prompt and an image are provided to the VLAs. We analyze the VLAs' responses by comparing the probability of outputting the "yes" option across genders and apply several debiasing methods.
  • Figure 2: Top five male- and female-biased personality traits (top), skills (middle), occupations (bottom). For each trait, skill, and occupation, we show $\mu_{\text{male}} - \mu_{\text{female}}$ averaged across models in the respective series to show gender bias strength.
  • Figure 3: Comparison of Full Fine-tuning (middle bar) and LoRA Fine-tuning (right bar) as debiasing methods (original VLA = left bar in each VLA). For each prompt group and evaluated model, we show the ratio of traits/skills/occupations with significant bias for the original and debiased models.
  • Figure 4: Qualitative results demonstrating the effect of different debiasing methods. For two images (female-labeled on top and male-labeled on bottom), we show distributions over options before (red plot) and after debiasing (blue-green plots) for four different prompt variants.
  • Figure 5: Ratio of removed images per dataset as a function of the removal threshold on the probability of the image containing occupation-related information. In addition to the full datasets (left), we show curves for male-labeled images (center) and female-labeled images (right). For FairFace (padding=1.25), MIAP, and Phase, male-labeled images, on average, have a higher probability of containing occupation-related content. The dashed lines indicate the threshold of 0.25 chosen in this study.
  • ...and 20 more figures