BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

Chahat Raj; Anjishnu Mukherjee; Aylin Caliskan; Antonios Anastasopoulos; Ziwei Zhu

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

TL;DR

BiasDora addresses the narrow scope of existing VLM bias evaluations by proposing a cross-modal probing framework that uncovers hidden implicit associations across 9 bias dimensions in Text-to-Text, Text-to-Image, and Image-to-Text settings. It introduces a three-stage pipeline—probing, association salience, bias level assessment with LLMs—and a bias-isolation mechanism, leveraging a dataset derived from CrowS-Pairs descriptors to analyze roughly 400 demographic descriptors. The approach yields significant, negative, toxic, and extreme biases that vary by model and modality, including many previously unreported associations, and it publicly releases the Dora dataset to enable broader mitigation efforts. The work advances bias detection in VLMs beyond predefined vocabularies, providing a practical framework for evaluating and mitigating hidden biases in real-world systems and informing responsible deployment of multimodal AI.

Abstract

Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available here https://github.com/chahatraj/BiasDora.

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

TL;DR

Abstract

Paper Structure (24 sections, 18 figures, 4 tables)

This paper contains 24 sections, 18 figures, 4 tables.

Introduction
VLM Probing
Text-to-Text
Text-to-Image
Image-to-Text
VLM Association Assessment
Significant Associations
Negative and Toxic Associations
Bias Level Assessment
Bias Isolation
Empirical Analysis
Negative Stereotypical Associations
Toxic Associations
Bias Level Assessment
Discovered Associations
...and 9 more sections

Figures (18)

Figure 1: VLMs reinforce biases that are different from the documented stereotypical associations.
Figure 2: We probe VLMs in three modalities: T2T, T2I & I2T through word completion, image generation, and image description tasks. We calculate statistically significant association followed by identifying sentiment-negative and toxic association. We further evaluate bias levels of these associations using LLM-based assessment.
Figure 3: GPT-4o (T2T) and Llama-3-8B (T2T) generate a high percentage of negative associations in T2T modality. Each lexical setting captures a distinct level of negative sentiment across the bias dimensions and models. Sexual Orientation and Physical Appearance demonstrate more negative associations than the other dimensions.
Figure 4: Stable Diffusion (T2I) has higher bias than DALL-E 3 (T2I) in gender images. GPT-4o (I2T) and LLaVA (I2T) reflect high disability biases.
Figure 5: GPT-4o (T2I) image generations perpetuate stereotypes by associating humans with skin-color, colors, objects, and attributes.
...and 13 more figures

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

TL;DR

Abstract

BiasDora: Exploring Hidden Biased Associations in Vision-Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)