Table of Contents
Fetching ...

BioPro: On Difference-Aware Gender Fairness for Vision-Language Models

Yujie Lin, Jiayao Ma, Qingguo Hu, Derek F. Wong, Jinsong Su

TL;DR

The paper addresses gender bias in vision–language models by introducing difference-aware fairness for multimodal tasks. It proposes BioPro, a training-free Bias Orthogonal Projection method that selectively debiases neutral inputs while preserving explicit gender cues, using a gender-variation subspace learned from counterfactuals. The approach is extended to both image captioning and text-to-image generation, including a calibration term for generation and an explicit mechanism to handle continuous bias variables like scene brightness. Experimental results show BioPro reduces neutral bias with minimal impact on explicit faithfulness and semantic fidelity, outperforming several baselines and demonstrating generalization beyond discrete gender bias.

Abstract

Vision-Language Models (VLMs) inherit significant social biases from their training data, notably in gender representation. Current fairness interventions often adopt a difference-unaware perspective that enforces uniform treatment across demographic groups. These approaches, however, fail to distinguish between contexts where neutrality is required and those where group-specific attributes are legitimate and must be preserved. Building upon recent advances in difference-aware fairness for text-only models, we extend this concept to the multimodal domain and formalize the problem of difference-aware gender fairness for image captioning and text-to-image generation. We advocate for selective debiasing, which aims to mitigate unwanted bias in neutral contexts while preserving valid distinctions in explicit ones. To achieve this, we propose BioPro (Bias Orthogonal Projection), an entirely training-free framework. BioPro identifies a low-dimensional gender-variation subspace through counterfactual embeddings and applies projection to selectively neutralize gender-related information. Experiments show that BioPro effectively reduces gender bias in neutral cases while maintaining gender faithfulness in explicit ones, thus providing a promising direction toward achieving selective fairness in VLMs. Beyond gender bias, we further demonstrate that BioPro can effectively generalize to continuous bias variables, such as scene brightness, highlighting its broader applicability.

BioPro: On Difference-Aware Gender Fairness for Vision-Language Models

TL;DR

The paper addresses gender bias in vision–language models by introducing difference-aware fairness for multimodal tasks. It proposes BioPro, a training-free Bias Orthogonal Projection method that selectively debiases neutral inputs while preserving explicit gender cues, using a gender-variation subspace learned from counterfactuals. The approach is extended to both image captioning and text-to-image generation, including a calibration term for generation and an explicit mechanism to handle continuous bias variables like scene brightness. Experimental results show BioPro reduces neutral bias with minimal impact on explicit faithfulness and semantic fidelity, outperforming several baselines and demonstrating generalization beyond discrete gender bias.

Abstract

Vision-Language Models (VLMs) inherit significant social biases from their training data, notably in gender representation. Current fairness interventions often adopt a difference-unaware perspective that enforces uniform treatment across demographic groups. These approaches, however, fail to distinguish between contexts where neutrality is required and those where group-specific attributes are legitimate and must be preserved. Building upon recent advances in difference-aware fairness for text-only models, we extend this concept to the multimodal domain and formalize the problem of difference-aware gender fairness for image captioning and text-to-image generation. We advocate for selective debiasing, which aims to mitigate unwanted bias in neutral contexts while preserving valid distinctions in explicit ones. To achieve this, we propose BioPro (Bias Orthogonal Projection), an entirely training-free framework. BioPro identifies a low-dimensional gender-variation subspace through counterfactual embeddings and applies projection to selectively neutralize gender-related information. Experiments show that BioPro effectively reduces gender bias in neutral cases while maintaining gender faithfulness in explicit ones, thus providing a promising direction toward achieving selective fairness in VLMs. Beyond gender bias, we further demonstrate that BioPro can effectively generalize to continuous bias variables, such as scene brightness, highlighting its broader applicability.

Paper Structure

This paper contains 35 sections, 1 theorem, 29 equations, 9 figures, 9 tables.

Key Result

lemma 1

The optimization problem defined in the objective eq: calibration admits a closed-form solution. Specifically, the optimal projection matrix $\mathbf P_{f \to m}$ that minimizes the objective eq: calibration is given by

Figures (9)

  • Figure 1: BioPro can selectively perform debiasing primarily on neutral samples labeled as unsure gender.
  • Figure 2: An example of text-to-image generation. The figure shows eight images generated from the same prompt with fixed seeds ranging from 0 to 7. BioPro can help balance the gender distribution of the generated images.
  • Figure 3: Overview of BioPro on two multimodal tasks. For the image captioning task, the feature space of the samples is divided into three subspaces, and the embedding space is similarly partitioned into three subspaces according to output types. The model tends to incorrectly project some samples from the neutral feature subspace into the male or female embedding subspace. BioPro pulls these misprojected samples back to the correct neutral subspace. For the text-to-image generation task, the sample feature space is divided into three subspaces, while the embedding space is partitioned into two subspaces according to output types, since the model can only generate male or female images. The model tends to project prompts from the neutral feature subspace disproportionately into one embedding subspace, resulting in highly imbalanced gender representations in the generated images. BioPro mitigates this issue by shifting a portion of samples from one embedding subspace to the other, thereby ensuring gender balance in the generated outputs.
  • Figure 4: Absolute values after projection onto $\mathbf{U}_c$ when using the FLUX.1-dev base model.
  • Figure 5: Debiasing scene bias. The figure shows eight images generated from the same prompt with fixed seeds ranging from 0 to 7.
  • ...and 4 more figures

Theorems & Definitions (1)

  • lemma 1: Closed-Form Solution for Calibration