Table of Contents
Fetching ...

Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Dachuan Zhao, Weiyue Li, Zhenda Shen, Yushu Qiu, Bowen Xu, Haoyu Chen, Yongchao Chen

TL;DR

Vision-Language Models often encode demographic biases that degrade fairness and cross-domain robustness. The authors show that prior post-hoc debiasing (SFID) relies on coordinate-wise edits that fail due to bias being distributed across subspaces, entangled across attributes, and varying with data distribution. They propose Subspace Projection Debiasing (SPD), which uses Iterative Null-space Projection to identify a bias subspace and then projects embeddings onto its orthogonal complement, followed by neutral reinjection to preserve semantics. SPD achieves stronger reductions in demographic disparity across zero-shot classification, text-to-image retrieval, and generation tasks with minimal loss in task performance, demonstrating the importance of a geometric, subspace-based view of bias in VLMs.

Abstract

Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analysis reveals three critical failures of this coordinate-wise approach: feature entanglement, poor cross-dataset generalization, and incomplete bias removal. We find that bias is not localized to a few coordinates but is instead distributed across a few linear subspaces. To address these limitations, we propose $\textbf{S}$ubspace $\textbf{P}$rojection $\textbf{D}$ebiasing ($\textbf{SPD}$), a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias while reinserting a neutral mean component to preserve semantic fidelity. Extensive experiments across zero-shot classification, text-to-image retrieval, and image generation validate the effectiveness of SPD: our method achieves more robust debiasing with an average improvement of $18.5\%$ across four fairness metrics, while maintaining minimal loss in task performance compared to the best debiasing baseline.

Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

TL;DR

Vision-Language Models often encode demographic biases that degrade fairness and cross-domain robustness. The authors show that prior post-hoc debiasing (SFID) relies on coordinate-wise edits that fail due to bias being distributed across subspaces, entangled across attributes, and varying with data distribution. They propose Subspace Projection Debiasing (SPD), which uses Iterative Null-space Projection to identify a bias subspace and then projects embeddings onto its orthogonal complement, followed by neutral reinjection to preserve semantics. SPD achieves stronger reductions in demographic disparity across zero-shot classification, text-to-image retrieval, and generation tasks with minimal loss in task performance, demonstrating the importance of a geometric, subspace-based view of bias in VLMs.

Abstract

Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analysis reveals three critical failures of this coordinate-wise approach: feature entanglement, poor cross-dataset generalization, and incomplete bias removal. We find that bias is not localized to a few coordinates but is instead distributed across a few linear subspaces. To address these limitations, we propose ubspace rojection ebiasing (), a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias while reinserting a neutral mean component to preserve semantic fidelity. Extensive experiments across zero-shot classification, text-to-image retrieval, and image generation validate the effectiveness of SPD: our method achieves more robust debiasing with an average improvement of across four fairness metrics, while maintaining minimal loss in task performance compared to the best debiasing baseline.

Paper Structure

This paper contains 27 sections, 11 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Overview of our framework SPD in 2D schematic. (1) We first identify the bias subspace set $U$ via $T$ iterative logistic classifiers, each extracts and removes a bias-predictive direction. (2) We then estimate a neutral mean $\bar{x}_{\text{low}}$ from low-confidence validation samples using a Random Forest. (3) At inference, query embeddings are projected onto the null space of the learned bias subspace and then re-centered by reinjecting the projection of $\bar{x}_{\text{low}}$ along that subspace, yielding debiased representations.
  • Figure 2: Text-to-image generation results for the neutral prompt “a person who works as a film director.” The first row shows CoDi outputs, and the second row shows CoDi with SPD debiasing. Gender labels (“male” / “female”) are automatically assigned using BLIP-2 by asking “Does the person look like a male or a female?”.