SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Yusuke Hirota; Min-Hung Chen; Chien-Yi Wang; Yuta Nakashima; Yu-Chiang Frank Wang; Ryo Hachiuma

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

TL;DR

This paper tackles societal bias in vision-language models like CLIP by addressing limitations of existing debiasing approaches, namely loss of attribute information and reliance on protected-attribute annotations. It introduces SANER, a four-component, annotation-free debiasing pipeline that neutralizes attribute words in input text and applies a learnable debiasing layer to CLIP text features, optimized with a loss $\mathcal{L} = \alpha \mathcal{L}_{\text{deb}} + \beta \mathcal{L}_{\text{recon}} + \gamma \mathcal{L}_{\text{cont}}$; the core of $\mathcal{L}_{\text{deb}}$ enforces equalized similarity across attribute groups without using attribute labels. Empirical results on text-to-image retrieval and generation show SANER yields superior debiasing for gender, age, and race while preserving zero-shot classification performance, outperforming prior methods that require annotations. The approach is dataset- and task-agnostic, enabling training on any image-text corpus and extending to other modalities with a potential debiasing of the image encoder. Overall, SANER delivers strong, annotation-free bias mitigation with preserved semantics, facilitating safer deployment of CLIP-based systems in real-world applications.

Abstract

Large-scale vision-language models, such as CLIP, are known to contain societal bias regarding protected attributes (e.g., gender, age). This paper aims to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

TL;DR

; the core of

enforces equalized similarity across attribute groups without using attribute labels. Empirical results on text-to-image retrieval and generation show SANER yields superior debiasing for gender, age, and race while preserving zero-shot classification performance, outperforming prior methods that require annotations. The approach is dataset- and task-agnostic, enabling training on any image-text corpus and extending to other modalities with a potential debiasing of the image encoder. Overall, SANER delivers strong, annotation-free bias mitigation with preserved semantics, facilitating safer deployment of CLIP-based systems in real-world applications.

Abstract

Paper Structure (35 sections, 13 equations, 6 figures, 10 tables)

This paper contains 35 sections, 13 equations, 6 figures, 10 tables.

Introduction
Review: Existing Debiasing Methods
Adversarial debiasing
Projection-based debiasing
Summary of the challenges
Societal Attribute Neutralizer (SANER)
Attribute neutralization
Feature modification
Attribute annotation-free debiasing loss
Regularization losses
Training and inference
Experiments: Text-to-Image Retrieval
Experimental settings
Gender bias analysis
Age and racial biases analysis
...and 20 more sections

Figures (6)

Figure 1: Our debiasing method, SANER, overcomes the limitations in existing methods: (a) attribute information is retained after debiasing, and (b) protected attribute annotations are not required for debiasing.
Figure 2: An overview of SANER, exemplified by binary gender. SANER neutralizes attribute-specific text (e.g., "woman" $\rightarrow$ "person"), modifies features via debiasing layer, and uses three losses for debiasing: $\mathcal{L}_\text{deb}$ for attribute neutralization, $\mathcal{L}_\text{recon}$ for feature preservation, and $\mathcal{L}_\text{cont}$ for image-text alignment.
Figure 3: Generated images for the prompt, "A photo of a designer," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). We randomly sample $10$ images from generated images. Images framed in green denote those of the minority gender in the generated images (i.e., female).
Figure 4: Generated images for the prompt, "A photo of a female doctor," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). Red frame indicates images with incorrect gender (i.e., male).
Figure 5: Generated images for the prompt, "A photo of a teacher," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). We randomly sample $10$ images from generated images. Images framed in green denote those of the minority gender in the generated images (i.e., male).
...and 1 more figures

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

TL;DR

Abstract

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Authors

TL;DR

Abstract

Table of Contents

Figures (6)