Table of Contents
Fetching ...

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

TL;DR

This paper tackles societal bias in vision-language models like CLIP by addressing limitations of existing debiasing approaches, namely loss of attribute information and reliance on protected-attribute annotations. It introduces SANER, a four-component, annotation-free debiasing pipeline that neutralizes attribute words in input text and applies a learnable debiasing layer to CLIP text features, optimized with a loss $\mathcal{L} = \alpha \mathcal{L}_{\text{deb}} + \beta \mathcal{L}_{\text{recon}} + \gamma \mathcal{L}_{\text{cont}}$; the core of $\mathcal{L}_{\text{deb}}$ enforces equalized similarity across attribute groups without using attribute labels. Empirical results on text-to-image retrieval and generation show SANER yields superior debiasing for gender, age, and race while preserving zero-shot classification performance, outperforming prior methods that require annotations. The approach is dataset- and task-agnostic, enabling training on any image-text corpus and extending to other modalities with a potential debiasing of the image encoder. Overall, SANER delivers strong, annotation-free bias mitigation with preserved semantics, facilitating safer deployment of CLIP-based systems in real-world applications.

Abstract

Large-scale vision-language models, such as CLIP, are known to contain societal bias regarding protected attributes (e.g., gender, age). This paper aims to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

TL;DR

This paper tackles societal bias in vision-language models like CLIP by addressing limitations of existing debiasing approaches, namely loss of attribute information and reliance on protected-attribute annotations. It introduces SANER, a four-component, annotation-free debiasing pipeline that neutralizes attribute words in input text and applies a learnable debiasing layer to CLIP text features, optimized with a loss ; the core of enforces equalized similarity across attribute groups without using attribute labels. Empirical results on text-to-image retrieval and generation show SANER yields superior debiasing for gender, age, and race while preserving zero-shot classification performance, outperforming prior methods that require annotations. The approach is dataset- and task-agnostic, enabling training on any image-text corpus and extending to other modalities with a potential debiasing of the image encoder. Overall, SANER delivers strong, annotation-free bias mitigation with preserved semantics, facilitating safer deployment of CLIP-based systems in real-world applications.

Abstract

Large-scale vision-language models, such as CLIP, are known to contain societal bias regarding protected attributes (e.g., gender, age). This paper aims to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.
Paper Structure (35 sections, 13 equations, 6 figures, 10 tables)

This paper contains 35 sections, 13 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Our debiasing method, SANER, overcomes the limitations in existing methods: (a) attribute information is retained after debiasing, and (b) protected attribute annotations are not required for debiasing.
  • Figure 2: An overview of SANER, exemplified by binary gender. SANER neutralizes attribute-specific text (e.g., "woman" $\rightarrow$ "person"), modifies features via debiasing layer, and uses three losses for debiasing: $\mathcal{L}_\text{deb}$ for attribute neutralization, $\mathcal{L}_\text{recon}$ for feature preservation, and $\mathcal{L}_\text{cont}$ for image-text alignment.
  • Figure 3: Generated images for the prompt, "A photo of a designer," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). We randomly sample $10$ images from generated images. Images framed in green denote those of the minority gender in the generated images (i.e., female).
  • Figure 4: Generated images for the prompt, "A photo of a female doctor," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). Red frame indicates images with incorrect gender (i.e., male).
  • Figure 5: Generated images for the prompt, "A photo of a teacher," by the original Stable Diffusion (SD), projection-based debiased CLIP (Projection), and our debiased CLIP (SANER). We randomly sample $10$ images from generated images. Images framed in green denote those of the minority gender in the generated images (i.e., male).
  • ...and 1 more figures