SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma
TL;DR
This paper tackles societal bias in vision-language models like CLIP by addressing limitations of existing debiasing approaches, namely loss of attribute information and reliance on protected-attribute annotations. It introduces SANER, a four-component, annotation-free debiasing pipeline that neutralizes attribute words in input text and applies a learnable debiasing layer to CLIP text features, optimized with a loss $\mathcal{L} = \alpha \mathcal{L}_{\text{deb}} + \beta \mathcal{L}_{\text{recon}} + \gamma \mathcal{L}_{\text{cont}}$; the core of $\mathcal{L}_{\text{deb}}$ enforces equalized similarity across attribute groups without using attribute labels. Empirical results on text-to-image retrieval and generation show SANER yields superior debiasing for gender, age, and race while preserving zero-shot classification performance, outperforming prior methods that require annotations. The approach is dataset- and task-agnostic, enabling training on any image-text corpus and extending to other modalities with a potential debiasing of the image encoder. Overall, SANER delivers strong, annotation-free bias mitigation with preserved semantics, facilitating safer deployment of CLIP-based systems in real-world applications.
Abstract
Large-scale vision-language models, such as CLIP, are known to contain societal bias regarding protected attributes (e.g., gender, age). This paper aims to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.
