Table of Contents
Fetching ...

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

Xinyu Hou, Xiaoming Li, Chen Change Loy

TL;DR

AITTI tackles stereotype biases in text-to-image generation without requiring explicit attribute specification or prior bias distributions. It learns concept-specific inclusive tokens through a lightweight adaptive mapping network and guides their training with an anchor loss to align with all target attribute classes, enabling generalization to unseen concepts. Empirical results show substantial fairness improvements (lower $\_{KL}$) while preserving text-image alignment and image quality, with model-agnostic applicability demonstrated on SD1.5, SD2.1, and SDXL. The approach also enables multi-bias mitigation by concatenating adaptive tokens, highlighting practical potential for fairer, more inclusive T2I systems in real-world use cases.

Abstract

Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specification nor prior knowledge of the bias distribution. Specifically, the core of our method is a lightweight adaptive mapping network, which can customize the inclusive tokens for the concepts to be de-biased, making the tokens generalizable to unseen concepts regardless of their original bias distributions. This is achieved by tuning the adaptive mapping network with a handful of balanced and inclusive samples using an anchor loss. Experimental results demonstrate that our method outperforms previous bias mitigation methods without attribute specification while preserving the alignment between generative results and text descriptions. Moreover, our method achieves comparable performance to models that require specific attributes or editing directions for generation. Extensive experiments showcase the effectiveness of our adaptive inclusive tokens in mitigating stereotypical bias in text-to-image generation. The code will be available at https://github.com/itsmag11/AITTI.

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

TL;DR

AITTI tackles stereotype biases in text-to-image generation without requiring explicit attribute specification or prior bias distributions. It learns concept-specific inclusive tokens through a lightweight adaptive mapping network and guides their training with an anchor loss to align with all target attribute classes, enabling generalization to unseen concepts. Empirical results show substantial fairness improvements (lower ) while preserving text-image alignment and image quality, with model-agnostic applicability demonstrated on SD1.5, SD2.1, and SDXL. The approach also enables multi-bias mitigation by concatenating adaptive tokens, highlighting practical potential for fairer, more inclusive T2I systems in real-world use cases.

Abstract

Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specification nor prior knowledge of the bias distribution. Specifically, the core of our method is a lightweight adaptive mapping network, which can customize the inclusive tokens for the concepts to be de-biased, making the tokens generalizable to unseen concepts regardless of their original bias distributions. This is achieved by tuning the adaptive mapping network with a handful of balanced and inclusive samples using an anchor loss. Experimental results demonstrate that our method outperforms previous bias mitigation methods without attribute specification while preserving the alignment between generative results and text descriptions. Moreover, our method achieves comparable performance to models that require specific attributes or editing directions for generation. Extensive experiments showcase the effectiveness of our adaptive inclusive tokens in mitigating stereotypical bias in text-to-image generation. The code will be available at https://github.com/itsmag11/AITTI.
Paper Structure (22 sections, 3 equations, 11 figures, 14 tables, 1 algorithm)

This paper contains 22 sections, 3 equations, 11 figures, 14 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of key concepts in T2I bias mitigation. The input prompt may contain biased concepts (e.g., doctor), and the generated images predominantly depict some attribute classes (e.g., male) over others, reflecting bias with respect to a sensitive attribute (e.g., gender). Such imbalances highlight the need for fairness-aware generation techniques.
  • Figure 2: Revised Textual Inversion (rTI) with fixed inclusive token causes semantic drifting of visual concepts. All images are generated with the same random seed. The caption above indicates the base prompt $T(c)$. Top: SD1.5 without rTI; Bottom: SD1.5 with rTI.
  • Figure 3: Framework of our proposed adaptive inclusive token for text-to-image generation. The blue color indicates frozen weights, and the green color indicates trainable weights. Left: single training stage. Right: details of the text model with the adaptive mapping network. The adaptive inclusive token is concept-specific. $TokenIDs$ are for illustration only.
  • Figure 4: Qualitative evaluation on gender bias mitigation of stereotypically male-dominated occupations. All images are generated with the same random seed. The captions above indicate the base prompt $T(c)$.
  • Figure 6: Qualitative evaluation on race biases mitigation. All images are generated with the same random seed. The captions above indicate the base prompt $T(c)$.
  • ...and 6 more figures