AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
Xinyu Hou, Xiaoming Li, Chen Change Loy
TL;DR
AITTI tackles stereotype biases in text-to-image generation without requiring explicit attribute specification or prior bias distributions. It learns concept-specific inclusive tokens through a lightweight adaptive mapping network and guides their training with an anchor loss to align with all target attribute classes, enabling generalization to unseen concepts. Empirical results show substantial fairness improvements (lower $\_{KL}$) while preserving text-image alignment and image quality, with model-agnostic applicability demonstrated on SD1.5, SD2.1, and SDXL. The approach also enables multi-bias mitigation by concatenating adaptive tokens, highlighting practical potential for fairer, more inclusive T2I systems in real-world use cases.
Abstract
Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specification nor prior knowledge of the bias distribution. Specifically, the core of our method is a lightweight adaptive mapping network, which can customize the inclusive tokens for the concepts to be de-biased, making the tokens generalizable to unseen concepts regardless of their original bias distributions. This is achieved by tuning the adaptive mapping network with a handful of balanced and inclusive samples using an anchor loss. Experimental results demonstrate that our method outperforms previous bias mitigation methods without attribute specification while preserving the alignment between generative results and text descriptions. Moreover, our method achieves comparable performance to models that require specific attributes or editing directions for generation. Extensive experiments showcase the effectiveness of our adaptive inclusive tokens in mitigating stereotypical bias in text-to-image generation. The code will be available at https://github.com/itsmag11/AITTI.
