FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, Siqi Deng
TL;DR
This work tackles the bias in diffusion-based human image generation by conditioning a frozen pre-trained backbone on externally retrieved, demographically diverse reference images. It introduces FairRAG, a lightweight framework with a linear conditioning module and a fair retrieval system that uses debiased queries and balanced sampling to enrich demographic representation. Empirical results show improved demographic diversity, better image-text alignment, and competitive image fidelity, all with minimal inference overhead. The approach is extensible to broader domains by expanding the external reference dataset and can be integrated with other retrieval-augmented generation strategies without retraining the backbone.
Abstract
Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In this work, we introduce Fair Retrieval Augmented Generation (FairRAG), a novel framework that conditions pre-trained generative models on reference images retrieved from an external image database to improve fairness in human generation. FairRAG enables conditioning through a lightweight linear module that projects reference images into the textual space. To enhance fairness, FairRAG applies simple-yet-effective debiasing strategies, providing images from diverse demographic groups during the generative process. Extensive experiments demonstrate that FairRAG outperforms existing methods in terms of demographic diversity, image-text alignment, and image fidelity while incurring minimal computational overhead during inference.
