Intersectional Unfairness Discovery
Gezheng Xu, Qi Chen, Charles Ling, Boyu Wang, Changjian Shui
TL;DR
The paper tackles intersectional unfairness by attempting to discover high-bias subgroups across multiple sensitive attributes, beyond single-attribute analyses. It reframes discovery as a generation problem and introduces a Bias-Guided Generative Network (BGGN) where $p_{\theta}(\boldsymbol{a}) \propto \mathcal{L}_{f}(\boldsymbol{a})$, trained via variational inference with reverse KL to produce diverse, high-bias intersectional attributes. The approach is validated on CelebA and Toxic, showing that BGGN yields more diverse and numerous high-bias configurations, including unseen combinations, and demonstrates their potential to induce biased outputs when prompted to modern generative AI systems. While promising, the work acknowledges limitations such as training stability and the focus on discovery rather than mitigation, underscoring the need for careful handling of such biases in real-world deployments and future work toward mitigation strategies.
Abstract
AI systems have been shown to produce unfair results for certain subgroups of population, highlighting the need to understand bias on certain sensitive attributes. Current research often falls short, primarily focusing on the subgroups characterized by a single sensitive attribute, while neglecting the nature of intersectional fairness of multiple sensitive attributes. This paper focuses on its one fundamental aspect by discovering diverse high-bias subgroups under intersectional sensitive attributes. Specifically, we propose a Bias-Guided Generative Network (BGGN). By treating each bias value as a reward, BGGN efficiently generates high-bias intersectional sensitive attributes. Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. To further evaluate the generated unseen but possible unfair intersectional sensitive attributes, we formulate them as prompts and use modern generative AI to produce new texts and images. The results of frequently generating biased data provides new insights of discovering potential unfairness in popular modern generative AI systems. Warning: This paper contains generative examples that are offensive in nature.
