Table of Contents
Fetching ...

Intersectional Unfairness Discovery

Gezheng Xu, Qi Chen, Charles Ling, Boyu Wang, Changjian Shui

TL;DR

The paper tackles intersectional unfairness by attempting to discover high-bias subgroups across multiple sensitive attributes, beyond single-attribute analyses. It reframes discovery as a generation problem and introduces a Bias-Guided Generative Network (BGGN) where $p_{\theta}(\boldsymbol{a}) \propto \mathcal{L}_{f}(\boldsymbol{a})$, trained via variational inference with reverse KL to produce diverse, high-bias intersectional attributes. The approach is validated on CelebA and Toxic, showing that BGGN yields more diverse and numerous high-bias configurations, including unseen combinations, and demonstrates their potential to induce biased outputs when prompted to modern generative AI systems. While promising, the work acknowledges limitations such as training stability and the focus on discovery rather than mitigation, underscoring the need for careful handling of such biases in real-world deployments and future work toward mitigation strategies.

Abstract

AI systems have been shown to produce unfair results for certain subgroups of population, highlighting the need to understand bias on certain sensitive attributes. Current research often falls short, primarily focusing on the subgroups characterized by a single sensitive attribute, while neglecting the nature of intersectional fairness of multiple sensitive attributes. This paper focuses on its one fundamental aspect by discovering diverse high-bias subgroups under intersectional sensitive attributes. Specifically, we propose a Bias-Guided Generative Network (BGGN). By treating each bias value as a reward, BGGN efficiently generates high-bias intersectional sensitive attributes. Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. To further evaluate the generated unseen but possible unfair intersectional sensitive attributes, we formulate them as prompts and use modern generative AI to produce new texts and images. The results of frequently generating biased data provides new insights of discovering potential unfairness in popular modern generative AI systems. Warning: This paper contains generative examples that are offensive in nature.

Intersectional Unfairness Discovery

TL;DR

The paper tackles intersectional unfairness by attempting to discover high-bias subgroups across multiple sensitive attributes, beyond single-attribute analyses. It reframes discovery as a generation problem and introduces a Bias-Guided Generative Network (BGGN) where , trained via variational inference with reverse KL to produce diverse, high-bias intersectional attributes. The approach is validated on CelebA and Toxic, showing that BGGN yields more diverse and numerous high-bias configurations, including unseen combinations, and demonstrates their potential to induce biased outputs when prompted to modern generative AI systems. While promising, the work acknowledges limitations such as training stability and the focus on discovery rather than mitigation, underscoring the need for careful handling of such biases in real-world deployments and future work toward mitigation strategies.

Abstract

AI systems have been shown to produce unfair results for certain subgroups of population, highlighting the need to understand bias on certain sensitive attributes. Current research often falls short, primarily focusing on the subgroups characterized by a single sensitive attribute, while neglecting the nature of intersectional fairness of multiple sensitive attributes. This paper focuses on its one fundamental aspect by discovering diverse high-bias subgroups under intersectional sensitive attributes. Specifically, we propose a Bias-Guided Generative Network (BGGN). By treating each bias value as a reward, BGGN efficiently generates high-bias intersectional sensitive attributes. Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. To further evaluate the generated unseen but possible unfair intersectional sensitive attributes, we formulate them as prompts and use modern generative AI to produce new texts and images. The results of frequently generating biased data provides new insights of discovering potential unfairness in popular modern generative AI systems. Warning: This paper contains generative examples that are offensive in nature.
Paper Structure (26 sections, 14 equations, 6 figures, 1 algorithm)

This paper contains 26 sections, 14 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Visualization of systematic generalization lake2023human among Relaxed Search Tree, Conventional generative model (VAE) and BGGN in Toxic Dataset. The search tree method is limited to identifying a narrow selection of high-bias subgroups. The VAE, on the other hand, tends to generate subgroups that mirror the original distribution of $\mathbf{a}$. In contrast, our BGGN can discover more $\mathbf{a}$ covering all cohorts of high-bias subgroups.
  • Figure 2: CelebA data. Results under bias threshold $\tau = 0.3$. (a) The search algorithm only depends on the Observation dataset and is inadequate in discovering diverse intersectional sensitive attribute $\mathbf{a}$. (b-c) We compare BGGN with the regular VAE under various metrics (higher is better). The results in radar charts demonstrate the superiority of BGGN in the efficient generation on high-bias and diverse $\mathbf{a}$.
  • Figure 3: Toxic data. Results under a bias threshold $\tau = 0.3$. (a) The search algorithm only depends on the Observation dataset and fails in discovering diverse intersectional sensitive attribute $\mathbf{a}$. (b-c) We compare BGGN with the regular VAE under various metrics (higher is better). The results in radar charts demonstrate the superiority of BGGN in the efficient generation on high-bias and diverse $\mathbf{a}$.
  • Figure 4: Analysis and Ablation studies. (a) We visualize the probability density of the bias value in the Toxic dataset. The conventional generative model, such as VAE, perfectly captures the raw data distribution $p_{\text{data}}(\mathbf{a})$, where most intersectional sensitive attributes are in the region of low bias. In contrast, the proposed BGGN tends to generate high-bias intersectional sensitive attributes, with a higher bias value by average (slashed line in green). (b,c) When we change different bias thresholds $\tau$, BGGN is consistently better than search in discovering diverse high-bias $\mathbf{a}$.
  • Figure 5: Case study in Texts (Toxic dataset). To evaluate the newly generated but unseen high-bias $\mathbf{a}$, we formulate them as prompts and then ask modern generative AI such as LLaMA Touvron2023LLaMAOA to generate opinionated comments. The LLaMA tends to generate biased opinions for these $\mathbf{a}$. The prediction error of the generated texts is higher than the average level of the dataset (e.g., the mean loss of the texts generated is 1.43, whereas the dataset level is about 0.2). More results from other generative AI can be found in the Appendix.
  • ...and 1 more figures