Table of Contents
Fetching ...

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh

TL;DR

This paper addresses the problem that foundation models can amplify disparities for marginalized communities through a cascading, interconnected set of effects rooted in embedding disparities. It formalizes eight disparity types, links them to lifecycle phases—design, training, and deployment—and argues that current evaluation and optimization practices miss root causes. The authors propose concrete, technically grounded calls to action, including mixture-of-experts architectures, representation-aware metrics, active data collection, and capacity-aware adaptation, to rebalance representations across heterogeneous distributions. The work aims to shift both evaluation and model design toward expressive, minority-friendly embeddings and equitable deployment, with sociotechnical considerations woven into the technical roadmap.

Abstract

The rise of foundation models holds immense promise for advancing AI, but this progress may amplify existing risks and inequalities, leaving marginalized communities behind. In this position paper, we discuss that disparities towards marginalized communities - performance, representation, privacy, robustness, interpretability and safety - are not isolated concerns but rather interconnected elements of a cascading disparity phenomenon. We contrast foundation models with traditional models and highlight the potential for exacerbated disparity against marginalized communities. Moreover, we emphasize the unique threat of cascading impacts in foundation models, where interconnected disparities can trigger long-lasting negative consequences, specifically to the people on the margin. We define marginalized communities within the machine learning context and explore the multifaceted nature of disparities. We analyze the sources of these disparities, tracing them from data creation, training and deployment procedures to highlight the complex technical and socio-technical landscape. To mitigate the pressing crisis, we conclude with a set of calls to action to mitigate disparity at its source.

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

TL;DR

This paper addresses the problem that foundation models can amplify disparities for marginalized communities through a cascading, interconnected set of effects rooted in embedding disparities. It formalizes eight disparity types, links them to lifecycle phases—design, training, and deployment—and argues that current evaluation and optimization practices miss root causes. The authors propose concrete, technically grounded calls to action, including mixture-of-experts architectures, representation-aware metrics, active data collection, and capacity-aware adaptation, to rebalance representations across heterogeneous distributions. The work aims to shift both evaluation and model design toward expressive, minority-friendly embeddings and equitable deployment, with sociotechnical considerations woven into the technical roadmap.

Abstract

The rise of foundation models holds immense promise for advancing AI, but this progress may amplify existing risks and inequalities, leaving marginalized communities behind. In this position paper, we discuss that disparities towards marginalized communities - performance, representation, privacy, robustness, interpretability and safety - are not isolated concerns but rather interconnected elements of a cascading disparity phenomenon. We contrast foundation models with traditional models and highlight the potential for exacerbated disparity against marginalized communities. Moreover, we emphasize the unique threat of cascading impacts in foundation models, where interconnected disparities can trigger long-lasting negative consequences, specifically to the people on the margin. We define marginalized communities within the machine learning context and explore the multifaceted nature of disparities. We analyze the sources of these disparities, tracing them from data creation, training and deployment procedures to highlight the complex technical and socio-technical landscape. To mitigate the pressing crisis, we conclude with a set of calls to action to mitigate disparity at its source.
Paper Structure (11 sections, 1 figure)

This paper contains 11 sections, 1 figure.

Figures (1)

  • Figure 1: Simplified example of data with a mixture of heterogeneous distributions. The aggregated distribution, commonly assumed for training machine learning models, is represented in Figure (a) by a black dotted line $D_m$, while the underlying distributions are depicted in blue $D_1$, orange $D_2$, green $D_3$, and red $D_4$. In this example, all distributions are assumed to be Gaussian with equal weight, which is not reflective of real-world scenarios where marginalized communities often have smaller data sizes. Despite this simplification, the aggregated distribution still differs significantly from all the underlying distributions. Learning based solely on the aggregated distribution not only fails to accurately represent any of the underlying distributions, as shown in (a), but also risks missing variations if the dimensions of the underlying distributions differ, as shown in (b-e). While simply reweighting or adding more data points to marginalized distributions is not helpful, the complexities of the distributions can significantly impact the learning process and should be reflected in the method of aggregation.