Bias-inducing geometries: an exactly solvable data model with fairness implications

Stefano Sarao Mannelli; Federica Gerace; Negar Rostamzadeh; Luca Saglietti

Bias-inducing geometries: an exactly solvable data model with fairness implications

Stefano Sarao Mannelli, Federica Gerace, Negar Rostamzadeh, Luca Saglietti

TL;DR

The paper addresses how data geometry can induce bias in ML by introducing the Teacher-Mixture (T-M) model, an exactly solvable high-dimensional data framework with coexisting sub-populations. Using replica analysis in the limit $n,d\to\infty$ with $\alpha=n/d$, it derives fixed-point equations for scalar order parameters $\Theta$ that yield exact predictions for classification performance and fairness metrics, including the Disparate Impact $DI$. It identifies bias arising from group-label correlations $m_T^\pm$, group overlap $q_T$, and representation imbalance $\rho$, showing that bias can persist even when the task is learnable ($q_T=1$) and illustrating a positive transfer when subpopulations share similar rules. The paper then proposes two mitigation strategies—loss reweighing and coupled networks—and analytically characterizes their impact on fairness metrics and accuracy, with validation on real data (CelebA, MEPS) that supports the theoretical insights and highlights practical considerations for bias mitigation.

Abstract

Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Despite the simplicity of the data model, we retrace and unpack typical unfairness behaviour observed on real-world datasets. We also obtain a detailed analytical characterisation of a class of bias mitigation strategies. We first consider a basic loss-reweighing scheme, which allows for an implicit minimisation of different unfairness metrics, and quantify the incompatibilities between some existing fairness criteria. Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models. Our theoretical analysis of this approach shows that the coupled strategy can strike superior fairness-accuracy trade-offs.

Bias-inducing geometries: an exactly solvable data model with fairness implications

TL;DR

with

, it derives fixed-point equations for scalar order parameters

that yield exact predictions for classification performance and fairness metrics, including the Disparate Impact

. It identifies bias arising from group-label correlations

, group overlap

, and representation imbalance

, showing that bias can persist even when the task is learnable (

) and illustrating a positive transfer when subpopulations share similar rules. The paper then proposes two mitigation strategies—loss reweighing and coupled networks—and analytically characterizes their impact on fairness metrics and accuracy, with validation on real data (CelebA, MEPS) that supports the theoretical insights and highlights practical considerations for bias mitigation.

Abstract

Paper Structure (26 sections, 64 equations, 15 figures, 1 table)

This paper contains 26 sections, 64 equations, 15 figures, 1 table.

Modelling Data Imbalance
Theoretical analysis in high-dimensions.
Investigating the sources of bias
Group-label correlation.
Bias and variance.
Positive transfer.
Mitigation strategies
Loss Reweighing.
Coupled Networks.
Discussion
Symbols and notation
Replica analysis
Replica symmetric ansatz.
Interaction term.
Energetic term.
...and 11 more sections

Figures (15)

Figure 1: The Teacher-Mixture (T-M) model can account for several types of data imbalance. Panel A The T-M model is a generative model of high-dimensional structured data. Inputs are sampled from a combination of multivariate Gaussian distributions, with different centroids and covariances for each sub-population in the dataset. The probability of sampling from each sub-population can be tuned, giving rise to representation imbalance. In particular, the cartoon shows a larger relative representation for the male population (), which also has a smaller variance. The cyan and yellow shaded regions (green in their intersection) denote the decision boundaries of the labelling rules for the different data sub-populations, which in principle can be misaligned. Panel B The panel exemplifies how manipulating the parameters of the T-M model can alter the data distribution: B.1 represents the balanced condition with equally represented, distributed and labelled samples; B.2 shows scarcity of data points in both clusters ; B.3 displays an example of rule misalignment ; B.4 shows different sub-population variances; B.5 shows relative representation imbalance ; B.6 represents the case of unbalanced labels ; B.7 shows a case of positive group-label correlation .
Figure 2: Training on T-M model and comparison between error on synthetic and real data. Panel A Given a vector of input features and a group membership (male/female), the ground-truth label is assigned by the associated 1-layer teacher network (represented by one of the vectors $W_T^\pm$). The decision boundaries are demarked in blue and yellow (while their intersection is coloured in green)). The labelling rules can be aligned, i.e. the decision rule does not depend on the group membership, or misaligned as in panel A. A 1-layer student network is given inputs $\boldsymbol{x}^\mu$ and labels $y^\mu$, and trained to produce the correct outputs $\hat{y}$ via gradient descent on the loss $\ell(\hat{y},y)$. Panel B shows the test performance (on the two sub-populations) for a student network trained on mixed data instances with variable relative representations. Unsurprisingly, when one sub-population is largely predominant in the dataset, the classifier becomes biased to have higher accuracy on it. The plot shows the match between the analytic curves described in Sec. \ref{['sec:analysis']} (solid lines), and numerical simulations on the synthetic framework (dots). Panel C contains a similar experiment, but with data from the 'CelebA' dataset liu2015faceattributes. Details in the Appendix \ref{['app:real_data']}.
Figure 3: Simple geometrical properties cause the emergence of bias. Each point in the left diagrams shows, for different values of the model parameters, the Disparate Impact (DI) of the trained model (darker colours represent stronger biases). In particular, in the left diagrams, on the x-axis we vary the relative representation $\rho$, while on the y-axis we explore possible values of the rule similarity $q_T$ for Panel A and the group-label correlation $m_T^\pm$ for Panel B.The corresponding figures on the right show the values of the accuracy for the two sub-populations in correspondence of the cut represented by the dashed line on the left.
Figure 4: Emergence of bias even in balanced datasets. We show the disparate impact as the distribution of the two subpopulations is changed by altering their variances ($\Delta_{+}$ and $\Delta_{-}$). The diagonal line gives the configurations where the two subpopulations have the same variance. The two figures consider different levels of representation, from left to right $\rho=0.1, 0.3, 0.5$. The latter is the situation with both subpopulations being equally represented in the dataset. We use the red and blue colours to quantify the disparate bias against sub-population $+$ and $-$ (respectively).
Figure 5: Performance benefits for both subpopulations under shared training. With $10\%$ of the data points in sub-population $+$ ($\rho=0.1$), we compare the performance with different levels of rule similarity ($q_T$) as the size of the dataset is increased, showing the disparate impact in the left figure and the individual accuracies in central and right ones. In central and right figures, the baselines --plotted in black-- show the accuracies attained when the model is trained only on the corresponding group data. The inset of the rightmost figure highlights the differences in accuracy in the small dataset regime. When the rules are sufficiently aligned, joint training on both groups will induce a better accuracy on the smaller sub-population provided $\alpha$ is not too small. Moreover, at intermediate values of $\alpha$ also the larger group can benefit from the information transfer.
...and 10 more figures

Theorems & Definitions (5)

Remark 1
Remark 2
Remark 3
Remark 4
Remark 5

Bias-inducing geometries: an exactly solvable data model with fairness implications

TL;DR

Abstract

Bias-inducing geometries: an exactly solvable data model with fairness implications

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (5)