BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

Ruta Binkyte; Daniele Gorla; Catuscia Palamidessi

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

Ruta Binkyte, Daniele Gorla, Catuscia Palamidessi

TL;DR

BaBE tackles unfair discrimination when the legitimate explaining variable $E$ is latent and only a biased proxy $Z$ is observed. By combining Bayes inference with the EM algorithm, BaBE estimates $\mathbb{P}[E|S]$ from data and then derives $\hat{\mathbb{P}}[E|Z,S]$, enabling decisions based on the inferred $E$ to satisfy CSP and EO. The method includes two practical decision strategies and demonstrates strong fairness and accuracy on synthetic data with distribution shifts and on the NHANES dataset, with robustness to changes in $\mathbb{P}[E|S]$ across populations. BaBE does not assume independence between $E$ and $S$ and can transfer causal knowledge via the bias mechanism $\mathbb{P}[Z|E,S]$, offering a principled pre-processing approach for fair decision-making across domains. Overall, BaBE provides a scalable, data-efficient framework for latent-explainer fairness that achieves CSP/EO while preserving predictive performance.

Abstract

We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

TL;DR

BaBE tackles unfair discrimination when the legitimate explaining variable

is latent and only a biased proxy

is observed. By combining Bayes inference with the EM algorithm, BaBE estimates

from data and then derives

, enabling decisions based on the inferred

to satisfy CSP and EO. The method includes two practical decision strategies and demonstrates strong fairness and accuracy on synthetic data with distribution shifts and on the NHANES dataset, with robustness to changes in

across populations. BaBE does not assume independence between

and

and can transfer causal knowledge via the bias mechanism

, offering a principled pre-processing approach for fair decision-making across domains. Overall, BaBE provides a scalable, data-efficient framework for latent-explainer fairness that achieves CSP/EO while preserving predictive performance.

Abstract

Paper Structure (27 sections, 3 theorems, 14 equations, 13 figures, 1 algorithm)

This paper contains 27 sections, 3 theorems, 14 equations, 13 figures, 1 algorithm.

Introduction
Related Work
Preliminaries and Notation
$\hat{E}$, $\hat{Y}$ and $Y$ notations
The Expectation-Maximization Framework
Metrics for the quality of estimations
The Wasserstein distance
Accuracy
Distortion
Metrics for fairness
The BaBE method
Deriving $\hat{\mathbb{P}}[E|S]$
Deriving $\hat{\mathbb{P}}[E|Z,S]$ from $\hat{\mathbb{P}}[E|S]$
Deriving ${\hat{E}}$ and $\hat{Y}_{\hat{E}}$ from $\hat{\mathbb{P}} [E|Z,S]$
Method 1
...and 12 more sections

Key Result

Lemma 1

Figures (13)

Figure 1: Left: illustration of the causal relation between the data. Right: illustration of our pre-processing method.
Figure 1: Execution time (in seconds) and an average number of iterations for BaBE algorithm, for each group of data sets where the mean for S=0 is varied.
Figure 2: The pipeline of BaBE application. The variable $E$ is observable in the source data and $\mathbb{P}[Z|E,S]$ can be derived. The target data is the one where $E$ is not observable and we want to recover it using $\mathbb{P}[Z|E,S]$ derived from the source data. We input $\mathbb{P}[Z|E,S]$ and statistics from the observable variables in the target data to BaBE and estimate $\hat{E}$ consistent with the target distribution (possibly different than in the source data). We then again use $\mathbb{P}[Z|E,S]$ (from source data), observable variables (from the target data) and $\hat{E}$ (BaBE estimation) to inference $\hat{E}|Z,S$ for each sample in the target data.
Figure 3: The distribution of $E|S$ in the source data and in the new populations.
Figure 4: Experiments on the synthetic data sets: The Wasserstein distance between $\hat{\mathbb{P}}[Z]$ and $\mathbb{P}[E]$ and between $\hat{\mathbb{P}}[E]$ and $\mathbb{P}[E]$.
...and 8 more figures

Theorems & Definitions (5)

Example 1
Example 2
Lemma 1
Lemma 2
Theorem 1

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

TL;DR

Abstract

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (5)