Table of Contents
Fetching ...

Approximating Discrimination Within Models When Faced With Several Non-Binary Sensitive Attributes

Yijun Bian, Yujie Luo, Ping Xu

TL;DR

This work tackles the challenge of assessing discrimination when multiple sensitive attributes with multiple values interact, introducing a manifold-based Harmonic Fairness measure (HFM) with maximum and average variants. To enable scalable bias evaluation, it proposes ApproxDist and ExtendDist, efficient approximations that reduce distance computations from quadratic to near-linear complexity via random projections and acceleration subroutines, with theoretical guarantees for ApproxDist. Empirically, HFM and its approximations demonstrate strong alignment with discrimination signals across diverse datasets and offer competitive fairness-accuracy trade-offs compared to standard metrics, including improved sensitivity in multi-attribute scenarios. The approach is versatile, extendable to neural representations, and highlights practical pathways for robust, multi-attribute fairness in real-world ML systems.

Abstract

Discrimination mitigation within machine learning (ML) models could be complicated because multiple factors may be interwoven hierarchically and historically. Yet few existing fairness measures can capture the discrimination level within ML models in the face of multiple sensitive attributes (SAs). To bridge this gap, we propose a fairness measure based on distances between sets from a manifold perspective, named as 'Harmonic Fairness measure via Manifolds (HFM)' with two optional versions, which can deal with a fine-grained discrimination evaluation for several SAs of multiple values. Because directly computing HFM may be costly, to accelerate its subprocedure -- the computation of distances of sets, we further propose two approximation algorithms named 'Approximation of distance between sets for one sensitive attribute with multiple values (ApproxDist)' and 'Approximation of extended distance between sets for several sensitive attributes with multiple values (ExtendDist)' to respectively resolve bias evaluation of one single SA with multiple values and that of several SAs with multiple values. Moreover, we provide an algorithmic effectiveness analysis for ApproxDist under certain assumptions to explain how well it could work. The empirical results demonstrate that our proposed fairness measure HFM is valid and approximation algorithms (i.e. ApproxDist and ExtendDist) are effective and efficient.

Approximating Discrimination Within Models When Faced With Several Non-Binary Sensitive Attributes

TL;DR

This work tackles the challenge of assessing discrimination when multiple sensitive attributes with multiple values interact, introducing a manifold-based Harmonic Fairness measure (HFM) with maximum and average variants. To enable scalable bias evaluation, it proposes ApproxDist and ExtendDist, efficient approximations that reduce distance computations from quadratic to near-linear complexity via random projections and acceleration subroutines, with theoretical guarantees for ApproxDist. Empirically, HFM and its approximations demonstrate strong alignment with discrimination signals across diverse datasets and offer competitive fairness-accuracy trade-offs compared to standard metrics, including improved sensitivity in multi-attribute scenarios. The approach is versatile, extendable to neural representations, and highlights practical pathways for robust, multi-attribute fairness in real-world ML systems.

Abstract

Discrimination mitigation within machine learning (ML) models could be complicated because multiple factors may be interwoven hierarchically and historically. Yet few existing fairness measures can capture the discrimination level within ML models in the face of multiple sensitive attributes (SAs). To bridge this gap, we propose a fairness measure based on distances between sets from a manifold perspective, named as 'Harmonic Fairness measure via Manifolds (HFM)' with two optional versions, which can deal with a fine-grained discrimination evaluation for several SAs of multiple values. Because directly computing HFM may be costly, to accelerate its subprocedure -- the computation of distances of sets, we further propose two approximation algorithms named 'Approximation of distance between sets for one sensitive attribute with multiple values (ApproxDist)' and 'Approximation of extended distance between sets for several sensitive attributes with multiple values (ExtendDist)' to respectively resolve bias evaluation of one single SA with multiple values and that of several SAs with multiple values. Moreover, we provide an algorithmic effectiveness analysis for ApproxDist under certain assumptions to explain how well it could work. The empirical results demonstrate that our proposed fairness measure HFM is valid and approximation algorithms (i.e. ApproxDist and ExtendDist) are effective and efficient.
Paper Structure (27 sections, 2 theorems, 21 equations, 10 figures, 4 tables, 3 algorithms)

This paper contains 27 sections, 2 theorems, 21 equations, 10 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

Let $\bm{v}_1$ (resp. $\bm{v}_2$) be a vector in the $n$-dimensional Euclidean space $\mathbb{R}^n$ with length $r_1$ (resp. $r_2$) such that $r_1\leqslant r_2$. Let $\bm{w}\subset \mathbb{R}^n$ be a unit vector. We define $\mathbb{P}(\bm{v}_1,\bm{v}_2)$ as the probability that $|\langle \bm{w},\bm{ where $\phi$ represents the angle between $\bm{v}_1$ and $\bm{v}_2$ .

Figures (10)

  • Figure 1: Correlation heatmap between normal evaluation metric and fairness, for one single SA. The used notations refer to those in Table \ref{['tab:sen_att,sing']}.
  • Figure 2: Correlation heatmap between normal evaluation metric and fairness measure, for all SAs within the dataset. The notations used here refer to those in Table \ref{['tab:sen_att,pl']}.
  • Figure 3: Comparison of HFM with the extended SP. (a--b) Correlation heatmap between normal evaluation metric and fairness measure, for all SAs within the dataset. Note that in the computation of $\mathbf{df}_\text{prev}$ here, multi-valued SAs are handled as bi-valued cases, equivalent to $\mathbf{df}_\text{prev\;bin-val}$ in Figure \ref{['fig:sen_att,sing']}; $\hat{\mathbf{df}}_\text{prev}, \hat{\mathbf{df}}$, and $\hat{\mathbf{df}}^\text{avg}$ indicate that the distances are obtained using approximation algorithms. (c--d) Plots of best test-set fairness-accuracy trade-offs per fairness metric cruz2022fairgbm (the smaller the better).
  • Figure 4: Plots of best test-set fairness-performance trade-offs per fairness metric cruz2022fairgbm (the smaller the better). (a) Plot of fairness-accuracy trade-off for one single SA; (b) Plot of fairness-accuracy trade-off for all SAs; (c--d) Plots of fairness-$\mathrm{f}_1$ score trade-off for one SA and for all SAs, respectively. Note that the notations in (a) and (c) refer to those in Table \ref{['tab:sen_att,sing']}, and that in (b) and (d) refer to those in Table \ref{['tab:sen_att,pl']}.
  • Figure 5: Comparison of approximation distances between sets with precise distances that are calculated directly by definition, evaluated on test data; Note that 'prev' denotes the approximation results obtained by our previous work bian2024does. (a--b), (c--d), (e--f), and (g--h) Scatter plots for comparison between approximated and precise values of $\mathbf{D}_{\cdot,\bm{a}}^(S)$, $\mathbf{D}_{\cdot,\bm{a}}^(S,a_i)$, $\mathbf{D}_{\cdot,\bm{a}}^\text{avg}(S)$, and $\mathbf{D}_{\cdot,\bm{a}}^\text{avg}(S,a_i)$, respectively; (i--j) Time cost comparison between ExtendDist and direct computation via Eq. \ref{['eq:6a']} and \ref{['eq:6b']}; (k--l) Time cost comparison between ApproxDist and direct computation via Eq. \ref{['eq:4a']} and \ref{['eq:4b']}.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Lemma 1: Lemma 1 bian2024does
  • Proposition 2
  • proof