Table of Contents
Fetching ...

Size-adaptive Hypothesis Testing for Fairness

Antonio Ferrara, Francesco Cozzi, Alan Perotti, André Panisson, Francesco Bonchi

Abstract

Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $α$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.

Size-adaptive Hypothesis Testing for Fairness

Abstract

Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level . (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.

Paper Structure

This paper contains 35 sections, 2 theorems, 30 equations, 17 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let $\sigma(S) = \sqrt{V^\top \, \Sigma_4 \, V}$, where $V$ and $\Sigma_4$ are defined above. Then where $\overset{d}{\to}$ denotes convergence in distribution and $N(0,1)$ indicates the standard normal distribution.

Figures (17)

  • Figure 1: Resolution limits for Statistical Parity violations under varying global negative rates $\mathbb{P}(f(x)=0)$, when detecting disadvantaged groups (the dual figure showing the boundary for deciding if a group is being advantaged, is reported in \ref{['fig:res-limit_dual']} of the Appendix). Each curve traces the minimal fraction of negative outcomes needed to reject $H_0{:}\,\mathrm{SP}(S)=0$ at $\alpha=0.05$ as a function of the group size $n_s$. To the left of each vertical bar is the “no-power” zone, where subgroups are too small to detect discrimination, regardless of the observed disparity. The shaded region above each curve is the “discrimination zone”, where the subgroup’s negative rate is enough to establish a statistically significant parity violation.
  • Figure 2: Point-wise estimation versus confidence intervals, COMPAS dataset.
  • Figure 3: Point-wise estimation versus confidence intervals, Adult dataset.
  • Figure 4: Protected groups size, $\gamma$SP scores, and interval-based fairness violations.
  • Figure 5: Resolution limits for Statistical Parity violations under varying global negative rates $\mathbb{P}(f(x)=0)$, when detecting advantaged groups (the dual figure showing the boundary for deciding if a group is being disadvantaged is reported in \ref{['fig:res-limit']} in the main paper). Each curve traces the maximal fraction of negative outcomes needed to reject $H_0{:}\,\mathrm{SP}(S)=0$ at $\alpha=0.05$ as a function of the group size $n_s$. To the left of each vertical bar is the “no-power” zone, where subgroups are too small to detect discrimination, regardless of the observed disparity. The shaded region below each curve is the “discrimination zone”, where the subgroup’s negative rate is enough to establish a statistically significant parity violation.
  • ...and 12 more figures

Theorems & Definitions (8)

  • Definition 1: Statistical parity, or SP
  • Definition 2: $\delta$-Statistical parity, or $\delta \mathrm{SP}$
  • Definition 3: $\gamma$-Statistical parity, or $\gamma \mathrm{SP}$
  • Definition 4: Statistical Parity violation (this work)
  • Theorem 1: Central Limit Theorem for Statistical Parity
  • Theorem 2
  • proof : Proof of Theorem \ref{['thm:general_convergence']}
  • proof : Proof of Theorem \ref{['thm:convergence']}