Federated Measurement of Demographic Disparities from Quantile Sketches

Arthur Charpentier; Agathe Fernandes Machado; Olivier Côté; François Hu

Federated Measurement of Demographic Disparities from Quantile Sketches

Arthur Charpentier, Agathe Fernandes Machado, Olivier Côté, François Hu

TL;DR

This work studies federated auditing of demographic parity through score distributions, measuring disparity as a Wasserstein--Frechet variance between sensitive-group score laws, and expressing the population metric in federated form that makes explicit how silo-specific selection drives local-global mismatch.

Abstract

Many fairness goals are defined at a population level that misaligns with siloed data collection, which remains unsharable due to privacy regulations. Horizontal federated learning (FL) enables collaborative modeling across clients with aligned features without sharing raw data. We study federated auditing of demographic parity through score distributions, measuring disparity as a Wasserstein--Frechet variance between sensitive-group score laws, and expressing the population metric in federated form that makes explicit how silo-specific selection drives local-global mismatch. For the squared Wasserstein distance, we prove an ANOVA-style decomposition that separates (i) selection-induced mixture effects from (ii) cross-silo heterogeneity, yielding tight bounds linking local and global metrics. We then propose a one-shot, communication-efficient protocol in which each silo shares only group counts and a quantile summary of its local score distributions, enabling the server to estimate global disparity and its decomposition, with $O(1/k)$ discretization bias ($k$ quantiles) and finite-sample guarantees. Experiments on synthetic data and COMPAS show that a few dozen quantiles suffice to recover global disparity and diagnose its sources.

Federated Measurement of Demographic Disparities from Quantile Sketches

TL;DR

Abstract

discretization bias (

quantiles) and finite-sample guarantees. Experiments on synthetic data and COMPAS show that a few dozen quantiles suffice to recover global disparity and diagnose its sources.

Paper Structure (82 sections, 26 theorems, 122 equations, 22 figures, 4 tables, 1 algorithm)

This paper contains 82 sections, 26 theorems, 122 equations, 22 figures, 4 tables, 1 algorithm.

Introduction
Motivation
Contributions
Setting and centralized fairness
Data model and notation
Demographic parity and homogeneity as distributional equalities
Quantile-grid approximation (centralized)
Quantile sketches and interpolation
Quantile sketches.
Discrete estimators and rates
Federated measurement from $k$ quantiles
Federated functionals
One-shot protocol
One-shot protocol (overview).
Estimator and convergence
...and 67 more sections

Key Result

Proposition 3.1

Assume that for each $s\in\mathcal{S}$, $\nu_s$ has compact support and admits a continuous density bounded away from $0$ and $\infty$ on its support. Then, for $p\in\{1,2\}$, there exist constants $C_U,C_H<\infty$ (depending only on these bounds and on $p$) such that Moreover, for fixed $k$, the plug-in estimators $\widehat{U}_{k,p}$ and $\widehat{H}_{k,p}$ are consistent as $n\to\infty$, and th

Figures (22)

Figure 1: Synthetic Beta.Left: score distributions by group, $U_2=0.0076$. Right: $U_2(k)$ with $k\in\{1,2\}$ as a function of $\rho$.
Figure 2: Sensitivity to $k$ (synthetic). Convergence of $U_{2}(k)$, independence allocation (left) and selection bias (right).
Figure 3: COMPAS: score distributions and Wasserstein barycenter.Left: (beta-)kernel density estimates of the jittered score $Z$ by group. Middle: empirical CDFs $F_{\mathrm{AA}}$ and $F_{\mathrm{C}}$ together with the Wasserstein barycenter distribution (dashed), obtained by inverting the barycenter quantile $Q^\star$. Right: group quantile functions $Q_{\mathrm{AA}}$ and $Q_{\mathrm{C}}$ and their Wasserstein barycenter $Q^\star = \alpha_{\mathrm{AA}}Q_{\mathrm{AA}} + \alpha_{\mathrm{C}}Q_{\mathrm{C}}$ (dashed).
Figure 4: COMPAS: original score distributions and Wasserstein barycenter.Left: histograms of score $Z$ by group. Middle: empirical CDFs $F_{\mathrm{AA}}$ and $F_{\mathrm{C}}$ together with the Wasserstein barycenter distribution (dashed), obtained by inverting the barycenter quantile $Q^\star$. Right: group quantile functions $Q_{\mathrm{AA}}$ and $Q_{\mathrm{C}}$ and their Wasserstein barycenter $Q^\star = \alpha_{\mathrm{AA}}Q_{\mathrm{AA}} + \alpha_{\mathrm{C}}Q_{\mathrm{C}}$ (dashed).
Figure 5: COMPAS: convergence in $k$. Mean absolute error $\mathrm{MAE}(k)=\mathbb{E}\,|\widehat{U}_2(k)-U_2|$ as a function of the number of quantiles $k$ (sent per group and per silo) on a log scale, for several numbers of silos $d$ and for different allocation regimes (random vs. selection bias). Across regimes, $\widehat{U}_2(k)$ stabilizes quickly, with diminishing returns beyond a few dozen quantiles.
...and 17 more figures

Theorems & Definitions (56)

Definition 2.1: Central unfairness functional
Definition 2.2: Heterogeneity functional
Remark : Two-group simplifications
Remark : Interior grids and trimmed functionals
Proposition 3.1: Consistency and discretization rate
Proposition 3.2: Bin-averaged discretization underestimates $U_2$
Definition 4.1: Federated demographic-parity functional
Proposition 4.2: Consistency with centralized targets
Proposition 4.3: Consistency and discretization rate
Proposition 4.4: High-probability control of communicated quantiles
...and 46 more

Federated Measurement of Demographic Disparities from Quantile Sketches

TL;DR

Abstract

Federated Measurement of Demographic Disparities from Quantile Sketches

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (56)