Fair Risk Minimization under Causal Path-Specific Effect Constraints

Razieh Nabi; David Benkeser

Fair Risk Minimization under Causal Path-Specific Effect Constraints

Razieh Nabi, David Benkeser

TL;DR

This paper addresses fair prediction under path-specific causal constraints by formulating an infinite-dimensional constrained risk minimization problem solved via a Lagrange-multiplier framework. It derives closed-form updates for the fair predictor under both mean squared error and cross-entropy losses, linking the constrained and unconstrained minimizers through the risk and constraint gradients. The authors develop a flexible semiparametric estimation strategy for nuisance components, prove risk- and constraint-satisfaction results under plausible regularity conditions, and corroborate the theory with simulations that emphasize robustness of AIPW-type estimators and the effects of gradient variance. The work advances algorithmic fairness by integrating causal mechanisms directly into model training, offering practical guidelines for implementing fair predictions in real-world settings while highlighting limitations and avenues for sensitivity analyses.

Abstract

This paper introduces a framework for estimating fair optimal predictions using machine learning where the notion of fairness can be quantified using path-specific causal effects. We use a recently developed approach based on Lagrange multipliers for infinite-dimensional functional estimation to derive closed-form solutions for constrained optimization based on mean squared error and cross-entropy risk criteria. The theoretical forms of the solutions are analyzed in detail and described as nuanced adjustments to the unconstrained minimizer. This analysis highlights important trade-offs between risk minimization and achieving fairnes. The theoretical solutions are also used as the basis for construction of flexible semiparametric estimation strategies for these nuisance components. We describe the robustness properties of our estimators in terms of achieving the optimal constrained risk, as well as in terms of controlling the value of the constraint. We study via simulation the impact of using robust estimators of pathway-specific effects to validate our theory. This work advances the discourse on algorithmic fairness by integrating complex causal considerations into model training, thus providing strategies for implementing fair models in real-world applications.

Fair Risk Minimization under Causal Path-Specific Effect Constraints

TL;DR

Abstract

Paper Structure (34 sections, 10 theorems, 92 equations, 17 figures, 1 table)

This paper contains 34 sections, 10 theorems, 92 equations, 17 figures, 1 table.

Introduction
Statistical learning under causal fairness constraints
Constrained functional parameter
A class of causal and counterfactual constraints
Closed-form solutions for fair optimal predictions
Intuition underlying theoretical results
Estimation of fair optimal predictions
Estimation of nuisance parameters
Conditions for optimal risk and constraint satisfaction
Simulations
Impact of inconsistent nuisance estimation
High-dimensional covariates and penalized regression
Discussion
Glossary of terms and notations
More details on constraint-specific path
...and 19 more sections

Key Result

Lemma 2

Given an NPSEM-IE associated with DAG $\mathcal{G}$, the $\rho$-specific effect $\Delta^\rho$, as detailed in Definition def:unfair_eff, is identifiable if and only if there are no recanting witnesses. Assuming $\rho$ includes the direct effect $(s_y=1)$, the identification functional, denoted by $\

Figures (17)

Figure 1: Depicted are example DAGs demonstrating scenarios where (a) the total effect of sensitive attribute $S$ on outcome $Y$ is considered unacceptable, (b) the direct effect of $S$ on $Y$ is viewed as impermissible, and (c) any set of causal paths from $S$ to $Y$ could be identified as unacceptable.
Figure 2: Predictions for the $S = 0$ group (left, blue) and $S = 1$ group (right, red). The optimal prediction function $\psi_0$ (solid line) confers a disadvantage to the $S = 0$ group as evidenced by lower predictions throughout the range of $X$. Using $\psi_0(0,X)$ to predict for both groups (dotted line) solves the constraint, but is suboptimal for prediction owning to large errors made in predictions for the $S = 1$ group. These errors are minimized through prediction using the optimal constrained prediction function $\psi_0^*$ (dashed line). The shaded regions, depicted in lighter tones at the bottom of each plot, represent the distribution of covariate profile $X$ within the sub-population stratified by $S$ values.
Figure 3: Illustrating the effects of covariate and mediator profiles on fairness adjustments in predictive modeling. (Top) This panel displays the probability and density ratios for mediator $M$ across different covariate profiles $X$, stratified by class membership $S$. Higher values of $X$ increase the likelihood of $M=1$ for both $S=0$ and $S=1$ groups. (Bottom) This plot contrasts unconstrained (solid lines) with fairness-adjusted (dashed lines) predictions, distinguished by class membership $S$ (color-coded) and mediator values (spatially arranged with predictions for $M=0$ at the top and $M=1$ at the bottom). The shaded regions, depicted in lighter tones at the bottom of each plot, represent the distribution of covariate profile $X$ within the sub-population stratified by values of $S$ and $M$. The visualization reveals that for the $S=1$ group, individuals with $M=0$ and higher $X$ values receive larger adjustments to promote fairness. In contrast, adjustments for the $S=0$ group remain consistent across both $X$ and $M$ values.
Figure 4: Estimates of optimal predictions under $\rho_1$-pathway constraint for mean squared error risk.Top row: Various estimators of the constraint are shown; from left to right: $\Theta^\text{plug-in}_{\rho_1, n}$, $\Theta^\text{ipw}_{\rho_1, n}$, $\Theta^\text{ipw-alt}_{\rho_1, n}$, $\Theta^\text{aipw}_{\rho_1, n}$. For each estimator, we show the distribution of risk of $\psi_{n,\lambda_n}$ over 1000 realizations for each sample size for the equality constraint $\Theta_{P_0}(\psi) = 0$. The dashed line indicates the optimal risk $R_{P_0}(\psi_0^*)$. Bottom row: Distribution of the true constraint over 1000 realizations for each sample size. The dashed line indicates the equality constraint value of zero. The dotted line indicates the true value of the constraint under $\psi_0$.
Figure 5: Comparison of pathway specific effect estimates used in construction of $\psi_n^*$.Top row: The various estimators are shown in each figure for sample sizes (from left-to-right) of $n=200,400,800, 1600$. For each estimator, we show the distribution of mean squared error of $\psi_{n,\lambda_n}$ over 1000 realizations for each sample size for the equality constraint $\Theta_{P_0}(\psi) = 0$. The dashed line indicates the optimal risk $R_{P_0}(\psi_0^*)$. Bottom row: Distribution of the true constraint over 1000 realizations for each sample size. The dashed line indicates the equality constraint value of zero. The dotted line indicates the true value of the constraint under $\psi_0$.
...and 12 more figures

Theorems & Definitions (12)

Definition 1: Unfair PSE effect
Lemma 2: Identification of causal constraints
Lemma 3: Canonical gradients
Theorem 4: Mean Squared Error risk
Theorem 5: Cross-entropy risk
Lemma 6
Remark 7
Lemma 8
Lemma 9
Theorem 10
...and 2 more

Fair Risk Minimization under Causal Path-Specific Effect Constraints

TL;DR

Abstract

Fair Risk Minimization under Causal Path-Specific Effect Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (12)