Table of Contents
Fetching ...

Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness

Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi

TL;DR

The paper advances causality-aware distributionally robust optimization by integrating a causally fair dissimilarity function (CFDF) with Wasserstein DRO to enforce individual fairness under structural causal models. It derives a strong duality result that recasts the infinite min–max problem into a finite optimization with an explicit regularizer, and provides exact and first-order regularizers for linear and nonlinear SCMs. The framework connects DRO with classical robust optimization and, crucially, offers finite-sample guarantees when the SCM is unknown, enabling practical learning with empirical and estimated causal structures. Empirical results on Adult, COMPAS, and synthetic data demonstrate that causally fair DRO (CDRO) can reduce unfair areas and improve counterfactual fairness, albeit with some accuracy-cost trade-offs, underscoring its potential for more robust and equitable decision-making under distributional uncertainty.

Abstract

In recent years, Wasserstein Distributionally Robust Optimization (DRO) has garnered substantial interest for its efficacy in data-driven decision-making under distributional uncertainty. However, limited research has explored the application of DRO to address individual fairness concerns, particularly when considering causal structures and sensitive attributes in learning problems. To address this gap, we first formulate the DRO problem from causality and individual fairness perspectives. We then present the DRO dual formulation as an efficient tool to convert the DRO problem into a more tractable and computationally efficient form. Next, we characterize the closed form of the approximate worst-case loss quantity as a regularizer, eliminating the max-step in the min-max DRO problem. We further estimate the regularizer in more general cases and explore the relationship between DRO and classical robust optimization. Finally, by removing the assumption of a known structural causal model, we provide finite sample error bounds when designing DRO with empirical distributions and estimated causal structures to ensure efficiency and robust learning.

Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness

TL;DR

The paper advances causality-aware distributionally robust optimization by integrating a causally fair dissimilarity function (CFDF) with Wasserstein DRO to enforce individual fairness under structural causal models. It derives a strong duality result that recasts the infinite min–max problem into a finite optimization with an explicit regularizer, and provides exact and first-order regularizers for linear and nonlinear SCMs. The framework connects DRO with classical robust optimization and, crucially, offers finite-sample guarantees when the SCM is unknown, enabling practical learning with empirical and estimated causal structures. Empirical results on Adult, COMPAS, and synthetic data demonstrate that causally fair DRO (CDRO) can reduce unfair areas and improve counterfactual fairness, albeit with some accuracy-cost trade-offs, underscoring its potential for more robust and equitable decision-making under distributional uncertainty.

Abstract

In recent years, Wasserstein Distributionally Robust Optimization (DRO) has garnered substantial interest for its efficacy in data-driven decision-making under distributional uncertainty. However, limited research has explored the application of DRO to address individual fairness concerns, particularly when considering causal structures and sensitive attributes in learning problems. To address this gap, we first formulate the DRO problem from causality and individual fairness perspectives. We then present the DRO dual formulation as an efficient tool to convert the DRO problem into a more tractable and computationally efficient form. Next, we characterize the closed form of the approximate worst-case loss quantity as a regularizer, eliminating the max-step in the min-max DRO problem. We further estimate the regularizer in more general cases and explore the relationship between DRO and classical robust optimization. Finally, by removing the assumption of a known structural causal model, we provide finite sample error bounds when designing DRO with empirical distributions and estimated causal structures to ensure efficiency and robust learning.

Paper Structure

This paper contains 45 sections, 18 theorems, 168 equations, 3 figures, 1 table.

Key Result

Proposition 1

Let $\mathcal{M}$ be an ANM, with $g$ as its corresponding map to the semi-latent space eq:semi_map , and $P_{\mathcal{X}}(u)$ the projection of vector $u$ to the non-sensitive part $\mathcal{U}_\mathcal{X}$. Then: (i) If $d_\mathcal{X}$ is a continuous dissimilarity function on diagonal $\mathcal{U satisfies the definitions of a CFDF. (ii) If $d: \mathcal{V} \times \mathcal{V} \rightarrow [0,\inf

Figures (3)

  • Figure 1: Displays the findings from our numerical experiment, assessing the performance of DRO across different models and datasets. (left) Bar plot showing the comparison of models based on the unfair area percentage (lower values are better) for $\Delta = .05$. (right) Bar plot comparing methods by prediction accuracy performance (higher values are better).
  • Figure 2: Displays the findings from our numerical experiment, assessing the performance of DRO across different models and datasets. (left) Counterfactual unfair area percentage (lower values are better). (right) Non-robust area performance of classifier (higher values are better) for $\Delta = .05$.
  • Figure 3: Displays the findings from our numerical experiment, assessing the performance of DRO across different models and datasets. (Left) Bar plot showing the comparison of models based on the unfair area percentage $U(\delta)$(lower values are better) at $\Delta = .01$. (Right) Bar plot showing the comparison of models based on the robustness area percentage $R(\delta)$ (lower values are better) at $\Delta = .01$.

Theorems & Definitions (30)

  • Example 1
  • Definition 1: Causally Fair Dissimilarity Function
  • Definition 2: Parent-Free Sensitive Attribute SCM
  • Proposition 1
  • Remark 1
  • Theorem 1: Causally Fair Strong Duality
  • Remark 2
  • Theorem 2: Higher Order Linear Loss
  • Remark 3
  • Example 2
  • ...and 20 more