Table of Contents
Fetching ...

Simulating counterfactuals

Juha Karvanen, Santtu Tikka, Matti Vihola

TL;DR

The paper tackles the challenge of sampling from counterfactual distributions under a recursive SCM when conditioning on continuous evidence, which typically induces a manifold and is analytically intractable. It introduces a tuning-free conditional-simulation algorithm that processes conditioning variables in topological order, solving for dedicated error terms with binary search for continuous conditions and applying sequential Monte Carlo calibration, while discrete conditions are handled by resampling; the entire workflow admits a particle-filter interpretation. The authors establish mean-square error convergence and a central limit theorem for the estimator, and they demonstrate the approach in a fairness-evaluation framework using synthetic credit-scoring data to assess counterfactual fairness of opaque AI models. Together, these contributions enable robust, scalable counterfactual inference and fairness analysis in mixed-variable SCMs with unobserved confounding, complemented by open-source software for practitioners.

Abstract

Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables. We show that the proposed algorithm can be presented as a particle filter leading to asymptotically valid inference. The algorithm is applied to fairness analysis in credit-scoring.

Simulating counterfactuals

TL;DR

The paper tackles the challenge of sampling from counterfactual distributions under a recursive SCM when conditioning on continuous evidence, which typically induces a manifold and is analytically intractable. It introduces a tuning-free conditional-simulation algorithm that processes conditioning variables in topological order, solving for dedicated error terms with binary search for continuous conditions and applying sequential Monte Carlo calibration, while discrete conditions are handled by resampling; the entire workflow admits a particle-filter interpretation. The authors establish mean-square error convergence and a central limit theorem for the estimator, and they demonstrate the approach in a fairness-evaluation framework using synthetic credit-scoring data to assess counterfactual fairness of opaque AI models. Together, these contributions enable robust, scalable counterfactual inference and fairness analysis in mixed-variable SCMs with unobserved confounding, complemented by open-source software for practitioners.

Abstract

Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables. We show that the proposed algorithm can be presented as a particle filter leading to asymptotically valid inference. The algorithm is applied to fairness analysis in credit-scoring.
Paper Structure (13 sections, 6 theorems, 31 equations, 1 figure, 2 tables, 6 algorithms)

This paper contains 13 sections, 6 theorems, 31 equations, 1 figure, 2 tables, 6 algorithms.

Key Result

Theorem 1

Let $\mathcal{M} = (\mathbf{V}, \mathbf{U}, \mathbf{F}, p(\mathbf{u}))$ be a recursive SCM and let $\eta = p(\mathbf{W}_{\mathrm{do}(\mathbf{X} = \mathbf{x})} = \mathbf{w} \,\vert\, \mathbf{C} = \mathbf{c})$ be a counterfactual distribution such that $\mathbf{W} \cup \mathbf{X} \cup \mathbf{C} \subs

Figures (1)

  • Figure 1: Causal model for the fairness of credit-scoring example. Ethnicity and gender (red nodes) are the sensitive variables, and the risk of default (blue node) is the outcome to be predicted. Gray nodes depict other observed variables and white nodes are unobserved confounders.

Theorems & Definitions (16)

  • Definition 1: SCM
  • Definition 2: Evaluation of a counterfactual distribution
  • Definition 3: Ancestral SCM
  • Theorem 1
  • proof
  • Definition 4: Counterfactual fairness
  • Definition 5: Dedicated error term
  • Definition 6: u-monotonic SCM
  • Lemma 1
  • proof
  • ...and 6 more