Table of Contents
Fetching ...

Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation

Yikang Chen, Dehui Du, Lili Tian

TL;DR

This work tackles tractable estimation of counterfactual expressions in general SCM settings by introducing Exogenous Matching (EXOM), an importance-sampling framework that minimizes a common variance upper bound via learning a conditional proposal Q_{U|Y_*}. The core theoretical contribution is a bound on the variance of the estimator, guiding a learning objective that reframes counterfactual estimation as conditional density modeling, with a stochastic counterfactual process enabling reuse across multiple queries. The method incorporates counterfactual Markov boundaries as structural priors and integrates with identifiable proxy SCMs, achieving unbiased or low-bias estimation in practical scenarios and outperforming standard IS baselines. Empirically, EXOM demonstrates superior sampling efficiency (ESP) and lower failure rates (FR) across diverse SCMs and density estimators, and remains applicable to real-world problems through proxy SCMs like CausalNF and NCM, albeit with acknowledged limitations such as reliance on partially specified models and faithfulness assumptions.

Abstract

We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions in general settings, named Exogenous Matching. By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem, enabling its integration with existing conditional distribution modeling approaches. We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks compared to other existing importance sampling methods. We also explore the impact of injecting structural prior knowledge (counterfactual Markov boundaries) on the results. Finally, we apply this method to identifiable proxy SCMs and demonstrate the unbiasedness of the estimates, empirically illustrating the applicability of the method to practical scenarios.

Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation

TL;DR

This work tackles tractable estimation of counterfactual expressions in general SCM settings by introducing Exogenous Matching (EXOM), an importance-sampling framework that minimizes a common variance upper bound via learning a conditional proposal Q_{U|Y_*}. The core theoretical contribution is a bound on the variance of the estimator, guiding a learning objective that reframes counterfactual estimation as conditional density modeling, with a stochastic counterfactual process enabling reuse across multiple queries. The method incorporates counterfactual Markov boundaries as structural priors and integrates with identifiable proxy SCMs, achieving unbiased or low-bias estimation in practical scenarios and outperforming standard IS baselines. Empirically, EXOM demonstrates superior sampling efficiency (ESP) and lower failure rates (FR) across diverse SCMs and density estimators, and remains applicable to real-world problems through proxy SCMs like CausalNF and NCM, albeit with acknowledged limitations such as reliance on partially specified models and faithfulness assumptions.

Abstract

We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions in general settings, named Exogenous Matching. By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem, enabling its integration with existing conditional distribution modeling approaches. We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks compared to other existing importance sampling methods. We also explore the impact of injecting structural prior knowledge (counterfactual Markov boundaries) on the results. Finally, we apply this method to identifiable proxy SCMs and demonstrate the unbiasedness of the estimates, empirically illustrating the applicability of the method to practical scenarios.

Paper Structure

This paper contains 83 sections, 14 theorems, 67 equations, 15 figures, 6 tables, 3 algorithms.

Key Result

theorem 1

Let $\sigma_{\mathcal{Y}_*}(\mathbf{u})=\left(p(\mathbf{u})/q(\mathbf{u}\!\mid\!\mathbf{y}_*)\right)\mathbbm{1}_{\Omega_\mathbf{U}(\mathcal{Y}_*)}(\mathbf{u})$, where $q(\mathbf{u}\!\mid\!\mathbf{y}_*)$ denote the density of the proposal distribution $Q_{\mathbf{U}\mid\mathbf{y}_*}$, and let $\mathb where the constant $c$ is solely dependent on $\kappa$ and $P_\mathbf{U}$.

Figures (15)

  • Figure 1: A brief illustration of SCM, PCH and counterfactual concepts.
  • Figure 2: Overview of the conditioning and masking process. $\mathbf{y}_*$ serves as the input to the entire process, $\mathbf{m}$ represents the inferred mask, and the vectorized parameters $\theta_{\mathbf{y}_*}$ of the proposal distribution $Q_{\mathbf{U}\mid\mathbf{y}_*}$ are the output. Different colors represent information from different submodels. Both $h$ and $g$ represent neural networks.
  • Figure 3: LL (negative \ref{['eqn:opt_goal']}) and ESP, FR on SIMPSON-NLIN. As LL increases, ESP increases while FR decreases, until convergence.
  • Figure 4: Ablation study for Markov boundaries on 4 different settings of SCMs: (a) SIMPSON-NLIN, (b) LARGEBD-NLIN, (c) M, (d) NAPKIN. A higher ESP signifies greater sampling efficiency. In most cases, EXOM with Markov boundaries masked (orange bar) exhibits superior performance compared to when the Markov boundaries are not masked (blue bar).
  • Figure 5: Exogenous Matching Learning
  • ...and 10 more figures

Theorems & Definitions (28)

  • theorem 1: Variance Upper Bound
  • definition 1: Stochastic Counterfactual Process
  • corollary 1: Expected Variance Upper Bound
  • definition 2: Counterfactual Markov Boundary
  • theorem 2: Counterfactual Markov Boundary Independence
  • theorem 3: Counterfactual Markov Boundary on Graph
  • proposition 1
  • proof
  • lemma 1
  • proof
  • ...and 18 more