Table of Contents
Fetching ...

Retrospective Counterfactual Prediction by Conditioning on the Factual Outcome: A Cross-World Approach

Juraj Bodik

Abstract

Retrospective causal questions ask what would have happened to an observed individual had they received a different treatment. We study the problem of estimating $μ(x,y)=\mathbb{E}[Y(1)\mid X=x,Y(0)=y]$, the expected counterfactual outcome for an individual with covariates $x$ and observed outcome $y$, and constructing valid prediction intervals under the Neyman-Rubin superpopulation model. This quantity is generally not identified without additional assumptions. To link the observed and unobserved potential outcomes, we work with a cross-world correlation $ρ(x)=cor(Y(1),Y(0)\mid X=x)$; plausible bounds on $ρ(x)$ enable a principled approach to this otherwise unidentified problem. We introduce retrospective counterfactual estimators $\hatμ_ρ(x,y)$ and prediction intervals $C_ρ(x,y)$ that asymptotically satisfy $P[Y(1)\in C_ρ(x,y)\mid X=x, Y(0)=y]\ge1-α$ under standard causal assumptions. Many common baselines implicitly correspond to endpoint choices $ρ=0$ or $ρ=1$ (ignoring the factual outcome or treating the counterfactual as a shifted factual outcome). Interpolating between these cases through cross-world dependence yields substantial gains in both theory and practice.

Retrospective Counterfactual Prediction by Conditioning on the Factual Outcome: A Cross-World Approach

Abstract

Retrospective causal questions ask what would have happened to an observed individual had they received a different treatment. We study the problem of estimating , the expected counterfactual outcome for an individual with covariates and observed outcome , and constructing valid prediction intervals under the Neyman-Rubin superpopulation model. This quantity is generally not identified without additional assumptions. To link the observed and unobserved potential outcomes, we work with a cross-world correlation ; plausible bounds on enable a principled approach to this otherwise unidentified problem. We introduce retrospective counterfactual estimators and prediction intervals that asymptotically satisfy under standard causal assumptions. Many common baselines implicitly correspond to endpoint choices or (ignoring the factual outcome or treating the counterfactual as a shifted factual outcome). Interpolating between these cases through cross-world dependence yields substantial gains in both theory and practice.

Paper Structure

This paper contains 31 sections, 6 theorems, 46 equations, 7 figures.

Key Result

lemma 1

Consider two non-degenerate distributions $P$ and $P'$ satisfying strong ignorability and overlap, that share the same marginal laws of $Y(0)\mid X$ and $Y(1)\mid X$ but differ in their cross-world dependence structure; that is, $\rho(x)\neq \rho'(x)$. Then $P$ and $P'$ induce the same observable di for some $(x,y)$. Therefore $\mu(x,y)$ is not identified from the observed data and its value depen

Figures (7)

  • Figure 1: Proposed $\mathrm{RCP}(\rho)$ estimator $\hat{Y}(1):=\hat{\mu}_\rho(x,y)$ and prediction interval $C_\rho(x,y)$, combining baseline predictions with cross-world dependence, shown for two choices of $\rho$. Across panels, the underlying data and baseline models are identical; increasing $\rho$ only shifts the counterfactual prediction in the direction implied by the factual residual and shrinks the interval. We show $\rho=0$ (no outcome conditioning) and $\rho=0.95$ (strong dependence) on five highlighted units.
  • Figure 2: Mean squared error of different estimators across different datasets, averaged over 50 repetitions. In $\mu_{\rho}$, we use either $\rho = \rho_{true}$, or mimic misspecification by using $\rho = \rho_{true}+ Unif(-0.5, 0.5)$. Standard deviations for each entry can be found in Figure \ref{['fig_MSE_with_sd']} located in Appendix \ref{['appendix_simulations']}.
  • Figure 3: $\text{Gap} = \text{MSE}_{\text{RCP}} - \text{MSE}_{\text{oracle}}$ calculated across different misspecifications of $\rho$. Bias persists when $\rho$ is far from the truth, but vanishes asymptotically if $\rho$ is specified correctly. This demonstrates that incorporating even approximate knowledge of cross-world dependence improves counterfactual predictions.
  • Figure 4: $\text{Gap} = \text{MSE}_{\text{our}} - \text{MSE}_{\text{oracle}}$ calculated across different marginal–copula distributions of potential outcomes $(Y(0), Y(1))$. Here, we only considered correctly specified $\rho$ in the estimation.
  • Figure 5: Interval Scores of different prediction interval methods across all datasets. Here, $C_\rho^{+CI}$, the bias-corrected version of $C_\rho$ introduced in Section \ref{['subsection_C_with_CI']}, is used. GANITE is excluded since it does not provide a natural way of constructing prediction intervals.
  • ...and 2 more figures

Theorems & Definitions (13)

  • definition 1: bodik2025crossworld
  • lemma 1: Non-identifiability of $\mu(x,y)$
  • definition 2
  • theorem 1
  • definition 3
  • lemma 2
  • proof
  • Theorem \ref{theorem_consistency}
  • proof
  • Lemma \ref{lemma_nonidentifiability}
  • ...and 3 more