Table of Contents
Fetching ...

Causal Partial Identification via Conditional Optimal Transport

Sirui Lin, Zijun Gao, Jose Blanchet, Peter Glynn

TL;DR

The paper tackles partial identification of causal estimands that depend on the joint distribution of potential outcomes by introducing covariate-assisted sets via Conditional Optimal Transport (COT). It proves continuity of the COT functional under the adapted Wasserstein distance and proposes a direct, nonparametric adapted COT value estimator that avoids nuisance-function estimation, with provable consistency and finite-sample rates. Through simulations and a STAR data-inspired study, the method demonstrates robust, competitive performance against existing bounds-based approaches, particularly when covariates play a strong role. The work also discusses extensions to covariate-shift scenarios, triangular transport maps, and multi-treatment settings, highlighting practical implications for covariate-informed causal inference and distributional bounding.

Abstract

We study the estimation of causal estimand involving the joint distribution of treatment and control outcomes for a single unit. In typical causal inference settings, it is impossible to observe both outcomes simultaneously, which places our estimation within the domain of partial identification (PI). Pre-treatment covariates can substantially reduce estimation uncertainty by shrinking the partially identified set. Recent work has shown that covariate-assisted PI sets can be characterized through conditional optimal transport (COT) problems. However, finite-sample estimation of COT poses significant challenges, primarily because the COT functional is discontinuous under the weak topology, rendering the direct plug-in estimator inconsistent. To address this issue, existing literature relies on relaxations or indirect methods involving the estimation of non-parametric nuisance statistics. In this work, we demonstrate the continuity of the COT functional under a stronger topology induced by the adapted Wasserstein distance. Leveraging this result, we propose a direct, consistent, non-parametric estimator for COT value that avoids nuisance parameter estimation. We derive the convergence rate for our estimator and validate its effectiveness through comprehensive simulations, demonstrating its improved performance compared to existing approaches.

Causal Partial Identification via Conditional Optimal Transport

TL;DR

The paper tackles partial identification of causal estimands that depend on the joint distribution of potential outcomes by introducing covariate-assisted sets via Conditional Optimal Transport (COT). It proves continuity of the COT functional under the adapted Wasserstein distance and proposes a direct, nonparametric adapted COT value estimator that avoids nuisance-function estimation, with provable consistency and finite-sample rates. Through simulations and a STAR data-inspired study, the method demonstrates robust, competitive performance against existing bounds-based approaches, particularly when covariates play a strong role. The work also discusses extensions to covariate-shift scenarios, triangular transport maps, and multi-treatment settings, highlighting practical implications for covariate-informed causal inference and distributional bounding.

Abstract

We study the estimation of causal estimand involving the joint distribution of treatment and control outcomes for a single unit. In typical causal inference settings, it is impossible to observe both outcomes simultaneously, which places our estimation within the domain of partial identification (PI). Pre-treatment covariates can substantially reduce estimation uncertainty by shrinking the partially identified set. Recent work has shown that covariate-assisted PI sets can be characterized through conditional optimal transport (COT) problems. However, finite-sample estimation of COT poses significant challenges, primarily because the COT functional is discontinuous under the weak topology, rendering the direct plug-in estimator inconsistent. To address this issue, existing literature relies on relaxations or indirect methods involving the estimation of non-parametric nuisance statistics. In this work, we demonstrate the continuity of the COT functional under a stronger topology induced by the adapted Wasserstein distance. Leveraging this result, we propose a direct, consistent, non-parametric estimator for COT value that avoids nuisance parameter estimation. We derive the convergence rate for our estimator and validate its effectiveness through comprehensive simulations, demonstrating its improved performance compared to existing approaches.

Paper Structure

This paper contains 54 sections, 31 theorems, 152 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Under Assp. assu:SUTVA - assu:overlap, for $w=0,1$, where $\tilde{e}_w(z) = \frac{(1-w) - \bar{e}}{(1-w) - e(z)}$, $\bar{e} \triangleq \int e(z) {\rm d} \mu_z(z)$.

Figures (4)

  • Figure 1: Selection of the elbow point.
  • Figure 2: Estimation accuracy comparison between the adapted COT estimator and DualBounds. In the legend, "adapt" refers to the adapted COT estimator, "ridge" refers to the ridge regression–based DualBounds, and "knn" refers to the KNN–based DualBounds. The average relative error is computed over 500 Monte Carlo repetitions. For the uncertainty, we compute the standard error of the mean (SEM), i.e. $\textup{SD} / \sqrt{500}$, where $\textup{SD}$ is the standard deviation. The numerical values of SEM satisfy: (a): $< 0.005$; (b)(c): $< 0.02$; (d)(e)(f): $<0.018$.
  • Figure 3: Cubic spline regression of the outcome variable $Y$ (GPA) versus the covariate $Z$ (baseline GPA) for the treatment group. The distribution of $Z$ is modeled using Gaussian kernel density estimation (KDE), while the relationship between the outcome $Y$ and $Z$ is modeled using cubic spline regression. The Wasserstein distance between the empirical and KDE-generated distributions of $Z$ is less than $0.2$. The Wasserstein distance between the observed and model-generated distributions of $Y$ is less than $0.07$ for the treatment group and less than $0.09$ for the control group.
  • Figure 4: Robustness of our method to covariate mismatch. The average relative error is defined as $\mathbb{E}\left[ \left| \text{estimator} - V_{\textup{c}} \right| \right] / V_{\textup{c}}$, where the expectation $\mathbb{E}$ is approximated by averaging over 500 Monte Carlo repetitions. To quantify the uncertainty, we compute the standard error of the mean (SEM) as $\textup{SD} / \sqrt{500}$, where $\textup{SD}$ denotes the standard deviation of the relative errors across repetitions. The maximum SEM values are less than 0.006 in this setting.

Theorems & Definitions (66)

  • Proposition 1: Identifiable marginals
  • Example 1: Variance of treatment effects
  • Proposition 2: Recursive formulation
  • Definition 1: Adapted Wasserstein distance
  • Theorem 1: Optimal PI bounds
  • Example 2: Inconsistent estimator
  • Theorem 2: Continuity under $\mathds W_{\textup{a}}$
  • Definition 2: Cell-center projection
  • Definition 3: Adapted empirical distribution
  • Definition 4: Adapted COT value estimator
  • ...and 56 more