Table of Contents
Fetching ...

A primer on optimal transport for causal inference with observational data

Florian F Gunsilius

TL;DR

This paper surveys the deep connections between optimal transport and causal inference with observational data, arguing that OT provides a foundational language for identifying and bounding causal effects under endogeneity. It develops the role of monotone rearrangements (and their multivariate Brenier-map generalizations) as a structural core for linking unobservables to outcomes, and shows how classic identification strategies (IV, DID, and synthetic controls) can be reframed within an OT framework. Key contributions include clarifying when the full causal mechanism is identifiable (under exogeneity or Brenier map structure), deriving tight distributional bounds via path-space OT, and extending methods to nonlinear and multivariate settings through comonotonicity and barycenters. The review also highlights practical tools, such as control variables and distributionally robust methods, that leverage OT to handle weak instruments, limited support, and distributional heterogeneity. Overall, the work provides a unifying perspective that connects causality, probability, and optimization, with implications for both theory and applied econometrics.

Abstract

The theory of optimal transportation has developed into a powerful and elegant framework for comparing probability distributions, with wide-ranging applications in all areas of science. The fundamental idea of analyzing probabilities by comparing their underlying state space naturally aligns with the core idea of causal inference, where understanding and quantifying counterfactual states is paramount. Despite this intuitive connection, explicit research at the intersection of optimal transport and causal inference is only beginning to develop. Yet, many foundational models in causal inference have implicitly relied on optimal transport principles for decades, without recognizing the underlying connection. Therefore, the goal of this review is to offer an introduction to the surprisingly deep existing connections between optimal transport and the identification of causal effects with observational data -- where optimal transport is not just a set of potential tools, but actually builds the foundation of model assumptions. As a result, this review is intended to unify the language and notation between different areas of statistics, mathematics, and econometrics, by pointing out these existing connections, and to explore novel problems and directions for future work in both areas derived from this realization.

A primer on optimal transport for causal inference with observational data

TL;DR

This paper surveys the deep connections between optimal transport and causal inference with observational data, arguing that OT provides a foundational language for identifying and bounding causal effects under endogeneity. It develops the role of monotone rearrangements (and their multivariate Brenier-map generalizations) as a structural core for linking unobservables to outcomes, and shows how classic identification strategies (IV, DID, and synthetic controls) can be reframed within an OT framework. Key contributions include clarifying when the full causal mechanism is identifiable (under exogeneity or Brenier map structure), deriving tight distributional bounds via path-space OT, and extending methods to nonlinear and multivariate settings through comonotonicity and barycenters. The review also highlights practical tools, such as control variables and distributionally robust methods, that leverage OT to handle weak instruments, limited support, and distributional heterogeneity. Overall, the work provides a unifying perspective that connects causality, probability, and optimization, with implications for both theory and applied econometrics.

Abstract

The theory of optimal transportation has developed into a powerful and elegant framework for comparing probability distributions, with wide-ranging applications in all areas of science. The fundamental idea of analyzing probabilities by comparing their underlying state space naturally aligns with the core idea of causal inference, where understanding and quantifying counterfactual states is paramount. Despite this intuitive connection, explicit research at the intersection of optimal transport and causal inference is only beginning to develop. Yet, many foundational models in causal inference have implicitly relied on optimal transport principles for decades, without recognizing the underlying connection. Therefore, the goal of this review is to offer an introduction to the surprisingly deep existing connections between optimal transport and the identification of causal effects with observational data -- where optimal transport is not just a set of potential tools, but actually builds the foundation of model assumptions. As a result, this review is intended to unify the language and notation between different areas of statistics, mathematics, and econometrics, by pointing out these existing connections, and to explore novel problems and directions for future work in both areas derived from this realization.

Paper Structure

This paper contains 25 sections, 3 theorems, 65 equations, 5 figures.

Key Result

Proposition 4.1

Let $F_W$ be absolutely continuous and strictly increasing and let $h(z,w)$ be the monotone rearrangement between $F_W$ and $F_{X|Z=z}$ for all $z$. If $F_{X|Z=z}$ is continuous in $x$ for all $z$, then any univariate random variable $R$ independent of $Z$ for which there exists a measure-preserving

Figures (5)

  • Figure 1: The DAG corresponding to \ref{['eq:struct_mod']} illustrating the backdoor path through the unobservable $U$.
  • Figure 2: Depiction of the monotone rearrangement $y=g(x,u_0)$$= F^{-1}_{Y|X=x}(F_U(u_0))$
  • Figure 3: The DAG corresponding to \ref{['eq:struct_IV']}.
  • Figure 4: Fixed-point iteration in a univariate framework for identifying causal effects in \ref{['eq:struct_IV']}.
  • Figure 5: Illustration of various maps in the "nonlinear difference-in-differences" setup. An arrow indicates a pushforward map between two measures; for example $P_{Y_{C,1}}=\mathrm{d}_{\#} P_{Y_{C,0}}$. The maps $h_j$ are the "production functions" linking the unobservable measures $\nu$ and $\nu^*$ to the potential outcomes. A dashed arrow indicates a map from a measure to itself. $P_{Y_{T,1}^\dagger}$ is the counterfactual outcome measure of the treated units had they not received treatment. $\mathrm{d}$ is the natural trend map and $\mathrm{T}$ is the map from an observed outcome to its counterfactual. The observable data is drawn from the four boxed measures.

Theorems & Definitions (9)

  • Definition 3.1: matzkin2003nonparametric
  • Definition 4.1
  • Definition 4.2
  • Proposition 4.1: Generalized control variables
  • proof
  • Claim 4.1
  • Definition 5.1: Cyclic comonotonicity, torous2024optimal
  • Theorem 5.1: Multivariate extension of the changes--in--changes estimator, torous2024optimal
  • Corollary 5.1: torous2024optimal