Beyond Flatland: A Geometric Take on Matching Methods for Treatment Effect Estimation
Melanie F. Pradier, Javier González
TL;DR
GeoMatching addresses the core challenge of covariate confounding in observational causal inference by incorporating the geometry of the data manifold. It learns a latent representation and a Riemannian metric to compute geodesic distances, guiding nearest-neighbor matching along the manifold rather than in raw Euclidean space. The approach yields more accurate TE estimates, demonstrating robustness to increasing input dimensionality, presence of outliers, and benefits in semi-supervised settings across synthetic Swissroll, semi-synthetic Mocap, IHDP, and Lalonde datasets. This geometry-aware matching framework offers a principled way to reduce extrapolation bias and opens avenues for integrating differential geometry with causal discovery and broader causal inference methods. The work highlights practical improvements in TE estimation and provides a foundation for future optimization of geodesic computations and latent representations in causal tasks.
Abstract
Matching is a popular approach in causal inference to estimate treatment effects by pairing treated and control units that are most similar in terms of their covariate information. However, classic matching methods completely ignore the geometry of the data manifold, which is crucial to define a meaningful distance for matching, and struggle when covariates are noisy and high-dimensional. In this work, we propose GeoMatching, a matching method to estimate treatment effects that takes into account the intrinsic data geometry induced by existing causal mechanisms among the confounding variables. First, we learn a low-dimensional, latent Riemannian manifold that accounts for uncertainty and geometry of the original input data. Second, we estimate treatment effects via matching in the latent space based on the learned latent Riemannian metric. We provide theoretical insights and empirical results in synthetic and real-world scenarios, demonstrating that GeoMatching yields more effective treatment effect estimators, even as we increase input dimensionality, in the presence of outliers, or in semi-supervised scenarios.
