Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference
Kaiwen Hou
TL;DR
This work addresses finite-sample causal inference under distribution shift by marrying continuous normalizing flows (CNFs) with Wasserstein gradient flows to create geometry-aware normalizing Wasserstein flows. The approach refines parametric submodels used in TMLE by guiding perturbations along optimal transport-geometric paths from a prior $p_0$ to a data-driven $p_1$, aiming to minimize the Cramér-Rao bound through $\mathcal{W}_2$-gradient dynamics. It introduces velocity-field alignment to avoid PDE gradient computations, proposes variance-regularized and velocity-aligned objectives, and provides OT and diffusion-interpretations to justify the framework. Preliminary experiments on toy distributions (e.g., 8gaussians, Pinwheel) show reduced mean-squared error and improved efficiency over naïve flows, indicating potential for more robust, finite-sample causal estimations.
Abstract
This paper presents a groundbreaking approach to causal inference by integrating continuous normalizing flows (CNFs) with parametric submodels, enhancing their geometric sensitivity and improving upon traditional Targeted Maximum Likelihood Estimation (TMLE). Our method employs CNFs to refine TMLE, optimizing the Cramér-Rao bound and transitioning from a predefined distribution $p_0$ to a data-driven distribution $p_1$. We innovate further by embedding Wasserstein gradient flows within Fokker-Planck equations, thus imposing geometric structures that boost the robustness of CNFs, particularly in optimal transport theory. Our approach addresses the disparity between sample and population distributions, a critical factor in parameter estimation bias. We leverage optimal transport and Wasserstein gradient flows to develop causal inference methodologies with minimal variance in finite-sample settings, outperforming traditional methods like TMLE and AIPW. This novel framework, centered on Wasserstein gradient flows, minimizes variance in efficient influence functions under distribution $p_t$. Preliminary experiments showcase our method's superiority, yielding lower mean-squared errors compared to standard flows, thereby demonstrating the potential of geometry-aware normalizing Wasserstein flows in advancing statistical modeling and inference.
