Table of Contents
Fetching ...

Statistical Inference via Generative Models: Flow Matching and Causal Inference

Shinto Eguchi

TL;DR

A statistical framework is developed in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning.

Abstract

Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book reinterprets generative AI in the language of statistics, using flow matching as a central example. The key idea is that generative models should be understood not merely as devices for producing plausible data, but as methods for the nonparametric learning of high-dimensional probability distributions. From this viewpoint, missing-data imputation becomes principled sampling from learned conditional distributions, counterfactual analysis becomes the estimation of intervention distributions, and distributional dynamics become statistically analyzable objects. Mathematically, flow matching represents distributional deformation through the continuity equation and a time-dependent velocity field, thereby extending score matching from the learning of static score fields to the learning of transport paths themselves. Building on this foundation, the book develops a statistical framework in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning. Applications to survival analysis, censoring, missingness, and causal inference show how generative models can be integrated into statistical inference for structured high-dimensional problems.

Statistical Inference via Generative Models: Flow Matching and Causal Inference

TL;DR

A statistical framework is developed in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning.

Abstract

Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book reinterprets generative AI in the language of statistics, using flow matching as a central example. The key idea is that generative models should be understood not merely as devices for producing plausible data, but as methods for the nonparametric learning of high-dimensional probability distributions. From this viewpoint, missing-data imputation becomes principled sampling from learned conditional distributions, counterfactual analysis becomes the estimation of intervention distributions, and distributional dynamics become statistically analyzable objects. Mathematically, flow matching represents distributional deformation through the continuity equation and a time-dependent velocity field, thereby extending score matching from the learning of static score fields to the learning of transport paths themselves. Building on this foundation, the book develops a statistical framework in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning. Applications to survival analysis, censoring, missingness, and causal inference show how generative models can be integrated into statistical inference for structured high-dimensional problems.
Paper Structure (287 sections, 4 theorems, 345 equations, 10 figures, 3 tables, 4 algorithms)

This paper contains 287 sections, 4 theorems, 345 equations, 10 figures, 3 tables, 4 algorithms.

Key Result

Theorem 4.8.1

Assume eq:neyman_orth holds and the nuisance estimator satisfies Then any solution $\widehat{\theta}$ to the score equation $\Psi_n(\widehat{\theta}, \widehat{\eta}) = 0$ satisfies where $J =\mathbb E[ -\nabla_\theta \Psi(\theta_0, \eta_0)]$ and $V = \mathbb{E}[\Psi(O;\theta_0,\eta_0)\Psi(O;\theta_0,\eta_0)^\top]$.

Figures (10)

  • Figure 1: Quartic potential model $p_\theta(x)$ with $\theta=(0,2,-0.5)$.
  • Figure 2: Vector field induced by a James--Stein-type shrinkage term.
  • Figure 3: Pairings induced by different couplings (illustration).
  • Figure 4: Stabilization and robustness induced by Lipschitz constraints on the velocity field. The color intensity visualizes the sensitivity ratio ${\|\Delta X_1\|}/{\|\Delta X_0\|}$ of the final state to a small perturbation in the initial state; red indicates instability. (Left) standard training. (Right) training with a Lipschitz constraint via spectral normalization.
  • Figure 5: Neyman orthogonality: parameter score vs nuisance score directions.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Example 2.3.1: Quartic potential model
  • Example 2.3.2: Gaussian graphical model (GGM)
  • Theorem 4.8.1
  • proof
  • Proposition 4.8.1: Local robustness
  • proof
  • Definition 5.4.1
  • Proposition 5.4.1: Neyman orthogonality (centered score)
  • proof
  • Corollary 5.4.1: Local efficiency (reduction to Cox score)