Table of Contents
Fetching ...

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

Zheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk, Lars Ruthotto, Deepanshu Verma

TL;DR

This work tackles conditional sampling and density estimation in Bayesian inference when the likelihood is intractable, by developing two neural network-based conditional optimal transport (COT) approaches. PCP-Map provides a static, Brenier-map-inspired Transport via the gradient of a partially input convex neural network, trained with maximum likelihood, while COT-Flow offers a dynamic transport via a neural ODE with OT-regularized velocity fields. The methods are demonstrated on six UCI datasets, stochastic Lotka–Volterra inference, and high-dimensional shallow water equations, showing competitive accuracy and favorable computational trade-offs compared to state-of-the-art baselines, with PCP-Map delivering faster training and COT-Flow enabling rapid sampling after training. The work contributes reproducible, scalable techniques for amortized posterior sampling in likelihood-free settings and lays groundwork for statistical and computational analyses of learned COT maps.

Abstract

We present two neural network approaches that approximate the solutions of static and dynamic $\unicode{x1D450}\unicode{x1D45C}\unicode{x1D45B}\unicode{x1D451}\unicode{x1D456}\unicode{x1D461}\unicode{x1D456}\unicode{x1D45C}\unicode{x1D45B}\unicode{x1D44E}\unicode{x1D459}\unicode{x0020}\unicode{x1D45C}\unicode{x1D45D}\unicode{x1D461}\unicode{x1D456}\unicode{x1D45A}\unicode{x1D44E}\unicode{x1D459}\unicode{x0020}\unicode{x1D461}\unicode{x1D45F}\unicode{x1D44E}\unicode{x1D45B}\unicode{x1D460}\unicode{x1D45D}\unicode{x1D45C}\unicode{x1D45F}\unicode{x1D461}$ (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference$\unicode{x2013}$particularly in the simulation-based ($\unicode{x201C}$likelihood-free$\unicode{x201D}$) setting. Our methods represent the target conditional distribution as a transformation of a tractable reference distribution. Obtaining such a transformation, chosen here to be an approximation of the COT map, is computationally challenging even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize candidate maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

TL;DR

This work tackles conditional sampling and density estimation in Bayesian inference when the likelihood is intractable, by developing two neural network-based conditional optimal transport (COT) approaches. PCP-Map provides a static, Brenier-map-inspired Transport via the gradient of a partially input convex neural network, trained with maximum likelihood, while COT-Flow offers a dynamic transport via a neural ODE with OT-regularized velocity fields. The methods are demonstrated on six UCI datasets, stochastic Lotka–Volterra inference, and high-dimensional shallow water equations, showing competitive accuracy and favorable computational trade-offs compared to state-of-the-art baselines, with PCP-Map delivering faster training and COT-Flow enabling rapid sampling after training. The work contributes reproducible, scalable techniques for amortized posterior sampling in likelihood-free settings and lays groundwork for statistical and computational analyses of learned COT maps.

Abstract

We present two neural network approaches that approximate the solutions of static and dynamic (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inferenceparticularly in the simulation-based (likelihood-free) setting. Our methods represent the target conditional distribution as a transformation of a tractable reference distribution. Obtaining such a transformation, chosen here to be an approximation of the COT map, is computationally challenging even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize candidate maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
Paper Structure (22 sections, 37 equations, 7 figures, 7 tables)

This paper contains 22 sections, 37 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Schematic overview of related measure transport approaches, all of which produce diffeomorphic transport maps. Approaches are first separated into static versus dynamic, then grouped by their approximation target assumptions. The red dots represent specific, canonical transport maps: ($L^2$) optimal transport (OT) and the Knothe--Rosenblatt (KR) transport. The surrounding ellipses capture methods that seek to approximate these canonical maps.
  • Figure 1: Stochastic Lotka--Volterra experiment, \ref{['sec:stochLV']}. Comparison of estimated posterior distributions and MAP points between the proposed approaches and SMC-ABC (${\bf x}^* = (0.01, 0.5, 1, 0.01)^\top$). Red dots/bars represent ${\bf x}^\ast$; black crosses/bars represent MAP points. Column 1: Proposed approaches (50k training samples). Column 2: Proposed approaches (500k training samples). Column 3: SMC-ABC (around 17.9 million simulations).
  • Figure 2: Stochastic Lotka--Volterra experiment, \ref{['sec:stochLV']}. Comparison of estimated posterior distributions and MAP points between the proposed approaches and SMC-ABC (${\bf x}^* = (0.02, 0.02, 0.02, 0.02)^\top$). Red dots/bars represent ${\bf x}^\ast$; black crosses/bars represent MAP points. Column 1: Proposed approaches (50k training samples). Column 2: Proposed approaches (500k training samples). Column 3: SMC-ABC (around 5.8 million simulations).
  • Figure 3: Stochastic Lotka--Volterra experiment, \ref{['sec:stochLV']}. Relative normed errors between COT-Flow posterior samples generated with smaller number of time-steps $n_t$ and samples generated with $n_t=32$. Here ${\bf x}^*_1 = (0.01, 0.5, 1, 0.01)^\top$ and ${\bf x}^*_2 = (0.02, 0.02, 0.02, 0.02)^\top$.
  • Figure 4: Shallow water experiment, \ref{['sec:shallow']}. Comparison of MAP estimation accuracy, posterior sampling quality, and posterior predictive accuracy among PCP-Map, COT-Flow, and NPE. Column 1: Ground truth parameter ${\bf x}^\ast$ (black), prior parameter samples (gray, row 1), generated posterior samples (gray, row 2-4), and MAP points (purple cross, row 2-4). Column 2: 2D wave simulation images over 100 time and space grids. Horizontal color lines mark 3 time slices shown in Columns 3–5. Ground truth simulation (row 1) and simulation from generated posterior samples (row 2-4). Columns 3--5: Wave amplitudes across the 3 time slices. Ground truth simulations (black), simulations from prior samples (row 1) and generated posterior samples (rows 2–4).
  • ...and 2 more figures