Table of Contents
Fetching ...

HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models

Tingwei Meng, Zongren Zou, Jérôme Darbon, George Em Karniadakis

TL;DR

This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator, finding that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process.

Abstract

The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: (1) solving the viscous Hamilton-Jacobi partial differential equations, and (2) sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. We demonstrate the effectiveness and flexibility of the proposed methods by applying them to solve Bayesian inverse problems involving various stochastic processes and prior distributions, including applications that address model misspecifications and quantifying model uncertainty.

HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models

TL;DR

This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator, finding that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process.

Abstract

The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: (1) solving the viscous Hamilton-Jacobi partial differential equations, and (2) sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. We demonstrate the effectiveness and flexibility of the proposed methods by applying them to solve Bayesian inverse problems involving various stochastic processes and prior distributions, including applications that address model misspecifications and quantifying model uncertainty.
Paper Structure (35 sections, 84 equations, 12 figures, 5 tables)

This paper contains 35 sections, 84 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Roadmap of this paper. The black sections represent well-known concepts in the literature, while the red sections indicate our contributions.
  • Figure 2: This figure illustrates the log transform \ref{['eqt:transform_mu_to_rho']} applied when $\mathcal{A}_{\epsilon,t}$ is as the infinitesimal generator of a stochastic process $X_t$. On the left, a general process is depicted, while on the right, the specific instance of a scaled Brownian motion (see Example \ref{['eg:log_transform_BM']}) is presented. The time orientation selected here is consistent with that used in stochastic optimal control or stochastic optimal transport problems, aligning with the reversal of the viscous HJ PDE (see Remark \ref{['rem:time_direction']} for details).
  • Figure 3: Depiction of the log transform \ref{['eqt:transform_mu_to_rho']} when the linear operator $\mathcal{A}_{\epsilon,T-t}$ acts as the adjoint of the infinitesimal generator for the stochastic process $Y_t$, illustrating its application in Bayesian inference. With specific initial and terminal conditions, the function $\mu$ evolves from the prior distribution to the data distribution, while $\rho$ evolves from a Dirac delta centered at the observation $y_{obs}$ of $Y_T$ to the corresponding posterior distribution. The first line shows the evolution of $\mu$ from right to left, while the second and third lines display the evolutions of $\nu$ and $\rho$ from left to right. The figures depict the graphs of the respective density functions, and the relationships among the three lines represent the first part of the log transform \ref{['eqt:transform_mu_to_rho']}.
  • Figure 4: The figure illustrates the SGM-HJ-sampler algorithm, consisting of two steps. The first step, shown in the top panel, corresponds to the training phase, where training data is generated by sampling $Y_t$ from $\mu_t$, and a neural network is trained to approximate the scaled control or score function. The heatmap in the middle represents the evolution of the density function $\mu_t$ from right to left, with time on the horizontal axis and space on the vertical axis. The black curves display the sample paths, demonstrating the training data. The second step, in the bottom panel, represents the inference phase, where posterior samples of $Y_t \mid Y_T = y_{obs}$ (with density $\rho_\tau$) are generated by sampling the controlled paths $Z_\tau$. The heatmap shows the evolution of $\rho_\tau$ from left to right, with the black curves representing the generated sample paths, and the white curve depicting the sample mean of the posterior distribution. The graphs of the initial and terminal densities are displayed on the sides of each panel.
  • Figure 5: Histograms depicting the distribution of posterior samples for the scaled 1D Brownian motion case with a Gaussian mixture prior, across different observation times $s$ and data values $y_{obs}$. For all cases, the posterior samples, obtained from the SGM-HJ-sampler, utilize the same pretrained neural network, trained on $t \in [0, T]$ with $T = 1$. The black dashed lines represent the exact posterior density functions (Gaussian mixture). Each histogram is generated from $1 \times 10^6$ samples.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Remark 2.1: Initial or terminal conditions
  • Example 2.1: Brownian motion
  • Remark 2.2
  • Remark 2.3: Partial observation
  • Remark 3.1: Flexibility of the observation time $T$
  • Remark 3.2: Difference between SGM-HJ-sampler and SGM
  • Remark 3.3: Theoretical unification of SGM-HJ-sampler and SGM
  • Remark C.1
  • Remark C.2