Table of Contents
Fetching ...

Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems

Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas

TL;DR

The paper introduces derivative-informed neural operators (DINO) trained on joint maps of the parameter-to-observable (PtO) relationship and its Jacobian to accelerate geometric MCMC for infinite-dimensional Bayesian inverse problems. By embedding a reduced-basis DINO surrogate within a delayed-acceptance, dimension-independent geometric MCMC (mMALA) framework, the method avoids online forward/adjoint sensitivity solves while preserving posterior geometry and sampling correctness. The authors provide $H^1_{oldsymbol{}}$-level error analysis, cost decompositions for PDE-based training, and two PDE benchmarks (coefficient inversion in nonlinear diffusion–reaction and heterogeneous hyperelastic material property inference) showing 3–9x faster geometric MCMC and 60–97x faster than prior geometry-based MCMC, with training break-even after 10–25 effective samples. This approach substantially lowers the online cost of Bayesian inference on function spaces and offers a scalable, rigorous route to fast uncertainty quantification in PDE-constrained problems.

Abstract

We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.

Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems

TL;DR

The paper introduces derivative-informed neural operators (DINO) trained on joint maps of the parameter-to-observable (PtO) relationship and its Jacobian to accelerate geometric MCMC for infinite-dimensional Bayesian inverse problems. By embedding a reduced-basis DINO surrogate within a delayed-acceptance, dimension-independent geometric MCMC (mMALA) framework, the method avoids online forward/adjoint sensitivity solves while preserving posterior geometry and sampling correctness. The authors provide -level error analysis, cost decompositions for PDE-based training, and two PDE benchmarks (coefficient inversion in nonlinear diffusion–reaction and heterogeneous hyperelastic material property inference) showing 3–9x faster geometric MCMC and 60–97x faster than prior geometry-based MCMC, with training break-even after 10–25 effective samples. This approach substantially lowers the online cost of Bayesian inference on function spaces and offers a scalable, rigorous route to fast uncertainty quantification in PDE-constrained problems.

Abstract

We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.
Paper Structure (68 sections, 7 theorems, 118 equations, 22 figures, 7 tables, 1 algorithm)

This paper contains 68 sections, 7 theorems, 118 equations, 22 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3

[theorem]thm:log-sobolev If ${\mathcal{S}} \in H^1_{\mu}({\mathscr{M}})\coloneqq H^1_{\mu}({\mathscr{M}};\mathbb{R})$, then the following inequality holds where $D_{{\mathscr{H}}_{\mu}}{\mathcal{S}}$ is the ${\mathscr{H}}_{\mu}$-Riesz representation of the stochastic derivative of ${\mathcal{S}}$.

Figures (22)

  • Figure 1: (left) A schematic of the MH algorithm for sampling from the posterior distribution $\mu^{\bm{y}}$ as described in \ref{['subsec:mh']}. (right) A schematic of the MH algorithm with delayed acceptance enabled by a surrogate PtO map $\widetilde{\boldsymbol{\mathcal{G}}}(\cdot;\bm{w})$ parameterized by $\bm{w}$. See \ref{['subsec:da']} for a detailed description of the components of this algorithm.
  • Figure 2: A schematic of reduced basis DINO architecture and learning for surrogate approximation $\widetilde{\boldsymbol{\mathcal{G}}}\approx \boldsymbol{\mathcal{G}}$ in $H^1_{\mu}({\mathscr{M}};{\mathscr{Y}})$.
  • Figure 3: Visualizations of prior samples ($1681$ DoFs), PDE solutions ($3362$ DoFs), and predicted observables ($\mathbb{R}^{25}$) for coefficient inversion in a nonlinear diffusion--reaction PDE.
  • Figure 4: Visualization of the BIP setting and the MAP estimate for coefficient inversion in a nonlinear diffusion--reaction PDE.
  • Figure 5: The generalization error and accuracy \ref{['eq:generalization_error']} for predicting the observable vector and the reduced Jacobian matrix via $L^2_{\mu}$-trained neural operators and $H^1_{\mu}$-trained DINOs for coefficient inversion in a nonlinear diffusion--reaction PDE. The error is plotted as a function of training sample generation cost, measured relative to the averaged cost of one nonlinear PDE solve.
  • ...and 17 more figures

Theorems & Definitions (14)

  • Remark 1
  • Definition 2: $\mu$-a.e. Gâteaux differentiability
  • Theorem 3: Logarithmic Sobolev inequality, bogachev1998gaussian
  • Theorem 4: Poincaré inequality, bogachev1998gaussian
  • Remark 5
  • Remark 6
  • Proposition 7: $L^2_\mu$ approximation error, DIS
  • Proposition 8: $L^2_\mu$ approximation error, KLE
  • Remark 9
  • Lemma 10
  • ...and 4 more