Table of Contents
Fetching ...

Stochastic gradient descent based variational inference for infinite-dimensional inverse problems

Jiaming Sui, Junxiong Jia, Jinglai Li

TL;DR

This paper introduces two variational inference approaches for infinite-dimensional inverse problems, developed through gradient descent with a constant learning rate, and develops a preconditioned version of cSGD to further improve sampling efficiency.

Abstract

This paper introduces two variational inference approaches for infinite-dimensional inverse problems, developed through gradient descent with a constant learning rate. The proposed methods enable efficient approximate sampling from the target posterior distribution using a constant-rate stochastic gradient descent (cSGD) iteration. Specifically, we introduce a randomization strategy that incorporates stochastic gradient noise, allowing the cSGD iteration to be viewed as a discrete-time process. This transformation establishes key relationships between the covariance operators of the approximate and true posterior distributions, thereby validating cSGD as a variational inference method. We also investigate the regularization properties of the cSGD iteration and provide a theoretical analysis of the discretization error between the approximated posterior mean and the true background function. Building on this framework, we develop a preconditioned version of cSGD to further improve sampling efficiency. Finally, we apply the proposed methods to two practical inverse problems: one governed by a simple smooth equation and the other by the steady-state Darcy flow equation. Numerical results confirm our theoretical findings and compare the sampling performance of the two approaches for solving linear and non-linear inverse problems.

Stochastic gradient descent based variational inference for infinite-dimensional inverse problems

TL;DR

This paper introduces two variational inference approaches for infinite-dimensional inverse problems, developed through gradient descent with a constant learning rate, and develops a preconditioned version of cSGD to further improve sampling efficiency.

Abstract

This paper introduces two variational inference approaches for infinite-dimensional inverse problems, developed through gradient descent with a constant learning rate. The proposed methods enable efficient approximate sampling from the target posterior distribution using a constant-rate stochastic gradient descent (cSGD) iteration. Specifically, we introduce a randomization strategy that incorporates stochastic gradient noise, allowing the cSGD iteration to be viewed as a discrete-time process. This transformation establishes key relationships between the covariance operators of the approximate and true posterior distributions, thereby validating cSGD as a variational inference method. We also investigate the regularization properties of the cSGD iteration and provide a theoretical analysis of the discretization error between the approximated posterior mean and the true background function. Building on this framework, we develop a preconditioned version of cSGD to further improve sampling efficiency. Finally, we apply the proposed methods to two practical inverse problems: one governed by a simple smooth equation and the other by the steady-state Darcy flow equation. Numerical results confirm our theoretical findings and compare the sampling performance of the two approaches for solving linear and non-linear inverse problems.

Paper Structure

This paper contains 16 sections, 6 theorems, 110 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 2.1

Let $\mathcal{H}_u$ be a separable Hilbert space and $N_d$ a positive integer. Suppose $\mu_0$ is a Gaussian measure on $\mathcal{H}_u$ and for some constants $M_1, M_2 > 0$, $\Phi : \mathcal{H}_u \rightarrow \mathbb{R}$ satisfies: Under these conditions, the posterior measure $\mu \ll \mu_0$ on $\mathcal{H}_u$ is well-defined with Radon-Nikodym derivative: where the normalization constant $Z_{\

Figures (8)

  • Figure 1: (a): The trace plot of pCN method under the same mesh size of $100$; (b): Logarithm of the eigenvalues $\lbrace c_k \rbrace^{n}_{k=1}$ of prior measure. The horizontal red dashed line shows the corresponding eigenvalue $c_M$ satisfying $c_M / c_1 < 10^{-3}$.
  • Figure 2: (a): The comparison of the estimated posterior mean function obtained by cSGD-iVI and the background truth of $u$, respectively. The green shade area represents the $95 \%$ credibility region of estimated posterior mean function; (b): The comparison of the estimated posterior mean function obtained by cSGD-iVI and pCN sampling of $u$, respectively.
  • Figure 3: (a): The comparison of the estimated posterior mean function obtained by pcSGD-iVI and the background truth of $u$, respectively. The green shade area represents the $95 \%$ credibility region of estimated posterior mean function; (b): The comparison of the estimated posterior mean function obtained by pcSGD-iVI and pCN of $u$, respectively.
  • Figure 4: (a): Relative errors of estimated posterior means of $u$ in the $L^2-norm$ of cSGD-iVI method with size of mesh $n = 100$; (b): Relative errors of estimated posterior means of $u$ in the $L^2-norm$ of pcSGD-iVI method with size of mesh $n = 100$.
  • Figure 5: The comparison of estimated posterior covariance operators of $u$ with mesh size $n=100$. (a): The covariance operator obtained by cSGD-iVI method; (b): The covariance operator obtained by pcSGD-iVI method; (c): The covariance operator obtained by pCN method.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Theorem 2.1
  • Definition 2.1
  • Remark 2.1
  • Remark 2.2
  • Theorem 2.5
  • Remark 2.3
  • Lemma 2.1
  • Theorem 2.7
  • Theorem 2.8
  • Remark 2.4
  • ...and 1 more