A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Neil K. Chada; Quanjun Lang; Fei Lu; Xiong Wang

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Neil K. Chada, Quanjun Lang, Fei Lu, Xiong Wang

TL;DR

A data-adaptive prior is introduced to achieve a stable posterior whose mean always has a small noise limit, and its covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method.

Abstract

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

TL;DR

Abstract

Paper Structure (36 sections, 13 theorems, 62 equations, 8 figures, 5 tables)

This paper contains 36 sections, 13 theorems, 62 equations, 8 figures, 5 tables.

Introduction
Problem setup
Proposed: a data-adaptive RKHS prior
Related work
Prior selection for Bayesian inverse problems.
Regularization in a variational approach.
Operator learning.
Gaussian process and kernel-based regression.
Learning interacting kernels and nonlocal kernels.
The learning of kernels in operators
Examples
Variational approach
Function space of identifiability
A data-adaptive RKHS
Bayesian inversion and the risk in a non-degenerate prior
...and 21 more sections

Key Result

Theorem 2.7

Suppose the data in eq:data is generated from the system eq:map_R with a true kernel $\phi_{true}$. Suppose that Assumption assumption1 holds. Then, the following statements hold.

Figures (8)

Figure 1: The exploration measure and the eigenvalues of the basis matrix $B$, regression matrix $A_D$ and operator ${\mathcal{L}_{\overline{G}}}$ (computed via the generalized eigenvalue problem of $(A_D,B)$).
Figure 2: Interquartile range (IQR, the $75^{th}$, $50^{th}$ and $25^{th}$ percentiles) of the $L^2_\rho$ errors of the posterior means. They are computed in $200$ independent simulations with $\phi_{true}$ sampled from the fixed prior (hence outside the FSOI), in the presence of four types of errors. Top row: the regression matrix ${\overline{A}}$ is computed from continuous $\{u^k\}$; Bottom row: ${\overline{A}}$ is computed from discrete data. As $\sigma_\eta\to 0$, the fixed prior leads to diverging posterior means in 6 out of the 8 cases, while the data-adaptive (DA) prior leads to stable posterior means.
Figure 3: IQR of the $L^2_\rho$ errors of the posterior means in $200$ independent simulations with $\phi_{true}$ sampled inside the FSOI.
Figure 4: The posterior (its mean, the $75^{th}$ and $25^{th}$ percentiles) when $\phi_{true} \notin$ FSOI.
Figure 5: The posterior (its mean, the $75^{th}$ and $25^{th}$ percentiles) when $\phi_{true} \in$ FSOI.
...and 3 more figures

Theorems & Definitions (39)

Example 2.1: Kernels in Toeplitz matrices
Example 2.2: Integral operator
Example 2.3: Nonlocal operator
Example 2.4: Interaction operator
Definition 2.6
Theorem 2.7: Function space of identifiability
Lemma 2.8: A data-adaptive RKHS
Proposition 3.2: Risk in a fixed non-degenerate prior
proof : Proof of Proposition \ref{['thm:risk_prior']}.
Proposition 3.3: Risk in a noise-adaptive non-degenerate prior
...and 29 more

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

TL;DR

Abstract

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (39)