Table of Contents
Fetching ...

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Neil K. Chada, Quanjun Lang, Fei Lu, Xiong Wang

TL;DR

A data-adaptive prior is introduced to achieve a stable posterior whose mean always has a small noise limit, and its covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method.

Abstract

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

TL;DR

A data-adaptive prior is introduced to achieve a stable posterior whose mean always has a small noise limit, and its covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method.

Abstract

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.
Paper Structure (36 sections, 13 theorems, 62 equations, 8 figures, 5 tables)

This paper contains 36 sections, 13 theorems, 62 equations, 8 figures, 5 tables.

Key Result

Theorem 2.7

Suppose the data in eq:data is generated from the system eq:map_R with a true kernel $\phi_{true}$. Suppose that Assumption assumption1 holds. Then, the following statements hold.

Figures (8)

  • Figure 1: The exploration measure and the eigenvalues of the basis matrix $B$, regression matrix $A_D$ and operator ${\mathcal{L}_{\overline{G}}}$ (computed via the generalized eigenvalue problem of $(A_D,B)$).
  • Figure 2: Interquartile range (IQR, the $75^{th}$, $50^{th}$ and $25^{th}$ percentiles) of the $L^2_\rho$ errors of the posterior means. They are computed in $200$ independent simulations with $\phi_{true}$ sampled from the fixed prior (hence outside the FSOI), in the presence of four types of errors. Top row: the regression matrix ${\overline{A}}$ is computed from continuous $\{u^k\}$; Bottom row: ${\overline{A}}$ is computed from discrete data. As $\sigma_\eta\to 0$, the fixed prior leads to diverging posterior means in 6 out of the 8 cases, while the data-adaptive (DA) prior leads to stable posterior means.
  • Figure 3: IQR of the $L^2_\rho$ errors of the posterior means in $200$ independent simulations with $\phi_{true}$ sampled inside the FSOI.
  • Figure 4: The posterior (its mean, the $75^{th}$ and $25^{th}$ percentiles) when $\phi_{true} \notin$ FSOI.
  • Figure 5: The posterior (its mean, the $75^{th}$ and $25^{th}$ percentiles) when $\phi_{true} \in$ FSOI.
  • ...and 3 more figures

Theorems & Definitions (39)

  • Example 2.1: Kernels in Toeplitz matrices
  • Example 2.2: Integral operator
  • Example 2.3: Nonlocal operator
  • Example 2.4: Interaction operator
  • Definition 2.6
  • Theorem 2.7: Function space of identifiability
  • Lemma 2.8: A data-adaptive RKHS
  • Proposition 3.2: Risk in a fixed non-degenerate prior
  • proof : Proof of Proposition \ref{['thm:risk_prior']}.
  • Proposition 3.3: Risk in a noise-adaptive non-degenerate prior
  • ...and 29 more