Table of Contents
Fetching ...

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators

Yukito Iba

TL;DR

The paper advances a unified framework built around the posterior log-likelihood covariance matrix $W$ (the W-kernel) to enable frequentist evaluation of Bayesian estimators. By leveraging Bayesian Infinitesimal Jackknife (Bayesian IJ) and projecting onto the principal space of $W$, it shows how posterior covariances, cumulants, and bootstrap-like quantities can be efficiently approximated, even with higher-order terms. It establishes a deep connection between $W$ and kernel methods, showing that a modified Fisher kernel approximates $W$ asymptotically and that $W$ itself serves as a reproducing kernel with a corresponding RKHS, while a dual $Z$ matrix provides an alternate PCA perspective. The work also links the data-space kernel $W$ to the classical information matrices through $\hat{\mathcal J}^{-1/2}\hat{\mathcal I}\hat{\mathcal J}^{-1/2}$, clarifies eigenvalue correspondences, and discusses efficient computation via incomplete Cholesky and representative subsets. Together, these results offer new tools for sensitivity analysis, model checking, and frequentist uncertainty quantification in Bayesian models with practical computational advantages.

Abstract

Evaluating the variability of posterior estimates is a key aspect of Bayesian model assessment. In this study, we focus on the posterior covariance matrix W, defined through the log-likelihoods of individual observations. Previous studies, notably MacEachern and Peruggia(2002) and Thomas et al.(2018), examined the role of the principal space of W in Bayesian sensitivity analysis. Here, we show that the principal space of W is also central to frequentist evaluation, using the recently proposed Bayesian infinitesimal jackknife (Bayesian IJ) approximation (Giordano and Broderick(2023)) as a key tool. We further clarify the relationship between W and the Fisher kernel, showing that a modified version of the Fisher kernel can be viewed as an approximation to W. Moreover, the matrix W itself can be interpreted as a reproducing kernel, which we refer to as the W-kernel. Based on this connection, we investigate the relation between the W-kernel formulation in the data space and the classical asymptotic formulation in the parameter space. We also introduce the matrix Z, which is effectively dual to W in the sense of PCA; this formulation provides another perspective on the relationship between W and the classical asymptotic theory. In the appendices, we explore approximate bootstrap methods for posterior means and show that projection onto the principal space of W facilitates frequentist evaluation when higher-order terms are included. In addition, we introduce incomplete Cholesky decomposition as an efficient method for computing the principal space of W, and discuss the concept of representative subsets of observations.

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators

TL;DR

The paper advances a unified framework built around the posterior log-likelihood covariance matrix (the W-kernel) to enable frequentist evaluation of Bayesian estimators. By leveraging Bayesian Infinitesimal Jackknife (Bayesian IJ) and projecting onto the principal space of , it shows how posterior covariances, cumulants, and bootstrap-like quantities can be efficiently approximated, even with higher-order terms. It establishes a deep connection between and kernel methods, showing that a modified Fisher kernel approximates asymptotically and that itself serves as a reproducing kernel with a corresponding RKHS, while a dual matrix provides an alternate PCA perspective. The work also links the data-space kernel to the classical information matrices through , clarifies eigenvalue correspondences, and discusses efficient computation via incomplete Cholesky and representative subsets. Together, these results offer new tools for sensitivity analysis, model checking, and frequentist uncertainty quantification in Bayesian models with practical computational advantages.

Abstract

Evaluating the variability of posterior estimates is a key aspect of Bayesian model assessment. In this study, we focus on the posterior covariance matrix W, defined through the log-likelihoods of individual observations. Previous studies, notably MacEachern and Peruggia(2002) and Thomas et al.(2018), examined the role of the principal space of W in Bayesian sensitivity analysis. Here, we show that the principal space of W is also central to frequentist evaluation, using the recently proposed Bayesian infinitesimal jackknife (Bayesian IJ) approximation (Giordano and Broderick(2023)) as a key tool. We further clarify the relationship between W and the Fisher kernel, showing that a modified version of the Fisher kernel can be viewed as an approximation to W. Moreover, the matrix W itself can be interpreted as a reproducing kernel, which we refer to as the W-kernel. Based on this connection, we investigate the relation between the W-kernel formulation in the data space and the classical asymptotic formulation in the parameter space. We also introduce the matrix Z, which is effectively dual to W in the sense of PCA; this formulation provides another perspective on the relationship between W and the classical asymptotic theory. In the appendices, we explore approximate bootstrap methods for posterior means and show that projection onto the principal space of W facilitates frequentist evaluation when higher-order terms are included. In addition, we introduce incomplete Cholesky decomposition as an efficient method for computing the principal space of W, and discuss the concept of representative subsets of observations.
Paper Structure (43 sections, 178 equations, 16 figures, 2 tables)

This paper contains 43 sections, 178 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Weibull analysis. From left to right, the first and second panels present a histogram and a Weibull fitting (MLE) of the data, respectively. The third panel shows the first 20 eigenvalues of $W$ in the decreasing order; the horizontal axis represents the index $1\ldots 59$ of the eigenvalues. The fourth panel shows the eigenvectors corresponds to the first two eigenvalues; the horizontal axis represents the index $1\ldots 59$ of the observations. The black dots correspond to the first eigenvalue, while the red $+$s correspond to the second eigenvalue.
  • Figure 2: Regression. The first panel presents the data points used in the experiment (blue dot) and the fitted curve; the horizontal axis represents the values of explanatory variable $z$, while the vertical axis represents $X_i$ and the posterior mean of $f(z)$. The second, third, and fourth panel present the eigenvalues of the matrix $W$; from right to left, results with normal ($\sigma$-given), normal($\sigma$-estimated), and Student-t (df=5) likelihood are shown. The horizontal and vertical axes represent an index of eigenvalues and the eigenvalues, respectively.
  • Figure 3: Smoothing. The left panel presents the fitted curve; the horizontal axis corresponds to the year $t=1,\ldots, 20$, while the vertical axis corresponds to the posterior mean of $exp(Z_t)$. The estimated curve is not smooth, because the prior penalizes the first-order difference. The right panel presents the first 30 eigenvalues; the horizontal and vertical axes represent an index of eigenvalues and eigenvalues, respectively.
  • Figure C 1: Schismatic view of the incomplete Cholesky decomposition. The upper and lower panels present $W=LL^T$ and $L^TL$, respectively. Non-zero components of each matrix belong to the dark regions, while the light regions are filled with zeros. The symbol "$\times$" indicates a matrix product. We assume that the rows and columns of the matrix $W$ are already sorted and neglect the residual (see the main text for details).
  • Figure C 2: Residual variance and representing sets of the observations. The first and second rows of the figure correspond to the case comprising the normal and Student-t distribution (df=5) of the observational noise, respectively; in both cases, the dispersion of the noise is estimated from the data. In the leftmost panel of each row, the residual variances of the incomplete Cholesky decomposition (red dot) and the optimal selection of the subspace (black $+$) are compared; the horizontal and vertical axes indicate the dimension of the subspace and the residual variance, respectively. In the middle and right panels of each row, the set of selected points are presented: those selected by the incomplete Cholesky decomposition, and the result of an independent selection, respectively. The selected points are encircled and the number above the point indicates the order of the selection in each algorithm; it should be noted that these numbers are not the index $i$ of the observation. The number of the selected points shown in the panel is five in each case.
  • ...and 11 more figures

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Remark 7
  • Remark 8
  • Remark 9
  • Remark 10
  • ...and 1 more