Table of Contents
Fetching ...

Uncertainty calibration for probabilistic projection methods

Vladimir Fanaskov

TL;DR

This work addresses the mismatch between probabilistic projections and actual error in linear solves by augmenting the prior covariance with a subspace spanning ${\sf Null}(W^{T}A)$, yielding nontrivial posterior uncertainty while preserving the projection mean. It develops practical, computable covariance constructions using projector-based forms for $\boldsymbol{ abla}_0 = VV^{T} + Y G Y^{T}$, analyzes when probabilistic projection is sound, and provides a Krylov-specific uncertainty calibration procedure that relies on an extra calibration run to align posterior statistics with observed errors. The authors demonstrate that Reid's covariance is a special case of their general framework, and they show through extensive numerical experiments that their calibration approach substantially improves uncertainty quality over existing methods, albeit with higher calibration cost. They also illustrate a PDE-constrained optimization example to highlight practical implications and potential use in surrogate modeling and optimization. Overall, the paper advances probabilistic linear solves by enabling calibrated, interpretable uncertainty in projection methods and clarifies the limitations of applying these ideas directly to Krylov subspace methods.

Abstract

Classical Krylov subspace projection methods for the solution of linear problem $Ax = b$ output an approximate solution $\widetilde{x}\simeq x$. Recently, it has been recognized that projection methods can be understood from a statistical perspective. These probabilistic projection methods return a distribution $p(\widetilde{x})$ in place of a point estimate $\widetilde{x}$. The resulting uncertainty, codified as a distribution, can, in theory, be meaningfully combined with other uncertainties, can be propagated through computational pipelines, and can be used in the framework of probabilistic decision theory. The problem we address is that the current probabilistic projection methods lead to the poorly calibrated posterior distribution. We improve the covariance matrix from previous works in a way that it does not contain such undesirable objects as $A^{-1}$ or $A^{-1}A^{-T}$, results in nontrivial uncertainty, and reproduces an arbitrary projection method as a mean of the posterior distribution. We also propose a variant that is numerically inexpensive in the case the uncertainty is calibrated a priori. Since it usually is not, we put forward a practical way to calibrate uncertainty that performs reasonably well, albeit at the expense of roughly doubling the numerical cost of the underlying projection method.

Uncertainty calibration for probabilistic projection methods

TL;DR

This work addresses the mismatch between probabilistic projections and actual error in linear solves by augmenting the prior covariance with a subspace spanning , yielding nontrivial posterior uncertainty while preserving the projection mean. It develops practical, computable covariance constructions using projector-based forms for , analyzes when probabilistic projection is sound, and provides a Krylov-specific uncertainty calibration procedure that relies on an extra calibration run to align posterior statistics with observed errors. The authors demonstrate that Reid's covariance is a special case of their general framework, and they show through extensive numerical experiments that their calibration approach substantially improves uncertainty quality over existing methods, albeit with higher calibration cost. They also illustrate a PDE-constrained optimization example to highlight practical implications and potential use in surrogate modeling and optimization. Overall, the paper advances probabilistic linear solves by enabling calibrated, interpretable uncertainty in projection methods and clarifies the limitations of applying these ideas directly to Krylov subspace methods.

Abstract

Classical Krylov subspace projection methods for the solution of linear problem output an approximate solution . Recently, it has been recognized that projection methods can be understood from a statistical perspective. These probabilistic projection methods return a distribution in place of a point estimate . The resulting uncertainty, codified as a distribution, can, in theory, be meaningfully combined with other uncertainties, can be propagated through computational pipelines, and can be used in the framework of probabilistic decision theory. The problem we address is that the current probabilistic projection methods lead to the poorly calibrated posterior distribution. We improve the covariance matrix from previous works in a way that it does not contain such undesirable objects as or , results in nontrivial uncertainty, and reproduces an arbitrary projection method as a mean of the posterior distribution. We also propose a variant that is numerically inexpensive in the case the uncertainty is calibrated a priori. Since it usually is not, we put forward a practical way to calibrate uncertainty that performs reasonably well, albeit at the expense of roughly doubling the numerical cost of the underlying projection method.
Paper Structure (18 sections, 18 theorems, 33 equations, 6 figures, 3 algorithms)

This paper contains 18 sections, 18 theorems, 33 equations, 6 figures, 3 algorithms.

Key Result

Theorem 1

Let $\det A \neq 0$, $p(x) = \mathcal{N}(x|x_0, \Sigma_{0})$ and $y_{m} = S_{m}^{T}Ax$, where $S_{m}\in\mathbb{R}^{n\times m}, m\leq n$ is a full-rank matrix. The mean of conditional distribution $p(x|y_{m} = S^{T}_{m}b) = \mathcal{N}(x|x_m, \Sigma_{m})$ reproduces projection method projection_metho

Figures (6)

  • Figure 1: The figure demonstrates how the acute angle $\theta_{i},~i=1,2$ between subspace spanned by $v_1$ and $v_2$ and $u_i,~i=1, 2$ depend on the vector $u_{i} = v_1 + v_2 + iv_3$. The angles can be computed as $\cos(\theta_{i}) = u_{i}^{T}P_{\perp} u_{i}/u_{i}^T u_{i}$. Lemma \ref{['lemma:alignment']} is a probabilistic counterpart of this situation. Namely, by rescaling eigenvectors of covariance matrix one can influence the distribution of the angle between the error and a given subspace.
  • Figure 2: Figure demonstrates $\gamma_{i}\left\|r_{i-1}\right\|_2^2$ for matrix bcsstm07 from SuiteSparse matrix collection.
  • Figure 3: Figures demonstrate theoretical test statistics and empirical distributions for different prior distributions. Common legends for each column appear in the first row. The legend provides specifications of covariance matrices. For example, $s\left(VV^{T} + P_2\right)$ refers to posterior described in Lemma \ref{['lemma:cheap_UQ']} with $\Psi = P_2$. The first two columns contain point estimation and hierarchical modelling for five projection steps. The first row presents results related to the conjugate gradient method and the second to GMRES. In the last column we show how $L_1$ norm of the difference between empirical $p_{e}$ and target $p_{t}$ ($\chi^{2}$ or $F$ as explained in Section \ref{['section:Numerical_experiments']}) distributions changes with the number of projection steps. Perfect uncertainty calibration corresponds to zero value of discrepancy. The worst possible mismatch corresponds to $L_1$ norm of the error equals two. Overall we can see that the method proposed in Lemma \ref{['lemma:expensive_UQ']} provides a reasonable uncertainty for both projection processes.
  • Figure 4: Figures summarize the dependence of proposed uncertainty calibration (Algorithm \ref{['algorithm:UQ_calibration']}) on the number of additional observations $k$. First row corresponds to results for conjugate gradient iteration and the second row -- for GMRES iteration. The second and the third columns, which contain point estimation and hierarchical modelling, respectively, share common legends that appeared in the first row. Graphs in these last two columns show how $L_1$ norm of the difference between empirical $p_{e}$ and target $p_{t}$ ($\chi^{2}$ or $F$ as explained in Section \ref{['section:Numerical_experiments']}) distributions changes with the number of projection steps for $k = 1, 5, 25$ additional observations in Algorithm \ref{['algorithm:UQ_calibration']}. Figures in the first column allow for visual inspection of empirical and target distributions for $Z$-statistic. Namely, for CG, we sketch the probability density function of $Z$-statistic for point estimation in the first row (the target distribution is $\chi^2$), whereas the second row contains the same quantity but for hierarchical modelling (the target distribution is $F$). We can see that for point estimation, additional observations marginally improve uncertainty calibration, whereas, for hierarchical modelling, the situation is reversed. We conclude that, first, it makes little sense to use $k>1$ for the chosen family of linear systems. Second, such behaviour clearly indicates that the chosen statistical model is inadequate for Krylov subspace methods.
  • Figure 5: Figures demonstrate exact error $e_{m}^TAe_{m}$ on iteration $m$, and samples from $S-$statistic for three matrices. First row corresponds to uncertainty calibration proposed in reid2020probabilistic. Second row shows samples from $S-$statistic calibrated according to Algorithm \ref{['algorithm:UQ_calibration']} with ${\sf statistic} = S$. We can see that the statistical uncertainty calibration proposed in this article leads to better uncertainty in all three cases.
  • ...and 1 more figures

Theorems & Definitions (35)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 2
  • Theorem 3
  • proof
  • ...and 25 more