Table of Contents
Fetching ...

Preconditioned Truncated Single-Sample Estimators for Scalable Stochastic Optimization

Tianshi Xu, Difeng Cai, Hua Huang, Edmond Chow, Yuanzhe Xi

TL;DR

The paper addresses the computational bottlenecks in large-scale stochastic optimization arising from repeated linear solves and log-determinant evaluations. It introduces Preconditioned Truncated Single-Sample (PTSS) estimators, which fuse preconditioning with randomized, truncated Krylov iterations to produce unbiased, low-variance estimates for inverse-quadratic forms, log-determinants, and derivatives. The authors derive mean, variance, and concentration results, including Gamma-optimal sampling distributions that minimize variance as a function of the condition number $$, and provide concrete TSS-Solve and TSS-LogQF variants. Numerical experiments on Gaussian process NLML and training tasks demonstrate substantial gains in stability and variance control over existing unbiased and biased approaches, highlighting the practical impact for scalable Bayesian learning and stochastic optimization.

Abstract

Many large-scale stochastic optimization algorithms involve repeated solutions of linear systems or evaluations of log-determinants. In these regimes, computing exact solutions is often unnecessary; it is more computationally efficient to construct unbiased stochastic estimators with controlled variance. However, classical iterative solvers incur truncation bias, whereas unbiased Krylov-based estimators typically exhibit high variance and numerical instability. To mitigate these issues, we introduce the Preconditioned Truncated Single-Sample (PTSS) estimators--a family of stochastic Krylov methods that integrate preconditioning with truncated Lanczos iterations. PTSS yields low-variance, stable estimators for linear system solutions, log-determinants, and their derivatives. We establish theoretical results on their mean, variance, and concentration properties, explicitly quantifying the variance reduction induced by preconditioning. Numerical experiments confirm that PTSS achieves superior stability and variance control compared with existing unbiased and biased alternatives, providing an efficient framework for stochastic optimization.

Preconditioned Truncated Single-Sample Estimators for Scalable Stochastic Optimization

TL;DR

The paper addresses the computational bottlenecks in large-scale stochastic optimization arising from repeated linear solves and log-determinant evaluations. It introduces Preconditioned Truncated Single-Sample (PTSS) estimators, which fuse preconditioning with randomized, truncated Krylov iterations to produce unbiased, low-variance estimates for inverse-quadratic forms, log-determinants, and derivatives. The authors derive mean, variance, and concentration results, including Gamma-optimal sampling distributions that minimize variance as a function of the condition number , and provide concrete TSS-Solve and TSS-LogQF variants. Numerical experiments on Gaussian process NLML and training tasks demonstrate substantial gains in stability and variance control over existing unbiased and biased approaches, highlighting the practical impact for scalable Bayesian learning and stochastic optimization.

Abstract

Many large-scale stochastic optimization algorithms involve repeated solutions of linear systems or evaluations of log-determinants. In these regimes, computing exact solutions is often unnecessary; it is more computationally efficient to construct unbiased stochastic estimators with controlled variance. However, classical iterative solvers incur truncation bias, whereas unbiased Krylov-based estimators typically exhibit high variance and numerical instability. To mitigate these issues, we introduce the Preconditioned Truncated Single-Sample (PTSS) estimators--a family of stochastic Krylov methods that integrate preconditioning with truncated Lanczos iterations. PTSS yields low-variance, stable estimators for linear system solutions, log-determinants, and their derivatives. We establish theoretical results on their mean, variance, and concentration properties, explicitly quantifying the variance reduction induced by preconditioning. Numerical experiments confirm that PTSS achieves superior stability and variance control compared with existing unbiased and biased alternatives, providing an efficient framework for stochastic optimization.

Paper Structure

This paper contains 20 sections, 9 theorems, 80 equations, 10 figures, 2 algorithms.

Key Result

Proposition 1

Let $\widetilde{\Phi}_{tss}$ be the TSS estimator in eq:TSS-estimator-again. Then where $\Delta_*:=\sum\limits_{j=i_{\min}}^{i_{\max}}\Delta_i.$ As a result, for $\Phi$ defined in eq:Phi-sum-again, $\widetilde{\Phi}_{tss}$ is unbiased if $i_{\max}=n$.

Figures (10)

  • Figure 1: Comparison of $\Gamma_{\textrm{Solve}}$ values in \ref{['eq:GammaSolve']} for different distributions $p_Q$: two commonly used distributions $\mathbb{P}_1(Q=j)\propto e^{-0.5j}$ and $\mathbb{P}_2(Q=j)\propto 2^{-j}$ versus the $\Gamma$-optimal distribution in \ref{['eq:pSolve-opt']}.
  • Figure 1: CG vs Lanczos with full reorthogonalization for solving linear systems with $\widehat{\mathbf{K}}$ and for solving eigenvalue problems. The tests use a Gaussian kernel with $\sigma=0.001$, $l=1.0$ and $f=1.0$ on a simple 1D dataset with $100$ points randomly distributed in $[0,100]$. The eigenvalue approximations are obtained from the Ritz values of the tridiagonal matrices generated by the CG and Lanczos algorithms.
  • Figure 1: Signed mean error of the quadratic form estimation $\mathbf y^\top \widehat{\mathbf K}^{-1}\mathbf y$ versus length-scale $l$ (RBF kernel with $f=1.0$, $\mu=10^{-2}$). We use $\mathbb{P}({Q}=j)\propto e^{-0.5j}$ with $i_{\min}=5$ and $i_{\max}=10$ for TSS estimator, fix both the rank and Schur complement fill level for AFN to 32, and sample 10,000 times for each $l$ to report the mean. We also include the results for AFN-T-$n$ with $i_{\min}$, $i_{\max}$, and $\lceil\mathbb{E}({Q})\rceil$ iterations.
  • Figure 2: $\Gamma$-optimal sampling distribution has lighter tail for smaller condition number $\kappa$. Left: TSS-Solve; Right: TSS-LogQF.
  • Figure 2: Signed mean error of the inverse quadratic form estimation $\mathbf y^\top \widehat{\mathbf K}^{-1}\mathbf y$ versus length-scale $l$ (RBF kernel with $f=1.0$, $\mu=10^{-2}$). The AFN-TSS-$n$ is compared with NP-TSS-$n$ with $\mathbb{P}({Q}=j)\propto e^{-0.5j}$, $\mathbb{P}({Q}=j)\propto 2^{-j}$, and the condition number-based distribution as in \ref{['eq:pSolve-opt']} using $i_{\min}=5$ and $i_{\max}=15$. We fix both the rank and Schur complement fill level for AFN to 64, and sample 10,000 times for each $l$ to report the mean (left) and the standard deviation (right).
  • ...and 5 more figures

Theorems & Definitions (18)

  • Proposition 1
  • Proof 1
  • Theorem 2: Mean and Variance of TSS-Solve
  • Proof 2
  • Corollary 3: Concentration of TSS-Solve
  • Proof 3
  • Theorem 4: Mean and Variance of TSS-LogQF
  • Proof 4
  • Corollary 5: Concentration of TSS-LogQF
  • Proof 5
  • ...and 8 more