Table of Contents
Fetching ...

Fast Approximate Determinants Using Rational Functions

Thomas Colthurst, Srinivas Vasudevan, James Lottes, Brian Patton

TL;DR

This paper addresses efficient approximation of $\log \det M$ for large SPD matrices, a key operation in Gaussian process training. It introduces the r* family of determinant estimators that combine rational approximations to $\log x$ with Hutchinson's trace estimator and a randomized, truncated SVD preconditioner, plus a multi-shift Krylov solver for efficient evaluation. Among the proposed approximants, the third-order $r_3$ (and $r_5$) provides the best trade-off between accuracy and speed across Matérn-$5/2$ and RBF GP covariance matrices, outperforming stochastic Lanczos quadrature at similar runtimes, particularly for higher intrinsic dimension $d$. The method is implemented in TensorFlow Probability and JAX, with CPU and GPU benchmarks validating the approach and highlighting dimension-dependent accuracy and preconditioner effects. These results enable scalable determinant estimation for large GP models and related covariance-structured problems.

Abstract

We show how rational function approximations to the logarithm, such as $\log z \approx (z^2 - 1)/(z^2 + 6z + 1)$, can be turned into fast algorithms for approximating the determinant of a very large matrix. We empirically demonstrate that when combined with a good preconditioner, the third order rational function approximation offers a very good trade-off between speed and accuracy when measured on matrices coming from Matérn-$5/2$ and radial basis function Gaussian process kernels. In particular, it is significantly more accurate on those matrices than the state-of-the-art stochastic Lanczos quadrature method for approximating determinants while running at about the same speed.

Fast Approximate Determinants Using Rational Functions

TL;DR

This paper addresses efficient approximation of for large SPD matrices, a key operation in Gaussian process training. It introduces the r* family of determinant estimators that combine rational approximations to with Hutchinson's trace estimator and a randomized, truncated SVD preconditioner, plus a multi-shift Krylov solver for efficient evaluation. Among the proposed approximants, the third-order (and ) provides the best trade-off between accuracy and speed across Matérn- and RBF GP covariance matrices, outperforming stochastic Lanczos quadrature at similar runtimes, particularly for higher intrinsic dimension . The method is implemented in TensorFlow Probability and JAX, with CPU and GPU benchmarks validating the approach and highlighting dimension-dependent accuracy and preconditioner effects. These results enable scalable determinant estimation for large GP models and related covariance-structured problems.

Abstract

We show how rational function approximations to the logarithm, such as , can be turned into fast algorithms for approximating the determinant of a very large matrix. We empirically demonstrate that when combined with a good preconditioner, the third order rational function approximation offers a very good trade-off between speed and accuracy when measured on matrices coming from Matérn- and radial basis function Gaussian process kernels. In particular, it is significantly more accurate on those matrices than the state-of-the-art stochastic Lanczos quadrature method for approximating determinants while running at about the same speed.
Paper Structure (6 sections, 10 equations, 17 figures, 1 table)

This paper contains 6 sections, 10 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: $\log z - r_i(z)$ for the log approximations defined in equation 1.
  • Figure 2: Comparison of $\log \det$ algorithms as a function of $n$ on the radial basis function kernel with $d=1$ as measured on an Intel CPU. All measurements are averages over 100 randomly generated kernels.
  • Figure 3: Comparison of $\log \det$ algorithms as a function of $n$ on the radial basis function kernel with $d=5$ as measured on an Intel CPU. All measurements are averages over 100 randomly generated kernels.
  • Figure 4: Comparison of $\log \det$ algorithms as a function of $n$ on the Matérn-$5/2$ kernel with $d=1$ as measured on an Intel CPU. All measurements are averages over 100 randomly generated kernels.
  • Figure 5: Comparison of $\log \det$ algorithms as a function of $n$ on the Matérn-$5/2$ kernel with $d=5$ as measured on an Intel CPU. All measurements are averages over 100 randomly generated kernels.
  • ...and 12 more figures