Table of Contents
Fetching ...

GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations

Qilong Pan, Sameh Abdulah, Marc G. Genton, David E. Keyes, Hatem Ltaief, Ying Sun

TL;DR

This paper tackles the computational bottleneck of Gaussian process modeling for massive geospatial datasets by deploying a GPU-accelerated Vecchia approximation with batched matrix computations. By leveraging batched POTRF/TRSV operations via the KBLAS library, the approach achieves substantial speedups (up to roughly 700–1380X) over exact MLE on modern NVIDIA GPUs and scales to up to 1 million locations on a single GPU while preserving accuracy. The authors validate the method on synthetic studies and real data (soil moisture in the Mississippi Basin and wind speed in the Middle East), showing that a conditioning size around $m=60$ yields near-exact parameter estimates and predictions. The work demonstrates a practical, scalable solution for high-resolution geospatial analyses and offers an open-source implementation to enable adoption in HPC geostatistics workflows.

Abstract

Gaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.

GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations

TL;DR

This paper tackles the computational bottleneck of Gaussian process modeling for massive geospatial datasets by deploying a GPU-accelerated Vecchia approximation with batched matrix computations. By leveraging batched POTRF/TRSV operations via the KBLAS library, the approach achieves substantial speedups (up to roughly 700–1380X) over exact MLE on modern NVIDIA GPUs and scales to up to 1 million locations on a single GPU while preserving accuracy. The authors validate the method on synthetic studies and real data (soil moisture in the Mississippi Basin and wind speed in the Middle East), showing that a conditioning size around yields near-exact parameter estimates and predictions. The work demonstrates a practical, scalable solution for high-resolution geospatial analyses and offers an open-source implementation to enable adoption in HPC geostatistics workflows.

Abstract

Gaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.
Paper Structure (22 sections, 9 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: The example of random and Morton ordering on locations $20 \times 20$. (First row) The 45th and 250th locations (red stars) in the random ordering are marked with their nearest neighbors (orange circle); (Second row) The 45th and 250th locations (red stars) in the Morton order algorithm are marked with their nearest neighbors (orange circle). Blue circles indicate past locations for a given ordering algorithm.
  • Figure 2: Batched Vecchia algorithm description. $\bm \Sigma_{m:n}$ are constructed by the nearest neighbors of $\mathbf y^{\tau}_{m:n}$. The batched POTRF routine is applied to these matrices. After this decomposition, the resulting outputs are utilized as inputs for the batched TRSV operation with $\mathbf v_{m:n}$ and $\mathbf y^{\tau}_{\mathbf J_{m:n}}$, separately.
  • Figure 3: Contiguous data allocation in the GPU global memory.
  • Figure 4: Comparison of Arithmetic complexity: Vecchia algorithm versus Exact MLE.
  • Figure 5: KL divergence under 180K locations with log10 scale (y-axis is KL divergence). The red dashed line is the recommended threshold for choosing the conditioning size.
  • ...and 4 more figures