Table of Contents
Fetching ...

Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

Qilong Pan, Sameh Abdulah, Mustafa Abduljabbar, Hatem Ltaief, Andreas Herten, Mathis Bode, Matthew Pratola, Arindam Fadikar, Marc G. Genton, David E. Keyes, Ying Sun

Abstract

Emulating computationally intensive scientific simulations is crucial for enabling uncertainty quantification, optimization, and informed decision-making at scale. Gaussian Processes (GPs) offer a flexible and data-efficient foundation for statistical emulation, but their poor scalability limits applicability to large datasets. We introduce the Scaled Block Vecchia (SBV) algorithm for distributed GPU-based systems. SBV integrates the Scaled Vecchia approach for anisotropic input scaling with the Block Vecchia (BV) method to reduce computational and memory complexity while leveraging GPU acceleration techniques for efficient linear algebra operations. To the best of our knowledge, this is the first distributed implementation of any Vecchia-based GP variant. Our implementation employs MPI for inter-node parallelism and the MAGMA library for GPU-accelerated batched matrix computations. We demonstrate the scalability and efficiency of the proposed algorithm through experiments on synthetic and real-world workloads, including a 50M point simulation from a respiratory disease model. SBV achieves near-linear scalability on up to 512 A100 and GH200 GPUs, handles 2.56B points, and reduces energy use relative to exact GP solvers, establishing SBV as a scalable and energy-efficient framework for emulating large-scale scientific models on GPU-based distributed systems.

Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs

Abstract

Emulating computationally intensive scientific simulations is crucial for enabling uncertainty quantification, optimization, and informed decision-making at scale. Gaussian Processes (GPs) offer a flexible and data-efficient foundation for statistical emulation, but their poor scalability limits applicability to large datasets. We introduce the Scaled Block Vecchia (SBV) algorithm for distributed GPU-based systems. SBV integrates the Scaled Vecchia approach for anisotropic input scaling with the Block Vecchia (BV) method to reduce computational and memory complexity while leveraging GPU acceleration techniques for efficient linear algebra operations. To the best of our knowledge, this is the first distributed implementation of any Vecchia-based GP variant. Our implementation employs MPI for inter-node parallelism and the MAGMA library for GPU-accelerated batched matrix computations. We demonstrate the scalability and efficiency of the proposed algorithm through experiments on synthetic and real-world workloads, including a 50M point simulation from a respiratory disease model. SBV achieves near-linear scalability on up to 512 A100 and GH200 GPUs, handles 2.56B points, and reduces energy use relative to exact GP solvers, establishing SBV as a scalable and energy-efficient framework for emulating large-scale scientific models on GPU-based distributed systems.

Paper Structure

This paper contains 24 sections, 11 equations, 10 figures, 4 tables, 5 algorithms.

Figures (10)

  • Figure 1: The BV algorithm: (1) disjoint clustering, (2) Nearest Neighbor Searching (NNS), and (3) batched GPU log-likelihoods computation.
  • Figure 2: Distributed SBV pipeline. The pipeline for the distributed SBV algorithm: (0) parallel data loading; (1) anisotropic scaling and partitioning; (2) Random Anchor Clustering (RAC) to form disjoint spatial blocks; (3) neighborhood filtering for NNS within a radius $r = \lambda$; and (4) batched GPU computation of blockwise conditional log-likelihoods.
  • Figure 3: Filtered NNS algorithm pipeline. (1) workers expand local partitions by radius $\lambda$ and exchange boundary data; (2) finer candidates are selected within $r = \lambda$ around each block; (3) $m$ nearest neighbors are identified; (4) data and covariance matrices are generated.
  • Figure 4: Comparison of Vecchia-based GP methods on model fitting and prediction accuracy, i.e., Classic Vecchia (CV), Block Vecchia (BV), Scaled Vecchia (SV), Scaled Block Vecchia (SBV). Subfigures (a) and (b) evaluate the GP variants fitting (KL divergence) and prediction separately, showing that scaling and block enable better accuracy; subfigure (c) Investigate SBV with different block sizes (bs), showing that a larger $m$ and a smaller block size yield the best fitting accuracy; subfigure (d) Clarifying that the proposed RAC has comparable and robust log-likelihood with the K-means clustering and has the advantage of computationally linear complexity.
  • Figure 5: Root Mean Squared Percentage Error (RMSPE) of Vecchia-based GP models on six outputs from the satellite drag benchmark.
  • ...and 5 more figures