Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

Xiran Zhang; Sameh Abdulah; Jian Cao; Hatem Ltaief; Ying Sun; Marc G. Genton; David E. Keyes

Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

Xiran Zhang, Sameh Abdulah, Jian Cao, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes

TL;DR

This work tackles the computational bottleneck of high-dimensional MVN probability in confidence region detection by proposing a parallel SOV framework powered by tile-based linear algebra (Chameleon/StarPU) and Tile Low-Rank (TLR) approximations (HiCMA). It demonstrates substantial speedups (up to ~20X on shared memory and up to ~1.8X on distributed memory) while preserving accuracy across synthetic and wind-speed datasets, enabling confidence region detection over hundreds of thousands of locations. The approach includes a dense and a compressed (TLR) PMVN implementation and a QMC-based Monte Carlo integration, optimized via task-based parallelism and dynamic runtimes. The work advances scalable, high-accuracy MVN computations for large-scale spatial applications, with practical impact on wind energy planning and environmental risk assessment.

Abstract

Addressing the statistical challenge of computing the multivariate normal (MVN) probability in high dimensions holds significant potential for enhancing various applications. One common way to compute high-dimensional MVN probabilities is the Separation-of-Variables (SOV) algorithm. This algorithm is known for its high computational complexity of O(n^3) and space complexity of O(n^2), mainly due to a Cholesky factorization operation for an n X n covariance matrix, where $n$ represents the dimensionality of the MVN problem. This work proposes a high-performance computing framework that allows scaling the SOV algorithm and, subsequently, the confidence region detection algorithm. The framework leverages parallel linear algebra algorithms with a task-based programming model to achieve performance scalability in computing process probabilities, especially on large-scale systems. In addition, we enhance our implementation by incorporating Tile Low-Rank (TLR) approximation techniques to reduce algorithmic complexity without compromising the necessary accuracy. To evaluate the performance and accuracy of our framework, we conduct assessments using simulated data and a wind speed dataset. Our proposed implementation effectively handles high-dimensional multivariate normal (MVN) probability computations on shared and distributed-memory systems using finite precision arithmetics and TLR approximation computation. Performance results show a significant speedup of up to 20X in solving the MVN problem using TLR approximation compared to the reference dense solution without sacrificing the application's accuracy. The qualitative results on synthetic and real datasets demonstrate how we maintain high accuracy in detecting confidence regions even when relying on TLR approximation to perform the underlying linear algebra operations.

Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

TL;DR

Abstract

represents the dimensionality of the MVN problem. This work proposes a high-performance computing framework that allows scaling the SOV algorithm and, subsequently, the confidence region detection algorithm. The framework leverages parallel linear algebra algorithms with a task-based programming model to achieve performance scalability in computing process probabilities, especially on large-scale systems. In addition, we enhance our implementation by incorporating Tile Low-Rank (TLR) approximation techniques to reduce algorithmic complexity without compromising the necessary accuracy. To evaluate the performance and accuracy of our framework, we conduct assessments using simulated data and a wind speed dataset. Our proposed implementation effectively handles high-dimensional multivariate normal (MVN) probability computations on shared and distributed-memory systems using finite precision arithmetics and TLR approximation computation. Performance results show a significant speedup of up to 20X in solving the MVN problem using TLR approximation compared to the reference dense solution without sacrificing the application's accuracy. The qualitative results on synthetic and real datasets demonstrate how we maintain high accuracy in detecting confidence regions even when relying on TLR approximation to perform the underlying linear algebra operations.

Paper Structure (24 sections, 8 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 24 sections, 8 equations, 7 figures, 3 tables, 3 algorithms.

Introduction
Contributions
Background
Multivariate Normal (MVN) Probability
Separation-Of-Variable (SOV) Algorithm
Confidence Region Detection
Task-based Linear Algebra Libraries and Dynamic Runtime Systems
Tile Low-Rank (TLR) Approximation
Confidence Region Detection Framework
Confidence Region Detection Algorithm
Multivariate Normal Probability (PMVN) Algorithm
Quasi-Monte Carlo (QMC) Algorithm
Results
Environment Settings
Datasets
...and 9 more sections

Figures (7)

Figure 1: Confidence region detection accuracy assessment using $40$K synthetic datasets generated in a regular grid with varying correlation levels. The figure illustrates the accuracy of both dense and TLR results compared to MC results (MC error) and the accuracy of TLR results compared to dense results.
Figure 2: Results of summer wind speed data (on July 15, 2015) in the Middle East (Saudi Arabia).
Figure 3: Difference between dense and TLR results of summer wind speed data (on July 15, 2015) in Saudi Arabia.
Figure 4: Performance of one MVN integration operation on multiple shared-memory architectures using dense and TLR approximation.
Figure 5: Rank distributions of a $19600 \times 19600$ covariance matrix using a $980$ tile size with Matérn covariance function under three different settings when compressing the matrix using TLR approximation with accuracy 1e-3 (preserves the application accuracy).
...and 2 more figures

Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

TL;DR

Abstract

Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (7)