Table of Contents
Fetching ...

cuVegas: Accelerate Multidimensional Monte Carlo Integration through a Parallelized CUDA-based Implementation of the VEGAS Enhanced Algorithm

Emiliano Tolotti, Anas Jnini, Flavio Vella, Roberto Passerone

TL;DR

cuVegas presents a CUDA-based implementation of the VEGAS+ adaptive multidimensional Monte Carlo algorithm, maximizing GPU parallelism through a batch-oriented evaluation scheme, on-GPU map updates, and multi-GPU data sharing. It integrates adaptive importance sampling and adaptive stratified sampling with estimation aggregation, delivering substantial speedups over CPU VEGAS and competing GPU frameworks, especially for integrands with multiple peaks or diagonal structures. The paper provides extensive performance analyses, including benchmarks on Asian option pricing and Feynman path integrals, and demonstrates strong multi-GPU scalability with careful memory and RNG optimizations. The work shows that VEGAS+ on GPUs can achieve practical, real-world improvements in accuracy-time tradeoffs for high-dimensional integration tasks, with a usable Python binding for integration into scientific workflows.

Abstract

This paper introduces cuVegas, a CUDA-based implementation of the Vegas Enhanced Algorithm (VEGAS+), optimized for multi-dimensional integration in GPU environments. The VEGAS+ algorithm is an advanced form of Monte Carlo integration, recognized for its adaptability and effectiveness in handling complex, high-dimensional integrands. It employs a combination of variance reduction techniques, namely adaptive importance sampling and a variant of adaptive stratified sampling, that make it particularly adept at managing integrands with multiple peaks or those aligned with the diagonals of the integration volume. Being a Monte Carlo integration method, the task is well suited for parallelization and for GPU execution. Our implementation, cuVegas, aims to harness the inherent parallelism of GPUs, addressing the challenge of workload distribution that often hampers efficiency in standard implementations. We present a comprehensive analysis comparing cuVegas with existing CPU and GPU implementations, demonstrating significant performance improvements, from two to three orders of magnitude on CPUs, and from a factor of two on GPUs over the best existing implementation. We also demonstrate the speedup for integrands for which VEGAS+ was designed, with multiple peaks or other significant structures aligned with diagonals of the integration volume.

cuVegas: Accelerate Multidimensional Monte Carlo Integration through a Parallelized CUDA-based Implementation of the VEGAS Enhanced Algorithm

TL;DR

cuVegas presents a CUDA-based implementation of the VEGAS+ adaptive multidimensional Monte Carlo algorithm, maximizing GPU parallelism through a batch-oriented evaluation scheme, on-GPU map updates, and multi-GPU data sharing. It integrates adaptive importance sampling and adaptive stratified sampling with estimation aggregation, delivering substantial speedups over CPU VEGAS and competing GPU frameworks, especially for integrands with multiple peaks or diagonal structures. The paper provides extensive performance analyses, including benchmarks on Asian option pricing and Feynman path integrals, and demonstrates strong multi-GPU scalability with careful memory and RNG optimizations. The work shows that VEGAS+ on GPUs can achieve practical, real-world improvements in accuracy-time tradeoffs for high-dimensional integration tasks, with a usable Python binding for integration into scientific workflows.

Abstract

This paper introduces cuVegas, a CUDA-based implementation of the Vegas Enhanced Algorithm (VEGAS+), optimized for multi-dimensional integration in GPU environments. The VEGAS+ algorithm is an advanced form of Monte Carlo integration, recognized for its adaptability and effectiveness in handling complex, high-dimensional integrands. It employs a combination of variance reduction techniques, namely adaptive importance sampling and a variant of adaptive stratified sampling, that make it particularly adept at managing integrands with multiple peaks or those aligned with the diagonals of the integration volume. Being a Monte Carlo integration method, the task is well suited for parallelization and for GPU execution. Our implementation, cuVegas, aims to harness the inherent parallelism of GPUs, addressing the challenge of workload distribution that often hampers efficiency in standard implementations. We present a comprehensive analysis comparing cuVegas with existing CPU and GPU implementations, demonstrating significant performance improvements, from two to three orders of magnitude on CPUs, and from a factor of two on GPUs over the best existing implementation. We also demonstrate the speedup for integrands for which VEGAS+ was designed, with multiple peaks or other significant structures aligned with diagonals of the integration volume.
Paper Structure (31 sections, 11 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 31 sections, 11 equations, 8 figures, 10 tables, 2 algorithms.

Figures (8)

  • Figure 1: Parallelization diagram of the program in a single GPU setting.
  • Figure 2: Parallelization diagram of the program in a multi GPU setting.
  • Figure 3: VegasFill kernel performance scaling, changing algorithm parameters. The testing parameters are reported in Table \ref{['tab:scaling']}. Blue dots represent mean execution time values of the total kernel calls in the program and bars represent standard error. Orange and light blue dots represent minimum and maximum execution time respectively.
  • Figure 4: Performance comparison of cuVegas, Vegas, TorchQuad and VegasFlow across seven test functions. On the $y$-axis the average wall-clock time is plotted against the average relative standard error on the $x$-axis. Axes are in log-scale. Lines represent the geometric mean over the seven integrands.
  • Figure 5: Speedup of multiple GPUs with respect to the single GPU version for the Ridge integrand, varying the number of function evaluations.
  • ...and 3 more figures