An NLO-Matched Initial and Final State Parton Shower on a GPU
Michael H. Seymour, Siddharth Sule
TL;DR
The paper advances GPU-accelerated Monte Carlo event generation by presenting GAPS version 2, an NLO-matched initial- and final-state parton shower implemented on GPUs with a CPU-compatible reference. It introduces non-algorithmic computational improvements—namely partitioning of finished events and kernel tuning—that substantially reduce run time, achieving around 60 seconds for $10^6$ NLO events on a V100 compared to ~1 hour on a 96-core CPU cluster, with comparable energy consumption. Physics validation shows good agreement for $pp \to Z$ observables against Herwig and related NLO tools, confirming correct matching and shower behavior. The work demonstrates the practical viability of GPU-based event generation and outlines clear paths for further performance gains, including 2D kernels and extended hadronisation and MPI integration for a full GPU Event Generator.
Abstract
Recent developments have demonstrated the potential for high simulation speeds and reduced energy consumption by porting Monte Carlo Event Generators to GPUs. We release version 2 of the CUDA C++ parton shower event generator GAPS, which can simulate initial and final state emissions on a GPU and is capable of hard-process matching. As before, we accompany the generator with a near-identical C++ generator to run simulations on single-core and multi-core CPUs. Using these programs, we simulate NLO Z production at the LHC and demonstrate that the speed and energy consumption of an NVIDIA V100 GPU are on par with a 96-core cluster composed of two Intel Xeon Gold 5220R Processors, providing a potential alternative to cluster computing.
