A multi-event interface for next-to-leading order calculations in MadGraph5_aMC@NLO
Rikkert Frederix, Stefan Roiser, Robert Schöfbeck, Zenny Wettersten, Marco Zaro
TL;DR
This work introduces a multi-event interface to enable batched evaluation of tree-level amplitudes across multiple phase-space points within a single NLO cross-section computation in MadGraph5_aMC@NLO. A multithreaded OpenMP proof-of-concept demonstrates data-parallel evaluation of tree-level amplitudes and reproduces sequential results within numerical caveats. The authors show that tree-level amplitudes dominate runtime in NLO event generation, motivating data-parallel strategies and paving the way for on-CPU SIMD and SIMT GPU acceleration. They discuss algorithmic adjustments and overheads, address practical challenges in phase-space cuts and unweighting, and outline a path toward scalable hardware-accelerated NLO event generation.
Abstract
We detail the implementation of a multi-event interface for next-to-leading order (NLO) calculations in MadGraph5_aMC@NLO, allowing tree-level scattering amplitudes for multiple phase space points to be evaluated in each call to the integrated NLO differential cross section during event generation. Additionally, a multithreaded implementation based on this multi-event interface where tree-level amplitudes are evaluated in parallel across multiple CPU threads is presented for the Monte Carlo generation of quantum chromodynamical (QCD) events. Although this work primarily concerns the implemented code, some algorithmic changes involving the order of the application of phase-space cuts and calls to different scattering amplitudes are included. The codebase currently supports multi-threaded execution, but these changes pave the way for continued data parallelism in the form of on-CPU SIMD instructions or SIMT GPU offloading. A study in the runtime fraction spent in different diagrammatic contributions across various processes suggests that NLO QCD event generation are computationally dominated by tree-level scattering amplitude evaluations, which we show are perfectly suited for data parallelisation.
