Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research
Uraz Odyurt, Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Sascha Caron
TL;DR
This paper tackles the computational bottleneck of particle track reconstruction in high-energy physics by proposing REDVID, a complexity-reduced, parametric detector and event simulator designed for simulation-in-the-loop ML workflows. REDVID reduces the behavioural-space and model complexity while preserving essential causal relations, enabling rapid data generation and design-space exploration for ML-based tracking approaches. It introduces a ROM-inspired reduction strategy, a configurable cylindrical detector geometry, multiple track randomisation protocols, and publicly available data sets along with performance benchmarks that show linear scaling of CPU time with track count. The work provides a practical, open-source middle ground between physics-accurate simulators and synthetic data generators, with pedagogical value and clear pathways for extending the framework to broader ML exploration tasks in HEP.
Abstract
Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available. Data generated using REDVID has enabled rapid development of multiple novel ML model designs, which is currently ongoing.
