Table of Contents
Fetching ...

Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research

Uraz Odyurt, Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Sascha Caron

TL;DR

This paper tackles the computational bottleneck of particle track reconstruction in high-energy physics by proposing REDVID, a complexity-reduced, parametric detector and event simulator designed for simulation-in-the-loop ML workflows. REDVID reduces the behavioural-space and model complexity while preserving essential causal relations, enabling rapid data generation and design-space exploration for ML-based tracking approaches. It introduces a ROM-inspired reduction strategy, a configurable cylindrical detector geometry, multiple track randomisation protocols, and publicly available data sets along with performance benchmarks that show linear scaling of CPU time with track count. The work provides a practical, open-source middle ground between physics-accurate simulators and synthetic data generators, with pedagogical value and clear pathways for extending the framework to broader ML exploration tasks in HEP.

Abstract

Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available. Data generated using REDVID has enabled rapid development of multiple novel ML model designs, which is currently ongoing.

Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research

TL;DR

This paper tackles the computational bottleneck of particle track reconstruction in high-energy physics by proposing REDVID, a complexity-reduced, parametric detector and event simulator designed for simulation-in-the-loop ML workflows. REDVID reduces the behavioural-space and model complexity while preserving essential causal relations, enabling rapid data generation and design-space exploration for ML-based tracking approaches. It introduces a ROM-inspired reduction strategy, a configurable cylindrical detector geometry, multiple track randomisation protocols, and publicly available data sets along with performance benchmarks that show linear scaling of CPU time with track count. The work provides a practical, open-source middle ground between physics-accurate simulators and synthetic data generators, with pedagogical value and clear pathways for extending the framework to broader ML exploration tasks in HEP.

Abstract

Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available. Data generated using REDVID has enabled rapid development of multiple novel ML model designs, which is currently ongoing.
Paper Structure (29 sections, 4 equations, 7 figures, 1 table)

This paper contains 29 sections, 4 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Simulation complexity spectrum is shown from the most simplistic to the most realistic, with high complexity rates for both model and simulator. Depending on the enabled features, different simulators are capable of providing different levels of complexity, depicted as grey areas. ATLFAST is not included for lack of hit data generation. Note that the figure does not cover data reduction strategies, which are not relevant to changes in model or simulator complexity.
  • Figure 2: An overview of a reduced simulation as part of a ML model design workflow, e.g., a Neural Architecture Search (NAS), by providing the data set. This paper focuses on the area with the yellow fill, covered by our simulation tool, REDVID.
  • Figure 3: The fully parametric detector geometry, allowing for inclusion/exclusion of different sub-detector types, with full control over sub-layer counts, sizes and placements.
  • Figure 4: An overview of the REDVID modules, including a detector model generator, an event simulator, generating randomised tracks and calculating sub-detector hit points based on tracks and geometric data, as well as different reporting elements.
  • Figure 5: Basic definition and parameters of the cylindrical coordinate system, radial distance, azimuthal, height ($r$, $\theta$, $z$), which is the basis of our geometric structures.
  • ...and 2 more figures