Table of Contents
Fetching ...

A Comprehensive Evaluation of Generative Models in Calorimeter Shower Simulation

Farzana Yasmin Ahmad, Vanamala Venkataswamy, Geoffrey Fox

TL;DR

This work benchmarks three open-source calorimeter-shower surrogates—CaloDiffusion, CaloScore, and CaloINN—against Geant4 across CaloChallenge-2022 datasets, using a standardized suite of physics-, vision-, and statistics-inspired metrics. It additionally investigates full versus mixed-precision inference on GPUs to assess speed-accuracy tradeoffs. The study finds CaloDiffusion and CaloScore to be the most accurate overall, with CaloINN underperforming in several metrics; mixed precision generally preserves fidelity while offering meaningful speedups on higher-dimensional data. The results provide a comprehensive, transferable benchmark that can guide future surrogate-model development and integration into fast-simulation pipelines, while highlighting areas for improvement in fidelity, especially for certain observables and low-dimensional regimes.

Abstract

The pursuit of understanding fundamental particle interactions has reached unparalleled precision levels. Particle physics detectors play a crucial role in generating low-level object signatures that encode collision physics. However, simulating these particle collisions is a demanding task in terms of memory and computation which will be exasperated with larger data volumes, more complex detectors, and a higher pileup environment in the High-Luminosity LHC. The introduction of "Fast Simulation" has been pivotal in overcoming computational bottlenecks. The use of deep-generative models has sparked a surge of interest in surrogate modeling for detector simulations, generating particle showers that closely resemble the observed data. Nonetheless, there is a pressing need for a comprehensive evaluation of their performance using a standardized set of metrics. In this study, we conducted a rigorous evaluation of three generative models using standard datasets and a diverse set of metrics derived from physics, computer vision, and statistics. Furthermore, we explored the impact of using full versus mixed precision modes during inference. Our evaluation revealed that the CaloDiffusion and CaloScore generative models demonstrate the most accurate simulation of particle showers, yet there remains substantial room for improvement. Our findings identified areas where the evaluated models fell short in accurately replicating Geant4 data.

A Comprehensive Evaluation of Generative Models in Calorimeter Shower Simulation

TL;DR

This work benchmarks three open-source calorimeter-shower surrogates—CaloDiffusion, CaloScore, and CaloINN—against Geant4 across CaloChallenge-2022 datasets, using a standardized suite of physics-, vision-, and statistics-inspired metrics. It additionally investigates full versus mixed-precision inference on GPUs to assess speed-accuracy tradeoffs. The study finds CaloDiffusion and CaloScore to be the most accurate overall, with CaloINN underperforming in several metrics; mixed precision generally preserves fidelity while offering meaningful speedups on higher-dimensional data. The results provide a comprehensive, transferable benchmark that can guide future surrogate-model development and integration into fast-simulation pipelines, while highlighting areas for improvement in fidelity, especially for certain observables and low-dimensional regimes.

Abstract

The pursuit of understanding fundamental particle interactions has reached unparalleled precision levels. Particle physics detectors play a crucial role in generating low-level object signatures that encode collision physics. However, simulating these particle collisions is a demanding task in terms of memory and computation which will be exasperated with larger data volumes, more complex detectors, and a higher pileup environment in the High-Luminosity LHC. The introduction of "Fast Simulation" has been pivotal in overcoming computational bottlenecks. The use of deep-generative models has sparked a surge of interest in surrogate modeling for detector simulations, generating particle showers that closely resemble the observed data. Nonetheless, there is a pressing need for a comprehensive evaluation of their performance using a standardized set of metrics. In this study, we conducted a rigorous evaluation of three generative models using standard datasets and a diverse set of metrics derived from physics, computer vision, and statistics. Furthermore, we explored the impact of using full versus mixed precision modes during inference. Our evaluation revealed that the CaloDiffusion and CaloScore generative models demonstrate the most accurate simulation of particle showers, yet there remains substantial room for improvement. Our findings identified areas where the evaluated models fell short in accurately replicating Geant4 data.
Paper Structure (37 sections, 7 equations, 30 figures, 5 tables)

This paper contains 37 sections, 7 equations, 30 figures, 5 tables.

Figures (30)

  • Figure 1: Histogram of two physics observables for dataset 2.
  • Figure 2: Layer wise correlation - dataset 2
  • Figure 3: Layer wise correlation - dataset 3
  • Figure 4: EMD scores and separation power for sparsity with datasets 2 and 3.
  • Figure 5: Comparison of histogram of two physics observables with dataset 3 between full precision and mixed precision mode using CaloDiffusion.
  • ...and 25 more figures