Table of Contents
Fetching ...

Ultra Fast Calorimeter Simulation with Generative Machine Learning on FPGAs

P. Alex May, Qibin Liu, Julia Gonski, Benjamin Nachman

Abstract

Computationally expensive, high-accuracy detector simulations are a major bottleneck for many particle physics experiments such as those at the Large Hadron Collider (LHC) as well as those planned for future colliders. This challenge has motivated the development of fast generative machine learning based surrogates. We present a hardware-aware variational autoencoder model for fast calorimeter simulation that is designed specifically for field programmable gate array (FPGA) deployment, offering faster and lower power inference capability. Quantization aware training and other compression techniques are applied to respect the resource constraints of a single FPGA. The synthesized implementation of the VAE decoder achieves sub-millisecond latency, resulting in a substantial speed up compared to a traditional GPU implementation with only a small performance drop. This feasibility study demonstrates the potential of utilizing existing FPGA architecture at the LHC and other facilities for efficient offline computing using online resources.

Ultra Fast Calorimeter Simulation with Generative Machine Learning on FPGAs

Abstract

Computationally expensive, high-accuracy detector simulations are a major bottleneck for many particle physics experiments such as those at the Large Hadron Collider (LHC) as well as those planned for future colliders. This challenge has motivated the development of fast generative machine learning based surrogates. We present a hardware-aware variational autoencoder model for fast calorimeter simulation that is designed specifically for field programmable gate array (FPGA) deployment, offering faster and lower power inference capability. Quantization aware training and other compression techniques are applied to respect the resource constraints of a single FPGA. The synthesized implementation of the VAE decoder achieves sub-millisecond latency, resulting in a substantial speed up compared to a traditional GPU implementation with only a small performance drop. This feasibility study demonstrates the potential of utilizing existing FPGA architecture at the LHC and other facilities for efficient offline computing using online resources.
Paper Structure (12 sections, 7 equations, 8 figures, 5 tables)

This paper contains 12 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Demonstration of the dataset geometry. The variation of granularity along the radial direction for different layer is not reflected in the schematic for visualization reason.
  • Figure 2: Visualization of the VAE model. Forward propagation runs from left to right starting with the preprocessed input vector $x$ and condition $x_\mathrm{con}$ which feed into the encoder to output the vectors $\mu$ and $\sigma$. The condition $x_\mathrm{con}$ and the latent vector $z$ pass through the decoder and output the reconstructed ratios which are concatenated into the output vector $\tilde{x}$.
  • Figure 3: Example of generated shower at $2^{12}$ MeV created by the VAE-FPGA model.
  • Figure 4: Average per-layer energy deposition, comparing the Geant4 truth (top) and VAE-FPGA generated shower (bottom).
  • Figure 5: Energy response (left) and voxel energy distribution (right) histograms, comparing the Geant4 truth (gray) and VAE-FPGA generated shower (blue). The separation metric $S$ is provided for each feature, indicating good agreement.
  • ...and 3 more figures