Table of Contents
Fetching ...

Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation

Qibin Liu, Chase Shimmin, Xiulong Liu, Eli Shlizerman, Shu Li, Shih-Chieh Hsu

TL;DR

Calo-VQ tackles the computational bottleneck of calorimeter detector simulation by introducing a two-stage generative framework that first compresses high-dimensional calorimeter responses into a discrete latent via a VQ-VAE and then autoregressively samples latent tokens with a GPT-style model conditioned on incident energy and auxiliary information. The approach achieves over $>2000\times$ speedup compared with GEANT4, delivering millisecond-scale shower generation on Calo-challenge data while maintaining faithful distributions of energy deposition and shower-shape metrics across four datasets with high granularity. The method uses geometry-aware encoders (including cylindrical convolutions and FFT-based resampling) and demonstrates adaptability to both irregular and cylinder-like detector geometries, making it scalable to up to 40,500 channels. Although not surpassing state-of-the-art normalizing-flow or diffusion models in raw accuracy, Calo-VQ provides a practical, flexible speed-accuracy trade-off and points to future improvements via latent-diffusion in the second stage. This work thus offers a viable path toward fast, large-scale calorimeter simulations for next-generation high-energy physics experiments.

Abstract

We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.

Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation

TL;DR

Calo-VQ tackles the computational bottleneck of calorimeter detector simulation by introducing a two-stage generative framework that first compresses high-dimensional calorimeter responses into a discrete latent via a VQ-VAE and then autoregressively samples latent tokens with a GPT-style model conditioned on incident energy and auxiliary information. The approach achieves over speedup compared with GEANT4, delivering millisecond-scale shower generation on Calo-challenge data while maintaining faithful distributions of energy deposition and shower-shape metrics across four datasets with high granularity. The method uses geometry-aware encoders (including cylindrical convolutions and FFT-based resampling) and demonstrates adaptability to both irregular and cylinder-like detector geometries, making it scalable to up to 40,500 channels. Although not surpassing state-of-the-art normalizing-flow or diffusion models in raw accuracy, Calo-VQ provides a practical, flexible speed-accuracy trade-off and points to future improvements via latent-diffusion in the second stage. This work thus offers a viable path toward fast, large-scale calorimeter simulations for next-generation high-energy physics experiments.

Abstract

We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.
Paper Structure (16 sections, 8 equations, 16 figures, 7 tables, 2 algorithms)

This paper contains 16 sections, 8 equations, 16 figures, 7 tables, 2 algorithms.

Figures (16)

  • Figure 1: Demonstration of the vector-quantization based two-stage generative model. The upper and lower parts show the two stages of the model, respectively.
  • Figure 2: Architecture of Stage-1 Model.
  • Figure 3: Validation of cylindrical convolution. Equivariant property is well kept for different transformation.
  • Figure 4: Demonstration of FFT down-sampling.
  • Figure 5: Average energy deposition of calorimeter cells. Each shower is induced by single incident of pion. Generated on the top and reference on the bottom.
  • ...and 11 more figures