Calo-VQ: Vector-Quantized Two-Stage Generative Model in Calorimeter Simulation
Qibin Liu, Chase Shimmin, Xiulong Liu, Eli Shlizerman, Shu Li, Shih-Chieh Hsu
TL;DR
Calo-VQ tackles the computational bottleneck of calorimeter detector simulation by introducing a two-stage generative framework that first compresses high-dimensional calorimeter responses into a discrete latent via a VQ-VAE and then autoregressively samples latent tokens with a GPT-style model conditioned on incident energy and auxiliary information. The approach achieves over $>2000\times$ speedup compared with GEANT4, delivering millisecond-scale shower generation on Calo-challenge data while maintaining faithful distributions of energy deposition and shower-shape metrics across four datasets with high granularity. The method uses geometry-aware encoders (including cylindrical convolutions and FFT-based resampling) and demonstrates adaptability to both irregular and cylinder-like detector geometries, making it scalable to up to 40,500 channels. Although not surpassing state-of-the-art normalizing-flow or diffusion models in raw accuracy, Calo-VQ provides a practical, flexible speed-accuracy trade-off and points to future improvements via latent-diffusion in the second stage. This work thus offers a viable path toward fast, large-scale calorimeter simulations for next-generation high-energy physics experiments.
Abstract
We introduce a novel machine learning method developed for the fast simulation of calorimeter detector response, adapting vector-quantized variational autoencoder (VQ-VAE). Our model adopts a two-stage generation strategy: initially compressing geometry-aware calorimeter data into a discrete latent space, followed by the application of a sequence model to learn and generate the latent tokens. Extensive experimentation on the Calo-challenge dataset underscores the efficiency of our approach, showcasing a remarkable improvement in the generation speed compared with conventional method by a factor of 2000. Remarkably, our model achieves the generation of calorimeter showers within milliseconds. Furthermore, comprehensive quantitative evaluations across various metrics are performed to validate physics performance of generation.
