LEMURS dataset: Large-scale multi-detector ElectroMagnetic Universal Representation of Showers
Peter McKeown, Piyush Raikwar, Anna Zaborowska
TL;DR
LEMURS tackles the need for scalable, cross-detector fast calorimeter simulations by providing a large-scale electromagnetic shower dataset across five detectors with diverse geometries. It introduces the Universal grid Representation to describe showers in a detector-agnostic, high-granularity 3D voxel grid, enabling transfer of fast-simulation concepts between detectors. The dataset comprises nearly 1 million EM showers per detector for training and a carefully designed 1,000-shower testing grid for physics validation, generated with Geant4 Par04 via the ddfastsim workflow and released openly in HDF5. This open resource, together with reproducible code and validation demonstrating practical utility (e.g., CaloDiT-2 pretraining), supports benchmarking, cross-detector studies, and foundation-model development in calorimetry for high-energy physics.
Abstract
We present LEMURS: an extensive dataset of simulated calorimeter showers designed to support the development and benchmarking of fast simulation methods in high-energy physics, most notably providing a step towards the development of foundation models. This new dataset is more robust than the well-established CaloChallenge dataset 2, featuring substantially greater statistics, a wider range of incident angles in the detector, and most crucially multiple detector geometries (including more realistic calorimeters). The dataset is provided in HDF5 format, with a file structure inspired by the CaloChallenge shower representation while also including more variables. LEMURS scale and diversity make it particularly suitable for development of foundation models and has been used in the CaloDiT-2 model, a pre-trained model released in the community standard simulation toolkit Geant4 (version 11.4.beta). All data and code for generation and analysis are openly accessible, facilitating reproducibility and reuse across the community.
