Table of Contents
Fetching ...

Machine learning in LHCb Simulation: From fast to flash

Michał Mazurek

TL;DR

This paper tackles the heavy computational cost of Monte Carlo simulations in LHCb, focusing on the calorimeter as a major bottleneck. It presents two ML-driven strategies: CaloML, a production-ready fast simulation for electromagnetic showers using variational autoencoders and profile-aware refinements, and Lamarr, a flash simulation framework that parameterizes high-level detector responses through modular ML pipelines (tracking, PID, and calorimeter effects) with GANs and advanced architectures. CaloML achieves up to $2$ orders of magnitude speedup while maintaining a reconstructed-energy systematic error as low as $0.01\%$, demonstrated by physics-validating observables such as $B^+$ and $B_s^0$ invariant masses; Lamarr delivers a similar $2$-order-of-magnitude CPU reduction in the full simulation chain, validated against detailed Geant4-based samples. The work also outlines concrete integration pathways (Gaussino, scikinC, PyLamarr) and future directions, including more sophisticated calorimeter modeling with Graph Attention Networks and Transformers and broader community deployment.

Abstract

Monte Carlo simulations are essential for physics analyses in high-energy physics, but their computational demands are continuously increasing. In LHCb, 90 % of computing resources are used for simulations, with the calorimeter simulation being the most computationally intensive part. Fast simulations and flash simulations, leveraging machine learning techniques, offer promising solutions to this challenge with different levels of detail and speed. The CaloML framework accelerates electromagnetic shower propagation of photons and electrons in the LHCb calorimeter by up to two orders of magnitude, achieving a systematic error on reconstructed energies as low as 0.01\%. Lamarr is an in-house flash simulation framework that reduces CPU time of the whole simulation phase by two orders of magnitude compared to traditional Geant4-based methods. In this paper, these two approaches are presented, highlighting their methodologies, performance, and validation results, as well as future development plans.

Machine learning in LHCb Simulation: From fast to flash

TL;DR

This paper tackles the heavy computational cost of Monte Carlo simulations in LHCb, focusing on the calorimeter as a major bottleneck. It presents two ML-driven strategies: CaloML, a production-ready fast simulation for electromagnetic showers using variational autoencoders and profile-aware refinements, and Lamarr, a flash simulation framework that parameterizes high-level detector responses through modular ML pipelines (tracking, PID, and calorimeter effects) with GANs and advanced architectures. CaloML achieves up to orders of magnitude speedup while maintaining a reconstructed-energy systematic error as low as , demonstrated by physics-validating observables such as and invariant masses; Lamarr delivers a similar -order-of-magnitude CPU reduction in the full simulation chain, validated against detailed Geant4-based samples. The work also outlines concrete integration pathways (Gaussino, scikinC, PyLamarr) and future directions, including more sophisticated calorimeter modeling with Graph Attention Networks and Transformers and broader community deployment.

Abstract

Monte Carlo simulations are essential for physics analyses in high-energy physics, but their computational demands are continuously increasing. In LHCb, 90 % of computing resources are used for simulations, with the calorimeter simulation being the most computationally intensive part. Fast simulations and flash simulations, leveraging machine learning techniques, offer promising solutions to this challenge with different levels of detail and speed. The CaloML framework accelerates electromagnetic shower propagation of photons and electrons in the LHCb calorimeter by up to two orders of magnitude, achieving a systematic error on reconstructed energies as low as 0.01\%. Lamarr is an in-house flash simulation framework that reduces CPU time of the whole simulation phase by two orders of magnitude compared to traditional Geant4-based methods. In this paper, these two approaches are presented, highlighting their methodologies, performance, and validation results, as well as future development plans.

Paper Structure

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: Preliminary physics validation of the CaloML fast simulation using two benchmark decay channels.
  • Figure 2: Scheme of the Lamarr modular pipeline, illustrating the distinct parameterization paths for charged and neutral particles.