Machine learning in online and offline reconstruction and identification with CMS
Uttiya Sarkar
TL;DR
The paper surveys the expanding role of machine learning in CMS online and offline reconstruction and identification, highlighting a shift from traditional cut-based methods to deep learning and transformer-based approaches to handle Run-3 and HL-LHC challenges. It details key algorithms across jets, taus, electrons, photons, and muons, including ParticleNet in the HLT, UParT for offline heavy-flavor tagging, DeepTau, DeepSuperCluster, and TICL with GNNs for phase-2 reconstruction. It also emphasizes robustness advances such as adversarial training and domain adaptation to mitigate simulation mismodeling and detector effects. Collectively, these developments enhance tagging efficiency, energy resolution, and background rejection, enabling CMS to maximize physics potential under extreme pileup at the HL-LHC and maintain leadership in ML-driven particle identification.
Abstract
Machine learning (ML) plays an increasingly important role in both online and offline event reconstruction and identification at CMS experiment. A variety of ML techniques are used to improve the identification of physics objects. Dedicated algorithms enhance jet flavor tagging, including new approaches that strengthen sensitivity to Higgs boson decays to charm quarks. Tau identification has been significantly improved with ML-based methods, while in the electromagnetic calorimeter, ML-driven clustering techniques provide better energy reconstruction. Muon identification also benefits from multivariate approaches, leading to a higher signal efficiency and more background rejection. Looking at the future, ML will be central to the reconstruction strategy for the High-Granularity Calorimeter at high-luminosity LHC. New algorithms for the upgraded detectors are being developed to cope with extreme pileup conditions. All these advances ensure that CMS can fully exploit the physics potential of Run-3 and the HL-LHC, while also exploring novel ML strategies to maintain robust performance under evolving experimental conditions.
