Table of Contents
Fetching ...

Machine learning in online and offline reconstruction and identification with CMS

Uttiya Sarkar

TL;DR

The paper surveys the expanding role of machine learning in CMS online and offline reconstruction and identification, highlighting a shift from traditional cut-based methods to deep learning and transformer-based approaches to handle Run-3 and HL-LHC challenges. It details key algorithms across jets, taus, electrons, photons, and muons, including ParticleNet in the HLT, UParT for offline heavy-flavor tagging, DeepTau, DeepSuperCluster, and TICL with GNNs for phase-2 reconstruction. It also emphasizes robustness advances such as adversarial training and domain adaptation to mitigate simulation mismodeling and detector effects. Collectively, these developments enhance tagging efficiency, energy resolution, and background rejection, enabling CMS to maximize physics potential under extreme pileup at the HL-LHC and maintain leadership in ML-driven particle identification.

Abstract

Machine learning (ML) plays an increasingly important role in both online and offline event reconstruction and identification at CMS experiment. A variety of ML techniques are used to improve the identification of physics objects. Dedicated algorithms enhance jet flavor tagging, including new approaches that strengthen sensitivity to Higgs boson decays to charm quarks. Tau identification has been significantly improved with ML-based methods, while in the electromagnetic calorimeter, ML-driven clustering techniques provide better energy reconstruction. Muon identification also benefits from multivariate approaches, leading to a higher signal efficiency and more background rejection. Looking at the future, ML will be central to the reconstruction strategy for the High-Granularity Calorimeter at high-luminosity LHC. New algorithms for the upgraded detectors are being developed to cope with extreme pileup conditions. All these advances ensure that CMS can fully exploit the physics potential of Run-3 and the HL-LHC, while also exploring novel ML strategies to maintain robust performance under evolving experimental conditions.

Machine learning in online and offline reconstruction and identification with CMS

TL;DR

The paper surveys the expanding role of machine learning in CMS online and offline reconstruction and identification, highlighting a shift from traditional cut-based methods to deep learning and transformer-based approaches to handle Run-3 and HL-LHC challenges. It details key algorithms across jets, taus, electrons, photons, and muons, including ParticleNet in the HLT, UParT for offline heavy-flavor tagging, DeepTau, DeepSuperCluster, and TICL with GNNs for phase-2 reconstruction. It also emphasizes robustness advances such as adversarial training and domain adaptation to mitigate simulation mismodeling and detector effects. Collectively, these developments enhance tagging efficiency, energy resolution, and background rejection, enabling CMS to maximize physics potential under extreme pileup at the HL-LHC and maintain leadership in ML-driven particle identification.

Abstract

Machine learning (ML) plays an increasingly important role in both online and offline event reconstruction and identification at CMS experiment. A variety of ML techniques are used to improve the identification of physics objects. Dedicated algorithms enhance jet flavor tagging, including new approaches that strengthen sensitivity to Higgs boson decays to charm quarks. Tau identification has been significantly improved with ML-based methods, while in the electromagnetic calorimeter, ML-driven clustering techniques provide better energy reconstruction. Muon identification also benefits from multivariate approaches, leading to a higher signal efficiency and more background rejection. Looking at the future, ML will be central to the reconstruction strategy for the High-Granularity Calorimeter at high-luminosity LHC. New algorithms for the upgraded detectors are being developed to cope with extreme pileup conditions. All these advances ensure that CMS can fully exploit the physics potential of Run-3 and the HL-LHC, while also exploring novel ML strategies to maintain robust performance under evolving experimental conditions.
Paper Structure (6 sections, 5 figures)

This paper contains 6 sections, 5 figures.

Figures (5)

  • Figure 1: The left figure shows the per-jet efficiency vs. the transformed offline BvsAll score passing online medium working-point (M) during the 2024 data-taking period. The top panel compares the efficiency across different run eras, from RunC to RunI, while the bottom panel shows the ratios of all run eras normalized to RunC. The right figure shows the per-event online PNet b-tag efficiency vs. the mean transformed BvsAll score of the two leading b-jets. The top panel compares 2022, 2023, and 2024, while the bottom panel shows the ratios with respect to the previous year. In both cases, the jets (events) are selected from a $t\bar{t}$-enriched phase space.
  • Figure 2: Left: b-tagging efficiency vs. c/udsg-jet misidentification efficiency, comparing b-taggers starting from DeepJet used during late Run--2. Right: c-tagging efficiency vs. b/udsg-jet misidentification efficiency. UParT shows state-of-the-art performance in both b- and c-jet tagging efficiency as well as light-jet rejection.
  • Figure 3: Top panels show the comparison of DeepTau v2.1 (blue) and v2.5 (yellow) performance for hadronic tau identification vs. jet misidentification probability. The bottom panels show the ratio of the two in blue. Left plot is for $\mathrm{p_T}< 100$ GeV whereas on the right $\mathrm{p_T}> 100$ GeV. Significant improvements are observed in efficiency and background rejection.
  • Figure 4: Left: Comparison of photon energy resolution (Raw/Sim) in gen energy bins between Mustache and DeepSuperCluster. The bottom panel shows the ratio between the two methods, DeepSuperCluster being much closer to 1. Right: Performance of MVA-based muon identification in Muon $\mathrm{p_T}$ bins compared to traditional cut-based selections. The MVA method shows gain in efficiency bringing the Data/MC ratio at the bottom panel much closer to 1.
  • Figure 5: Left: 3D cluster of electromagnetic/hadronic shower reconstruction from Rechits in HGCAL by the TICLv5 algorithm. Right: ROC of the GNN-based particle property estimation. A good separation power could be achieved between an electromagnetic (photon) signal against the hadronic (pion) background.