Table of Contents
Fetching ...

Vision Transformers and Graph Neural Networks for Charged Particle Tracking in the ATLAS Muon Spectrometer

Jonathan Renusch

Abstract

The identification and reconstruction of charged particles, such as muons, is a main challenge for the physics program of the ATLAS experiment at the Large Hadron Collider. This task will become increasingly difficult with the start of the High-Luminosity LHC era after 2030, when the number of proton-proton collisions per bunch crossing will increase from 60 to up to 200. This elevated interaction density will also increase the occupancy within the ATLAS Muon Spectrometer, requiring more efficient and robust real-time data processing strategies within the experiment's trigger system, particularly the Event Filter. To address these algorithmic challenges, we present two machine-learning-based approaches. First, we target the problem of background-hit rejection in the Muon Spectrometer using Graph Neural Networks integrated into the non-ML baseline reconstruction chain, demonstrating a 15 % improvement in reconstruction speed (from 255 ms to 217 ms). Second, we present a proof-of-concept for end-to-end muon tracking using state-of-the-art Vision Transformer architectures, achieving ultra-fast approximate muon reconstruction in 2.3 ms on consumer-grade GPUs at 98 % tracking efficiency.

Vision Transformers and Graph Neural Networks for Charged Particle Tracking in the ATLAS Muon Spectrometer

Abstract

The identification and reconstruction of charged particles, such as muons, is a main challenge for the physics program of the ATLAS experiment at the Large Hadron Collider. This task will become increasingly difficult with the start of the High-Luminosity LHC era after 2030, when the number of proton-proton collisions per bunch crossing will increase from 60 to up to 200. This elevated interaction density will also increase the occupancy within the ATLAS Muon Spectrometer, requiring more efficient and robust real-time data processing strategies within the experiment's trigger system, particularly the Event Filter. To address these algorithmic challenges, we present two machine-learning-based approaches. First, we target the problem of background-hit rejection in the Muon Spectrometer using Graph Neural Networks integrated into the non-ML baseline reconstruction chain, demonstrating a 15 % improvement in reconstruction speed (from 255 ms to 217 ms). Second, we present a proof-of-concept for end-to-end muon tracking using state-of-the-art Vision Transformer architectures, achieving ultra-fast approximate muon reconstruction in 2.3 ms on consumer-grade GPUs at 98 % tracking efficiency.

Paper Structure

This paper contains 8 sections, 1 equation, 10 figures.

Figures (10)

  • Figure 1: Architecture of the EdgeConv-based GNN used for background hit rejection. The model operates on graphs constructed from Muon Buckets to achieve high computational efficiency.
  • Figure 2: Comparison of reconstructed muon kinematic distributions for the standard R4 reconstruction (orange) and the chain equipped with the GNN-based Bucket Filter (blue), matched to generator-level muons GGN_MuonBucketFiltering. The bottom panels show the ratio between the two methods, indicating no significant loss in performance.
  • Figure 3: Average per-event execution time of the muon reconstruction chain for $Z \to \mu\mu$ events across various pileup levels GGN_MuonBucketFiltering. The application of the Bucket Filter yields approx. a 15% improvement in total processing speed at $\langle\mu\rangle=200$.
  • Figure 4: Transverse momentum distribution of signal muons in the testing dataset used for ViT-based tracking ATLAS_MDET_2025. The dataset includes $J/\psi \to \mu\mu$, $t\overline{t}$, and $Z \to \mu\mu$ events simulated at HL-LHC conditions ($\langle\mu\rangle=200$).
  • Figure 5: Schematic of the Transformer-based tracking architecture adapted from the Mask2Former model Mask2Formerhepattn. The model treats individual hits as tokens, allowing the decoder to iteratively refine track candidates (queries) through mask-conditioned cross-attention.
  • ...and 5 more figures