Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging
Aaron Wang, Zihan Zhao, Subash Katel, Vivekanand Gyanchand Sahu, Elham E Khoda, Abhijith Gandrakota, Jennifer Ngadiuba, Richard Cavanaugh, Javier Duarte
TL;DR
Transformers offer strong modeling power for jet tagging but carry $O(n^2)$ attention costs that hinder real-time trigger deployment at the LHC. The Spatially Aware Linear Transformer (SAL-T) combines linear partitioned particle multi-head attention with physics-informed sorting ($k_{ ext{T}} = p_{ ext{T}} riangle R$) and a lightweight depthwise 2D convolution to embed local spatial structure while preserving linear-like complexity $O(n p)$. Across jet-tagging benchmarks (and generalizing to ModelNet10), SAL-T outperforms Linformer baselines and matches full-attention transformers in accuracy and AUC while delivering substantially lower memory and latency, illustrating the value of physics-informed locality in low-rank attention. The approach shows potential for trigger-level deployment and broader point-cloud tasks, offering a practical, efficient path to high-performance real-time ML in high-energy physics. These results suggest that integrating locality priors into linear attention can yield substantial gains for large-scale, physics-aware data analyses.
Abstract
Transformers are very effective in capturing both global and local correlations within high-energy particle collisions, but they present deployment challenges in high-data-throughput environments, such as the CERN LHC. The quadratic complexity of transformer models demands substantial resources and increases latency during inference. In order to address these issues, we introduce the Spatially Aware Linear Transformer (SAL-T), a physics-inspired enhancement of the linformer architecture that maintains linear attention. Our method incorporates spatially aware partitioning of particles based on kinematic features, thereby computing attention between regions of physical significance. Additionally, we employ convolutional layers to capture local correlations, informed by insights from jet physics. In addition to outperforming the standard linformer in jet classification tasks, SAL-T also achieves classification results comparable to full-attention transformers, while using considerably fewer resources with lower latency during inference. Experiments on a generic point cloud classification dataset (ModelNet10) further confirm this trend. Our code is available at https://github.com/aaronw5/SAL-T4HEP.
