Table of Contents
Fetching ...

Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging

Aaron Wang, Zihan Zhao, Subash Katel, Vivekanand Gyanchand Sahu, Elham E Khoda, Abhijith Gandrakota, Jennifer Ngadiuba, Richard Cavanaugh, Javier Duarte

TL;DR

Transformers offer strong modeling power for jet tagging but carry $O(n^2)$ attention costs that hinder real-time trigger deployment at the LHC. The Spatially Aware Linear Transformer (SAL-T) combines linear partitioned particle multi-head attention with physics-informed sorting ($k_{ ext{T}} = p_{ ext{T}} riangle R$) and a lightweight depthwise 2D convolution to embed local spatial structure while preserving linear-like complexity $O(n p)$. Across jet-tagging benchmarks (and generalizing to ModelNet10), SAL-T outperforms Linformer baselines and matches full-attention transformers in accuracy and AUC while delivering substantially lower memory and latency, illustrating the value of physics-informed locality in low-rank attention. The approach shows potential for trigger-level deployment and broader point-cloud tasks, offering a practical, efficient path to high-performance real-time ML in high-energy physics. These results suggest that integrating locality priors into linear attention can yield substantial gains for large-scale, physics-aware data analyses.

Abstract

Transformers are very effective in capturing both global and local correlations within high-energy particle collisions, but they present deployment challenges in high-data-throughput environments, such as the CERN LHC. The quadratic complexity of transformer models demands substantial resources and increases latency during inference. In order to address these issues, we introduce the Spatially Aware Linear Transformer (SAL-T), a physics-inspired enhancement of the linformer architecture that maintains linear attention. Our method incorporates spatially aware partitioning of particles based on kinematic features, thereby computing attention between regions of physical significance. Additionally, we employ convolutional layers to capture local correlations, informed by insights from jet physics. In addition to outperforming the standard linformer in jet classification tasks, SAL-T also achieves classification results comparable to full-attention transformers, while using considerably fewer resources with lower latency during inference. Experiments on a generic point cloud classification dataset (ModelNet10) further confirm this trend. Our code is available at https://github.com/aaronw5/SAL-T4HEP.

Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging

TL;DR

Transformers offer strong modeling power for jet tagging but carry attention costs that hinder real-time trigger deployment at the LHC. The Spatially Aware Linear Transformer (SAL-T) combines linear partitioned particle multi-head attention with physics-informed sorting () and a lightweight depthwise 2D convolution to embed local spatial structure while preserving linear-like complexity . Across jet-tagging benchmarks (and generalizing to ModelNet10), SAL-T outperforms Linformer baselines and matches full-attention transformers in accuracy and AUC while delivering substantially lower memory and latency, illustrating the value of physics-informed locality in low-rank attention. The approach shows potential for trigger-level deployment and broader point-cloud tasks, offering a practical, efficient path to high-performance real-time ML in high-energy physics. These results suggest that integrating locality priors into linear attention can yield substantial gains for large-scale, physics-aware data analyses.

Abstract

Transformers are very effective in capturing both global and local correlations within high-energy particle collisions, but they present deployment challenges in high-data-throughput environments, such as the CERN LHC. The quadratic complexity of transformer models demands substantial resources and increases latency during inference. In order to address these issues, we introduce the Spatially Aware Linear Transformer (SAL-T), a physics-inspired enhancement of the linformer architecture that maintains linear attention. Our method incorporates spatially aware partitioning of particles based on kinematic features, thereby computing attention between regions of physical significance. Additionally, we employ convolutional layers to capture local correlations, informed by insights from jet physics. In addition to outperforming the standard linformer in jet classification tasks, SAL-T also achieves classification results comparable to full-attention transformers, while using considerably fewer resources with lower latency during inference. Experiments on a generic point cloud classification dataset (ModelNet10) further confirm this trend. Our code is available at https://github.com/aaronw5/SAL-T4HEP.

Paper Structure

This paper contains 14 sections, 5 equations, 8 figures, 16 tables.

Figures (8)

  • Figure 1: (Left) Jet constituents partitioned and sorted by $k_{\mathrm{T}}\xspace$ in the $(\Delta\eta, \Delta\phi)$ plane in SAL-T, showing how constituents are binned spatially before projection. (Center) Jet constituents partitioned by transverse momentum in the $(\Delta\eta, \Delta\phi)$ plane. (Right) Visualization of the projection partitioning strategy used in SAL-T, Jet constituents are partitioned into spatial bins before projection, preserving local structure.
  • Figure 2: (Left) Architecture of the linear partitioned particle multi-head attention (LPP-MHA) module used in SAL-T. The input query, key, and value sequences of dimension $n \times m$ are linearly projected to dimension $n \times d$, then spatially partitioned into $p$ groups of size $p \times d$. Attention weights are computed via scaled dot-product attention within each partition, followed by a depthwise convolution over the attention map to promote local context mixing. The resulting attention matrix is used to aggregate value representations, forming the basis for the output of the attention layer. This design maintains computational efficiency while maintaining locality-aware expressivity. (Right) One layer SAL-T model).
  • Figure 3: (Left) Jet classification accuracy of SAL-T, Linformer, and standard Transformer across bins of increasing number of particles per jet. SAL-T consistently matches or exceeds the accuracy of linformer and remains competitive with full transformers. Performance variance in the highest bin (115--150 particles) is attributable to its small sample size (only 41 jets). (Right) Floating-point operation (FLOP) counts as a function of sequence length for the three models. While transformer FLOPs grow quadratically with input length, SAL-T maintains nearly linear scaling, closely tracking linformer while offering improved performance in high-multiplicity jets.
  • Figure 4: Attention matrices for a top quark jet with 81 particles. Each trio of attention plots represent a separate head of each respective model. The convolutional layer on top of the attention smooths the attention values of SAL-T, demonstrating how convolution helps SAL-T leverage immediate neigbhors. Notice in the top right and bottom left attention plots, SAL-T attention focuses on only one partition before convolution, signifying that SAL-T understands the partitions that are important to the jet.
  • Figure 5: Attention matrices for a top quark jet with 121 particles. Each trio of attention matrices represents a separate head of the respective models. The convolutional layer on top of the attention smooths the attention values of SAL-T, demonstrating how convolution helps SAL-T leverage immediate neighbors.
  • ...and 3 more figures