Table of Contents
Fetching ...

Economical Jet Taggers -- Equivariant, Slim, and Quantized

Antoine Petitjean, Tilman Plehn, Jonas Spinner, Ullrich Köthe

TL;DR

The paper addresses the challenge of resource-intensive jet tagging at the LHC by designing slim, quantized, Lorentz-equivariant transformers. It introduces L-GATr-slim, a scalar–vector latent architecture, and benchmarks it against LLoCa-Transformer across jet tagging, amplitude regression, and event generation, showing competitive performance with significantly reduced compute. The study demonstrates substantial resource savings—up to orders of magnitude in training efficiency and energy cost—while maintaining accuracy, aided by quantization-aware training via PARQ and STE. The results indicate strong potential for trigger-level jet tagging and online deployment, supported by public code and clear directions for further hardware-oriented optimizations.

Abstract

Modern machine learning is transforming jet tagging at the LHC, but the leading transformer architectures are large, not particularly fast, and training-intensive. We present a slim version of the L-GATr tagger, reduce the number of parameters of jet-tagging transformers, and quantize them. We compare different quantization methods for standard and Lorentz-equivariant transformers and estimate their gains in resource efficiency. We find a six-fold reduction in energy cost for an moderate performance decrease, down to 1000-parameter taggers. This might be a step towards trigger-level jet tagging with small and quantized versions of the leading equivariant transformer architectures.

Economical Jet Taggers -- Equivariant, Slim, and Quantized

TL;DR

The paper addresses the challenge of resource-intensive jet tagging at the LHC by designing slim, quantized, Lorentz-equivariant transformers. It introduces L-GATr-slim, a scalar–vector latent architecture, and benchmarks it against LLoCa-Transformer across jet tagging, amplitude regression, and event generation, showing competitive performance with significantly reduced compute. The study demonstrates substantial resource savings—up to orders of magnitude in training efficiency and energy cost—while maintaining accuracy, aided by quantization-aware training via PARQ and STE. The results indicate strong potential for trigger-level jet tagging and online deployment, supported by public code and clear directions for further hardware-oriented optimizations.

Abstract

Modern machine learning is transforming jet tagging at the LHC, but the leading transformer architectures are large, not particularly fast, and training-intensive. We present a slim version of the L-GATr tagger, reduce the number of parameters of jet-tagging transformers, and quantize them. We compare different quantization methods for standard and Lorentz-equivariant transformers and estimate their gains in resource efficiency. We find a six-fold reduction in energy cost for an moderate performance decrease, down to 1000-parameter taggers. This might be a step towards trigger-level jet tagging with small and quantized versions of the leading equivariant transformer architectures.

Paper Structure

This paper contains 14 sections, 18 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Left: Progress in top tagging through advanced network architectures over time. Right: Efficiency of tagging architectures on the JetClass dataset.
  • Figure 2: Event generation performance of L-GATr-slim for top pair production with additional jets. We compare it to the original results for L-GATr, Transformer, and LLoCa-Transformer Favaro:2025pgz.
  • Figure 3: Top taggers with decreasing number of parameter, down to 1000. We decrease the number of blocks from 10 to 4, 2, and 1 (left) or keep them fixed at 10 (right). The left panel includes results for taggers pretrained on the JetClass datasets. ParT requires at least two blocks in the standard implementation, so we do not show a 1000-parameter version.
  • Figure 4: Regularizer function $R(\theta_k)$ for ternary quantization (left), proximal map $\text{prox}_R(\tilde{\theta}_k)$ for soft quantization at time $t$ (middle), and final proximal map for hard quantization (right). The proximal map parametrized by $\rho(t)$ corresponds to the regularizer function defined with $a_0 = (1+\rho(t)) q_1/2$ and $b_1 = a_0 q_1$.
  • Figure 5: Energy consumption per jet using different taggers and quantization approaches. We have old (left) and new but in process (right).
  • ...and 1 more figures