Table of Contents
Fetching ...

A Lorentz-Equivariant Transformer for All of the LHC

Johann Brehmer, Víctor Bresó, Pim de Haan, Tilman Plehn, Huilin Qu, Jonas Spinner, Jesse Thaler

TL;DR

The paper introduces L-GATr, a Lorentz-equivariant transformer built in spacetime geometric algebra to efficiently handle relativistic LHC data. By encoding inputs as multivectors and crafting equivariant linear, attention, and normalization operations, the model preserves Lorentz symmetry while allowing controlled symmetry breaking via reference vectors. It delivers strong results across three tasks: high-precision amplitude regression, improved jet tagging with pre-training and multiclass capabilities, and state-of-the-art Lorentz-equivariant event generation within a diffusion/CFM framework. The approach yields significant performance gains over existing architectures, with practical benefits in data efficiency and scalability, and comes with public code to enable reproducibility and further development.

Abstract

We show that the Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) yields state-of-the-art performance for a wide range of machine learning tasks at the Large Hadron Collider. L-GATr represents data in a geometric algebra over space-time and is equivariant under Lorentz transformations. The underlying architecture is a versatile and scalable transformer, which is able to break symmetries if needed. We demonstrate the power of L-GATr for amplitude regression and jet classification, and then benchmark it as the first Lorentz-equivariant generative network. For all three LHC tasks, we find significant improvements over previous architectures.

A Lorentz-Equivariant Transformer for All of the LHC

TL;DR

The paper introduces L-GATr, a Lorentz-equivariant transformer built in spacetime geometric algebra to efficiently handle relativistic LHC data. By encoding inputs as multivectors and crafting equivariant linear, attention, and normalization operations, the model preserves Lorentz symmetry while allowing controlled symmetry breaking via reference vectors. It delivers strong results across three tasks: high-precision amplitude regression, improved jet tagging with pre-training and multiclass capabilities, and state-of-the-art Lorentz-equivariant event generation within a diffusion/CFM framework. The approach yields significant performance gains over existing architectures, with practical benefits in data efficiency and scalability, and comes with public code to enable reproducibility and further development.

Abstract

We show that the Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) yields state-of-the-art performance for a wide range of machine learning tasks at the Large Hadron Collider. L-GATr represents data in a geometric algebra over space-time and is equivariant under Lorentz transformations. The underlying architecture is a versatile and scalable transformer, which is able to break symmetries if needed. We demonstrate the power of L-GATr for amplitude regression and jet classification, and then benchmark it as the first Lorentz-equivariant generative network. For all three LHC tasks, we find significant improvements over previous architectures.

Paper Structure

This paper contains 11 sections, 28 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Scaling behavior of L-GATr, a standard transformer, and the equivariant graph network CGENN. The left panel was already discussed in Ref. Spinner:2024hjm. Error bands are based on the mean and standard deviation from 3 separate evaluations and originate from the GPU randomness and the time measuring software.
  • Figure 2: Left: prediction error from L-GATr and all baselines for $Z+ng$ amplitudes with increasing particle multiplicity. All networks are trained on $4 \times 10^5$ points. Right: prediction error as a function of the training dataset size. Error bands are based on the mean and standard deviation of five random seeds affecting network weight initialization. These figures are also included in Ref. Spinner:2024hjm.
  • Figure 3: Prediction error from L-GATr and selected baselines for $Z +5g$ amplitudes. Here, all networks are reduced in size and trained on $4 \times 10^4$ points. Error bands are based on the mean and standard deviation of five random seeds affecting network weight initialization.
  • Figure 4: AUC metric on JetClass as a function of the training dataset fraction (left) and the history of top taggers (right).
  • Figure 5: To construct the L-GATr velocity, we extract equivariantly predicted multivectors and symmetry-breaking scalars. We go back and forth between the parametrization $x$ and Minkowski space $p$ using the mapping $f$ from Eq. \ref{['eq:momentum_rep']}.
  • ...and 2 more figures