Sub-microsecond Transformers for Jet Tagging on FPGAs
Lauri Laatu, Chang Sun, Arianna Cox, Abhijith Gandrakota, Benedikt Maier, Jennifer Ngadiuba, Zhiqiang Que, Wayne Luk, Maria Spiropulu, Alexander Tapper
TL;DR
This paper tackles the challenge of real-time jet tagging at the HL-LHC by delivering the first sub-microsecond transformer implementation on an FPGA, achieving about $\ \mathcal{O}(100)$ ns latency while remaining competitive with state-of-the-art baselines. The authors employ an encoder-only transformer with either vanilla multi-head attention (MHA) or Linformer-based linear attention, operating on particle sequences (up to 64) described by $(p_\mathrm{T},\eta,\phi)$ without positional encoding, and they integrate High Granularity Quantization (HGQ) with EBOPs-driven training to fit the model on hardware. A key contribution is the Linformer variant, which provides scalable, accurate performance at all sequence lengths and is shown to outperform baselines in FPGA resource efficiency and latency. The work also extends hls4ml with linear attention support, enabling broader adoption for real-time high-energy physics tasks, and points to future applications in jet tagging under high pile-up, particle reconstruction, and foundation-model components. The results demonstrate that transformer-based approaches can power next-generation real-time triggers in high-energy physics and beyond, with practical FPGA deployment guiding both hardware and algorithmic design.
Abstract
We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.
