Table of Contents
Fetching ...

Fast Jet Tagging with MLP-Mixers on FPGAs

Chang Sun, Jennifer Ngadiuba, Maurizio Pierini, Maria Spiropulu

TL;DR

The paper tackles real-time jet tagging at the L1 trigger level under HL-LHC constraints by deploying MLP-Mixer architectures on FPGAs. It combines High Granularity Quantization and Distributed Arithmetic within the hls4ml/Vitis HLS workflow to produce bit-accurate, resource-efficient firmware, achieving state-of-the-art accuracy on a realistic jet dataset while dramatically reducing hardware usage and latency. The results show MLP-Mixer models outperform prior architectures in accuracy and efficiency, with substantial gains in throughput and latency and the ability to prioritize informative features through heterogeneous bitwidths. This work demonstrates the practicality of deploying advanced ML for real-time data processing at particle colliders and outlines avenues for further hardware-aware optimizations.

Abstract

We explore the innovative use of MLP-Mixer models for real-time jet tagging and establish their feasibility on resource-constrained hardware like FPGAs. MLP-Mixers excel in processing sequences of jet constituents, achieving state-of-the-art performance on datasets mimicking Large Hadron Collider conditions. By using advanced optimization techniques such as High-Granularity Quantization and Distributed Arithmetic, we achieve unprecedented efficiency. These models match or surpass the accuracy of previous architectures, reduce hardware resource usage by up to 97%, double the throughput, and half the latency. Additionally, non-permutation-invariant architectures enable smart feature prioritization and efficient FPGA deployment, setting a new benchmark for machine learning in real-time data processing at particle colliders.

Fast Jet Tagging with MLP-Mixers on FPGAs

TL;DR

The paper tackles real-time jet tagging at the L1 trigger level under HL-LHC constraints by deploying MLP-Mixer architectures on FPGAs. It combines High Granularity Quantization and Distributed Arithmetic within the hls4ml/Vitis HLS workflow to produce bit-accurate, resource-efficient firmware, achieving state-of-the-art accuracy on a realistic jet dataset while dramatically reducing hardware usage and latency. The results show MLP-Mixer models outperform prior architectures in accuracy and efficiency, with substantial gains in throughput and latency and the ability to prioritize informative features through heterogeneous bitwidths. This work demonstrates the practicality of deploying advanced ML for real-time data processing at particle colliders and outlines avenues for further hardware-aware optimizations.

Abstract

We explore the innovative use of MLP-Mixer models for real-time jet tagging and establish their feasibility on resource-constrained hardware like FPGAs. MLP-Mixers excel in processing sequences of jet constituents, achieving state-of-the-art performance on datasets mimicking Large Hadron Collider conditions. By using advanced optimization techniques such as High-Granularity Quantization and Distributed Arithmetic, we achieve unprecedented efficiency. These models match or surpass the accuracy of previous architectures, reduce hardware resource usage by up to 97%, double the throughput, and half the latency. Additionally, non-permutation-invariant architectures enable smart feature prioritization and efficient FPGA deployment, setting a new benchmark for machine learning in real-time data processing at particle colliders.

Paper Structure

This paper contains 22 sections, 11 figures, 6 tables.

Figures (11)

  • Figure I: Distribution of the number of particles per jet in the jet tagging dataset. On the right, only particles with $p_T \ge 2$ GeV are considered.
  • Figure II: Architecture of the MLP-Mixer models used in this work. Each model consists of four MLP blocks with a single skip-connection. The input channel size match the number of per-particle input features. The implementation of each MLP and the classification head is shown in the corresponding dashed blocks. MLP1 and MLP3 act on the feature dimension; MLP2 and MLP4 act on the particle dimension. DenseBn represents a dense layer followed by a batch normalization layer during training, which are fused into a single layer during inference. The exact implementation of the network can be found in the in the repository in Section \ref{['sec:data_avail']}.
  • Figure III: Workflow for training and deploying the MLP-Mixer models.
  • Figure IV: Accuracy vs. LUT usage and latency for the quantized MLP-Mixer and JEDI-net models, each trained with 16 features per particle. Dashed lines indicate the accuracy of the corresponding full-precision models.
  • Figure V: Accuracy vs. LUT usage and latency for the quantized MLP and JEDI-net models, each trained with 16 features per particle. Dashed lines indicate the accuracy of the corresponding full-precision models. Models marked with "$\times$" markers successfully underwent HDL synthesis but failed timing closure during the place & route phase. These models' LUT usage are can still be used for reference, but their latencies reported are not accurate.
  • ...and 6 more figures