Table of Contents
Fetching ...

TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs

Arjan Blankestijn, Uraz Odyurt, Amirreza Yousefzadeh

TL;DR

This work tackles the challenge of deploying Transformer-based subatomic particle tracking on FPGAs to meet HL-LHC latency and energy constraints. It introduces TrackCore-F, a methodology for monolithic or partitioned Transformer synthesis for inference, using two TrackFormers encoder-only designs (EncCla for classification and EncReg for regression) and a practical deployment flow via ONNX, Vitis HLS, and Vivado on a ZCU102 platform. Quantization studies reveal that activations drive accuracy loss more than weights, with baselines around $0.97$ and mixed INT8/INT16 configurations yielding $\sim$0.70–0.90 accuracies; BRAM is a key bottleneck, limiting single-encoder-layer deployments to roughly four layers unless memory trade-offs are accepted. The results demonstrate the feasibility of FPGA-based, low-latency ML-assisted tracking and provide a concrete workflow and resource-budget analysis to guide future hardware deployments for HL-LHC-scale problems, enabling energy-efficient on-site processing. Detectors at the LHC rely on measurements of momentum $p$ and energy $E$, with mass derived from $E^2 = (mc^2)^2 + (pc)^2$, underscoring the physics background motivating this hardware-focused approach.

Abstract

The Transformer Machine Learning (ML) architecture has been gaining considerable momentum in recent years. In particular, computational High-Energy Physics tasks such as jet tagging and particle track reconstruction (tracking), have either achieved proper solutions, or reached considerable milestones using Transformers. On the other hand, the use of specialised hardware accelerators, especially FPGAs, is an effective method to achieve online, or pseudo-online latencies. The development and integration of Transformer-based ML to FPGAs is still ongoing and the support from current tools is very limited to non-existent. Additionally, FPGA resources present a significant constraint. Considering the model size alone, while smaller models can be deployed directly, larger models are to be partitioned in a meaningful and ideally, automated way. We aim to develop methodologies and tools for monolithic, or partitioned Transformer synthesis, specifically targeting inference. Our primary use-case involves two machine learning model designs for tracking, derived from the TrackFormers project. We elaborate our development approach, present preliminary results, and provide comparisons.

TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs

TL;DR

This work tackles the challenge of deploying Transformer-based subatomic particle tracking on FPGAs to meet HL-LHC latency and energy constraints. It introduces TrackCore-F, a methodology for monolithic or partitioned Transformer synthesis for inference, using two TrackFormers encoder-only designs (EncCla for classification and EncReg for regression) and a practical deployment flow via ONNX, Vitis HLS, and Vivado on a ZCU102 platform. Quantization studies reveal that activations drive accuracy loss more than weights, with baselines around and mixed INT8/INT16 configurations yielding 0.70–0.90 accuracies; BRAM is a key bottleneck, limiting single-encoder-layer deployments to roughly four layers unless memory trade-offs are accepted. The results demonstrate the feasibility of FPGA-based, low-latency ML-assisted tracking and provide a concrete workflow and resource-budget analysis to guide future hardware deployments for HL-LHC-scale problems, enabling energy-efficient on-site processing. Detectors at the LHC rely on measurements of momentum and energy , with mass derived from , underscoring the physics background motivating this hardware-focused approach.

Abstract

The Transformer Machine Learning (ML) architecture has been gaining considerable momentum in recent years. In particular, computational High-Energy Physics tasks such as jet tagging and particle track reconstruction (tracking), have either achieved proper solutions, or reached considerable milestones using Transformers. On the other hand, the use of specialised hardware accelerators, especially FPGAs, is an effective method to achieve online, or pseudo-online latencies. The development and integration of Transformer-based ML to FPGAs is still ongoing and the support from current tools is very limited to non-existent. Additionally, FPGA resources present a significant constraint. Considering the model size alone, while smaller models can be deployed directly, larger models are to be partitioned in a meaningful and ideally, automated way. We aim to develop methodologies and tools for monolithic, or partitioned Transformer synthesis, specifically targeting inference. Our primary use-case involves two machine learning model designs for tracking, derived from the TrackFormers project. We elaborate our development approach, present preliminary results, and provide comparisons.

Paper Structure

This paper contains 9 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: The development flow describing the handling of pre-trained ML models and preparations for selective slice deployment on a FPGA.