TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs
Arjan Blankestijn, Uraz Odyurt, Amirreza Yousefzadeh
TL;DR
This work tackles the challenge of deploying Transformer-based subatomic particle tracking on FPGAs to meet HL-LHC latency and energy constraints. It introduces TrackCore-F, a methodology for monolithic or partitioned Transformer synthesis for inference, using two TrackFormers encoder-only designs (EncCla for classification and EncReg for regression) and a practical deployment flow via ONNX, Vitis HLS, and Vivado on a ZCU102 platform. Quantization studies reveal that activations drive accuracy loss more than weights, with baselines around $0.97$ and mixed INT8/INT16 configurations yielding $\sim$0.70–0.90 accuracies; BRAM is a key bottleneck, limiting single-encoder-layer deployments to roughly four layers unless memory trade-offs are accepted. The results demonstrate the feasibility of FPGA-based, low-latency ML-assisted tracking and provide a concrete workflow and resource-budget analysis to guide future hardware deployments for HL-LHC-scale problems, enabling energy-efficient on-site processing. Detectors at the LHC rely on measurements of momentum $p$ and energy $E$, with mass derived from $E^2 = (mc^2)^2 + (pc)^2$, underscoring the physics background motivating this hardware-focused approach.
Abstract
The Transformer Machine Learning (ML) architecture has been gaining considerable momentum in recent years. In particular, computational High-Energy Physics tasks such as jet tagging and particle track reconstruction (tracking), have either achieved proper solutions, or reached considerable milestones using Transformers. On the other hand, the use of specialised hardware accelerators, especially FPGAs, is an effective method to achieve online, or pseudo-online latencies. The development and integration of Transformer-based ML to FPGAs is still ongoing and the support from current tools is very limited to non-existent. Additionally, FPGA resources present a significant constraint. Considering the model size alone, while smaller models can be deployed directly, larger models are to be partitioned in a meaningful and ideally, automated way. We aim to develop methodologies and tools for monolithic, or partitioned Transformer synthesis, specifically targeting inference. Our primary use-case involves two machine learning model designs for tracking, derived from the TrackFormers project. We elaborate our development approach, present preliminary results, and provide comparisons.
