Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml
Katya Govorkova, Julian Garcia Pardinas, Vladimir Loncar, Victoria Nguyen, Sebastian Schmitt, Marco Pizzichemi, Loris Martinazzoli, Eluned Anne Smith
TL;DR
The paper tackles the challenge of deploying ultra-low-latency machine learning at the detector front-end in harsh radiation environments. It introduces a lightweight autoencoder that compresses a $32$-sample calorimeter pulse to a $2$-D latent space, paired with hardware-aware quantization to $<10,4>$ fixed-point weights, preserving physics-relevant information. A core contribution is the development of a new hls4ml backend targeting Microchip SmartHLS, enabling automated synthesis and deployment on radiation-hard PolarFire FPGAs, with demonstrated $25$ ns latency and $40$ MHz throughput. The work demonstrates end-to-end viability for on-detector ML in HL-LHC contexts and provides open-source tooling to broaden adoption across high-radiation applications. This paves the way for scalable, low-latency data compression and real-time reconstruction in future collider experiments and related domains.
Abstract
This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs, which could be used in future high-energy physics experiments. We present a three-fold contribution, with the PicoCal calorimeter, planned for the LHCb Upgrade II experiment, used as a test case. First, we develop a lightweight autoencoder to compress a 32-sample timing readout, representative of that of the PicoCal, into a two-dimensional latent space. Second, we introduce a systematic, hardware-aware quantization strategy and show that the model can be reduced to 10-bit weights with minimal performance loss. Third, as a barrier to the adoption of on-detector ML is the lack of support for radiation-hard FPGAs in the High-Energy Physics community's standard ML synthesis tool, hls4ml, we develop a new backend for this library. This new back-end enables the automatic translation of ML models into High-Level Synthesis (HLS) projects for the Microchip PolarFire family of FPGAs, one of the few commercially available and radiation hard FPGAs. We present the synthesis of the autoencoder on a target PolarFire FPGA, which indicates that a latency of 25 ns can be achieved. We show that the resources utilized are low enough that the model can be placed within the inherently protected logic of the FPGA. Our extension to hls4ml is a significant contribution, paving the way for broader adoption of ML on FPGAs in high-radiation environments.
