Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

Katya Govorkova; Julian Garcia Pardinas; Vladimir Loncar; Victoria Nguyen; Sebastian Schmitt; Marco Pizzichemi; Loris Martinazzoli; Eluned Anne Smith

Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

Katya Govorkova, Julian Garcia Pardinas, Vladimir Loncar, Victoria Nguyen, Sebastian Schmitt, Marco Pizzichemi, Loris Martinazzoli, Eluned Anne Smith

TL;DR

The paper tackles the challenge of deploying ultra-low-latency machine learning at the detector front-end in harsh radiation environments. It introduces a lightweight autoencoder that compresses a $32$-sample calorimeter pulse to a $2$-D latent space, paired with hardware-aware quantization to $<10,4>$ fixed-point weights, preserving physics-relevant information. A core contribution is the development of a new hls4ml backend targeting Microchip SmartHLS, enabling automated synthesis and deployment on radiation-hard PolarFire FPGAs, with demonstrated $25$ ns latency and $40$ MHz throughput. The work demonstrates end-to-end viability for on-detector ML in HL-LHC contexts and provides open-source tooling to broaden adoption across high-radiation applications. This paves the way for scalable, low-latency data compression and real-time reconstruction in future collider experiments and related domains.

Abstract

This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs, which could be used in future high-energy physics experiments. We present a three-fold contribution, with the PicoCal calorimeter, planned for the LHCb Upgrade II experiment, used as a test case. First, we develop a lightweight autoencoder to compress a 32-sample timing readout, representative of that of the PicoCal, into a two-dimensional latent space. Second, we introduce a systematic, hardware-aware quantization strategy and show that the model can be reduced to 10-bit weights with minimal performance loss. Third, as a barrier to the adoption of on-detector ML is the lack of support for radiation-hard FPGAs in the High-Energy Physics community's standard ML synthesis tool, hls4ml, we develop a new backend for this library. This new back-end enables the automatic translation of ML models into High-Level Synthesis (HLS) projects for the Microchip PolarFire family of FPGAs, one of the few commercially available and radiation hard FPGAs. We present the synthesis of the autoencoder on a target PolarFire FPGA, which indicates that a latency of 25 ns can be achieved. We show that the resources utilized are low enough that the model can be placed within the inherently protected logic of the FPGA. Our extension to hls4ml is a significant contribution, paving the way for broader adoption of ML on FPGAs in high-radiation environments.

Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

TL;DR

The paper tackles the challenge of deploying ultra-low-latency machine learning at the detector front-end in harsh radiation environments. It introduces a lightweight autoencoder that compresses a

-sample calorimeter pulse to a

-D latent space, paired with hardware-aware quantization to

fixed-point weights, preserving physics-relevant information. A core contribution is the development of a new hls4ml backend targeting Microchip SmartHLS, enabling automated synthesis and deployment on radiation-hard PolarFire FPGAs, with demonstrated

ns latency and

MHz throughput. The work demonstrates end-to-end viability for on-detector ML in HL-LHC contexts and provides open-source tooling to broaden adoption across high-radiation applications. This paves the way for scalable, low-latency data compression and real-time reconstruction in future collider experiments and related domains.

Abstract

Paper Structure (24 sections, 7 figures, 1 table)

This paper contains 24 sections, 7 figures, 1 table.

Introduction
Related Work
Data Compression and Anomaly Detection with Autoencoders
ML-to-FPGA Toolchains and Radiation-Hard Hardware
Radiation Hardness Paradigms in FPGAs
Autoencoder Architecture and Co-Design Rationale
Model Architecture and Co-Design Rationale
Simulation Dataset
Training
Pulse Shape Reconstruction Performance
Latent Space Analysis
Validation of Timestamp and Rise-time Reconstruction
Timestamp Regression
Hardware-Aware Quantization for Efficient FPGA Inference
Quantization and Impact Analysis
...and 9 more sections

Figures (7)

Figure 1: Training and validation loss curves for the full-precision autoencoder (in green) and the quantized autoencoder (in pink), discussed fully in Sec. \ref{['sec:compression']}. The y-axis uses a logarithmic scale to emphasize convergence behavior.
Figure 2: Examples of autoencoder reconstruction performance on calorimeter pulse shapes from the test set. Original waveforms (solid blue lines) are compared with their corresponding reconstructions (dashed lines). In green is the full precision model, and in pink is the quantized model, discussed in Sec. \ref{['sec:compression']}. Visual agreement is supported by low MSE values across diverse pulse amplitudes, highlighting the robustness of the reconstruction.
Figure 3: Correlation between latent space variables of the full-precision autoencoder and three pulse-level features on the test set: rise time (10%--90% interval), pulse true timestamp, and peak amplitude. Each row corresponds to one latent dimension ($z[0], z[1]$), and each column to one feature. Scatter plots include Pearson's $r$ and Spearman's $\rho$ coefficients in the titles. The latent representation is strongly correlated with peak amplitude, while correlations with timing features (rise time and cell time) are weaker.
Figure 4: Residual distributions between CFD-reconstructed timestamps and true simulation times for original and autoencoder-reconstructed pulses (left), and the ratio of their absolute values (right). Reconstruction systematically reduces the timing residual for approximately half of the events.
Figure 5: Histogram of the ratio of absolute differences in 10--90 % rise time: (reconstructed vs. 32-sample original) relative to (32-sample original vs. 1024-sample original).
...and 2 more figures

Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

TL;DR

Abstract

Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

Authors

TL;DR

Abstract

Table of Contents

Figures (7)