Symbolic Regression on FPGAs for Fast Machine Learning Inference

Ho Fung Tsoi; Adrian Alan Pol; Vladimir Loncar; Ekaterina Govorkova; Miles Cranmer; Sridhara Dasu; Peter Elmer; Philip Harris; Isobel Ojalvo; Maurizio Pierini

Symbolic Regression on FPGAs for Fast Machine Learning Inference

Ho Fung Tsoi, Adrian Alan Pol, Vladimir Loncar, Ekaterina Govorkova, Miles Cranmer, Sridhara Dasu, Peter Elmer, Philip Harris, Isobel Ojalvo, Maurizio Pierini

TL;DR

This work demonstrates an end-to-end symbolic regression pipeline for FPGA-based fast inference in high-energy physics by extending PySR with hls4ml support. It shows that SR can produce interpretable algebraic expressions that approximate neural networks while enabling Pareto-front optimization for speed and resource use, and it validates the approach on LHC jet tagging with substantial latency reductions. Function-approximation using LUTs yields dramatic resource reductions and latency improvements (up to $13$-fold faster, down to $5$ ns) while preserving accuracy above $90$ percent. The method offers a practical, interpretable, and resource-efficient alternative to deep learning in latency-constrained settings and opens pathways for broader SR-on-FPGA deployment.

Abstract

The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equation space to discover algebraic relations approximating a dataset. We use PySR (a software to uncover these expressions based on an evolutionary algorithm) and extend the functionality of hls4ml (a package for machine learning inference in FPGAs) to support PySR-generated expressions for resource-constrained production environments. Deep learning models often optimize the top metric by pinning the network size because the vast hyperparameter space prevents an extensive search for neural architecture. Conversely, SR selects a set of models on the Pareto front, which allows for optimizing the performance-resource trade-off directly. By embedding symbolic forms, our implementation can dramatically reduce the computational resources needed to perform critical tasks. We validate our method on a physics benchmark: the multiclass classification of jets produced in simulated proton-proton collisions at the CERN Large Hadron Collider. We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.

Symbolic Regression on FPGAs for Fast Machine Learning Inference

TL;DR

-fold faster, down to

ns) while preserving accuracy above

percent. The method offers a practical, interpretable, and resource-efficient alternative to deep learning in latency-constrained settings and opens pathways for broader SR-on-FPGA deployment.

Abstract

Paper Structure (8 sections, 2 equations, 5 figures, 3 tables)

This paper contains 8 sections, 2 equations, 5 figures, 3 tables.

Introduction
Benchmark and Baseline
Implementations and Results
Plain implementation
Function approximation with LUTs
Latency-aware training
Summary and Outlook
Acknowledgments

Figures (5)

Figure 1: The sine (left) and tangent (right) functions evaluated with and without the use of LUTs, implemented in HLS with precision $\langle\text{12},\text{6}\rangle$, i.e., 12 bits variable with 6 integer bits. The LUT notation reads: $[$range start, range end; table size$]$ for table definition. The lower panel shows the function deviation from the truth.
Figure 2: Relative accuracy as a function of bit width, for polynomial (top left), trigonometric (top right), exponential (bottom left), and logarithmic (bottom right) models. The relative accuracy is evaluated with respect to the baseline QAT NN trained and implemented at corresponding precision. The number of integer bits is fixed at $I=12$ for the exponential model and at $I=6$ for other models.
Figure 3: DSPs usage (left), LUTs usage (middle), and latency (right) as a function of bit width. From top to bottom: polynomial, trigonometric, exponential, and logarithmic models. The baseline QAT NN trained and implemented at corresponding precision is shown for comparison. Resource usage and latency are obtained from C-synthesis on a Xilinx VU9P FPGA with part number 'xcvu9p-flga2577-2-e'.
Figure 4: ROC curves for the trigonometric models with $c_{\text{max}}=80$ implemented with precision $\langle\text{16},\text{6}\rangle$, as compared to the baseline QAT NN. Numbers in parentheses correspond to the AUC per class.
Figure 5: Relative accuracy (top), DSPs usage (bottom left), LUTs usage (bottom middle) and latency (bottom right) as a function of $c_{\text{max}}$ ranging from 20 to 80, comparing models obtained from plain implementation (solid) and LAT (dashed). Two precision settings are implemented: $\langle\text{16},\text{6}\rangle$ and $\langle\text{18},\text{8}\rangle$. The relative accuracy is evaluated with respect to the baseline model. Resource usage and latency are obtained from C-synthesis on a Xilinx VU9P FPGA with part number 'xcvu9p-flga2577-2-e'.

Symbolic Regression on FPGAs for Fast Machine Learning Inference

TL;DR

Abstract

Symbolic Regression on FPGAs for Fast Machine Learning Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (5)