Nanosecond hardware regression trees in FPGA at the LHC

Pavel Serhiayenka; Stephen Roche; Benjamin Carlson; Tae Min Hong

Nanosecond hardware regression trees in FPGA at the LHC

Pavel Serhiayenka, Stephen Roche, Benjamin Carlson, Tae Min Hong

TL;DR

The paper tackles real-time ML regression for trigger systems at the LHC, specifically aiming to estimate $E_ extrm{T}^ extrm{miss}$ using a low-latency, FPGA-based forest of decision trees. It introduces an HDL-based Deep Decision Tree Engine (DDTE) that implements a boosted decision tree regression in hardware, avoiding DSP and BRAM while delivering sub-10 to a few dozen nanoseconds latency depending on configuration. Key contributions include the VHDL DDTE implementation, two adder architectures (pipeline and combinational), pre-division to compute averages, and a detailed scaling analysis across forest size and input width, plus a muon momentum estimation application for ATLAS RPC at HL-LHC with competitive FPGA performance. The results demonstrate that ultra-fast, resource-efficient BDT regression is feasible for trigger-level ML tasks, enabling more capable real-time analyses in constrained FPGA environments.

Abstract

We present a generic parallel implementation of the decision tree-based machine learning (ML) method in hardware description language (HDL) on field programmable gate arrays (FPGA). A regression problem in high energy physics at the Large Hadron Collider is considered: the estimation of the magnitude of missing transverse momentum using boosted decision trees (BDT). A forest of twenty decision trees each with a maximum depth of ten using eight input variables of 16-bit precision is executed with a latency of less than 10 ns using O(0.1%) resources on Xilinx UltraScale+ VU9P -- approximately ten times faster and five times smaller compared to similar designs using high level synthesis (HLS) -- without the use of digital signal processors (DSP) while eliminating the use of block RAM (BRAM). We also demonstrate a potential application in the estimation of muon momentum for ATLAS RPC at HL-LHC.

Nanosecond hardware regression trees in FPGA at the LHC

TL;DR

The paper tackles real-time ML regression for trigger systems at the LHC, specifically aiming to estimate

using a low-latency, FPGA-based forest of decision trees. It introduces an HDL-based Deep Decision Tree Engine (DDTE) that implements a boosted decision tree regression in hardware, avoiding DSP and BRAM while delivering sub-10 to a few dozen nanoseconds latency depending on configuration. Key contributions include the VHDL DDTE implementation, two adder architectures (pipeline and combinational), pre-division to compute averages, and a detailed scaling analysis across forest size and input width, plus a muon momentum estimation application for ATLAS RPC at HL-LHC with competitive FPGA performance. The results demonstrate that ultra-fast, resource-efficient BDT regression is feasible for trigger-level ML tasks, enabling more capable real-time analyses in constrained FPGA environments.

Abstract

Paper Structure (5 sections, 8 figures, 2 tables)

This paper contains 5 sections, 8 figures, 2 tables.

Introduction
Method
Results
Adder details
Muon momentum for ATLAS RPC at HL-LHC

Figures (8)

Figure 1: Block diagram of the VHDL version of the Deep Decision Tree Engine (DDTE). Each tree is represented by HDL Tree Engine (HTE), which are composed of One Hot Decision Paths (OHDP) corresponding to Parallel Decision Paths (PDP) (from Figure 2 of Ref. Carlson:2022dgb).
Figure 2: Algorithm latency scaling vs. $N_\textrm{tree}$. The scaling is done with respect to the parameters stated on the plot, corresponding to the benchmark configuration in Table \ref{['tab:main']}. Diamond represents the HLS results Carlson:2022dgb.
Figure 3: Resource usage scaling vs. $N_\textrm{tree}$ (left) and $N_\textrm{bit}$ (right). On each plot, look up table usage is shown in circles with the scale on the left side and flip flop usage is shown in triangles with the scale on the right side. The scaling is done with respect to the parameters stated on the plot, corresponding to the benchmark configuration in Table \ref{['tab:main']}. Diamond represents the HLS results Carlson:2022dgb, with flip flop usage being off the scale and noted as such with the arrow.
Figure 4: Two adder designs using pipeline (left) and combinational logic (right) for the sum block in Fig. \ref{['fig:dt']}.
Figure 5: Distribution of the differences of $z$ coordinate between the impact point of the seed line and the cluster position in the RPC1 (left) and RPC3 (right). Muons are separated by charge and noise fakes. The sample is produced using the code available at Ref. MuonTriggerPhase2RPC.
...and 3 more figures

Nanosecond hardware regression trees in FPGA at the LHC

TL;DR

Abstract

Nanosecond hardware regression trees in FPGA at the LHC

Authors

TL;DR

Abstract

Table of Contents

Figures (8)