Nanosecond hardware regression trees in FPGA at the LHC
Pavel Serhiayenka, Stephen Roche, Benjamin Carlson, Tae Min Hong
TL;DR
The paper tackles real-time ML regression for trigger systems at the LHC, specifically aiming to estimate $E_ extrm{T}^ extrm{miss}$ using a low-latency, FPGA-based forest of decision trees. It introduces an HDL-based Deep Decision Tree Engine (DDTE) that implements a boosted decision tree regression in hardware, avoiding DSP and BRAM while delivering sub-10 to a few dozen nanoseconds latency depending on configuration. Key contributions include the VHDL DDTE implementation, two adder architectures (pipeline and combinational), pre-division to compute averages, and a detailed scaling analysis across forest size and input width, plus a muon momentum estimation application for ATLAS RPC at HL-LHC with competitive FPGA performance. The results demonstrate that ultra-fast, resource-efficient BDT regression is feasible for trigger-level ML tasks, enabling more capable real-time analyses in constrained FPGA environments.
Abstract
We present a generic parallel implementation of the decision tree-based machine learning (ML) method in hardware description language (HDL) on field programmable gate arrays (FPGA). A regression problem in high energy physics at the Large Hadron Collider is considered: the estimation of the magnitude of missing transverse momentum using boosted decision trees (BDT). A forest of twenty decision trees each with a maximum depth of ten using eight input variables of 16-bit precision is executed with a latency of less than 10 ns using O(0.1%) resources on Xilinx UltraScale+ VU9P -- approximately ten times faster and five times smaller compared to similar designs using high level synthesis (HLS) -- without the use of digital signal processors (DSP) while eliminating the use of block RAM (BRAM). We also demonstrate a potential application in the estimation of muon momentum for ATLAS RPC at HL-LHC.
