Table of Contents
Fetching ...

Hardware-Accelerated GNN-based Hit Filtering for the Belle II Level-1 Trigger

Greta Heine, Fabio Mayer, Marc Neu, Jürgen Becker, Torben Ferber

TL;DR

This work demonstrates a hardware-accelerated Graph Neural Network-based hit filtering system for the Belle II Level-1 Trigger, implemented as a dataflow accelerator on an FPGA. Hits are represented as graph nodes connected by geometry-informed edges, and a compressed Interaction Network classifies hits to distinguish signal from background. The model is trained on simulated and real data with 4-bit quantization and pruning, achieving robust performance while fitting tight latency budgets. In a single-sector demonstrator, the system processes data at 31.804 MHz with a total latency of 632.4 ns and moderate FPGA resource usage, yielding background rejection of about 83% at 95% signal efficiency. These results establish hit-level GNN-based filtering on FPGAs as a scalable, low-latency solution for real-time data reduction in high-luminosity collider environments, paving the way for full-detector deployment via sector-wise parallelization.

Abstract

We present a hardware-accelerated hit filtering system employing Graph Neural Networks (GNNs) on Field-Programmable Gate Arrays (FPGAs) for the Belle II Level-1 Trigger. The GNN exploits spatial and temporal relationships among sense wire hits and is optimized for high-throughput hardware operation via quantization, pruning, and static graph-building. Sector-wise spatial parallelization permits scaling to full-detector coverage, satisfying stringent latency and throughput requirements. At a sustained throughput of 31.804 MHz, the system processes sense wire data in real-time and achieves detector-level background suppression with a measured latency of 632.4 ns while utilizing 35.65% of Look-Up Tables (LUTs), and 29.75% of Flip-Flops, with zero Digital Signal Processing (DSP) usage, as demonstrated in a prototype implementation for a single sector on an AMD Ultrascale XVCU190. Offline validation using Belle II data yields a background hit rejection of 83% while maintaining 95% signal hit efficiency. This work establishes hit-level GNN-based filtering on FPGAs as a scalable low-latency solution for real-time data reduction in high-luminosity collider conditions.

Hardware-Accelerated GNN-based Hit Filtering for the Belle II Level-1 Trigger

TL;DR

This work demonstrates a hardware-accelerated Graph Neural Network-based hit filtering system for the Belle II Level-1 Trigger, implemented as a dataflow accelerator on an FPGA. Hits are represented as graph nodes connected by geometry-informed edges, and a compressed Interaction Network classifies hits to distinguish signal from background. The model is trained on simulated and real data with 4-bit quantization and pruning, achieving robust performance while fitting tight latency budgets. In a single-sector demonstrator, the system processes data at 31.804 MHz with a total latency of 632.4 ns and moderate FPGA resource usage, yielding background rejection of about 83% at 95% signal efficiency. These results establish hit-level GNN-based filtering on FPGAs as a scalable, low-latency solution for real-time data reduction in high-luminosity collider environments, paving the way for full-detector deployment via sector-wise parallelization.

Abstract

We present a hardware-accelerated hit filtering system employing Graph Neural Networks (GNNs) on Field-Programmable Gate Arrays (FPGAs) for the Belle II Level-1 Trigger. The GNN exploits spatial and temporal relationships among sense wire hits and is optimized for high-throughput hardware operation via quantization, pruning, and static graph-building. Sector-wise spatial parallelization permits scaling to full-detector coverage, satisfying stringent latency and throughput requirements. At a sustained throughput of 31.804 MHz, the system processes sense wire data in real-time and achieves detector-level background suppression with a measured latency of 632.4 ns while utilizing 35.65% of Look-Up Tables (LUTs), and 29.75% of Flip-Flops, with zero Digital Signal Processing (DSP) usage, as demonstrated in a prototype implementation for a single sector on an AMD Ultrascale XVCU190. Offline validation using Belle II data yields a background hit rejection of 83% while maintaining 95% signal hit efficiency. This work establishes hit-level GNN-based filtering on FPGAs as a scalable low-latency solution for real-time data reduction in high-luminosity collider conditions.

Paper Structure

This paper contains 9 sections, 3 figures.

Figures (3)

  • Figure 1: Overview of the -based hit filtering process: (\ref{['fig:cleanup_steps_1']}) hits before filtering with signal hits (pink) and background hits (grey), (\ref{['fig:cleanup_steps_2']}) hits are represented as graphs with edges connecting spatially compatible hits, (\ref{['fig:cleanup_steps_3']}) our performs edge or node classification to identify signal patterns where a dark colour denotes signal-like and light colour background-like classification, and (\ref{['fig:cleanup_steps_4']}) classification outputs are mapped back to individual hits for filtering.
  • Figure 2: Block diagram of the hardware-accelerated Interaction Network architecture, where Vitis HLS synthesized network blocks are mapped to dedicated . Static graphs generated from -supplied sense wire data are propagated and updated via a series of scatter and aggregate Switch Boxes in-between , realized in Chisel into a register-transfer level design. The final classifier outputs, after threshold application, are sent to downstream tracking modules.
  • Figure 3: (\ref{['fig:result_hits']}) Background-like hit distributions $n_{\text{extraCDC}}$ processing Belle II data (-selected $\mu\mu(\gamma)$ events from late 2024) with offline simulation on the full including and excluding the hit filtering. Both the full-precision and 4-bit quantised models achieve a background hit rejection of $>80%$ at 95% signal hit efficiency. (\ref{['fig:fpga_combined']}) resource utilization and latency per logic block for 495 sense wires and 2163 edges, showing modest lut/ff use and zero . The total pipeline latency amounts to 632.4ns; results are reported from Vivado 2024.2 after routing in out-of-context mode.