Hardware-Aware Design of a GNN-Based Hit Filtering Algorithm for the Belle II Level-1 Trigger

Greta Heine; Fabio Mayer; Marc Neu; Jürgen Becker; Torben Ferber

Hardware-Aware Design of a GNN-Based Hit Filtering Algorithm for the Belle II Level-1 Trigger

Greta Heine, Fabio Mayer, Marc Neu, Jürgen Becker, Torben Ferber

TL;DR

This work presents a hardware-aware model-compression workflow for this hit-filtering algorithm targeting deployment on FPGA devices within the Belle~II trigger system, and identifies a configuration that decreases this cost by more than two orders of magnitude relative to the full-precision reference implementation.

Abstract

The Belle~II experiment operates at high luminosity, where an increasing beam-induced background imposes stringent demands on the hardware Level-1 trigger system, which must operate under tight latency and bandwidth constraints. To achieve online data reduction within the Level-1 trigger system, we have developed a hit-filtering algorithm based on the lightweight Interaction Network architecture. In this work, we present a hardware-aware model-compression workflow for this hit-filtering algorithm targeting deployment on FPGA devices within the Belle~II trigger system. The network is adapted to the detector and trigger conditions through model-size and graph-size reduction, low-precision (4 bit) fixed-point arithmetic, and unstructured pruning. We assess the resulting design using the total number of bit operations as a hardware-aware computational complexity metric. Using this metric, we identify a configuration that decreases this cost by more than two orders of magnitude relative to the full-precision reference implementation. This reduction is achieved while preserving performance close to the reference model in terms of hit efficiency and background rejection, as indicated by only a modest decrease in the AUC score from 97.4 to 96.8, evaluated on Belle~II collision data.

Hardware-Aware Design of a GNN-Based Hit Filtering Algorithm for the Belle II Level-1 Trigger

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 2 figures)

This paper contains 13 sections, 2 equations, 2 figures.

Introduction
Compression pipeline: from full-precision to online
Full-precision baseline model
Graph inputs:
Training strategy:
Compressed quantized model
Model and graph size reduction:
4bit quantization:
Pruning:
Performance evaluation
Hit filtering performance
Number of bit operations
Conclusion

Figures (2)

Figure 1: Our workflow for deploying a PyTorch Geometric model on fpga: starting from model configuration and layer replacement with Brevitas quantization layers, followed by model compression including quantization-aware training, hardware generation for the dataflow accelerator, and subsequent rtl netlist creation for implementation.
Figure 2: (\ref{['fig:roc']}) Hit-level classification roc curves and auc values for different network compression steps and (\ref{['fig:bops']}) corresponding as a proxy for computational complexity towards implementation.

Hardware-Aware Design of a GNN-Based Hit Filtering Algorithm for the Belle II Level-1 Trigger

TL;DR

Abstract

Hardware-Aware Design of a GNN-Based Hit Filtering Algorithm for the Belle II Level-1 Trigger

Authors

TL;DR

Abstract

Table of Contents

Figures (2)