Deploying a Hybrid PVFinder Algorithm for Primary Vertex Reconstruction in LHCb's GPU-Resident HLT1

Simon Akar; Mohamed Elashri; Conor Henderson; Michael Sokoloff

Deploying a Hybrid PVFinder Algorithm for Primary Vertex Reconstruction in LHCb's GPU-Resident HLT1

Simon Akar, Mohamed Elashri, Conor Henderson, Michael Sokoloff

TL;DR

This work presents the development of an inference engine for PVFinder, a hybrid deep neural network for finding primary vertices, the proton-proton collision points from which all subsequent particle decays originate into Allen, LHCb's High Level Trigger (HLT1) framework.

Abstract

LHCb's Run 3 upgrade introduced a fully software-based trigger system operating at 30~MHz, processing an average of 5.6 proton-proton collision vertices per bunch crossing (event). This work presents the development of an inference engine for PVFinder, a hybrid deep neural network for finding primary vertices, the proton-proton collision points from which all subsequent particle decays originate into Allen, LHCb's High Level Trigger (HLT1) framework. The integration addresses critical real-time constraints including fixed memory pools, single-stream execution, and sub-400~$μ$s per-event processing budgets on NVIDIA GPUs. We introduce a translation layer that bridges Allen's Structure-of-Arrays (SoA) data layout with cuDNN's tensor format while maintaining zero-copy semantics and deterministic behavior. Current performance shows the CNN stage contributes significant throughput overhead. We present a roadmap targeting order-of-magnitude improvements through mixed-precision computing, model compression and other techniques.

Deploying a Hybrid PVFinder Algorithm for Primary Vertex Reconstruction in LHCb's GPU-Resident HLT1

TL;DR

Abstract

s per-event processing budgets on NVIDIA GPUs. We introduce a translation layer that bridges Allen's Structure-of-Arrays (SoA) data layout with cuDNN's tensor format while maintaining zero-copy semantics and deterministic behavior. Current performance shows the CNN stage contributes significant throughput overhead. We present a roadmap targeting order-of-magnitude improvements through mixed-precision computing, model compression and other techniques.

Paper Structure (6 sections, 2 figures, 2 tables)

This paper contains 6 sections, 2 figures, 2 tables.

Introduction
PVFinder Architecture for Real-Time Inference
Allen Integration and Translation Layer
Integration Performance Results
Optimization Roadmap and Outlook
Conclusions

Figures (2)

Figure 1: PVFinder physics performance showing efficiency vs. false positive rate for different configurations. The magenta configuration (FP32, 64-channel UNet) selected for deployment achieves $>$ 97% efficiency with 0.03 false positives per event, significantly outperforming the LHCb heuristic baseline dziurda_parallel_2025. FP16 configurations show minimal performance degradation.
Figure 2: PVFinder hybrid architecture showing the three-stage pipeline: FC layers process track parameters (9 features/track) into some representation, UNet CNN refines spatial patterns into probability histograms, and peak finding extracts vertex positions. The FC stage is implemented in native CUDA while the CNN stage uses cuDNN, bridged by the translation layer.

Deploying a Hybrid PVFinder Algorithm for Primary Vertex Reconstruction in LHCb's GPU-Resident HLT1

TL;DR

Abstract

Deploying a Hybrid PVFinder Algorithm for Primary Vertex Reconstruction in LHCb's GPU-Resident HLT1

Authors

TL;DR

Abstract

Table of Contents

Figures (2)