Track reconstruction as a service for collider physics
Haoran Zhao, Yuan-Tang Chou, Yao Yao, Xiangyang Ju, Yongbin Feng, William Patrick McCormack, Miles Cochran-Branson, Jan-Frederik Schulte, Miaoyuan Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu, Kevin Pedro, Nhan Tran
TL;DR
The paper addresses the growing computational burden of charged-particle track reconstruction at the HL-LHC by proposing an inference-as-a-service framework that offloads tracking to GPUs via NVIDIA Triton. It evaluates two representative pipelines, Patatrack (rule-based) and Exa.TrkX (ML-based), showing improved GPU utilization and the ability to serve multiple CPU cores concurrently with minimal per-request latency. Key contributions include the implementation of custom Triton backends for both pipelines, comprehensive throughput and latency measurements, and integration with the ACTS framework to demonstrate end-to-end workflow performance. The results indicate substantial speedups and efficiency gains over CPU-only approaches, with potential reductions in GPU count and operational power, offering a scalable path for HL-LHC computing in the face of increasing pileup.
Abstract
Optimizing charged-particle track reconstruction algorithms is crucial for efficient event reconstruction in Large Hadron Collider (LHC) experiments due to their significant computational demands. Existing track reconstruction algorithms have been adapted to run on massively parallel coprocessors, such as graphics processing units (GPUs), to reduce processing time. Nevertheless, challenges remain in fully harnessing the computational capacity of coprocessors in a scalable and non-disruptive manner. This paper proposes an inference-as-a-service approach for particle tracking in high energy physics experiments. To evaluate the efficacy of this approach, two distinct tracking algorithms are tested: Patatrack, a rule-based algorithm, and Exa$.$TrkX, a machine learning-based algorithm. The as-a-service implementations show enhanced GPU utilization and can process requests from multiple CPU cores concurrently without increasing per-request latency. The impact of data transfer is minimal and insignificant compared to running on local coprocessors. This approach greatly improves the computational efficiency of charged particle tracking, providing a solution to the computing challenges anticipated in the High-Luminosity LHC era.
