Table of Contents
Fetching ...

Improvements of the GPU Processing Framework for ALICE

David Rohr

TL;DR

ALICE addresses the challenge of processing a data rate of up to $50 kHz$ collisions and up to $3.4 TB/s$ using GPUs within the O2 framework; the paper presents a set of GPU framework improvements to boost online and offline performance. The main contributions include per-kernel compilation units, a deterministic processing mode for bitwise reproducibility, framework support for sharing components across GPU libraries, and Run Time Compilation (RTC) for on-the-fly optimization, plus architecture-aware build strategies. These approaches reduce compile times, improve debugging and validation, and broaden hardware portability, achieving online GPU-bound processing and offline GPU acceleration with $2\times$–$2.5\times$ speedups and a roadmap toward ~$5\times$ overall speedup. The results demonstrate a practical path for expanding GPU use to GRID-based workflows and for porting more reconstruction steps to GPUs in future runs.

Abstract

ALICE is the dedicated heavy ion experiment at the LHC at CERN and records lead-lead collisions at a rate of up to 50 kHz. The detector with the highest data rate of up to 3.4 TB/s is the TPC. ALICE performs the full online TPC processing corresponding to more than 95\% of the total workload on GPUs, and when there is no beam in the LHC, the online computing farm's GPUs are used to speed up the offline processing. After the deployment of the first version of the online TPC processing needed for data taking, ALICE has implemented many improvements to its GPU processing framework. These include a run time compilation mode applying on the fly optimizations, improvements to parallelize / speed up the GPU compilation, debugging modes to guarantee reproducible and deterministic results in concurrent reconstruction, and framework features to leverage common components in the code of different detectors. The proceedings give an overview of the ALICE experience with GPUs in online and offline processing and present the latest GPU processing framework features.

Improvements of the GPU Processing Framework for ALICE

TL;DR

ALICE addresses the challenge of processing a data rate of up to collisions and up to using GPUs within the O2 framework; the paper presents a set of GPU framework improvements to boost online and offline performance. The main contributions include per-kernel compilation units, a deterministic processing mode for bitwise reproducibility, framework support for sharing components across GPU libraries, and Run Time Compilation (RTC) for on-the-fly optimization, plus architecture-aware build strategies. These approaches reduce compile times, improve debugging and validation, and broaden hardware portability, achieving online GPU-bound processing and offline GPU acceleration with speedups and a roadmap toward ~ overall speedup. The results demonstrate a practical path for expanding GPU use to GRID-based workflows and for porting more reconstruction steps to GPUs in future runs.

Abstract

ALICE is the dedicated heavy ion experiment at the LHC at CERN and records lead-lead collisions at a rate of up to 50 kHz. The detector with the highest data rate of up to 3.4 TB/s is the TPC. ALICE performs the full online TPC processing corresponding to more than 95\% of the total workload on GPUs, and when there is no beam in the LHC, the online computing farm's GPUs are used to speed up the offline processing. After the deployment of the first version of the online TPC processing needed for data taking, ALICE has implemented many improvements to its GPU processing framework. These include a run time compilation mode applying on the fly optimizations, improvements to parallelize / speed up the GPU compilation, debugging modes to guarantee reproducible and deterministic results in concurrent reconstruction, and framework features to leverage common components in the code of different detectors. The proceedings give an overview of the ALICE experience with GPUs in online and offline processing and present the latest GPU processing framework features.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: Evolution of ALICE TPC GPU online processing time per time frame: the relative speedups / degradations in columns 3 and 4 are normalized to the performance during 2023 Pb--Pb data taking (first line), with the performance impact (column 3) showing the performance of the feature in a line compared to the previous line, while the Performance v. s. 2023 Pb--Pb (column 4) shows the relative impact of the software including this feature and all previous features v. s. the software version running in November 2023.$^{\text{3}}$