Minimising Event Size, Maximising Physics: Inclusive Particle Isolation for LHCb's Run 3
Marta Calvi, Tommaso Fulghesu, George Hallett, Luca Hartman, Basem Khanji, Veronica S. Kirsebom, Thomas Latham, Marion Lehuraux, Ching-Hua Li, Abhijit Mathad, Matthew Monk, Andy Morris, Matthew Scott Rudolph, Francesca Swystun, Dorothea vom Bruch
TL;DR
This work tackles the challenge of dramatically increasing Run 3 data rates at LHCb by reducing per-event size without sacrificing heavy-flavour physics. It introduces Inclusive Multivariate Isolation (IMI), a fast, XGBoost-based classifier that unifies strengths of classical isolation methods (track, cone, and vertex) to selectively retain only the most relevant particles per event. IMI achieves about 45% data-size reduction while preserving ~99% signal efficiency and delivering superior background rejection across diverse decay channels and event multiplicities, with robust validation on Run 3 data. The method is integrated into the LHCb framework and designed for future extensions, potentially serving as a fast pruning layer to support more compute-intensive reconstruction as the experiment scales toward the High-Luminosity LHC era. The results demonstrate a practical path to scalable, physics-preserving data processing in high-rate environments.
Abstract
The Run 3 of the LHC brings unprecedented luminosity and a surge in data volume to the LHCb detector, necessitating a critical reduction in the size of each reconstructed event without compromising the physics reach of the heavy-flavour programme. While signal decays typically involve just a few charged particles, a single proton-proton collision produces hundreds of tracks, with charged particle information dominating the event size. To address this imbalance, a suite of inclusive isolation tools have been developed, including both classical methods and a novel Inclusive Multivariate Isolation (IMI) algorithm. The IMI unifies the key strengths of classical isolation techniques and is designed to robustly handle diverse decay topologies and kinematics, enabling efficient reconstruction of decay chains with varying final-state multiplicities. It consistently outperforms traditional methods, with superior background rejection and high signal efficiency across diverse channels and event multiplicities. By retaining only the most relevant particles in each event, the method achieves a 45 % reduction in data size while preserving full physics performance, selecting signal particles with 99% efficiency. We also validate IMI on Run 3 data, confirming its robustness under real data-taking conditions. In the long term, IMI could provide a fast, lightweight front-end to support more compute-intensive selection strategies in the high-multiplicity environment of the High-Luminosity LHC.
