Table of Contents
Fetching ...

Minimising Event Size, Maximising Physics: Inclusive Particle Isolation for LHCb's Run 3

Marta Calvi, Tommaso Fulghesu, George Hallett, Luca Hartman, Basem Khanji, Veronica S. Kirsebom, Thomas Latham, Marion Lehuraux, Ching-Hua Li, Abhijit Mathad, Matthew Monk, Andy Morris, Matthew Scott Rudolph, Francesca Swystun, Dorothea vom Bruch

TL;DR

This work tackles the challenge of dramatically increasing Run 3 data rates at LHCb by reducing per-event size without sacrificing heavy-flavour physics. It introduces Inclusive Multivariate Isolation (IMI), a fast, XGBoost-based classifier that unifies strengths of classical isolation methods (track, cone, and vertex) to selectively retain only the most relevant particles per event. IMI achieves about 45% data-size reduction while preserving ~99% signal efficiency and delivering superior background rejection across diverse decay channels and event multiplicities, with robust validation on Run 3 data. The method is integrated into the LHCb framework and designed for future extensions, potentially serving as a fast pruning layer to support more compute-intensive reconstruction as the experiment scales toward the High-Luminosity LHC era. The results demonstrate a practical path to scalable, physics-preserving data processing in high-rate environments.

Abstract

The Run 3 of the LHC brings unprecedented luminosity and a surge in data volume to the LHCb detector, necessitating a critical reduction in the size of each reconstructed event without compromising the physics reach of the heavy-flavour programme. While signal decays typically involve just a few charged particles, a single proton-proton collision produces hundreds of tracks, with charged particle information dominating the event size. To address this imbalance, a suite of inclusive isolation tools have been developed, including both classical methods and a novel Inclusive Multivariate Isolation (IMI) algorithm. The IMI unifies the key strengths of classical isolation techniques and is designed to robustly handle diverse decay topologies and kinematics, enabling efficient reconstruction of decay chains with varying final-state multiplicities. It consistently outperforms traditional methods, with superior background rejection and high signal efficiency across diverse channels and event multiplicities. By retaining only the most relevant particles in each event, the method achieves a 45 % reduction in data size while preserving full physics performance, selecting signal particles with 99% efficiency. We also validate IMI on Run 3 data, confirming its robustness under real data-taking conditions. In the long term, IMI could provide a fast, lightweight front-end to support more compute-intensive selection strategies in the high-multiplicity environment of the High-Luminosity LHC.

Minimising Event Size, Maximising Physics: Inclusive Particle Isolation for LHCb's Run 3

TL;DR

This work tackles the challenge of dramatically increasing Run 3 data rates at LHCb by reducing per-event size without sacrificing heavy-flavour physics. It introduces Inclusive Multivariate Isolation (IMI), a fast, XGBoost-based classifier that unifies strengths of classical isolation methods (track, cone, and vertex) to selectively retain only the most relevant particles per event. IMI achieves about 45% data-size reduction while preserving ~99% signal efficiency and delivering superior background rejection across diverse decay channels and event multiplicities, with robust validation on Run 3 data. The method is integrated into the LHCb framework and designed for future extensions, potentially serving as a fast pruning layer to support more compute-intensive reconstruction as the experiment scales toward the High-Luminosity LHC era. The results demonstrate a practical path to scalable, physics-preserving data processing in high-rate environments.

Abstract

The Run 3 of the LHC brings unprecedented luminosity and a surge in data volume to the LHCb detector, necessitating a critical reduction in the size of each reconstructed event without compromising the physics reach of the heavy-flavour programme. While signal decays typically involve just a few charged particles, a single proton-proton collision produces hundreds of tracks, with charged particle information dominating the event size. To address this imbalance, a suite of inclusive isolation tools have been developed, including both classical methods and a novel Inclusive Multivariate Isolation (IMI) algorithm. The IMI unifies the key strengths of classical isolation techniques and is designed to robustly handle diverse decay topologies and kinematics, enabling efficient reconstruction of decay chains with varying final-state multiplicities. It consistently outperforms traditional methods, with superior background rejection and high signal efficiency across diverse channels and event multiplicities. By retaining only the most relevant particles in each event, the method achieves a 45 % reduction in data size while preserving full physics performance, selecting signal particles with 99% efficiency. We also validate IMI on Run 3 data, confirming its robustness under real data-taking conditions. In the long term, IMI could provide a fast, lightweight front-end to support more compute-intensive selection strategies in the high-multiplicity environment of the High-Luminosity LHC.

Paper Structure

This paper contains 25 sections, 4 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: (Top) Event size breakdown for semileptonic Full stream events in LHCb, based on a minimum-bias Run 3 simulation. The base particles and metadata include a few particles from a partially reconstructed $b$-hadron decay (e.g., $\mathrm{D}^0$ and a muon), together with primary vertex, trigger, and reconstruction metadata. The extra charged particle component covers all other charged tracks (Long, Downstream, Upstream) and RICH PID information. Additional VELO and T tracks contribute 10% and 15% of the event size, respectively, and are not shown as they do not contribute to the Semileptonic event size. The extra neutral particle component corresponds to reconstructed neutrals, such as photons and neutral hadrons. (Bottom) Track categories in LHCb, defined by the tracking sub-detectors where hits are recorded LHCb:2023hlw.
  • Figure 2: The LHCb data flow during Run 3, illustrating the throughput and event rates for different streams, and indicating where the classical and IMI isolation algorithms are applied. Note that none of the Spruce selection lines using IMI rely on candidates preselected with the classical isolation tool. Based on numbers from Ref. LHCb:2018mlt.
  • Figure 3: Schematic illustration of the three classical isolation strategies: track isolation (top), based on large impact-parameter significance; cone isolation (middle), based on the track’s proximity to the signal candidate; and vertex isolation (bottom), based on compatibility with the signal decay vertex. For clarity, the terms "Signal" and "Background" follow the definitions introduced in Section \ref{['subsec:sig_bkg_definition']}, while in the context of isolation the more general terminology "isolated" and "non-isolated" is always used.
  • Figure 4: Illustration of typical signal- and background-like topologies relevant to the IMI tool. The depicted $B_s^{*} \rightarrow\xspace B K$ decay is shown as an illustrative example and was used only in the development of the cut-based method. In the left diagram, the signal candidates include non-isolated particles originating from the decay vertex of a b-hadron or its higher excited state. In the right diagram, the background candidates are defined as those formed by pairing with particles originating from the primary vertex or other b-hadrons in the event.
  • Figure 5: Distributions of the input features used to train the IMI. The curves for signal particles (non-isolated particles) are shown as red histograms, while background particles (isolated particles) are represented by green histograms. These variables serve as inputs to the multivariate classifier. See Sec. \ref{['subsec:training_feature']} for a detailed description of each feature.
  • ...and 14 more figures