Particle identification with machine learning from incomplete data in the ALICE experiment
Maja Karwowska, Łukasz Graczykowski, Kamil Deja, Miłosz Kasak, Małgorzata Janik
TL;DR
High-level summary: The paper tackles particle identification in ALICE under incomplete data and domain-shift conditions by replacing fixed cuts with neural-network binary classifiers trained on Monte Carlo data. It introduces an attention-enabled Feature Set Embedding to handle missing detector information and uses Domain Adversarial Neural Networks to align simulated and real data, all within the ALICE O^2 framework via ONNX integration. Key contributions include improved PID efficiency and purity over traditional methods, a scalable mechanism to train and deploy ML models in the experiment, and a path toward Run 3 production. The work demonstrates practical ML-PID deployment in a large-scale collider experiment with emphasis on data-format interoperability and domain adaptation.
Abstract
The ALICE experiment at the LHC measures properties of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. Such studies require accurate particle identification (PID). ALICE provides PID information via several detectors for particles with momentum from about 100 MeV/c up to 20 GeV/c. Traditionally, particles are selected with rectangular cuts. A much better performance can be achieved with machine learning (ML) methods. Our solution uses multiple neural networks (NN) serving as binary classifiers. Moreover, we extended our particle classifier with Feature Set Embedding and attention in order to train on data with incomplete samples. We also present the integration of the ML project with the ALICE analysis software, and we discuss domain adaptation, the ML technique needed to transfer the knowledge between simulated and real experimental data.
