Table of Contents
Fetching ...

Machine-learning-based particle identification with missing data

Miłosz Kasak, Kamil Deja, Maja Karwowska, Monika Jakubowska, Łukasz Graczykowski, Małgorzata Janik

TL;DR

This work introduces a novel method for Particle Identification (PID) within the scope of the ALICE experiment at the Large Hadron Collider at CERN that improves the PID purity and efficiency of the selected sample for all investigated particle species.

Abstract

In this work, we introduce a novel method for Particle Identification (PID) within the scope of the ALICE experiment at the Large Hadron Collider at CERN. Identifying products of ultrarelativisitc collisions delivered by the LHC is one of the crucial objectives of ALICE. Typically employed PID methods rely on hand-crafted selections, which compare experimental data to theoretical simulations. To improve the performance of the baseline methods, novel approaches use machine learning models that learn the proper assignment in a classification task. However, because of the various detection techniques used by different subdetectors, as well as the limited detector efficiency and acceptance, produced particles do not always yield signals in all of the ALICE components. This results in data with missing values. Machine learning techniques cannot be trained with such examples, so a significant part of the data is skipped during training. In this work, we propose the first method for PID that can be trained with all of the available data examples, including incomplete ones. Our approach improves the PID purity and efficiency of the selected sample for all investigated particle species.

Machine-learning-based particle identification with missing data

TL;DR

This work introduces a novel method for Particle Identification (PID) within the scope of the ALICE experiment at the Large Hadron Collider at CERN that improves the PID purity and efficiency of the selected sample for all investigated particle species.

Abstract

In this work, we introduce a novel method for Particle Identification (PID) within the scope of the ALICE experiment at the Large Hadron Collider at CERN. Identifying products of ultrarelativisitc collisions delivered by the LHC is one of the crucial objectives of ALICE. Typically employed PID methods rely on hand-crafted selections, which compare experimental data to theoretical simulations. To improve the performance of the baseline methods, novel approaches use machine learning models that learn the proper assignment in a classification task. However, because of the various detection techniques used by different subdetectors, as well as the limited detector efficiency and acceptance, produced particles do not always yield signals in all of the ALICE components. This results in data with missing values. Machine learning techniques cannot be trained with such examples, so a significant part of the data is skipped during training. In this work, we propose the first method for PID that can be trained with all of the available data examples, including incomplete ones. Our approach improves the PID purity and efficiency of the selected sample for all investigated particle species.
Paper Structure (21 sections, 5 equations, 9 figures, 7 tables)

This paper contains 21 sections, 5 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Components of the ALICE detector in its Run 2 configuration botta:2017a.
  • Figure 2: The proposed model architecture. Layered blocks are applied separately to each vector in a set. Single blocks are applied to their input as a whole.
  • Figure 2: Particle type distribution. Approximately 97.8% of the examples belong to the 6 most populous particle types.
  • Figure 3: Missing data distribution. Over 62.8% of the examples are missing at least one value.
  • Figure 4: Precision recall curve for different ML based approaches with missing data
  • ...and 4 more figures