Table of Contents
Fetching ...

LEP Data@EDM4hep: mitigating data loss risks by increasing data FAIRness, with a view on FCC-ee

Jacopo Fanini, Gerardo Ganis, Marcello Maggi

Abstract

The LEP data represents the most precise and highest centre-of-mass energy sample of $e^+e^-$ collision data collected to date. Numerous scientific articles have been published since the conclusion of the experiments, underscoring the ongoing relevance of this dataset and the need to secure its long-term availability according to FAIR data preservation principles. These data could also play a crucial new role in the context of the evaluation of the physics potential of FCC-ee, due to the overlapping centre-of-mass energies, offering a valuable benchmark for detector performance and physics analyses. To fulfill this role, the data should be made available in EDM4hep, the standardized event data format currently developed in the context of the common HEP software ecosystem Key4hep. Migrating to EDM4hep would not only beneficial to future studies but also significantly mitigate the risk of data loss, increase accessibility and interoperability, hence facilitate long-term data preservation. A proof of concept workflow to perform the migration has been developed and successfully applied to ALEPH data.

LEP Data@EDM4hep: mitigating data loss risks by increasing data FAIRness, with a view on FCC-ee

Abstract

The LEP data represents the most precise and highest centre-of-mass energy sample of collision data collected to date. Numerous scientific articles have been published since the conclusion of the experiments, underscoring the ongoing relevance of this dataset and the need to secure its long-term availability according to FAIR data preservation principles. These data could also play a crucial new role in the context of the evaluation of the physics potential of FCC-ee, due to the overlapping centre-of-mass energies, offering a valuable benchmark for detector performance and physics analyses. To fulfill this role, the data should be made available in EDM4hep, the standardized event data format currently developed in the context of the common HEP software ecosystem Key4hep. Migrating to EDM4hep would not only beneficial to future studies but also significantly mitigate the risk of data loss, increase accessibility and interoperability, hence facilitate long-term data preservation. A proof of concept workflow to perform the migration has been developed and successfully applied to ALEPH data.
Paper Structure (16 sections, 5 figures, 3 tables)

This paper contains 16 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: EDM4hep data types schema from simulation to reconstruction. Relations are indicated by the black arrows, while blue arrows indicate external links.
  • Figure 2: Workflow of the migration process. Green rectangles show the files, blue ellipses the programs developed to perform the data migration. The two computing environments involved in the migration are shown in light grey.
  • Figure 3: Charged particles' total energy distribution in 1994 class 16 selected events. Data migrated to EDM4hep are shown as blue markers, while archived data are shown as a red line. The lower panel shows the difference in the number of events in each energy bin between archived and EDM4hep data.
  • Figure 4: Charged particles' total energy distribution in 1994 class 16 selected data and $q\bar{q}$ simulated data EDM4hep samples. Data are shown as blue markers, while the Monte Carlo simulation is shown as a green line. The lower panel shows the ratio between data and Monte Carlo for each energy bin.
  • Figure 5: Screenshot of the website home page documenting, among other things, how to recreate ALEPH's computing environment and the content of migrated files.