Table of Contents
Fetching ...

The Virtual Research Environment: towards a comprehensive analysis platform

Elena Gazzarrini, Enrique Garcia, Domenic Gosein, Alba Vendrell Moya, Agisilaos Kounelis, Xavier Espinal

TL;DR

The paper addresses the need for end-to-end, FAIR, reproducible data workflows across disciplines handling exabyte-scale datasets. It proposes a modular Virtual Research Environment (VRE) that integratess a Rucio-based Data Lake, a Reana-driven computing cluster, a federated INDIGO IAM authentication framework, and an enhanced Jupyter-based notebook interface, all deployed on CERN Cloud with Kubernetes. The key contributions include demonstrating cross-domain applicability (HEP, high-energy astrophysics, neutrino astronomy, GW science), enabling data governance, scalable computing, and reproducible analyses, and providing a blueprint for open, reproducible infrastructure deployment under EOSC-Future and ESCAPE. The work has practical impact by enabling faster, more transparent scientific workflows, fostering cross-disciplinary collaborations, and offering a replicable deployment model for other institutions.

Abstract

The Virtual Research Environment is an analysis platform developed at CERN serving the needs of scientific communities involved in European Projects. Its scope is to facilitate the development of end-to-end physics workflows, providing researchers with access to an infrastructure and to the digital content necessary to produce and preserve a scientific result in compliance with FAIR principles. The platform's development is aimed at demonstrating how sciences spanning from High Energy Physics to Astrophysics could benefit from the usage of common technologies, initially born to satisfy CERN's exabyte-scale data management needs. The Virtual Research Environment's main components are (1) a federated distributed storage solution (the Data Lake), providing functionalities for data injection and replication through a Data Management framework (Rucio), (2) a computing cluster supplying the processing power to run full analyses with Reana, a re-analysis software, (3) a federated and reliable Authentication and Authorization layer and (4) an enhanced notebook interface with containerised environments to hide the infrastructure's complexity from the user. The deployment of the Virtual Research Environment is open-source and modular, in order to make it easily reproducible by partner institutions; it is publicly accessible and kept up to date by taking advantage of state of the art IT-infrastructure technologies.

The Virtual Research Environment: towards a comprehensive analysis platform

TL;DR

The paper addresses the need for end-to-end, FAIR, reproducible data workflows across disciplines handling exabyte-scale datasets. It proposes a modular Virtual Research Environment (VRE) that integratess a Rucio-based Data Lake, a Reana-driven computing cluster, a federated INDIGO IAM authentication framework, and an enhanced Jupyter-based notebook interface, all deployed on CERN Cloud with Kubernetes. The key contributions include demonstrating cross-domain applicability (HEP, high-energy astrophysics, neutrino astronomy, GW science), enabling data governance, scalable computing, and reproducible analyses, and providing a blueprint for open, reproducible infrastructure deployment under EOSC-Future and ESCAPE. The work has practical impact by enabling faster, more transparent scientific workflows, fostering cross-disciplinary collaborations, and offering a replicable deployment model for other institutions.

Abstract

The Virtual Research Environment is an analysis platform developed at CERN serving the needs of scientific communities involved in European Projects. Its scope is to facilitate the development of end-to-end physics workflows, providing researchers with access to an infrastructure and to the digital content necessary to produce and preserve a scientific result in compliance with FAIR principles. The platform's development is aimed at demonstrating how sciences spanning from High Energy Physics to Astrophysics could benefit from the usage of common technologies, initially born to satisfy CERN's exabyte-scale data management needs. The Virtual Research Environment's main components are (1) a federated distributed storage solution (the Data Lake), providing functionalities for data injection and replication through a Data Management framework (Rucio), (2) a computing cluster supplying the processing power to run full analyses with Reana, a re-analysis software, (3) a federated and reliable Authentication and Authorization layer and (4) an enhanced notebook interface with containerised environments to hide the infrastructure's complexity from the user. The deployment of the Virtual Research Environment is open-source and modular, in order to make it easily reproducible by partner institutions; it is publicly accessible and kept up to date by taking advantage of state of the art IT-infrastructure technologies.
Paper Structure (8 sections, 2 figures, 1 table)

This paper contains 8 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: EOSC-Future's Dark Matter Science Project aims at bringing together different search approaches (Astrophysics, Theory, Direct Detection, Collider Physics, Indirect Detection), to ultimately investigate limits on DM mass.
  • Figure 2: A graphical representation of the VRE components, i.e. (1) a federated distributed storage solution (blue), (2) a computing cluster (red), (3) a federated AAI layer (pink) and (4) an enhanced notebook interface (purple).