The Virtual Research Environment: towards a comprehensive analysis platform
Elena Gazzarrini, Enrique Garcia, Domenic Gosein, Alba Vendrell Moya, Agisilaos Kounelis, Xavier Espinal
TL;DR
The paper addresses the need for end-to-end, FAIR, reproducible data workflows across disciplines handling exabyte-scale datasets. It proposes a modular Virtual Research Environment (VRE) that integratess a Rucio-based Data Lake, a Reana-driven computing cluster, a federated INDIGO IAM authentication framework, and an enhanced Jupyter-based notebook interface, all deployed on CERN Cloud with Kubernetes. The key contributions include demonstrating cross-domain applicability (HEP, high-energy astrophysics, neutrino astronomy, GW science), enabling data governance, scalable computing, and reproducible analyses, and providing a blueprint for open, reproducible infrastructure deployment under EOSC-Future and ESCAPE. The work has practical impact by enabling faster, more transparent scientific workflows, fostering cross-disciplinary collaborations, and offering a replicable deployment model for other institutions.
Abstract
The Virtual Research Environment is an analysis platform developed at CERN serving the needs of scientific communities involved in European Projects. Its scope is to facilitate the development of end-to-end physics workflows, providing researchers with access to an infrastructure and to the digital content necessary to produce and preserve a scientific result in compliance with FAIR principles. The platform's development is aimed at demonstrating how sciences spanning from High Energy Physics to Astrophysics could benefit from the usage of common technologies, initially born to satisfy CERN's exabyte-scale data management needs. The Virtual Research Environment's main components are (1) a federated distributed storage solution (the Data Lake), providing functionalities for data injection and replication through a Data Management framework (Rucio), (2) a computing cluster supplying the processing power to run full analyses with Reana, a re-analysis software, (3) a federated and reliable Authentication and Authorization layer and (4) an enhanced notebook interface with containerised environments to hide the infrastructure's complexity from the user. The deployment of the Virtual Research Environment is open-source and modular, in order to make it easily reproducible by partner institutions; it is publicly accessible and kept up to date by taking advantage of state of the art IT-infrastructure technologies.
