Table of Contents
Fetching ...

Conceptual Design Report for FAIR Computing

Johan Messchendorp, Mohammad Al-Turany, Volker Friese, Thorsten Kollegger, Bastian Loeher, Jochen Markert, Andrew Mistry, Thomas Neff, Adrian Oeftiger, Michael Papenbrock, Stephane Pietri, Shahab Sanjari, Tobias Stockmanns

TL;DR

This Conceptual Design Report articulates a forward‑looking plan for FAIR’s computing infrastructure, integrating a centrally orchestrated Tier0 (GreenCube) with federated external centers to meet HPC/HTC needs across diverse research pillars. It outlines a phased timeline (FS+, MSVc) and a federated, FAIR‑conscious model (AAI, Data Lake, OSSR) to support open science while containing costs and energy use. Resource estimates span compute, storage, and bandwidth across CBM, PANDA, NUSTAR, APPA, HADES, and THEORY, with detailed online/offline workflows, data flows, and data management policies. It also emphasizes R&D in ML/AI, heterogeneous architectures, and EOSC/NFDI‑aligned interfaces to ensure scalable, interoperable access to data and services, underpinned by governance structures that coordinate funding, usage, and policy compliance. Overall, the document lays out a comprehensive blueprint for a sustainable, flexible, and open FAIR computing ecosystem that can scale to TB/s data rates and hundreds of PBs of archival data while enabling broad international collaboration.

Abstract

This Conceptual Design Report (CDR) presents the plans of the computing infrastructure for research at FAIR, Darmstadt, Germany. It presents the computing requirements of the various research groups, the policies for the computing and storage infrastructure, the foreseen FAIR computing model including the open data, software and services policies and architecture for the periods starting in 2028 with the "first science (plus)" phase to the modularized start version of FAIR. The overall ambition is to create a federated and centrally-orchestrated infrastructure serving the large diversity of the research lines present with sufficient scalability and flexibility to cope with future data challenges that will be present at FAIR.

Conceptual Design Report for FAIR Computing

TL;DR

This Conceptual Design Report articulates a forward‑looking plan for FAIR’s computing infrastructure, integrating a centrally orchestrated Tier0 (GreenCube) with federated external centers to meet HPC/HTC needs across diverse research pillars. It outlines a phased timeline (FS+, MSVc) and a federated, FAIR‑conscious model (AAI, Data Lake, OSSR) to support open science while containing costs and energy use. Resource estimates span compute, storage, and bandwidth across CBM, PANDA, NUSTAR, APPA, HADES, and THEORY, with detailed online/offline workflows, data flows, and data management policies. It also emphasizes R&D in ML/AI, heterogeneous architectures, and EOSC/NFDI‑aligned interfaces to ensure scalable, interoperable access to data and services, underpinned by governance structures that coordinate funding, usage, and policy compliance. Overall, the document lays out a comprehensive blueprint for a sustainable, flexible, and open FAIR computing ecosystem that can scale to TB/s data rates and hundreds of PBs of archival data while enabling broad international collaboration.

Abstract

This Conceptual Design Report (CDR) presents the plans of the computing infrastructure for research at FAIR, Darmstadt, Germany. It presents the computing requirements of the various research groups, the policies for the computing and storage infrastructure, the foreseen FAIR computing model including the open data, software and services policies and architecture for the periods starting in 2028 with the "first science (plus)" phase to the modularized start version of FAIR. The overall ambition is to create a federated and centrally-orchestrated infrastructure serving the large diversity of the research lines present with sufficient scalability and flexibility to cope with future data challenges that will be present at FAIR.

Paper Structure

This paper contains 92 sections, 13 figures, 51 tables.

Figures (13)

  • Figure 1: Sketch of the required compute capacity for a nominal FS+ year. The light-grey area depicts the total required shared compute capacity for online and offline computations whereby the online part is averaged out over the year. The dark-grey area indicated the required online computing capacity taking into account CBM (100 days), NUSTAR (180 days), HADES (30 days), and APPA (180 days). The dotted line represents the presently used compute capacity at the GreenCube for FAIR Phase Zero activities. The dashed line depicts the minimum required capacity at FAIR Tier0 which includes the maximum online compute capacity plus data intensive tasks.
  • Figure 2: The required amount of storage as a function of year with FS+ starting in 2028 and MSVc in 2032. The top panel depicts the requested disk space for fast access, whereas the bottom panel presents the needed long-term storage space (archive). The contributions of the various research lines are indicated by different colors. The dashed line shows the storage used on the Lustre filesystem for FAIR Phase Zero activities. Copies of the raw data at other FAIR facilities are not included, but imposed a requirement established by an external audit
  • Figure 3: Schematic view of FAIR with its beam lines and experimental areas. The color code indicates the various stages of the construction relevant for this CDR. The left-hand side shows images of the existing GreenCube as seen from the outside (top) and inside (bottom).
  • Figure 4: An overview of the compute usage at the Virgo cluster at GSI classified according to the various research lines. The presented data correspond to the period 1/2021-3/2023 and amounts to about 82 kcore years.
  • Figure 5: A stacked overview of the maximum disk usage at the Virgo cluster (Lustre) at GSI classified according to the various research lines and for the past three years.
  • ...and 8 more figures