Table of Contents
Fetching ...

Towards FAIR Astrophysical Simulations

Susanne Pfalzner, Stephan Hachinger, Jolanta Zjupa, Salvatore Cielo, Frank W. Wagner, Marcus Brüggen, Annika Hagemeier

TL;DR

The paper addresses the reproducibility crisis in theoretical astrophysics by arguing that FAIR data alone is insufficient for reproducibility in HPC-driven simulations. It surveys current data management practices, identifies barriers, and outlines a practical path toward FAIR and reproducible simulations, emphasizing metadata, provenance, and persistent identifiers. Key contributions include a framework linking FAIR principles to reproducibility, a review of tooling and workflows, and culturally informed recommendations to support small research groups. The work aims to catalyze community discussions and the adoption of low-threshold, HPC-friendly solutions that enable transparent, reusable, and verifiable simulation results with minimal barriers to entry.

Abstract

Reproducibility is a cornerstone of science. FAIR (findable, accessible, interoperable, and reusable) data is often a vital step towards testing the reproducibility of results. The implementation of FAIR principles in the astrophysical simulation community is still varied. We approach the discussion of this topic mainly from a high-performance computing (HPC) point of view. We identify the main obstacles to FAIR astrophysics simulations: First, the vast datasets created in simulations on HPC facilities complicate FAIR data management. Second, missing incentives to fully share codes, results, and diagnostic data. Third, a lack of workflows that include data publication and technical support. Therefore, particularly smaller research groups struggle due to the unavailability of dedicated personnel and time in their efforts towards FAIR and open simulations. We propose actionable steps towards achieving ``FAIRer'' data and open source publication standards in numerical astrophysics. Our suggestions include low-threshold methods to fulfil the basic FAIR requirements as well as basic tools for FAIR (meta-)data generation and data/code publication. This work is a high-level overview intended to initiate discussions within the community, offering initial solutions to these challenges.

Towards FAIR Astrophysical Simulations

TL;DR

The paper addresses the reproducibility crisis in theoretical astrophysics by arguing that FAIR data alone is insufficient for reproducibility in HPC-driven simulations. It surveys current data management practices, identifies barriers, and outlines a practical path toward FAIR and reproducible simulations, emphasizing metadata, provenance, and persistent identifiers. Key contributions include a framework linking FAIR principles to reproducibility, a review of tooling and workflows, and culturally informed recommendations to support small research groups. The work aims to catalyze community discussions and the adoption of low-threshold, HPC-friendly solutions that enable transparent, reusable, and verifiable simulation results with minimal barriers to entry.

Abstract

Reproducibility is a cornerstone of science. FAIR (findable, accessible, interoperable, and reusable) data is often a vital step towards testing the reproducibility of results. The implementation of FAIR principles in the astrophysical simulation community is still varied. We approach the discussion of this topic mainly from a high-performance computing (HPC) point of view. We identify the main obstacles to FAIR astrophysics simulations: First, the vast datasets created in simulations on HPC facilities complicate FAIR data management. Second, missing incentives to fully share codes, results, and diagnostic data. Third, a lack of workflows that include data publication and technical support. Therefore, particularly smaller research groups struggle due to the unavailability of dedicated personnel and time in their efforts towards FAIR and open simulations. We propose actionable steps towards achieving ``FAIRer'' data and open source publication standards in numerical astrophysics. Our suggestions include low-threshold methods to fulfil the basic FAIR requirements as well as basic tools for FAIR (meta-)data generation and data/code publication. This work is a high-level overview intended to initiate discussions within the community, offering initial solutions to these challenges.
Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Figures (4)

  • Figure 1: Steps from a "publication-only" workflow to one that enables "full replication" by providing executable code, data, and diagnostics in a linked format, inspired by the gold-standard illustration for simulation publications by Peng:2011. "Code (public)" refers to the situation where some version of code is publicly available, which is not necessarily the code version used for the specific publication. By contrast, "Code (used)" refers to the situation in which the exact code used to produce the simulation results is made publicly available. "Diagnostics" refer to the full set of analysis scripts used to obtain the figures and results presented in the published paper.
  • Figure 2: Current workflow from initial scientific idea to publication. The code, simulation setup, and results in full detail are often only documented in the PhD theses, while journal publications provide a more concise, result-focused and therefore often incomplete version.
  • Figure 3: Workflow including the essential components towards a more FAIR-compliant process for astrophysics simulations.
  • Figure 4: Minimalistic workflow for publishing data directly on high-performance computing (HPC) systems without moving them Hachinger:2025, where metadata are stored alongside the data, ingested into a database, exposed via landing pages, and linked with EUDAT-B2HANDLE persistent identifiers (PIDs) and mass-data transfer mechanisms to enable third-party access and reuse.