Towards FAIR Astrophysical Simulations
Susanne Pfalzner, Stephan Hachinger, Jolanta Zjupa, Salvatore Cielo, Frank W. Wagner, Marcus Brüggen, Annika Hagemeier
TL;DR
The paper addresses the reproducibility crisis in theoretical astrophysics by arguing that FAIR data alone is insufficient for reproducibility in HPC-driven simulations. It surveys current data management practices, identifies barriers, and outlines a practical path toward FAIR and reproducible simulations, emphasizing metadata, provenance, and persistent identifiers. Key contributions include a framework linking FAIR principles to reproducibility, a review of tooling and workflows, and culturally informed recommendations to support small research groups. The work aims to catalyze community discussions and the adoption of low-threshold, HPC-friendly solutions that enable transparent, reusable, and verifiable simulation results with minimal barriers to entry.
Abstract
Reproducibility is a cornerstone of science. FAIR (findable, accessible, interoperable, and reusable) data is often a vital step towards testing the reproducibility of results. The implementation of FAIR principles in the astrophysical simulation community is still varied. We approach the discussion of this topic mainly from a high-performance computing (HPC) point of view. We identify the main obstacles to FAIR astrophysics simulations: First, the vast datasets created in simulations on HPC facilities complicate FAIR data management. Second, missing incentives to fully share codes, results, and diagnostic data. Third, a lack of workflows that include data publication and technical support. Therefore, particularly smaller research groups struggle due to the unavailability of dedicated personnel and time in their efforts towards FAIR and open simulations. We propose actionable steps towards achieving ``FAIRer'' data and open source publication standards in numerical astrophysics. Our suggestions include low-threshold methods to fulfil the basic FAIR requirements as well as basic tools for FAIR (meta-)data generation and data/code publication. This work is a high-level overview intended to initiate discussions within the community, offering initial solutions to these challenges.
