Reproducible container solutions for codes and workflows in materials science
Dylan Bissuel, Léo Orveillon, Benjamin Arrondeau, Paulo Almeida De Mendonça, Irina Piazza, Martin Uhrin, Étienne Polack, Akshay Krishna Ammothum Kandy, David Martin-Calle, Jonathan Chapignac, Aadhityan Arivazhagan, Lorenzo Paulatto, Pierre-Antoine Bouttier, M. -I Richard, Thierry Deutsch, David Rodney, A M Saitta, Nöel Jakse
TL;DR
The paper addresses the challenge of maintaining complex, cross-facility software stacks for materials science in a reproducible and portable form. It proposes a reproducible computing platform that combines the GNU Guix functional package manager with the Apptainer container system, enabling fully declarative environments and near-native performance on HPC systems. The authors demonstrate the approach through AiiDA/VASP AIMD workflows, path-integral molecular dynamics with MACE, ML interatomic-potential training, and Ewoks-driven analysis of X-ray diffraction data, all within a unified, provenance-aware framework. This work advances autonomous, data-driven materials research by providing a community-friendly, long-term-stable infrastructure (the DIAMOND platform) that can be easily shared and extended across groups and facilities.
Abstract
A computing solution combining the GNU Guix functional package manager with the Apptainer container system is presented. This approach provides fully declarative and reproducible software environments suitable for computational materials science. Its versatility and performance enable the construction of complete frameworks integrating workflow managers such as AiiDA, and Ewoks that can be deployed on HPC infrastructures. The efficiency of the solution is illustrated through several examples: (i) AiiDA workflows for automated dataset construction and analysis as well as path-integral molecular dynamics based on ab initio calculations; (ii) workflows for the training of machine-learning interatomic potentials; and (iii) an Ewoks workflow for the automated analysis of coherent X-ray diffraction data in large-scale synchrotron facilities. These examples demonstrate that the proposed environment provides a reliable and reproducible basis for computational and data-driven research in materials science.
