Table of Contents
Fetching ...

Reproducible container solutions for codes and workflows in materials science

Dylan Bissuel, Léo Orveillon, Benjamin Arrondeau, Paulo Almeida De Mendonça, Irina Piazza, Martin Uhrin, Étienne Polack, Akshay Krishna Ammothum Kandy, David Martin-Calle, Jonathan Chapignac, Aadhityan Arivazhagan, Lorenzo Paulatto, Pierre-Antoine Bouttier, M. -I Richard, Thierry Deutsch, David Rodney, A M Saitta, Nöel Jakse

TL;DR

The paper addresses the challenge of maintaining complex, cross-facility software stacks for materials science in a reproducible and portable form. It proposes a reproducible computing platform that combines the GNU Guix functional package manager with the Apptainer container system, enabling fully declarative environments and near-native performance on HPC systems. The authors demonstrate the approach through AiiDA/VASP AIMD workflows, path-integral molecular dynamics with MACE, ML interatomic-potential training, and Ewoks-driven analysis of X-ray diffraction data, all within a unified, provenance-aware framework. This work advances autonomous, data-driven materials research by providing a community-friendly, long-term-stable infrastructure (the DIAMOND platform) that can be easily shared and extended across groups and facilities.

Abstract

A computing solution combining the GNU Guix functional package manager with the Apptainer container system is presented. This approach provides fully declarative and reproducible software environments suitable for computational materials science. Its versatility and performance enable the construction of complete frameworks integrating workflow managers such as AiiDA, and Ewoks that can be deployed on HPC infrastructures. The efficiency of the solution is illustrated through several examples: (i) AiiDA workflows for automated dataset construction and analysis as well as path-integral molecular dynamics based on ab initio calculations; (ii) workflows for the training of machine-learning interatomic potentials; and (iii) an Ewoks workflow for the automated analysis of coherent X-ray diffraction data in large-scale synchrotron facilities. These examples demonstrate that the proposed environment provides a reliable and reproducible basis for computational and data-driven research in materials science.

Reproducible container solutions for codes and workflows in materials science

TL;DR

The paper addresses the challenge of maintaining complex, cross-facility software stacks for materials science in a reproducible and portable form. It proposes a reproducible computing platform that combines the GNU Guix functional package manager with the Apptainer container system, enabling fully declarative environments and near-native performance on HPC systems. The authors demonstrate the approach through AiiDA/VASP AIMD workflows, path-integral molecular dynamics with MACE, ML interatomic-potential training, and Ewoks-driven analysis of X-ray diffraction data, all within a unified, provenance-aware framework. This work advances autonomous, data-driven materials research by providing a community-friendly, long-term-stable infrastructure (the DIAMOND platform) that can be easily shared and extended across groups and facilities.

Abstract

A computing solution combining the GNU Guix functional package manager with the Apptainer container system is presented. This approach provides fully declarative and reproducible software environments suitable for computational materials science. Its versatility and performance enable the construction of complete frameworks integrating workflow managers such as AiiDA, and Ewoks that can be deployed on HPC infrastructures. The efficiency of the solution is illustrated through several examples: (i) AiiDA workflows for automated dataset construction and analysis as well as path-integral molecular dynamics based on ab initio calculations; (ii) workflows for the training of machine-learning interatomic potentials; and (iii) an Ewoks workflow for the automated analysis of coherent X-ray diffraction data in large-scale synchrotron facilities. These examples demonstrate that the proposed environment provides a reliable and reproducible basis for computational and data-driven research in materials science.

Paper Structure

This paper contains 19 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Schematic representation of the three repositories maintained and their interactions in the creation of fully reproducible container images. All repositories are publicly available on GRICAD's GitLab forge gricad-gitlab.
  • Figure 2: LAMMPS benchmark: Comparison of execution runtimes between the local installation, the Guix-packed version, and the Apptainer-containerized version of the DIAMOND platform Diamond.
  • Figure 3: N2P2 benchmark: (a) Throughput of training runs as a function of dataset size for 32 MPI processes; (b) total time for 100 epochs using 51702.0 structures as a function of the number of MPI processes for the normal installation and the Guix-packed Apptainer containerization on the Dahu machine; (b) absolute total time as a function of the number of MPI processes ($\log_2$--$\log_2$ scale) and corresponding efficiency $R$, plotted as in (c).
  • Figure 4: Schematic representation of the DIAMOND VASP workflow modules. (WF1) Inherent-structure extraction; (WF2) equilibrium volume estimation via EOS fitting; (WF3) integrated data-generation workflow for machine-learning force-field training.
  • Figure 5: Example of results obtained with the automated VASP-based AiiDA workflow: (a) Statistical (a.1) and Spearman correlation (a.2) analyses of the local atomic arrangements in Al$_{97}$Zr$_{3}$ liquid alloys at $T = \qty{1500}{\kelvin}$ using WF1. The boxplot shows the distribution of the CNA signatures (a.4) for bonded-pair clusters such as 555, 544, 433, and 422 faken1994systematic. The Spearman correlation matrices for Al and Zr indicate the dominant local ordering, consistent with the structural snapshot in (a.3). (b) Temperature dependence of the self-diffusion coefficient $D$ in liquid indium, shown as an Arrhenius plot (b.1). Diffusion coefficients are obtained from the slope of the mean-square displacement (b.2); a plateau in the MSD at the lowest temperature indicates partial crystallisation visible in the structure in (b.3), coloured as function of the Effective Coordination Number (ECoN)Hoppe1979.
  • ...and 2 more figures