Table of Contents
Fetching ...

Source Code Archiving to the Rescue of Reproducible Deployment

Ludovic Courtès, Timothy Sample, Simon Tournier, Stefano Zacchiroli

TL;DR

The paper tackles the challenge of reproducible deployment by ensuring long-term access to package source code. It introduces a bridge between Guix and the Software Heritage archive and a Disarchive tool to reconstruct tarballs, enabling automatic source-code recovery and verification. Over five years, the approach achieves substantial archival coverage (about 85-90% of sources with SWHIDs) and demonstrates that most sources remain retrievable, strengthening the reliability of time-travel deployments. The work offers a practical pathway for deploying computational experiments with provable provenance and persistent access to source code, with broader implications for scientific workflows and deployment tools.

Abstract

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.

Source Code Archiving to the Rescue of Reproducible Deployment

TL;DR

The paper tackles the challenge of reproducible deployment by ensuring long-term access to package source code. It introduces a bridge between Guix and the Software Heritage archive and a Disarchive tool to reconstruct tarballs, enabling automatic source-code recovery and verification. Over five years, the approach achieves substantial archival coverage (about 85-90% of sources with SWHIDs) and demonstrates that most sources remain retrievable, strengthening the reliability of time-travel deployments. The work offers a practical pathway for deploying computational experiments with provable provenance and persistent access to source code, with broader implications for scientific workflows and deployment tools.

Abstract

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.
Paper Structure (18 sections, 8 figures, 2 tables)

This paper contains 18 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Package definitions of Python and Scikit-learn.
  • Figure 2: Populating the Software Heritage archive (orange arrows) and retrieving source code (blue arrows).
  • Figure 3: Relative high-level source types by sampled Guix commit.
  • Figure 4: Relative VCS source types by sampled Guix commit.
  • Figure 5: Disarchive tarball disassembly (orange arrows) takes a "tarball" as input and produces metadata along with a SWHID pointing to the tarball contents. Assembly (blue arrows) reconstructs the tarball by combining its metadata and its contents.
  • ...and 3 more figures