Let's Talk About It: Making Scientific Computational Reproducibility Easy

Lázaro Costa; Susana Barbosa; Jácome Cunha

Let's Talk About It: Making Scientific Computational Reproducibility Easy

Lázaro Costa, Susana Barbosa, Jácome Cunha

TL;DR

This work targets the reproducibility crisis in computational science by introducing SciConv, a conversational tool that automatically infers execution environments and dependencies to package experiments for re-execution via Docker. Through qualitative evaluation and a comparative user study against Code Ocean, SciConv demonstrates high usability and reduced workload, achieving reproducibility for most curated experiments and outperforming a professional platform on SUS and NASA TLX metrics. The study confirms the promise of natural-language interfaces in simplifying complex reproducibility workflows, while candidly discussing limitations with database usage, Jupyter support, and large-scale projects. Overall, SciConv represents a practical, AI-assisted approach to making computational reproducibility easier, more accessible across disciplines, with open avenues for broader support and automation enhancements.

Abstract

Computational reproducibility of scientific results, that is, the execution of a computational experiment (e.g., a script) using its original settings (data, code, etc.), should always be possible. However, reproducibility has become a significant challenge, as researchers often face difficulties in accurately replicating experiments due to inconsistencies in documentation, setup configurations, and missing data. This lack of reproducibility may undermine the credibility of scientific results. To address this issue, we propose a conversational, text-based tool that allows researchers to easily reproduce computational experiments (theirs or from others) and package them in a single file that can be re-executed with just a double click on any computer, requiring the installation of a single widely-used software. Researchers interact with the platform in natural language, which our tool processes to automatically create a computational environment able to execute the provided experiment/code. We conducted two studies to evaluate our proposal. In the first study, we gathered qualitative data by executing 18 experiments from the literature. Although in some cases it was not possible to execute the experiment, in most instances, it was necessary to have little or even no interaction for the tool to reproduce the results. We also conducted a user study comparing our tool with an enterprise-level one. During this study, we measured the usability of both tools using the System Usability Scale (SUS) and participants' workload using the NASA Task Load Index (TLX). The results show a statistically significant difference between both tools in favor of our proposal, demonstrating that the usability and workload of our tool are superior to the current state of the art.

Let's Talk About It: Making Scientific Computational Reproducibility Easy

TL;DR

Abstract

Let's Talk About It: Making Scientific Computational Reproducibility Easy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)