Table of Contents
Fetching ...

From Guidelines to Practice: Evaluating the Reproducibility of Methods in Computational Social Science

Fakhri Momeni, Sarah Sajid, Johannes Kiesel

TL;DR

This study presents a systematic evaluation of reproducibility across three conditions: uncurated documentation, curated documentation, and curated documentation paired with a preset execution environment, demonstrating that reproducibility barriers are multi-layered and require coordinated improvements in documentation quality, environment stability, and conceptual clarity.

Abstract

Reproducibility remains a central challenge in computational social science, where complex workflows, evolving software ecosystems, and inconsistent documentation hinder researchers ability to re-execute published methods. This study presents a systematic evaluation of reproducibility across three conditions: uncurated documentation, curated documentation, and curated documentation paired with a preset execution environment. Using 47 usability test sessions, we combine behavioral performance indicators (success rates, task time, and error profiles) with questionnaire data and thematic analysis to identify technical and conceptual barriers to reproducibility. Curated documentation substantially reduced repository-level errors and improved users ability to interpret method outputs. Standardizing the execution environment further improved reproducibility, yielding the highest success rate and shortest task completion times. Across conditions, participants frequently relied on AI tools for troubleshooting, often enabling independent resolution of issues without facilitator intervention. Our findings demonstrate that reproducibility barriers are multi-layered and require coordinated improvements in documentation quality, environment stability, and conceptual clarity. We discuss implications for the design of reproducibility platforms and infrastructure in computational social science.

From Guidelines to Practice: Evaluating the Reproducibility of Methods in Computational Social Science

TL;DR

This study presents a systematic evaluation of reproducibility across three conditions: uncurated documentation, curated documentation, and curated documentation paired with a preset execution environment, demonstrating that reproducibility barriers are multi-layered and require coordinated improvements in documentation quality, environment stability, and conceptual clarity.

Abstract

Reproducibility remains a central challenge in computational social science, where complex workflows, evolving software ecosystems, and inconsistent documentation hinder researchers ability to re-execute published methods. This study presents a systematic evaluation of reproducibility across three conditions: uncurated documentation, curated documentation, and curated documentation paired with a preset execution environment. Using 47 usability test sessions, we combine behavioral performance indicators (success rates, task time, and error profiles) with questionnaire data and thematic analysis to identify technical and conceptual barriers to reproducibility. Curated documentation substantially reduced repository-level errors and improved users ability to interpret method outputs. Standardizing the execution environment further improved reproducibility, yielding the highest success rate and shortest task completion times. Across conditions, participants frequently relied on AI tools for troubleshooting, often enabling independent resolution of issues without facilitator intervention. Our findings demonstrate that reproducibility barriers are multi-layered and require coordinated improvements in documentation quality, environment stability, and conceptual clarity. We discuss implications for the design of reproducibility platforms and infrastructure in computational social science.
Paper Structure (43 sections, 3 figures, 2 tables)

This paper contains 43 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Task completion (reproducibility) rates across reproduction conditions (larger is better).
  • Figure 2: Average task completion time across reproduction conditions (smaller is better).
  • Figure 3: Distribution of errors by codes across reproduction conditions (smaller is better).