Table of Contents
Fetching ...

Navigating the Landscape of Reproducible Research: A Predictive Modeling Approach

Akhil Pandey Akella, Sagnik Ray Choudhury, David Koop, Hamed Alhoori

TL;DR

This work used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums.

Abstract

The reproducibility of scientific articles is central to the advancement of science. Despite this importance, evaluating reproducibility remains challenging due to the scarcity of ground truth data. Predictive models can address this limitation by streamlining the tedious evaluation process. Typically, a paper's reproducibility is inferred based on the availability of artifacts such as code, data, or supplemental information, often without extensive empirical investigation. To address these issues, we utilized artifacts of papers as fundamental units to develop a novel, dual-spectrum framework that focuses on author-centric and external-agent perspectives. We used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility. We explored the interdependencies between different factors influencing reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums. Our work provides a model-driven pathway for evaluating the reproducibility of scientific research. The code, methods, and artifacts for our study are publicly available at: https://github.com/reproducibilityproject/NLRR/

Navigating the Landscape of Reproducible Research: A Predictive Modeling Approach

TL;DR

This work used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums.

Abstract

The reproducibility of scientific articles is central to the advancement of science. Despite this importance, evaluating reproducibility remains challenging due to the scarcity of ground truth data. Predictive models can address this limitation by streamlining the tedious evaluation process. Typically, a paper's reproducibility is inferred based on the availability of artifacts such as code, data, or supplemental information, often without extensive empirical investigation. To address these issues, we utilized artifacts of papers as fundamental units to develop a novel, dual-spectrum framework that focuses on author-centric and external-agent perspectives. We used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility. We explored the interdependencies between different factors influencing reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums. Our work provides a model-driven pathway for evaluating the reproducibility of scientific research. The code, methods, and artifacts for our study are publicly available at: https://github.com/reproducibilityproject/NLRR/

Paper Structure

This paper contains 15 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Visualization of badge category overlaps for the scholarly articles in our dataset.
  • Figure 2: Joint framework to assess reproducibility levels in scientific papers.
  • Figure 3: Most important features for predicting labels in the author-centric, and external-agent frameworks.
  • Figure 4: Confidence calibration of $\phi_{\text{author}}$ Random Forest model, author-centric framework (left) and $\phi_{\text{external}}$ Random Forest model, external-agent centric framework (right).
  • Figure 5: Confidence calibration of $\phi_{\text{author}}$ Longformer-MLP model, author-centric framework (left) and $\phi_{\text{external}}$ Longformer-MLP model, external-agent centric framework (right).