Towards an OSF-based Registered Report Template for Software Engineering Controlled Experiments
Ana B. M. Bett, Thais S. Nepomuceno, Edson OliveiraJr, Maria Teresa Baldassarre, Valdemar V. Graciano Neto, Marcos Kalinowski
TL;DR
The paper tackles the lack of rigor in reporting software engineering controlled experiments by advocating OSF-based Registered Reports to improve transparency, reproducibility, and bias control. It analyzes OSF RR types against Jedlitschka et al. (2008) guidelines, finding that no current type fully satisfies the guidelines, though RR.3 offers the broadest coverage. The authors propose an initial OSF-based RR template for SE, discuss its mapping to guidelines, and report lessons and limitations, including the need for SE-specific customization and artifact support. They present prospective actions to harmonize RR practices with SE workflows, artifact documentation, and ethical considerations, aiming to establish SE-centric RR guidelines and tooling. The work highlights both the potential of RRs to elevate research quality and the practical barriers to widespread adoption in software engineering.
Abstract
Context: The empirical software engineering (ESE) community has contributed to improving experimentation over the years. However, there is still a lack of rigor in describing controlled experiments, hindering reproducibility and transparency. Registered Reports (RR) have been discussed in the ESE community to address these issues. A RR registers a study's hypotheses, methods, and/or analyses before execution, involving peer review and potential acceptance before data collection. This helps mitigate problematic practices such as p-hacking, publication bias, and inappropriate post hoc analysis. Objective: This paper presents initial results toward establishing an RR template for Software Engineering controlled experiments using the Open Science Framework (OSF). Method: We analyzed templates of selected OSF RR types in light of documentation guidelines for controlled experiments. Results: The observed lack of rigor motivated our investigation of OSF-based RR types. Our analysis showed that, although one of the RR types aligned with many of the documentation suggestions contained in the guidelines, none of them covered the guidelines comprehensively. The study also highlights limitations in OSF RR template customization. Conclusion: Despite progress in ESE, planning and documenting experiments still lack rigor, compromising reproducibility. Adopting OSF-based RRs is proposed. However, no currently available RR type fully satisfies the guidelines. Establishing RR-specific guidelines for SE is deemed essential.
