Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

Daniel Vranješ; Oliver Niggemann

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

Daniel Vranješ, Oliver Niggemann

TL;DR

This work addresses the lack of standardization in empirical ML research, focusing on falsifiability, replicability, reproducibility, and generalizability. It introduces a linear-but-iterative experimental process model that begins with clearly defined hypotheses and proceeds through design, execution, analysis, and publication, linking results directly to the hypotheses via $H_0: \phi(X, C, Y) = 0$ and $H_a: \phi(X, C, Y) \neq 0$. Key contributions include formal definitions of hypothesis types, a structured experimental design framework, a modular execution approach with traceability, and a comprehensive statistical analysis and documentation protocol aligned with FAIR data principles, plus a practical checklist for researchers. By standardizing practices across hypotheses, variables, datasets, and reporting, the paper aims to enhance reproducibility, reliability, and impact in empirical ML, with strong emphasis on inter-subjectivity and robustness in evaluation.

Abstract

Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

TL;DR

and

. Key contributions include formal definitions of hypothesis types, a structured experimental design framework, a modular execution approach with traceability, and a comprehensive statistical analysis and documentation protocol aligned with FAIR data principles, plus a practical checklist for researchers. By standardizing practices across hypotheses, variables, datasets, and reporting, the paper aims to enhance reproducibility, reliability, and impact in empirical ML, with strong emphasis on inter-subjectivity and robustness in evaluation.

Abstract

Paper Structure (10 sections, 4 equations, 1 figure, 1 table)

This paper contains 10 sections, 4 equations, 1 figure, 1 table.

Introduction
State of the Art
Experimental Process
Formulation of Hypotheses
Experiment Design
Experiment Execution
Statistical Data Analysis
Documentation and Publication
Conclusion and Outlook
Checklist

Figures (1)

Figure 1: Research process model with process phases and generated artifacts

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

TL;DR

Abstract

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

Authors

TL;DR

Abstract

Table of Contents

Figures (1)