Table of Contents
Fetching ...

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

Daniel Vranješ, Oliver Niggemann

TL;DR

This work addresses the lack of standardization in empirical ML research, focusing on falsifiability, replicability, reproducibility, and generalizability. It introduces a linear-but-iterative experimental process model that begins with clearly defined hypotheses and proceeds through design, execution, analysis, and publication, linking results directly to the hypotheses via $H_0: \phi(X, C, Y) = 0$ and $H_a: \phi(X, C, Y) \neq 0$. Key contributions include formal definitions of hypothesis types, a structured experimental design framework, a modular execution approach with traceability, and a comprehensive statistical analysis and documentation protocol aligned with FAIR data principles, plus a practical checklist for researchers. By standardizing practices across hypotheses, variables, datasets, and reporting, the paper aims to enhance reproducibility, reliability, and impact in empirical ML, with strong emphasis on inter-subjectivity and robustness in evaluation.

Abstract

Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research

TL;DR

This work addresses the lack of standardization in empirical ML research, focusing on falsifiability, replicability, reproducibility, and generalizability. It introduces a linear-but-iterative experimental process model that begins with clearly defined hypotheses and proceeds through design, execution, analysis, and publication, linking results directly to the hypotheses via and . Key contributions include formal definitions of hypothesis types, a structured experimental design framework, a modular execution approach with traceability, and a comprehensive statistical analysis and documentation protocol aligned with FAIR data principles, plus a practical checklist for researchers. By standardizing practices across hypotheses, variables, datasets, and reporting, the paper aims to enhance reproducibility, reliability, and impact in empirical ML, with strong emphasis on inter-subjectivity and robustness in evaluation.

Abstract

Empirical research plays a fundamental role in the machine learning domain. At the heart of impactful empirical research lies the development of clear research hypotheses, which then shape the design of experiments. The execution of experiments must be carried out with precision to ensure reliable results, followed by statistical analysis to interpret these outcomes. This process is key to either supporting or refuting initial hypotheses. Despite its importance, there is a high variability in research practices across the machine learning community and no uniform understanding of quality criteria for empirical research. To address this gap, we propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research. By embracing these recommendations, greater consistency, enhanced reliability and increased impact can be achieved.
Paper Structure (10 sections, 4 equations, 1 figure, 1 table)

This paper contains 10 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Research process model with process phases and generated artifacts