How to avoid machine learning pitfalls: a guide for academic researchers
Michael A. Lones
TL;DR
The paper addresses widespread ML pitfalls in academic research, from data handling to reporting, and offers practical guardrails. It presents a structured, annually updated Dos and Donts guide spanning data use, model building, evaluation, fair comparison, and reporting. Key contributions include concrete practices such as independent test sets, nested cross-validation for hyperparameter tuning, meaningful baselines, multi-metric reporting, and fairness checks, with emphasis on transparency and reproducibility. The guidance aims to improve robustness, trust, and real-world impact of ML research by making methodological rigor feasible for researchers.
Abstract
Mistakes in machine learning practice are commonplace, and can result in a loss of confidence in the findings and products of machine learning. This guide outlines common mistakes that occur when using machine learning, and what can be done to avoid them. Whilst it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results.
