Don't Waste Your Time: Early Stopping Cross-Validation
Edward Bergman, Lennart Purucker, Frank Hutter
TL;DR
This paper investigates reducing the computational burden of cross-validation in AutoML for tabular data by introducing two simple early stopping strategies for inner cross-validation during model selection. Through an extensive study on MLP and RF across 36 datasets and multiple fold configurations, the authors show that a forgiving early stopping approach consistently speeds up convergence and expands the explored search space, often improving overall performance, while aggressive stopping can be unreliable. They further explore the approach with Bayesian optimization and repeated cross-validation, finding that Forgiving generally maintains gains and can outperform No ES under several conditions. The work contributes a practical, easy-to-implement framework for early stopping cross-validation, analyzes its interaction with common AutoML workflows, and highlights future research directions, including integration into BO and multi-fidelity approaches. Overall, the findings suggest that simple, robust early stopping can significantly enhance model selection efficiency in AutoML without compromising performance in many scenarios, with potential implications for efficiency and sustainability in AI systems.
Abstract
State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.
