An Open Source AutoML Benchmark
Pieter Gijsbers, Erin LeDell, Janek Thomas, Sébastien Poirier, Bernd Bischl, Joaquin Vanschoren
TL;DR
AutoML benchmarking has suffered from non-standardized datasets and inconsistent evaluation. This work introduces an open-source, extensible benchmark framework with a public results site to enable ongoing, fair comparisons across AutoML systems. It evaluates four AutoML tools across $39$ classification datasets using standardized resource budgets and evaluation metrics, revealing that no tool consistently outperforms a tuned Random Forest and that performance varies by dataset. The framework also discusses meta-learning fairness and future directions, emphasizing reproducibility and practical impact for AutoML research.
Abstract
In recent years, an active field of research has developed around automated machine learning (AutoML). Unfortunately, comparing different AutoML systems is hard and often done incorrectly. We introduce an open, ongoing, and extensible benchmark framework which follows best practices and avoids common mistakes. The framework is open-source, uses public datasets and has a website with up-to-date results. We use the framework to conduct a thorough comparison of 4 AutoML systems across 39 datasets and analyze the results.
