Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning
Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter
TL;DR
<3-5 sentence high-level summary> Auto-sklearn 2.0 advances hands-free AutoML by marrying a portfolio-based warmstart with budget-aware evaluation and a learned policy selector that automates high-level design decisions per dataset. It introduces PoSH Auto-sklearn (portfolio + successive halving) for fast, robust performance under tight budgets and then extends to a fully automated AutoML system that selects optimization policies using meta-learning. Empirical results on 39 AutoML benchmarks show substantial reductions in error compared to Auto-sklearn 1.0 and competitive standing against other frameworks, highlighting strong gains in both short and longer time horizons. The work lays a practical foundation for scalable, automatic AutoML and points to future directions in adaptive budgeting, richer meta-features, and broader policy spaces.
Abstract
Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour.
