Automated Machine Learning: From Principles to Practices
Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, Quanming Yao
TL;DR
AutoML tackles the challenge of manually configuring complex ML systems by treating learning configurations as modular components that can be atomized and recombined. The authors formalize AutoML as a bi-level optimization problem, provide a theoretical error decomposition, and develop a taxonomy based on search space, search algorithm, and evaluation strategy. They survey representative methods across general/structured/transformed search spaces and across baselines, gradient-based, Bayesian, and RL-based optimizers, with applications to ML pipelines, one-shot NAS, and foundation-model workflows. The survey also outlines emerging directions in problem setups, techniques, theory, and applications, aiming to guide future AutoML research and practice.
Abstract
Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on this topic. We begin with the formal definition of AutoML and then introduce its principles, including the bi-level learning objective, the learning strategy, and the theoretical interpretation. Then, we summarize the AutoML practices by setting up the taxonomy of existing works based on three main factors: the search space, the search algorithm, and the evaluation strategy. Each category is also explained with the representative methods. Then, we illustrate the principles and practices with exemplary applications from configuring ML pipeline, one-shot neural architecture search, and integration with foundation models. Finally, we highlight the emerging directions of AutoML and conclude the survey.
