Table of Contents
Fetching ...

Automated Machine Learning: From Principles to Practices

Zhenqian Shen, Yongqi Zhang, Lanning Wei, Huan Zhao, Quanming Yao

TL;DR

AutoML tackles the challenge of manually configuring complex ML systems by treating learning configurations as modular components that can be atomized and recombined. The authors formalize AutoML as a bi-level optimization problem, provide a theoretical error decomposition, and develop a taxonomy based on search space, search algorithm, and evaluation strategy. They survey representative methods across general/structured/transformed search spaces and across baselines, gradient-based, Bayesian, and RL-based optimizers, with applications to ML pipelines, one-shot NAS, and foundation-model workflows. The survey also outlines emerging directions in problem setups, techniques, theory, and applications, aiming to guide future AutoML research and practice.

Abstract

Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on this topic. We begin with the formal definition of AutoML and then introduce its principles, including the bi-level learning objective, the learning strategy, and the theoretical interpretation. Then, we summarize the AutoML practices by setting up the taxonomy of existing works based on three main factors: the search space, the search algorithm, and the evaluation strategy. Each category is also explained with the representative methods. Then, we illustrate the principles and practices with exemplary applications from configuring ML pipeline, one-shot neural architecture search, and integration with foundation models. Finally, we highlight the emerging directions of AutoML and conclude the survey.

Automated Machine Learning: From Principles to Practices

TL;DR

AutoML tackles the challenge of manually configuring complex ML systems by treating learning configurations as modular components that can be atomized and recombined. The authors formalize AutoML as a bi-level optimization problem, provide a theoretical error decomposition, and develop a taxonomy based on search space, search algorithm, and evaluation strategy. They survey representative methods across general/structured/transformed search spaces and across baselines, gradient-based, Bayesian, and RL-based optimizers, with applications to ML pipelines, one-shot NAS, and foundation-model workflows. The survey also outlines emerging directions in problem setups, techniques, theory, and applications, aiming to guide future AutoML research and practice.

Abstract

Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this challenge, automated machine learning (AutoML) has emerged, which aims to generate satisfactory ML configurations for given tasks in a data-driven way. In this paper, we provide a comprehensive survey on this topic. We begin with the formal definition of AutoML and then introduce its principles, including the bi-level learning objective, the learning strategy, and the theoretical interpretation. Then, we summarize the AutoML practices by setting up the taxonomy of existing works based on three main factors: the search space, the search algorithm, and the evaluation strategy. Each category is also explained with the representative methods. Then, we illustrate the principles and practices with exemplary applications from configuring ML pipeline, one-shot neural architecture search, and integration with foundation models. Finally, we highlight the emerging directions of AutoML and conclude the survey.

Paper Structure

This paper contains 44 sections, 8 equations, 21 figures, 9 tables, 1 algorithm.

Figures (21)

  • Figure 1: A brief illustration of AutoML framework (take image classification as an example). For classical machine learning, solutions are designed by human experts to solve machine learning tasks. While for AutoML, it first atomizes the learning configurations and then recombines them to generate machine learning solutions. In this figure, time means inference time and comp. cost refer to computational cost.
  • Figure 2: General pipeline to handle AutoML. Here LEGO blocks refer to atomized learning configurations. The shape formed by building blocks refer to the machine learning solutions.
  • Figure 3: Comparison of error decomposition between classical machine learning and AutoML.
  • Figure 4: Taxonomies of current AutoML approaches.
  • Figure 5: Example for search space structured by directed acyclic graph for GNN.
  • ...and 16 more figures

Theorems & Definitions (4)

  • Definition 1: Machine learning tom1997machinemohri2018foundationszhou2021machine
  • Definition 2: AutoML
  • Remark 2.1
  • Example 1: CASH Problem thornton2013autofeurer2015efficienthutter2019automated