Table of Contents
Fetching ...

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Tong Yu, Hong Zhu

TL;DR

Automated hyper-parameter optimization (HPO) is essential to scale deep learning beyond manual trial-and-error. The paper surveys key training and design hyper-parameters, and reviews a broad set of search algorithms and trial schedulers, including grid/random search, Bayesian optimization, TPE, SHA/HyperBand, ASHA, and PBT. It also compares widely used toolkits and cloud services (Vizier, SageMaker, NNI, Ray.Tune) and discusses evaluation strategies and practical extensions. The analysis provides guidance for practitioners on choosing appropriate HPO strategies under computational budgets and highlights ongoing challenges in efficiency, comparability, and integration with AutoML.

Abstract

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

Hyper-Parameter Optimization: A Review of Algorithms and Applications

TL;DR

Automated hyper-parameter optimization (HPO) is essential to scale deep learning beyond manual trial-and-error. The paper surveys key training and design hyper-parameters, and reviews a broad set of search algorithms and trial schedulers, including grid/random search, Bayesian optimization, TPE, SHA/HyperBand, ASHA, and PBT. It also compares widely used toolkits and cloud services (Vizier, SageMaker, NNI, Ray.Tune) and discusses evaluation strategies and practical extensions. The analysis provides guidance for practitioners on choosing appropriate HPO strategies under computational budgets and highlights ongoing challenges in efficiency, comparability, and integration with AutoML.

Abstract

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

Paper Structure

This paper contains 27 sections, 37 equations, 20 figures, 5 tables.

Figures (20)

  • Figure 1: Linear decay of learning rate with time-based (left) and drop-based (right) schedules
  • Figure 2: Exponential decay of learning rate
  • Figure 3: Learning rate decay in a cyclic schedule (source: https://github.com/bckenstler/CLR)
  • Figure 4: Effect of learning rate (adapted figure from: https://www.jeremyjordan.me/nn-learning-rate/)
  • Figure 5: (a) SGD without momentum; (b) SGD with momentum (Source: https://www.willamette.edu/ gorr/classes/cs449/momrate.html)
  • ...and 15 more figures