CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Ilya Loshchilov, Frank Hutter
TL;DR
The paper investigates using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as a derivative-free, parallel-friendly alternative to Bayesian optimization for hyperparameter tuning of deep neural networks. It benchmarks CMA-ES against Gaussian-process-based methods (Spearmint with EI and PES) and tree-based Bayesian optimizers (TPE, SMAC) on MNIST, leveraging 30 GPUs. Results show CMA-ES steadily improves validation performance, often achieving sub-0.4% error with substantial parallel budgets, while GP-based methods incur higher wall-clock costs due to their cubic scaling. The study suggests CMA-ES as a competitive component in hyperparameter optimization, especially in high-parallelism regimes, and provides releaseable code and supplementary material for reproducibility.
Abstract
Hyperparameters of deep neural networks are often optimized by grid search, random search or Bayesian optimization. As an alternative, we propose to use the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is known for its state-of-the-art performance in derivative-free optimization. CMA-ES has some useful invariance properties and is friendly to parallel evaluations of solutions. We provide a toy example comparing CMA-ES and state-of-the-art Bayesian optimization algorithms for tuning the hyperparameters of a convolutional neural network for the MNIST dataset on 30 GPUs in parallel.
