Table of Contents
Fetching ...

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

Thomas Elsken, Jan Hendrik Metzen, Frank Hutter

TL;DR

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution introduces LEMONADE, an evolutionary NAS framework that uses Lamarckian inheritance via network morphisms to warm-start offspring, enabling efficient exploration of large architecture spaces under multiple objectives. It distinguishes cheap (parameters, FLOPs) and expensive (validation accuracy) objectives using KDE-guided two-stage sampling to focus resources on promising regions of the Pareto front. The method supports arbitrary search spaces, including full architectures and repeatable cells, and demonstrates competitive results on CIFAR-10 and transferable performance to ImageNet64x64 and mobile ImageNet with substantially less compute than prior NAS methods. The work advances practical automations for model discovery under resource constraints by returning a Pareto set rather than a single optimum.

Abstract

Neural Architecture Search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have achieved state-of-the-art predictive performance for image recognition, they are problematic under resource constraints for two reasons: (1)the neural architectures found are solely optimized for high predictive performance, without penalizing excessive resource consumption, (2) most architecture search methods require vast computational resources. We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the entire Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. We address the second shortcoming by proposing a Lamarckian inheritance mechanism for LEMONADE which generates children networks that are warmstarted with the predictive performance of their trained parents. This is accomplished by using (approximate) network morphism operators for generating children. The combination of these two contributions allows finding models that are on par or even outperform both hand-crafted as well as automatically-designed networks.

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

TL;DR

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution introduces LEMONADE, an evolutionary NAS framework that uses Lamarckian inheritance via network morphisms to warm-start offspring, enabling efficient exploration of large architecture spaces under multiple objectives. It distinguishes cheap (parameters, FLOPs) and expensive (validation accuracy) objectives using KDE-guided two-stage sampling to focus resources on promising regions of the Pareto front. The method supports arbitrary search spaces, including full architectures and repeatable cells, and demonstrates competitive results on CIFAR-10 and transferable performance to ImageNet64x64 and mobile ImageNet with substantially less compute than prior NAS methods. The work advances practical automations for model discovery under resource constraints by returning a Pareto set rather than a single optimum.

Abstract

Neural Architecture Search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have achieved state-of-the-art predictive performance for image recognition, they are problematic under resource constraints for two reasons: (1)the neural architectures found are solely optimized for high predictive performance, without penalizing excessive resource consumption, (2) most architecture search methods require vast computational resources. We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the entire Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. We address the second shortcoming by proposing a Lamarckian inheritance mechanism for LEMONADE which generates children networks that are warmstarted with the predictive performance of their trained parents. This is accomplished by using (approximate) network morphism operators for generating children. The combination of these two contributions allows finding models that are on par or even outperform both hand-crafted as well as automatically-designed networks.

Paper Structure

This paper contains 30 sections, 9 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: Conceptual illustration of LEMONADE. (Left) LEMONADE maintains a population of trained networks that constitute a Pareto front in the multi-objective space. Parents are selected from the population inversely proportional to their density. Children are generated by mutation operators with Lamarckian inheritance that are realized by network morphisms and approximate network morphisms. NM operators generate children with the same initial error as their parent. In contrast, children generated with ANM operators may incur a (small) increase in error compared to their parent. However, their initial error is typically still very small. (Right) Only a subset of the generated children is accepted for training. After training, the performance of the children is evaluated and the population is updated to be the Pareto front.
  • Figure 2: Progress of the Pareto front of LEMONADE during architecture search. The Pareto front gets more and more densely settled over the course of time. Very large models found (e.g., in generation 25) are discarded in a later generation as smaller, better ones are discovered. Note: generation 1 denotes the generation after one iteration of LEMONADE.
  • Figure 3: Comparison of LEMONADE with NASNet and MobileNet V2. LEMONADE optimized five objectives: performance on CIFAR-10 (x-axis in all plots), performance on CIFAR-100 (top left), number of parameters (top right), number of multiply add operations (bottom left) and inference time (bottom right, measured in seconds on a Titan X GPU).
  • Figure 4: Transferring the cells discovered on CIFAR-10 to ImageNet64x64. A single Cell, namely Cell 2, outperforms all baselines. Utilizing 5 different cells (red line) further improves the results.
  • Figure 5: Performance on CIFAR-10 test data of models that have been trained under identical conditions.
  • ...and 9 more figures