Table of Contents
Fetching ...

Scalable Reinforcement Learning-based Neural Architecture Search

Amber Cassimon, Siegfried Mercelis, Kevin Mets

TL;DR

This paper tackles the scalability bottleneck in Neural Architecture Search (NAS) by learning to search architectures through reinforcement learning, rather than producing a single optimal model. It casts NAS as an incremental decision process over graphs of architectures with an explicit MDP formulation, neighbor-generation rules, reward shaping, and a transformer-based Ape-X Q-learning agent. Evaluations on NAS-Bench-101 (tabular) and NAS-Bench-301 (predictive) show that the approach scales to very large search spaces and can outperform baselines at low query budgets, though it becomes less robust to hyperparameter choices and slower to train on larger benchmarks. The work highlights the potential of re-usability of the learned search policy and the need for faster performance estimators to make RL-based NAS practical in real-world scenarios.

Abstract

In this publication, we assess the ability of a novel Reinforcement Learning-based solution to the problem of Neural Architecture Search, where a Reinforcement Learning (RL) agent learns to search for good architectures, rather than to return a single optimal architecture. We consider both the NAS-Bench-101 and NAS- Bench-301 settings, and compare against various known strong baselines, such as local search and random search. We conclude that our Reinforcement Learning agent displays strong scalability with regards to the size of the search space, but limited robustness to hyperparameter changes.

Scalable Reinforcement Learning-based Neural Architecture Search

TL;DR

This paper tackles the scalability bottleneck in Neural Architecture Search (NAS) by learning to search architectures through reinforcement learning, rather than producing a single optimal model. It casts NAS as an incremental decision process over graphs of architectures with an explicit MDP formulation, neighbor-generation rules, reward shaping, and a transformer-based Ape-X Q-learning agent. Evaluations on NAS-Bench-101 (tabular) and NAS-Bench-301 (predictive) show that the approach scales to very large search spaces and can outperform baselines at low query budgets, though it becomes less robust to hyperparameter choices and slower to train on larger benchmarks. The work highlights the potential of re-usability of the learned search policy and the need for faster performance estimators to make RL-based NAS practical in real-world scenarios.

Abstract

In this publication, we assess the ability of a novel Reinforcement Learning-based solution to the problem of Neural Architecture Search, where a Reinforcement Learning (RL) agent learns to search for good architectures, rather than to return a single optimal architecture. We consider both the NAS-Bench-101 and NAS- Bench-301 settings, and compare against various known strong baselines, such as local search and random search. We conclude that our Reinforcement Learning agent displays strong scalability with regards to the size of the search space, but limited robustness to hyperparameter changes.
Paper Structure (33 sections, 21 figures)

This paper contains 33 sections, 21 figures.

Figures (21)

  • Figure 1: An example of a "operations-on-edges" architecture (left) converted to a "operations-on-nodes" representation (right). All edges with associated operations are converted to nodes. Each of these nodes is given an in-edge from the source of the original edge, and an out-edge to the destination of the original edge. The nodes in the original architecture are replaced by reduction operations, a summation in this case. Finally, as was the case before, all reduction operations are given an edge to the output operation.
  • Figure 2: Vertex Removal Process. 1) A graph consisting of 5 vertices, where we want to remove the vertex with index 2. 2) After removing vertex 2, we generate all possible edges connecting the source of in-edges to vertex 2 to the destination of out-edges to vertex 2. 3a) An invalid selection of generated edges, leaving vertex 1 with an out-degree of 0. 3b) A valid selection of edges, leaving none of the other vertices disconnected from the graph.
  • Figure 3: Histogram of the validation accuracy of the architectures included in NAS-Bench-101, across all random initializations. The original accuracy distirbution is shown in blue, while the distribution after reward shaping is shown in red. Note the logarithmic Y-axis.
  • Figure 4: The reward shaping that was used in this paper. Experiments on NAS-Bench-101 used $\alpha=6$, experiments on NAS-Bench-301 used $\alpha=32$. Other values were used in ablation studies.
  • Figure 5: Histogram of the number of vertices and edges in a random sample of 10000 architectures, compared between a uniform sample from the NAS-Bench-101 dataset, the "Random Cell Adj" sampler from the BANANAS repository, and our sampler. Note the logarithmic Y-axis.
  • ...and 16 more figures