Neural Architecture Search by Learning a Hierarchical Search Space
Mehraveh Javan Roshtkhari, Matthew Toews, Marco Pedersoli
TL;DR
The paper addresses the efficiency of Neural Architecture Search by improving the exploration strategy of Monte-Carlo Tree Search through a learned hierarchical search space. It proposes constructing this hierarchy by clustering architectures based on pairwise distances between their output vectors produced by a partially trained supernet, enabling semantically meaningful early splits in the search tree. Empirically, the method yields state-of-the-art or competitive results on CIFAR10 (Pooling and NAS-Bench-Macro) and ImageNet under constrained computational budgets, without the need for additional regularization. This approach enhances NAS practicality by accelerating convergence and improving final architecture quality through a data-driven tree structure that better guides exploration in the search space.
Abstract
Monte-Carlo Tree Search (MCTS) is a powerful tool for many non-differentiable search related problems such as adversarial games. However, the performance of such approach highly depends on the order of the nodes that are considered at each branching of the tree. If the first branches cannot distinguish between promising and deceiving configurations for the final task, the efficiency of the search is exponentially reduced. In Neural Architecture Search (NAS), as only the final architecture matters, the visiting order of the branching can be optimized to improve learning. In this paper, we study the application of MCTS to NAS for image classification. We analyze several sampling methods and branching alternatives for MCTS and propose to learn the branching by hierarchical clustering of architectures based on their similarity. The similarity is measured by the pairwise distance of output vectors of architectures. Extensive experiments on two challenging benchmarks on CIFAR10 and ImageNet show that MCTS, if provided with a good branching hierarchy, can yield promising solutions more efficiently than other approaches for NAS problems.
