Graph HyperNetworks for Neural Architecture Search
Chris Zhang, Mengye Ren, Raquel Urtasun
TL;DR
The paper introduces Graph HyperNetwork (GHN), a framework that learns to generate all weights of unseen CNN architectures directly from their computation graphs, thereby amortizing the neural Architecture search process. By combining a graph neural network with a shared hypernetwork, GHN captures topology-aware weight generation and enables rapid evaluation of thousands of architectures. Experiments on CIFAR-10 and ImageNet-Mobile show GHN achieves competitive NAS performance with about 10x faster search than random baselines and extends naturally to anytime prediction to optimize speed-accuracy tradeoffs. Ablation studies confirm the benefits of forward-backward message passing, motif sharing, and smaller training graphs, and the approach generalizes to multi-scale, budget-aware deployments. Overall, GHN provides a scalable, topology-aware surrogate for NAS that can substantially reduce computational costs while delivering strong performance.
Abstract
Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. GHNs model the topology of an architecture and therefore can predict network performance more accurately than regular hypernetworks and premature early stopping. To perform NAS, we randomly sample architectures and use the validation accuracy of networks with GHN generated weights as the surrogate search signal. GHNs are fast -- they can search nearly 10 times faster than other random search methods on CIFAR-10 and ImageNet. GHNs can be further extended to the anytime prediction setting, where they have found networks with better speed-accuracy tradeoff than the state-of-the-art manual designs.
