Global optimization of graph acquisition functions for neural architecture search
Yilin Xie, Shiqiang Zhang, Jixiang Qing, Ruth Misener, Calvin Tsay
TL;DR
This work tackles the challenge of globally optimizing graph-structured architectures in neural architecture search by introducing NAS-GOAT, a framework that encodes the graph search space into a variable space enriched with reachability, distances, and shortest-path information. The graph encoding yields a bijection with the graph space and supports NAS-specific constraints to model DAGs with one source and one sink, for both node-labeled and edge-labeled cases. Graph kernels are designed to combine structure and label information, enabling GP surrogates within a graph-BO setup, while a MIP formulation globally optimizes the acquisition function at each NAS iteration. Empirical results on NAS-Bench-101 and NAS-Bench-201 demonstrate that NAS-GOAT can efficiently identify high-performing architectures and often outperforms established baselines in deterministic settings, with competitive performance under noise, highlighting the practical value of exact acquisition optimization for graph-based NAS.
Abstract
Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.
