Table of Contents
Fetching ...

Global optimization of graph acquisition functions for neural architecture search

Yilin Xie, Shiqiang Zhang, Jixiang Qing, Ruth Misener, Calvin Tsay

TL;DR

This work tackles the challenge of globally optimizing graph-structured architectures in neural architecture search by introducing NAS-GOAT, a framework that encodes the graph search space into a variable space enriched with reachability, distances, and shortest-path information. The graph encoding yields a bijection with the graph space and supports NAS-specific constraints to model DAGs with one source and one sink, for both node-labeled and edge-labeled cases. Graph kernels are designed to combine structure and label information, enabling GP surrogates within a graph-BO setup, while a MIP formulation globally optimizes the acquisition function at each NAS iteration. Empirical results on NAS-Bench-101 and NAS-Bench-201 demonstrate that NAS-GOAT can efficiently identify high-performing architectures and often outperforms established baselines in deterministic settings, with competitive performance under noise, highlighting the practical value of exact acquisition optimization for graph-based NAS.

Abstract

Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.

Global optimization of graph acquisition functions for neural architecture search

TL;DR

This work tackles the challenge of globally optimizing graph-structured architectures in neural architecture search by introducing NAS-GOAT, a framework that encodes the graph search space into a variable space enriched with reachability, distances, and shortest-path information. The graph encoding yields a bijection with the graph space and supports NAS-specific constraints to model DAGs with one source and one sink, for both node-labeled and edge-labeled cases. Graph kernels are designed to combine structure and label information, enabling GP surrogates within a graph-BO setup, while a MIP formulation globally optimizes the acquisition function at each NAS iteration. Empirical results on NAS-Bench-101 and NAS-Bench-201 demonstrate that NAS-GOAT can efficiently identify high-performing architectures and often outperforms established baselines in deterministic settings, with competitive performance under noise, highlighting the practical value of exact acquisition optimization for graph-based NAS.

Abstract

Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.

Paper Structure

This paper contains 23 sections, 2 theorems, 34 equations, 6 figures, 3 tables.

Key Result

Theorem 1

There is a bijection between the feasible domain restricted by Eq. eq:final_MIP with size $[n_0,n]$ and the whole graph space with node numbers in $[n_0,n]$.

Figures (6)

  • Figure 1: Illustration of NAS-GOAT. The main idea is to represent graphs in variable space and introduce constraints to build a bijection between all graphs and the feasible domain. The graph kernel value between an unknown graph (which is our optimization target) and a given graph is then formulated as expressions of variables, or constraints, enabling us to employ global optimization for acquisition function and propose the next neural architecture to evaluate.
  • Figure 2: Predictive performance of graph GPs with different kernels. 50 and 400 architectures are randomly sampled from NAS-Bench-201 for training and testing, resp. Predicted deterministic validation error are plotted against the true values, with one standard deviation as error bars.
  • Figure 3: Numerical results of Graph BO on NAS-Bench-101 (N101) and NAS-Bench-201 (N201). (Top) Deterministic validation error. (Bottom) The corresponding test error. Median with one standard deviation over 20 replications is plotted.
  • Figure 4: Predictive performance of graph GPs with different kernels. 50 and 400 architectures are randomly sampled from NAS-Bench-101 for training and testing resp. Predicted deterministic validation error are plotted against the true values, with one standard deviation as error bars.
  • Figure 5: Comparison NAS-GOAT with the remaining baselines. Numerical results of Graph BO on NAS-Bench-101 (N101) and NAS-Bench-201 (N201). (Top) Deterministic validation error. (Bottom) The corresponding test error. Median with one standard deviation over 20 replications is plotted.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1