GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts
Sofia Casarin, Oswald Lanz, Sergio Escalera
TL;DR
This work tackles the challenge of transferring predictor-based NAS across datasets with distribution shifts. It introduces GRASP-GCN, a ranking Graph Convolutional Network that additionally ingests vertex shapes of architectures, and evaluates it on a Kronecker-product-based randomly wired search space trained across four image datasets. The authors construct a 2000-architecture NAS benchmark to study transferability and show that GRASP-GCN, especially when combined with vertex shapes and early stopping, achieves superior top-k ranking and robust generalization under distribution shifts, outperforming prior predictor-based methods on CIFAR-10 and transferring effectively to other datasets. The approach reduces NAS search time by enabling early stopping and demonstrates that more complex datasets yield more transferable rankings, offering practical improvements for multi-dataset NAS scenarios.
Abstract
Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way of improving their prediction performance when dealing with data distribution shifts. We exploit the Kronecker-product on the randomly wired search-space and create a small NAS benchmark composed of networks trained over four different datasets. To improve the generalization abilities, we propose GRASP-GCN, a ranking Graph Convolutional Network that takes as additional input the shape of the layers of the neural networks. GRASP-GCN is trained with the not-at-convergence accuracies, and improves the state-of-the-art of 3.3 % for Cifar-10 and increasing moreover the generalization abilities under data distribution shift.
