Table of Contents
Fetching ...

GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts

Sofia Casarin, Oswald Lanz, Sergio Escalera

TL;DR

This work tackles the challenge of transferring predictor-based NAS across datasets with distribution shifts. It introduces GRASP-GCN, a ranking Graph Convolutional Network that additionally ingests vertex shapes of architectures, and evaluates it on a Kronecker-product-based randomly wired search space trained across four image datasets. The authors construct a 2000-architecture NAS benchmark to study transferability and show that GRASP-GCN, especially when combined with vertex shapes and early stopping, achieves superior top-k ranking and robust generalization under distribution shifts, outperforming prior predictor-based methods on CIFAR-10 and transferring effectively to other datasets. The approach reduces NAS search time by enabling early stopping and demonstrates that more complex datasets yield more transferable rankings, offering practical improvements for multi-dataset NAS scenarios.

Abstract

Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way of improving their prediction performance when dealing with data distribution shifts. We exploit the Kronecker-product on the randomly wired search-space and create a small NAS benchmark composed of networks trained over four different datasets. To improve the generalization abilities, we propose GRASP-GCN, a ranking Graph Convolutional Network that takes as additional input the shape of the layers of the neural networks. GRASP-GCN is trained with the not-at-convergence accuracies, and improves the state-of-the-art of 3.3 % for Cifar-10 and increasing moreover the generalization abilities under data distribution shift.

GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts

TL;DR

This work tackles the challenge of transferring predictor-based NAS across datasets with distribution shifts. It introduces GRASP-GCN, a ranking Graph Convolutional Network that additionally ingests vertex shapes of architectures, and evaluates it on a Kronecker-product-based randomly wired search space trained across four image datasets. The authors construct a 2000-architecture NAS benchmark to study transferability and show that GRASP-GCN, especially when combined with vertex shapes and early stopping, achieves superior top-k ranking and robust generalization under distribution shifts, outperforming prior predictor-based methods on CIFAR-10 and transferring effectively to other datasets. The approach reduces NAS search time by enabling early stopping and demonstrates that more complex datasets yield more transferable rankings, offering practical improvements for multi-dataset NAS scenarios.

Abstract

Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way of improving their prediction performance when dealing with data distribution shifts. We exploit the Kronecker-product on the randomly wired search-space and create a small NAS benchmark composed of networks trained over four different datasets. To improve the generalization abilities, we propose GRASP-GCN, a ranking Graph Convolutional Network that takes as additional input the shape of the layers of the neural networks. GRASP-GCN is trained with the not-at-convergence accuracies, and improves the state-of-the-art of 3.3 % for Cifar-10 and increasing moreover the generalization abilities under data distribution shift.
Paper Structure (17 sections, 9 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 17 sections, 9 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: Architectures are sampled from the search space and trained over 4 datasets. The structure of DNNs is given as input with the shapes of the layers to a ranking GCN, which given the accuracy learns to rank DNNs so that the search space is narrowed down.
  • Figure 2: Working principles of a GCN used as a ranking network
  • Figure 3: Cifar-10 results. Ranking evolution during training with a cumulative (a) and derivative (b) plot. The x-axis shows the architectures in a descent order with respect to their accuracy. The y-axis carries the training epochs. The heatmap displays large numbr of rank changes (yellow) to no changes (blue). (c) Validation accuracy the GCN can obtain when trained with the previous rankings.
  • Figure 4: The ranking of each architecture on one of the four datasets, sorted by the ranking in Cifar-10. Correlation measured through Kendall's$\tau$.
  • Figure 5: 1-NDCG plot showing the ranking correlation among the sorting the architectures have at epoch $i$ w.r.t the sorting of the architectures at epoch 120. The lower the better. Every figure displays the results for considering one dataset at a time. The dashed lines over each plot highlight the epoch where the learning rate is dropped. Every plot displays both the NDCG@10 (light color) and the NDCG@2092 (dark color).