Table of Contents
Fetching ...

Few-shot Neural Architecture Search

Yiyang Zhao, Linnan Wang, Yuandong Tian, Rodrigo Fonseca, Tian Guo

TL;DR

Few-shot NAS addresses the high cost of vanilla NAS and the accuracy gaps of one-shot NAS by using multiple sub-supernets that partition the search space to reduce operation co-adaptation. The approach integrates with both gradient-based and search-based NAS, leveraging transfer learning to rapidly fine-tune sub-supernets and select high-quality architectures. Empirical results across NasBench-201, NasBench1-shot-1, CIFAR-10, ImageNet, AutoGAN, and Penn Treebank demonstrate consistent improvements in ranking accuracy, search efficiency, and final model performance, achieving state-of-the-art results in several settings. The method offers a scalable, generalizable framework for faster, more reliable neural architecture search with broad practical impact on various DL tasks and domains.

Abstract

Efficient evaluation of a network architecture drawn from a large search space remains a key challenge in Neural Architecture Search (NAS). Vanilla NAS evaluates each architecture by training from scratch, which gives the true performance but is extremely time-consuming. Recently, one-shot NAS substantially reduces the computation cost by training only one supernetwork, a.k.a. supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. In this paper, we propose few-shot NAS that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS; on CIFAR10, it reaches 98.72% top-1 accuracy without using extra data or transfer learning. In Auto-GAN, few-shot NAS outperforms the previously published results by up to 20%. Extensive experiments show that few-shot NAS significantly improves various one-shot methods, including 4 gradient-based and 6 search-based methods on 3 different tasks in NasBench-201 and NasBench1-shot-1.

Few-shot Neural Architecture Search

TL;DR

Few-shot NAS addresses the high cost of vanilla NAS and the accuracy gaps of one-shot NAS by using multiple sub-supernets that partition the search space to reduce operation co-adaptation. The approach integrates with both gradient-based and search-based NAS, leveraging transfer learning to rapidly fine-tune sub-supernets and select high-quality architectures. Empirical results across NasBench-201, NasBench1-shot-1, CIFAR-10, ImageNet, AutoGAN, and Penn Treebank demonstrate consistent improvements in ranking accuracy, search efficiency, and final model performance, achieving state-of-the-art results in several settings. The method offers a scalable, generalizable framework for faster, more reliable neural architecture search with broad practical impact on various DL tasks and domains.

Abstract

Efficient evaluation of a network architecture drawn from a large search space remains a key challenge in Neural Architecture Search (NAS). Vanilla NAS evaluates each architecture by training from scratch, which gives the true performance but is extremely time-consuming. Recently, one-shot NAS substantially reduces the computation cost by training only one supernetwork, a.k.a. supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. In this paper, we propose few-shot NAS that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS; on CIFAR10, it reaches 98.72% top-1 accuracy without using extra data or transfer learning. In Auto-GAN, few-shot NAS outperforms the previously published results by up to 20%. Extensive experiments show that few-shot NAS significantly improves various one-shot methods, including 4 gradient-based and 6 search-based methods on 3 different tasks in NasBench-201 and NasBench1-shot-1.

Paper Structure

This paper contains 38 sections, 8 figures, 6 tables, 3 algorithms.

Figures (8)

  • Figure 1: Few-shot NAS is a tradeoff between the vanilla NAS and one-shot NAS that intends to maintain accurate evaluations in vanilla NAS and the speed advantages of one-shot NAS.
  • Figure 2: (a) Masking supernet to a specific architecture for fast evaluation of network architecture. (b) the motivation of using few-shot NAS to alleviate the co-adaption impact. After splitting on edge $a$, supernet_$\Omega_{B}$ exclusively predicts architectures in $\Omega_{B}$, so does supernet $\Omega_{C}$.
  • Figure 3: (a) Using multi-supernets clearly improves the correlation and (c) provides the correlation score (Kendall Tau) at different numbers of supernets in (a); (b) shows the improved performance predictions result in better performance on NAS.
  • Figure 4: A generic architecture space.
  • Figure 5: Anytime accuracy comparison of state-of-the-art gradient-based algorithms on few-shot NAS. We ran each algorithm five times.
  • ...and 3 more figures