Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

Beichen Zhang; Xiaoxing Wang; Xiaohan Qin; Junchi Yan

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

Beichen Zhang, Xiaoxing Wang, Xiaohan Qin, Junchi Yan

TL;DR

This work analyzes the order-preserving ability on the whole search space and a sub-space of top architectures (global) and a sub-space of top architectures (local), and empirically shows that the local order-preserving for current two-stage NAS methods still need to be improved.

Abstract

Supernet is a core component in many recent Neural Architecture Search (NAS) methods. It not only helps embody the search space but also provides a (relative) estimation of the final performance of candidate architectures. Thus, it is critical that the top architectures ranked by a supernet should be consistent with those ranked by true performance, which is known as the order-preserving ability. In this work, we analyze the order-preserving ability on the whole search space (global) and a sub-space of top architectures (local), and empirically show that the local order-preserving for current two-stage NAS methods still need to be improved. To rectify this, we propose a novel concept of Supernet Shifting, a refined search strategy combining architecture searching with supernet fine-tuning. Specifically, apart from evaluating, the training loss is also accumulated in searching and the supernet is updated every iteration. Since superior architectures are sampled more frequently in evolutionary searching, the supernet is encouraged to focus on top architectures, thus improving local order-preserving. Besides, a pre-trained supernet is often un-reusable for one-shot methods. We show that Supernet Shifting can fulfill transferring supernet to a new dataset. Specifically, the last classifier layer will be unset and trained through evolutionary searching. Comprehensive experiments show that our method has better order-preserving ability and can find a dominating architecture. Moreover, the pre-trained supernet can be easily transferred into a new dataset with no loss of performance.

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

TL;DR

Abstract

Paper Structure (20 sections, 7 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 7 equations, 8 figures, 8 tables, 1 algorithm.

Introduction
Related Works
Method
NAS Retrospection from Order-preserving
Single-Path Supernet Training
Supernet Shifting
Supernet Transferring
Approach Summary and Remarks
Experiment
Experiment Setting
Searching Result
Order-preserving Ability
Supernet Transfer
Time Cost Analysis
Conclusion and Outlook
...and 5 more sections

Figures (8)

Figure 1: An illustration of global and local order-preserving ability. For global one, we care about coarse-grained comparison to wipe out poor architectures in entire search space. For local one, we care about fine-grained comparison to rank the architectures in a subspace of top architectures.
Figure 2: Pipeline of our method with two stages. In the training stage, a single-path supernet is trained by uniform sampling. Each architecture is equally treated. In the searching stage, evolutionary searching is applied. When an architecture is sampled, the training loss is calculated and accumulated apart from evaluating. At the end of each iteration, the supernet is updated. Since superior architectures are sampled more frequently in evolutionary searching, the supernet is expected to shift to focus on top architectures.
Figure 3: Trajectory of Supernet Shifting process. We sample 5 searched superior architectures (bottom) and 5 random architectures (top). We monitor their error rate over iterations of evolutionary searching. Iteration 0 denotes the original supernet trained by uniform sampling. The shifting supernet gradually focuses on superior architectures and dismisses inferior ones.
Figure 4: We choose 5 depth multipliers: 0.5, 1.0, 1.5, 2.0 and 4.0. For each we train a new supernet to which we apply our method and the SPOS method. Then, we retrain the searched architecture separately and compare the results on ImageNet-100.
Figure 5: Experiments on order-preserving ability. The number of good architectures predicted correctly as the top-10 architectures indicates the global ranking. The Kendall's tau coefficient of the 10 good architectures indicates the local consistency.
...and 3 more figures

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

TL;DR

Abstract

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (8)