Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

Jeimin Jeon; Youngmin Oh; Junghyup Lee; Donghyeon Baek; Dohyung Kim; Chanho Eom; Bumsub Ham

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, Bumsub Ham

TL;DR

This work tackles two core problems in N-shot NAS: unfairness toward high-complexity subnets and noisy momentum from shared optimizers. It introduces CaLR, a complexity-aware LR scheduler, and MS, a momentum separation strategy that clusters subnets by structure and uses cluster-specific momentum buffers, to stabilize training and improve subnet ranking. Across NAS-Bench-201 and MobileNet spaces on CIFAR-10/100 and ImageNet, CaLR+MS consistently improves Kendall's Tau ranking and retrieved subnet accuracy with negligible overhead, and it is compatible as a plug-in with SPOS, FairNAS, and FSNAS. The approach provides a practical, generalizable enhancement to dynamic supernet training, advancing reliable NAS with minimal computational burden.

Abstract

N-shot neural architecture search (NAS) exploits a supernet containing all candidate subnets for a given search space. The subnets are typically trained with a static training strategy (e.g., using the same learning rate (LR) scheduler and optimizer for all subnets). This, however, does not consider that individual subnets have distinct characteristics, leading to two problems: (1) The supernet training is biased towards the low-complexity subnets (unfairness); (2) the momentum update in the supernet is noisy (noisy momentum). We present a dynamic supernet training technique to address these problems by adjusting the training strategy adaptive to the subnets. Specifically, we introduce a complexity-aware LR scheduler (CaLR) that controls the decay ratio of LR adaptive to the complexities of subnets, which alleviates the unfairness problem. We also present a momentum separation technique (MS). It groups the subnets with similar structural characteristics and uses a separate momentum for each group, avoiding the noisy momentum problem. Our approach can be applicable to various N-shot NAS methods with marginal cost, while improving the search performance drastically. We validate the effectiveness of our approach on various search spaces (e.g., NAS-Bench-201, Mobilenet spaces) and datasets (e.g., CIFAR-10/100, ImageNet).

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

TL;DR

Abstract

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)