Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets
Chenxia Tang
TL;DR
This work investigates applying Differentiable Architecture Search (DARTS) to long-tailed datasets and finds that standard re-sampling and re-weighting can harm NAS performance. It introduces a heterogeneous learning-rate scheduling strategy for architecture parameters within a Bilateral Branch Network (BBN) to stabilize DARTS training when handling imbalanced data, coupled with a symmetric mixing-ratio scheme for the two heads. Empirical results on long-tailed CIFAR-10 show that the proposed method (HLS) can achieve accuracy comparable to or better than the DARTS baseline, while re-sampling methods consistently degrade performance. The study highlights the importance of architecture-parameter LR control and balanced training dynamics in depth-aware NAS under class imbalance, and suggests careful data augmentation as a critical factor in DNAS for imbalanced scenarios.
Abstract
In this paper, we attempt to address the challenge of applying Neural Architecture Search (NAS) algorithms, specifically the Differentiable Architecture Search (DARTS), to long-tailed datasets where class distribution is highly imbalanced. We observe that traditional re-sampling and re-weighting techniques, which are effective in standard classification tasks, lead to performance degradation when combined with DARTS. To mitigate this, we propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS when integrated with the Bilateral Branch Network (BBN) for handling imbalanced datasets. Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations in the later stages of training. Additionally, we explore the impact of branch mixing factors on the algorithm's performance. Through extensive experiments on the CIFAR-10 dataset with an artificially induced long-tailed distribution, we demonstrate that our method achieves comparable accuracy to using DARTS alone. And the experiment results suggest that re-sampling methods inherently harm the performance of the DARTS algorithm. Our findings highlight the importance of careful data augment when applying DNAS to imbalanced learning scenarios.
