Regularizing Differentiable Architecture Search with Smooth Activation
Yanlin Zhou, Mostafa El-Khamy, Kee-Bong Song
TL;DR
This work tackles robustness and generalization gaps in differentiable architecture search by introducing SA-DARTS, a regularization that embeds architecture weights $\boldsymbol{\alpha}$ inside a smooth activation function. The resulting method mitigates skip-dominance and the discretization mismatch between the supernet and the final one-hot architecture, while preserving or improving search efficiency via SAC-DARTS with partial-channel. The authors demonstrate state-of-the-art performance on NAS-Bench-201, CIFAR/ImageNet classification, and super-resolution tasks, and show that the approach yields better loss landscapes and more robust operator ranking. The contributions provide a principled, low-overhead path to more reliable neural architecture search with broad practical impact across vision tasks and beyond.
Abstract
Differentiable Architecture Search (DARTS) is an efficient Neural Architecture Search (NAS) method but suffers from robustness, generalization, and discrepancy issues. Many efforts have been made towards the performance collapse issue caused by skip dominance with various regularization techniques towards operation weights, path weights, noise injection, and super-network redesign. It had become questionable at a certain point if there could exist a better and more elegant way to retract the search to its intended goal -- NAS is a selection problem. In this paper, we undertake a simple but effective approach, named Smooth Activation DARTS (SA-DARTS), to overcome skip dominance and discretization discrepancy challenges. By leveraging a smooth activation function on architecture weights as an auxiliary loss, our SA-DARTS mitigates the unfair advantage of weight-free operations, converging to fanned-out architecture weight values, and can recover the search process from skip-dominance initialization. Through theoretical and empirical analysis, we demonstrate that the SA-DARTS can yield new state-of-the-art (SOTA) results on NAS-Bench-201, classification, and super-resolution. Further, we show that SA-DARTS can help improve the performance of SOTA models with fewer parameters, such as Information Multi-distillation Network on the super-resolution task.
