Table of Contents
Fetching ...

ZO-DARTS++: An Efficient and Size-Variable Zeroth-Order Neural Architecture Search Algorithm

Lunchen Xie, Eugenio Lomurno, Matteo Gambella, Danilo Ardagna, Manual Roveri, Matteo Matteucci, Qingjiang Shi

TL;DR

ZO-DARTS++ tackles the efficiency and resource-adaptation challenges of differentiable NAS by integrating a zeroth-order gradient estimator, a sparsity-promoting architecture distribution via sparsemax with temperature annealing, and a size-variable search that jointly optimizes kernel size and network depth. The bi-level NAS formulation includes a parameter-budget constraint enforced through a penalty, enabling search under realistic resource limits. Empirical evaluation on MedMNIST datasets demonstrates improved average accuracy and substantially reduced search time, with constrained variants achieving up to ~35% parameter reductions while preserving performance. The approach also provides enhanced interpretability of operation choices and demonstrates favorable comparisons to POPNASv3, highlighting practical applicability to medical imaging on devices with limited resources.

Abstract

Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficiency, operation selection, and adaptability under varying resource limitations. We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints. By integrating a zeroth-order approximation for efficient gradient handling, employing a sparsemax function with temperature annealing for clearer and more interpretable architecture distributions, and adopting a size-variable search scheme for generating compact yet accurate architectures, ZO-DARTS++ establishes a new balance between model complexity and performance. In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8\% over standard DARTS-based methods and shortens search time by approximately 38.6\%. Additionally, its resource-constrained variants can reduce the number of parameters by more than 35\% while maintaining competitive accuracy levels. Thus, ZO-DARTS++ offers a versatile and efficient framework for generating high-quality, resource-aware DL models suitable for real-world medical applications.

ZO-DARTS++: An Efficient and Size-Variable Zeroth-Order Neural Architecture Search Algorithm

TL;DR

ZO-DARTS++ tackles the efficiency and resource-adaptation challenges of differentiable NAS by integrating a zeroth-order gradient estimator, a sparsity-promoting architecture distribution via sparsemax with temperature annealing, and a size-variable search that jointly optimizes kernel size and network depth. The bi-level NAS formulation includes a parameter-budget constraint enforced through a penalty, enabling search under realistic resource limits. Empirical evaluation on MedMNIST datasets demonstrates improved average accuracy and substantially reduced search time, with constrained variants achieving up to ~35% parameter reductions while preserving performance. The approach also provides enhanced interpretability of operation choices and demonstrates favorable comparisons to POPNASv3, highlighting practical applicability to medical imaging on devices with limited resources.

Abstract

Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficiency, operation selection, and adaptability under varying resource limitations. We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints. By integrating a zeroth-order approximation for efficient gradient handling, employing a sparsemax function with temperature annealing for clearer and more interpretable architecture distributions, and adopting a size-variable search scheme for generating compact yet accurate architectures, ZO-DARTS++ establishes a new balance between model complexity and performance. In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8\% over standard DARTS-based methods and shortens search time by approximately 38.6\%. Additionally, its resource-constrained variants can reduce the number of parameters by more than 35\% while maintaining competitive accuracy levels. Thus, ZO-DARTS++ offers a versatile and efficient framework for generating high-quality, resource-aware DL models suitable for real-world medical applications.

Paper Structure

This paper contains 20 sections, 17 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overall search framework of ZO-DARTS++, which produces a flexible network model. The stages contain various cells, each has various operations, and the convolution kernel sizes and cell numbers can be adjusted during the search. (a) Basic searchable cell. (b) Operation set. (c) Kernel-variable convolution. (d) Depth-variable stage. (e) Reduction cell. (f) Final classifier.
  • Figure 2: Violin plots of parameter numbers (million) distribution of models across various datasets. Each violin represents the range, density, and median parameter size for models sampled from supernets searched without constraints on the respective datasets.
  • Figure 3: Joint distribution of model accuracy (%) and parameter numbers (million) for different NAS methods. Each point represents a specific model structure, with density distributions of accuracy (top) and parameter size (right) shown as marginal plots.
  • Figure 4: Violin plots of parameter numbers (million) distribution of models across various datasets. Each violin represents the range, density, and median parameter size for models sampled from supernets searched under three levels of constraints on the respective datasets. Blue and red lines represent the upper and lower bounds of constraints during the search.
  • Figure 5: Probability rank variation of one edge during the search procedure. Lines with different colors represent different operations.
  • ...and 2 more figures