Table of Contents
Fetching ...

Efficient NAS with FaDE on Hierarchical Spaces

Simon Neumeyer, Julian Stier, Michael Granitzer

TL;DR

This work tackles neural architecture search in hierarchical, open-ended spaces by introducing FaDE, a fast DARTS-based estimator that derives FaDE-ranks—relative performance indicators for finite regions of a hyper-architecture. These ranks enable a memory-less outer search using a pseudo-gradient, batch-wise approach that scales linearly with depth, avoiding proxy architectures. Empirical results show strong rank correlation (~0.8) between FaDE-ranks and actual performance on CIFAR-10, and demonstrate that FaDE-guided outer searches can improve architectures over iterations compared to random search and Bayesian optimization. The method offers a generalizable framework for open-ended NAS, with potential extensions to richer graph spaces and alternative outer-search strategies.

Abstract

Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.

Efficient NAS with FaDE on Hierarchical Spaces

TL;DR

This work tackles neural architecture search in hierarchical, open-ended spaces by introducing FaDE, a fast DARTS-based estimator that derives FaDE-ranks—relative performance indicators for finite regions of a hyper-architecture. These ranks enable a memory-less outer search using a pseudo-gradient, batch-wise approach that scales linearly with depth, avoiding proxy architectures. Empirical results show strong rank correlation (~0.8) between FaDE-ranks and actual performance on CIFAR-10, and demonstrate that FaDE-guided outer searches can improve architectures over iterations compared to random search and Bayesian optimization. The method offers a generalizable framework for open-ended NAS, with potential extensions to richer graph spaces and alternative outer-search strategies.

Abstract

Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.
Paper Structure (13 sections, 6 equations, 8 figures)

This paper contains 13 sections, 6 equations, 8 figures.

Figures (8)

  • Figure 1: (left) Discrete architecture BBD$\in \mathcal{S}^3$ featuring cell architectures $B,D\in\mathcal{S}$. (middle) BBD contained in a hyper-architecture $H\in\mathcal{H}_{3,4}(\mathcal{S})$ that allows for several cell architectures per row. Obtaining relative FaDE-ranks on trained hyper-architecture: factorizing architecture parameters along the corresponding path of the hyper-architecture. (right) Each step in the outer NAS optimization discovers new cell architectures per row.
  • Figure 2: Regularization We apply two modes, cell-dependent and cell-independent regularization by means of a regularization factor $r_i\in \mathbb{R}$ (here shown between -1 and 1) along training epochs (here up to 50 as used in the experiments). Cell-independent regularization applies a regularization factor $r$ which linearly decreases with increasing epochs (straight linearly decreasing line in the middle). Cell-dependent regularization, however, applies a differently regularized loss per cell $i$: $r_i$ decreases faster the smaller $i$.
  • Figure 3: Graph generation: a sample in embedding space determines a corresponding bucket from which a graph is drawn.
  • Figure 4: Density of softmaxed architecture parameters: predicted ranks based on averaged architecture parameter per graph architecture per cell (dark=deep, light=shallow)
  • Figure 5: Correlation between predicted and evaluated ranks.
  • ...and 3 more figures