Table of Contents
Fetching ...

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

TL;DR

PC-DARTS addresses the memory bottleneck of differentiable architecture search by performing operation selection on a randomly sampled channel subset and bypassing the rest, paired with edge normalization to stabilize edge selection. This partial channel approach enables much larger batch sizes, faster search, and improved stability, achieving 2.57% error on CIFAR-10 in under 0.1 GPU-days and 24.2% top-1 on ImageNet mobile in 3.8 GPU-days, plus successful direct ImageNet search and transfer to object detection. Ablation studies identify 1/4 channel sampling as a sweet spot and show that edge normalization provides regularization, with both components yielding the best performance. The results demonstrate practical, memory-efficient NAS with strong performance and transferability across tasks and datasets.

Abstract

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture. In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We alleviate it using edge normalization, which adds a new set of edge-level parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoys both faster speed and higher training stability. Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) using 3.8 GPU-days for search. Our code has been made available at: https://github.com/yuhuixu1993/PC-DARTS.

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

TL;DR

PC-DARTS addresses the memory bottleneck of differentiable architecture search by performing operation selection on a randomly sampled channel subset and bypassing the rest, paired with edge normalization to stabilize edge selection. This partial channel approach enables much larger batch sizes, faster search, and improved stability, achieving 2.57% error on CIFAR-10 in under 0.1 GPU-days and 24.2% top-1 on ImageNet mobile in 3.8 GPU-days, plus successful direct ImageNet search and transfer to object detection. Ablation studies identify 1/4 channel sampling as a sweet spot and show that edge normalization provides regularization, with both components yielding the best performance. The results demonstrate practical, memory-efficient NAS with strong performance and transferability across tasks and datasets.

Abstract

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture. In this paper, we present a novel approach, namely, Partially-Connected DARTS, by sampling a small part of super-network to reduce the redundancy in exploring the network space, thereby performing a more efficient search without comprising the performance. In particular, we perform operation search in a subset of channels while bypassing the held out part in a shortcut. This strategy may suffer from an undesired inconsistency on selecting the edges of super-net caused by sampling different channels. We alleviate it using edge normalization, which adds a new set of edge-level parameters to reduce uncertainty in search. Thanks to the reduced memory cost, PC-DARTS can be trained with a larger batch size and, consequently, enjoys both faster speed and higher training stability. Experimental results demonstrate the effectiveness of the proposed method. Specifically, we achieve an error rate of 2.57% on CIFAR10 with merely 0.1 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.2% on ImageNet (under the mobile setting) using 3.8 GPU-days for search. Our code has been made available at: https://github.com/yuhuixu1993/PC-DARTS.

Paper Structure

This paper contains 17 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Illustration of the proposed approach (best viewed in color), partially-connected DARTS (PC-DARTS). As an example, we investigate how information is propagated to node $\#3$, i.e., ${j}={3}$. There are two sets of hyper-parameters during search, namely, $\left\{\alpha_{i,j}^o\right\}$ and $\left\{\beta_{i,j}\right\}$, where ${0}\leqslant{i}<{j}$ and ${o}\in{\mathcal{O}}$. To determine $\left\{\alpha_{i,j}^o\right\}$, we only sample a subset, $1/K$, of channels and connect them to the next stage, so that the memory consumption is reduced by $K$ times. To minimize the uncertainty incurred by sampling, we add $\left\{\beta_{i,j}\right\}$ as extra edge-level parameters.
  • Figure 2: Cells found on CIFAR10 and ImageNet. Searching on ImageNet makes the normal cell more complex (deeper), although the reduction cell is very similar to that found on CIFAR10.