Table of Contents
Fetching ...

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun

TL;DR

This work argues that FLOPs alone are insufficient to gauge CNN efficiency and that speed must be measured on the target hardware, accounting for memory access and parallelism. It derives four practical guidelines for platform-aware design and introduces ShuffleNet V2, a two-branch channel-split architecture that preserves equal channel widths and reduces overhead from fragmentation and element-wise operations. Extensive GPU and ARM experiments show ShuffleNet V2 achieves superior speed-accuracy tradeoffs across mobile-scale FLOPs and generalizes to larger models and detection tasks, outperforming ShuffleNet v1, MobileNet v2, and others. The results advocate for speed-centric design principles and demonstrate tangible gains in real-world deployment, including COCO object detection and scalable deep architectures.

Abstract

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

TL;DR

This work argues that FLOPs alone are insufficient to gauge CNN efficiency and that speed must be measured on the target hardware, accounting for memory access and parallelism. It derives four practical guidelines for platform-aware design and introduces ShuffleNet V2, a two-branch channel-split architecture that preserves equal channel widths and reduces overhead from fragmentation and element-wise operations. Extensive GPU and ARM experiments show ShuffleNet V2 achieves superior speed-accuracy tradeoffs across mobile-scale FLOPs and generalizes to larger models and detection tasks, outperforming ShuffleNet v1, MobileNet v2, and others. The results advocate for speed-centric design principles and demonstrate tangible gains in real-world deployment, including COCO object detection and scalable deep architectures.

Abstract

Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.

Paper Structure

This paper contains 11 sections, 2 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Measurement of accuracy (ImageNet classification on validation set), speed and FLOPs of four network architectures on two hardware platforms with four different level of computation complexities (see text for details). (a, c) GPU results, $batch size=8$. (b, d) ARM results, $batch size=1$. The best performing algorithm, our proposed ShuffleNet v2, is on the top right region, under all cases.
  • Figure 1: Building blocks used in experiments for guideline 3. (a) 1-fragment. (b) 2-fragment-series. (c) 4-fragment-series. (d) 2-fragment-parallel. (e) 4-fragment-parallel.
  • Figure 2: Run time decomposition on two representative state-of-the-art network architectures, ShuffeNet v1zhang2017shufflenet (1$\times$, $g=3$) and MobileNet v2sandler2018inverted (1$\times$).
  • Figure 2: Building blocks of ShuffleNet v2 with SE/residual. (a) ShuffleNet v2 with residual. (b) ShuffleNet v2 with SE. (c) ShuffleNet v2 with SE and residual.
  • Figure 3: Building blocks of ShuffleNet v1 zhang2017shufflenet and this work. (a): the basic ShuffleNet unit; (b) the ShuffleNet unit for spatial down sampling ($2\times$); (c) our basic unit; (d) our unit for spatial down sampling ($2\times$). DWConv: depthwise convolution. GConv: group convolution.
  • ...and 1 more figures