ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun
TL;DR
This work argues that FLOPs alone are insufficient to gauge CNN efficiency and that speed must be measured on the target hardware, accounting for memory access and parallelism. It derives four practical guidelines for platform-aware design and introduces ShuffleNet V2, a two-branch channel-split architecture that preserves equal channel widths and reduces overhead from fragmentation and element-wise operations. Extensive GPU and ARM experiments show ShuffleNet V2 achieves superior speed-accuracy tradeoffs across mobile-scale FLOPs and generalizes to larger models and detection tasks, outperforming ShuffleNet v1, MobileNet v2, and others. The results advocate for speed-centric design principles and demonstrate tangible gains in real-world deployment, including COCO object detection and scalable deep architectures.
Abstract
Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.
