Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Yihui He, Jianing Qian, Jianren Wang, Cindy X. Le, Congrui Hetang, Qi Lyu, Wenping Wang, Tianwei Yue
TL;DR
The paper tackles the latency of very deep CNNs by introducing Depth-wise Decomposition, an SVD/GSVD-based method to convert regular convolutions into depth-wise separable forms without significant accuracy loss. It details data-driven processing, per-channel and inter-channel compensation via GSVD, and extensions to multi-layer networks, followed by fine-tuning. Empirical results on ImageNet using ShuffleNet V2 and Xception show the approach outperforms channel decomposition and yields approximately a 2% Top-1 accuracy improvement at similar compute, demonstrating practical gains for resource-constrained scenarios. This method offers a generalizable route to accelerate CNNs without sacrificing performance, with broad applicability to modern lightweight architectures.
Abstract
Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is much faster than its counterpart, regular convolution, accuracy is sacrificed. In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy. We show our approach can be further generalized to the multi-channel and multi-layer cases, based on Generalized Singular Value Decomposition (GSVD) [59]. We conduct thorough experiments with the latest ShuffleNet V2 model [47] on both random synthesized dataset and a large-scale image recognition dataset: ImageNet [10]. Our approach outperforms channel decomposition [73] on all datasets. More importantly, our approach improves the Top-1 accuracy of ShuffleNet V2 by ~2%.
