Efficient and Accurate Approximations of Nonlinear Convolutional Networks
Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun
TL;DR
The paper tackles the practical problem of accelerating test-time CNN inference by explicitly accounting for nonlinear activations in the approximation process. It introduces a nonlinear-aware, low-rank framework that decomposes and approximates layer responses, uses an asymmetric reconstruction strategy across layers to limit error accumulation, and applies a rank-selection scheme to optimize per-layer complexity under a target speedup. Key contributions include a GSVD/PCA-based solution for the nonlinear case, demonstrated whole-model speedups of about 4x on ImageNet with only modest top-5 degradation (≈0.9%), and favorable comparisons to prior spatial-decomposition methods and to AlexNet in both speed and accuracy. The approach offers a practical path to deploying high-accuracy CNNs in resource-constrained environments, with potential as a regularizer for filters and applicability to other nonlinearities.
Abstract
This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.
