Table of Contents
Fetching ...

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun

TL;DR

The paper tackles the practical problem of accelerating test-time CNN inference by explicitly accounting for nonlinear activations in the approximation process. It introduces a nonlinear-aware, low-rank framework that decomposes and approximates layer responses, uses an asymmetric reconstruction strategy across layers to limit error accumulation, and applies a rank-selection scheme to optimize per-layer complexity under a target speedup. Key contributions include a GSVD/PCA-based solution for the nonlinear case, demonstrated whole-model speedups of about 4x on ImageNet with only modest top-5 degradation (≈0.9%), and favorable comparisons to prior spatial-decomposition methods and to AlexNet in both speed and accuracy. The approach offers a practical path to deploying high-accuracy CNNs in resource-constrained environments, with potential as a regularizer for filters and applicability to other nonlinearities.

Abstract

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

TL;DR

The paper tackles the practical problem of accelerating test-time CNN inference by explicitly accounting for nonlinear activations in the approximation process. It introduces a nonlinear-aware, low-rank framework that decomposes and approximates layer responses, uses an asymmetric reconstruction strategy across layers to limit error accumulation, and applies a rank-selection scheme to optimize per-layer complexity under a target speedup. Key contributions include a GSVD/PCA-based solution for the nonlinear case, demonstrated whole-model speedups of about 4x on ImageNet with only modest top-5 degradation (≈0.9%), and favorable comparisons to prior spatial-decomposition methods and to AlexNet in both speed and accuracy. The approach offers a practical path to deploying high-accuracy CNNs in resource-constrained environments, with potential as a regularizer for filters and applicability to other nonlinearities.

Abstract

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We minimize the reconstruction error of the nonlinear responses, subject to a low-rank constraint which helps to reduce the complexity of filters. We develop an effective solution to this constrained nonlinear optimization problem. An algorithm is also presented for reducing the accumulated error when multiple layers are approximated. A whole-model speedup ratio of 4x is demonstrated on a large network trained for ImageNet, while the top-5 error rate is only increased by 0.9%. Our accelerated model has a comparably fast speed as the "AlexNet", but is 4.7% more accurate.

Paper Structure

This paper contains 13 sections, 13 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the approximation. (a) An original layer with complexity $O(dk^2c)$. (b) An approximated layer with complexity reduced to $O(d'k^2c)+O(dd')$.
  • Figure 2: PCA accumulative energy of the responses in each layer, presented as the sum of largest $d'$ eigenvalues (relative to the total energy when $d'=d$). Here the filter number $d$ is 96 for Conv1, 256 for Conv2, and 512 for Conv3-7 (detailed in Table \ref{['tbl:arch']}).
  • Figure 3: PCA accumulative energy and the accuracy rates (top-5). Here the accuracy is evaluated using the linear solution (the nonlinear solution has a similar trend). Each layer is evaluated independently, with other layers not approximated. The accuracy is shown as the difference to no approximation.
  • Figure 4: Linear vs. Nonlinear: single-layer performance of accelerating Conv1 to Conv7. The speedup ratios are computed by the theoretical complexity, but is nearly the same as the actual speedup ratios in our CPU/GPU implementation. The error rates are top-5 single-view, and shown as the increase of error rates compared with no approximation (smaller is better).
  • Figure 5: Symmetric vs. Asymmetric: the cases of 2-layer and 3-layer approximation. The speedup is computed by the complexity of the layers approximated. (a) Approximation of Conv6 & 7. (b) Approximation of Conv2, 3 & 4. (c) Approximation of Conv5, 6 & 7.
  • ...and 1 more figures