Table of Contents
Fetching ...

Convolutional neural networks with low-rank regularization

Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E

TL;DR

The paper addresses the high cost of large CNNs by applying a low-rank tensor decomposition to convolution kernels, delivering an exact, globally optimal decomposition via a simple SVD-based procedure and a discriminative fine-tuning path. It also introduces a training-from-scratch approach for low-rank constrained CNNs using batch normalization to stabilize training, enabling very deep architectures. Empirically, the method achieves substantial speedups on CIFAR-10 and ILSVRC12 across models such as AlexNet, NIN, VGG, and GoogLeNet, with accuracy that is competitive or even improved in some cases. This work demonstrates that low-rank decompositions can be a practical and effective tool for compressing and accelerating large CNNs, particularly for mobile and resource-constrained settings.

Abstract

Large CNNs have delivered impressive performance in various computer vision applications. But the storage and computation requirements make it problematic for deploying these models on mobile devices. Recently, tensor decompositions have been used for speeding up CNNs. In this paper, we further develop the tensor decomposition technique. We propose a new algorithm for computing the low-rank tensor decomposition for removing the redundancy in the convolution kernels. The algorithm finds the exact global optimizer of the decomposition and is more effective than iterative methods. Based on the decomposition, we further propose a new method for training low-rank constrained CNNs from scratch. Interestingly, while achieving a significant speedup, sometimes the low-rank constrained CNNs delivers significantly better performance than their non-constrained counterparts. On the CIFAR-10 dataset, the proposed low-rank NIN model achieves $91.31\%$ accuracy (without data augmentation), which also improves upon state-of-the-art result. We evaluated the proposed method on CIFAR-10 and ILSVRC12 datasets for a variety of modern CNNs, including AlexNet, NIN, VGG and GoogleNet with success. For example, the forward time of VGG-16 is reduced by half while the performance is still comparable. Empirical success suggests that low-rank tensor decompositions can be a very useful tool for speeding up large CNNs.

Convolutional neural networks with low-rank regularization

TL;DR

The paper addresses the high cost of large CNNs by applying a low-rank tensor decomposition to convolution kernels, delivering an exact, globally optimal decomposition via a simple SVD-based procedure and a discriminative fine-tuning path. It also introduces a training-from-scratch approach for low-rank constrained CNNs using batch normalization to stabilize training, enabling very deep architectures. Empirically, the method achieves substantial speedups on CIFAR-10 and ILSVRC12 across models such as AlexNet, NIN, VGG, and GoogLeNet, with accuracy that is competitive or even improved in some cases. This work demonstrates that low-rank decompositions can be a practical and effective tool for compressing and accelerating large CNNs, particularly for mobile and resource-constrained settings.

Abstract

Large CNNs have delivered impressive performance in various computer vision applications. But the storage and computation requirements make it problematic for deploying these models on mobile devices. Recently, tensor decompositions have been used for speeding up CNNs. In this paper, we further develop the tensor decomposition technique. We propose a new algorithm for computing the low-rank tensor decomposition for removing the redundancy in the convolution kernels. The algorithm finds the exact global optimizer of the decomposition and is more effective than iterative methods. Based on the decomposition, we further propose a new method for training low-rank constrained CNNs from scratch. Interestingly, while achieving a significant speedup, sometimes the low-rank constrained CNNs delivers significantly better performance than their non-constrained counterparts. On the CIFAR-10 dataset, the proposed low-rank NIN model achieves accuracy (without data augmentation), which also improves upon state-of-the-art result. We evaluated the proposed method on CIFAR-10 and ILSVRC12 datasets for a variety of modern CNNs, including AlexNet, NIN, VGG and GoogleNet with success. For example, the forward time of VGG-16 is reduced by half while the performance is still comparable. Empirical success suggests that low-rank tensor decompositions can be a very useful tool for speeding up large CNNs.

Paper Structure

This paper contains 10 sections, 1 theorem, 17 equations, 4 figures, 5 tables.

Key Result

Theorem 1

Define the following bijection that maps a tensor to a matrix $\mathcal{T}:\mathbb{R}^{C\times d\times d\times N} \mapsto \mathbb{R}^{Cd\times dN}$, tensor element $(i_1,i_2,i_3,i_4)$ maps to $(j_1,j_2)$, where Define $W := \mathcal{T}[\mathcal{W}]$. Let $W=UDQ^T$ be the Singular Value Decomposition (SVD) of $W$. Let then $(\hat{\mathcal{H}},\hat{\mathcal{V}})$ is a solution to $(P1)$.

Figures (4)

  • Figure 1: (a) Filters in the first layer in AlexNet. (b) Low-rank approximation using the proposed schemes with $K=8$, corresponding to $3.67\times$ speedup for this layer. Note the low-rank approximation captures most of the information, including the directionality of the original filters. (c) Low-rank filters trained from scratch with $K=8$.
  • Figure 2: The proposed parametrization for low-rank regularization. Left: The original convolutional layer. Right: low-rank constraint convolutional layer with rank-K.
  • Figure 3: The performance w.r.t. the theoretical layer speedup. Only the conv1-conv5 layers of the AlexNet are shown.
  • Figure 4: The performance w.r.t. the fine-tuning epoch when using the proposed closed form solution as initialization.

Theorems & Definitions (2)

  • Theorem 1
  • proof