Compressing Neural Networks Using Tensor Networks with Exponentially Fewer Variational Parameters
Yong Qing, Ke Li, Peng-Fei Zhou, Shi-Ju Ran
TL;DR
The paper addresses the challenge of massive variational parameters in neural networks by introducing automatically differentiable tensor networks (ADTN) that encode layer parameters into deep tensor-network contractions, reducing parameters from an exponential scale to a near-linear scale while preserving or enhancing accuracy on standard benchmarks. The method combines brick-wall ADTN constructions with a two-stage optimization (Euclidean pre-training followed by task-specific fine-tuning) and demonstrates exceptional compression on networks such as VGG-16, achieving substantial parameter reductions (e.g., from ~$10^7$ to a few hundred parameters) with improved CIFAR-10 accuracy. Key insights include the importance of backward compression order, the beneficial role of deeper ADTN layers for high compression, and the superior representational power of deep TNs over shallow tensor decompositions. The approach offers a practical, scalable framework for compressing neural networks and suggests deep tensor networks as a potent basis for representing variational parameters in modern architectures.
Abstract
Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including \R{overfitting}, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN's, despite of their specific types (linear, convolutional, \textit{etc}), by encoding them to deep \R{automatically differentiable} tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, improving the testing accuracy on CIFAR-10 from $90.17\%$ to $91.74\%$. We argue that the deep structure of ADTN is an essential reason for the remarkable compression performance of ADTN, compared to existing compression schemes that are mainly based on tensor decompositions/factorization and shallow tensor networks. Our work suggests deep TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.
