Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks
Alper Kalle, Theo Rudkiewicz, Mohamed-Oumar Ouerfelli, Mohamed Tamaazousti
TL;DR
The paper addresses the challenge of compressing convolutional neural networks by shifting from weight-space error minimization to a data-distribution–aware, function-space criterion. It introduces the Sigma-norm, a covariance-informed metric, and develops CP-ALS-Sigma and Tucker2-ALS-Sigma to optimize this norm for convolutional kernels, enabling competitive accuracy with little to no fine-tuning. A key finding is the transferability of the input covariance statistics across related datasets, supporting data-free or data-limited compression scenarios. Empirical evaluations on ResNet-18/50 and GoogLeNet across ImageNet, CIFAR-10/100, and FGVC datasets demonstrate improved reconstruction quality and higher accuracy compared with Frobenius-based baselines and tensor-deflation methods, with added benefits when combined with quantization. The proposed framework offers practical robustness to dataset changes and limited data access, suggesting a valuable path for real-world, privacy-preserving model compression of CNNs.
Abstract
Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on compression through tensorization and low-rank representations. Whereas classical approaches search for a low-rank approximation by minimizing an isotropic norm such as the Frobenius norm in weight-space, we use data-informed norms that measure the error in function space. Concretely, we minimize the change in the layer's output distribution, which can be expressed as $\lVert (W - \widetilde{W}) Σ^{1/2}\rVert_F$ where $Σ^{1/2}$ is the square root of the covariance matrix of the layer's input and $W$, $\widetilde{W}$ are the original and compressed weights. We propose new alternating least square algorithms for the two most common tensor decompositions (Tucker-2 and CPD) that directly optimize the new norm. Unlike conventional compression pipelines, which almost always require post-compression fine-tuning, our data-informed approach often achieves competitive accuracy without any fine-tuning. We further show that the same covariance-based norm can be transferred from one dataset to another with only a minor accuracy drop, enabling compression even when the original training dataset is unavailable. Experiments on several CNN architectures (ResNet-18/50, and GoogLeNet) and datasets (ImageNet, FGVC-Aircraft, Cifar10, and Cifar100) confirm the advantages of the proposed method.
