Table of Contents
Fetching ...

Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

Alper Kalle, Theo Rudkiewicz, Mohamed-Oumar Ouerfelli, Mohamed Tamaazousti

TL;DR

The paper addresses the challenge of compressing convolutional neural networks by shifting from weight-space error minimization to a data-distribution–aware, function-space criterion. It introduces the Sigma-norm, a covariance-informed metric, and develops CP-ALS-Sigma and Tucker2-ALS-Sigma to optimize this norm for convolutional kernels, enabling competitive accuracy with little to no fine-tuning. A key finding is the transferability of the input covariance statistics across related datasets, supporting data-free or data-limited compression scenarios. Empirical evaluations on ResNet-18/50 and GoogLeNet across ImageNet, CIFAR-10/100, and FGVC datasets demonstrate improved reconstruction quality and higher accuracy compared with Frobenius-based baselines and tensor-deflation methods, with added benefits when combined with quantization. The proposed framework offers practical robustness to dataset changes and limited data access, suggesting a valuable path for real-world, privacy-preserving model compression of CNNs.

Abstract

Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on compression through tensorization and low-rank representations. Whereas classical approaches search for a low-rank approximation by minimizing an isotropic norm such as the Frobenius norm in weight-space, we use data-informed norms that measure the error in function space. Concretely, we minimize the change in the layer's output distribution, which can be expressed as $\lVert (W - \widetilde{W}) Σ^{1/2}\rVert_F$ where $Σ^{1/2}$ is the square root of the covariance matrix of the layer's input and $W$, $\widetilde{W}$ are the original and compressed weights. We propose new alternating least square algorithms for the two most common tensor decompositions (Tucker-2 and CPD) that directly optimize the new norm. Unlike conventional compression pipelines, which almost always require post-compression fine-tuning, our data-informed approach often achieves competitive accuracy without any fine-tuning. We further show that the same covariance-based norm can be transferred from one dataset to another with only a minor accuracy drop, enabling compression even when the original training dataset is unavailable. Experiments on several CNN architectures (ResNet-18/50, and GoogLeNet) and datasets (ImageNet, FGVC-Aircraft, Cifar10, and Cifar100) confirm the advantages of the proposed method.

Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

TL;DR

The paper addresses the challenge of compressing convolutional neural networks by shifting from weight-space error minimization to a data-distribution–aware, function-space criterion. It introduces the Sigma-norm, a covariance-informed metric, and develops CP-ALS-Sigma and Tucker2-ALS-Sigma to optimize this norm for convolutional kernels, enabling competitive accuracy with little to no fine-tuning. A key finding is the transferability of the input covariance statistics across related datasets, supporting data-free or data-limited compression scenarios. Empirical evaluations on ResNet-18/50 and GoogLeNet across ImageNet, CIFAR-10/100, and FGVC datasets demonstrate improved reconstruction quality and higher accuracy compared with Frobenius-based baselines and tensor-deflation methods, with added benefits when combined with quantization. The proposed framework offers practical robustness to dataset changes and limited data access, suggesting a valuable path for real-world, privacy-preserving model compression of CNNs.

Abstract

Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on compression through tensorization and low-rank representations. Whereas classical approaches search for a low-rank approximation by minimizing an isotropic norm such as the Frobenius norm in weight-space, we use data-informed norms that measure the error in function space. Concretely, we minimize the change in the layer's output distribution, which can be expressed as where is the square root of the covariance matrix of the layer's input and , are the original and compressed weights. We propose new alternating least square algorithms for the two most common tensor decompositions (Tucker-2 and CPD) that directly optimize the new norm. Unlike conventional compression pipelines, which almost always require post-compression fine-tuning, our data-informed approach often achieves competitive accuracy without any fine-tuning. We further show that the same covariance-based norm can be transferred from one dataset to another with only a minor accuracy drop, enabling compression even when the original training dataset is unavailable. Experiments on several CNN architectures (ResNet-18/50, and GoogLeNet) and datasets (ImageNet, FGVC-Aircraft, Cifar10, and Cifar100) confirm the advantages of the proposed method.

Paper Structure

This paper contains 57 sections, 2 theorems, 37 equations, 7 figures, 16 tables, 2 algorithms.

Key Result

Proposition 1

Consider a distribution $\mathcal{D}$, a partial neural network $p$, and two convolution $\mathbf{Conv}_{\mathcal{K}}$ and $\mathbf{Conv}_{\widetilde{\mathcal{K}}}$ parametrized by the kernel tensor $\mathcal{K}\in \mathbb{R}^{T \times S \times H \times W}$ and $\widetilde{\mathcal{K}}$. Under reaso where $\left(\cdot\right)_{(1)}$ is the reshaping of the convolution kernel into $(T, S \times H \t

Figures (7)

  • Figure 1: Accuracy comparison of decomposed models obtained with Tucker2-ALS-Sigma and Tucker2-ALS algorithms, also including fine-tuned decomposed model (with Tucker2-ALS algorithm) results where the fine-tuning done on the subset of ImageNet train dataset. ($rX$ denotes the compression ratio, calculated by dividing the number of parameters of the original model by that of the compressed model.)
  • Figure 2: Accuracy comparison of Tucker2-ALS-Sigma and Tucker2-ALS algorithms including the fine-tuned model results after compression with Tucker2-ALS using Cifar10 dataset.
  • Figure 3: Accuracy comparison of decomposed models obtained with CP-ALS-Sigma and CP-ALS algorithms, also including fine-tuned decomposed model (with CP-ALS algorithm) results where fine-tuning done on the subset of ImageNet train dataset.
  • Figure 4: Comparison of CP-ALS-Sigma and CP-ALS algorithms including with fine-tuned model after compression with CP-ALS using CIFAR-10 dataset.
  • Figure 5: Accuracy comparison of decomposed models obtained using the Tucker2-ALS-Sigma and Tucker2-ALS algorithms across ResNet18, GoogLeNet, and ResNet50 architectures. The results also include fine-tuned decomposed models obtained via Tucker2-ALS, where fine-tuning was performed on the CIFAR-100 training dataset.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 1
  • proof