Table of Contents
Fetching ...

Data-Free Dynamic Compression of CNNs for Tractable Efficiency

Lukas Meiner, Jens Mehnert, Alexandru Paul Condurache

TL;DR

The paper addresses the high computational cost of CNNs by proposing HASTE, a data-free, plug-and-play module that dynamically compresses channel depth at test time using locality-sensitive hashing with sparse random projections. By grouping similar latent channels and leveraging the distributive property of convolution, HASTE merges redundant inputs and filters, achieving substantial FLOPs reductions without any training or data access. The approach introduces a tunable hyperparameter set, notably the number of hash hyperplanes $L$, to trade off accuracy and efficiency, and demonstrates strong results on CIFAR-10 (e.g., ResNet34 with 46.72% FLOPs reduction and 1.25% accuracy loss) and ImageNet (up to 31.54% FLOPs reduction for WideResNet101), with scalability to deeper and wider models. This data-free, dynamic pruning has practical impact for edge devices and federated settings, enabling real-time adjustment of model complexity without retraining or data availability.

Abstract

To reduce the computational cost of convolutional neural networks (CNNs) on resource-constrained devices, structured pruning approaches have shown promise in lowering floating-point operations (FLOPs) without substantial drops in accuracy. However, most methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs, adding computational overhead and requiring training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning. Our approach utilizes locality-sensitive hashing (LSH) to detect redundancies in the channel dimension of latent feature maps, compressing similar channels to reduce input and filter depth simultaneously, resulting in cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy by swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.

Data-Free Dynamic Compression of CNNs for Tractable Efficiency

TL;DR

The paper addresses the high computational cost of CNNs by proposing HASTE, a data-free, plug-and-play module that dynamically compresses channel depth at test time using locality-sensitive hashing with sparse random projections. By grouping similar latent channels and leveraging the distributive property of convolution, HASTE merges redundant inputs and filters, achieving substantial FLOPs reductions without any training or data access. The approach introduces a tunable hyperparameter set, notably the number of hash hyperplanes , to trade off accuracy and efficiency, and demonstrates strong results on CIFAR-10 (e.g., ResNet34 with 46.72% FLOPs reduction and 1.25% accuracy loss) and ImageNet (up to 31.54% FLOPs reduction for WideResNet101), with scalability to deeper and wider models. This data-free, dynamic pruning has practical impact for edge devices and federated settings, enabling real-time adjustment of model complexity without retraining or data availability.

Abstract

To reduce the computational cost of convolutional neural networks (CNNs) on resource-constrained devices, structured pruning approaches have shown promise in lowering floating-point operations (FLOPs) without substantial drops in accuracy. However, most methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs, adding computational overhead and requiring training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning. Our approach utilizes locality-sensitive hashing (LSH) to detect redundancies in the channel dimension of latent feature maps, compressing similar channels to reduce input and filter depth simultaneously, resulting in cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy by swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
Paper Structure (11 sections, 4 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 11 sections, 4 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of related pruning approaches. Training-based methods require specialized training procedures. Methods based on fine-tuning need retraining to compensate lost accuracy in the pruning step. Our method instantly reduces network FLOPs and maintains high accuracy entirely without training or fine-tuning.
  • Figure 2: Overview of our proposed HASTE module. Each patch of the input feature map is processed to find redundant channels. Detected redundancies are then merged together, dynamically reducing the depth of each patch and the convolutional filters.
  • Figure 3: Visualization of the input channel compression performed by the HASTE module in a ResNet18 model on CIFAR-10. One observed patch is marked as a red square on the input feature maps. All 64 channels of this patch are then plotted in an $8 \times 8$ grid. Patches with identical hash codes receive identical outline colors and are averaged by taking their mean. Patches with no matching hash code are left unchanged. Here, we reduce the input channel dimension from $64$ to $24$, which gives us a compression ratio of $r = 62.50\%$.
  • Figure 4: Results of our method on the CIFAR-10 dataset. (a) shows the achieved FLOPs reduction for all tested models, using $L = 14$ for ResNets and $L = 20$ for VGG-BN models. (b) depicts the influence of the chosen number of hyperplanes $L$ (shown in gray) on compression rates and accuracy.
  • Figure 5: Visualization of results on the ImageNet dataset. (a) depicts the relation of FLOPs reduction to number of parameters for all tested architectures. Results are shown with $L=16$ for basic ResNet models, $L=28$ for bottleneck ResNets, $L=32$ for WideResNets, and $L=20$ for VGG-BN models. (b) shows the achieved compression rate per convolution module in a ResNet50, starting from the second bottleneck layer.
  • ...and 3 more figures