Table of Contents
Fetching ...

Neural expressiveness for beyond importance model compression

Angelos-Christos Maroudis, Sotirios Xydis

TL;DR

This paper introduces Neural Expressiveness (NEXP), a data-agnostic pruning criterion based on activation overlap to assess a neuron's or filter's ability to redistribute information. By contrasting expressiveness with traditional weight-based importance, the authors develop a theoretically grounded framework and demonstrate that NEXP can be accurately estimated with limited data and at initialization. Empirically, NEXP-based pruning achieves substantial compression with minimal accuracy loss across CIFAR-10, ImageNet-1k, and YOLOv8, and, when combined with importance, yields additional gains. These results suggest that activation-centric criteria can significantly enhance model compression while preserving performance, offering a complementary or even superior alternative to existing pruning methods.

Abstract

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model compression, named "Expressiveness". Unlike existing pruning methods that rely on the inherent "Importance" of neurons' and filters' weights, ``Expressiveness" emphasizes a neuron's or group of neurons ability to redistribute informational resources effectively, based on the overlap of activations. This characteristic is strongly correlated to a network's initialization state, establishing criterion autonomy from the learning state stateless and thus setting a new fundamental basis for the expansion of compression strategies in regards to the "When to Prune" question. We show that expressiveness is effectively approximated with arbitrary data or limited dataset's representative samples, making ground for the exploration of Data-Agnostic strategies. Our work also facilitates a "hybrid" formulation of expressiveness and importance-based pruning strategies, illustrating their complementary benefits and delivering up to 10x extra gains w.r.t. weight-based approaches in parameter compression ratios, with an average of 1% in performance degradation. We also show that employing expressiveness (independently) for pruning leads to an improvement over top-performing and foundational methods in terms of compression efficiency. Finally, on YOLOv8, we achieve a 46.1% MACs reduction by removing 55.4\% of the parameters, with an increase of 3% in the mean Absolute Precision ($mAP_{50-95}$) for object detection on COCO dataset.

Neural expressiveness for beyond importance model compression

TL;DR

This paper introduces Neural Expressiveness (NEXP), a data-agnostic pruning criterion based on activation overlap to assess a neuron's or filter's ability to redistribute information. By contrasting expressiveness with traditional weight-based importance, the authors develop a theoretically grounded framework and demonstrate that NEXP can be accurately estimated with limited data and at initialization. Empirically, NEXP-based pruning achieves substantial compression with minimal accuracy loss across CIFAR-10, ImageNet-1k, and YOLOv8, and, when combined with importance, yields additional gains. These results suggest that activation-centric criteria can significantly enhance model compression while preserving performance, offering a complementary or even superior alternative to existing pruning methods.

Abstract

Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this paper, we introduce a novel criterion for model compression, named "Expressiveness". Unlike existing pruning methods that rely on the inherent "Importance" of neurons' and filters' weights, ``Expressiveness" emphasizes a neuron's or group of neurons ability to redistribute informational resources effectively, based on the overlap of activations. This characteristic is strongly correlated to a network's initialization state, establishing criterion autonomy from the learning state stateless and thus setting a new fundamental basis for the expansion of compression strategies in regards to the "When to Prune" question. We show that expressiveness is effectively approximated with arbitrary data or limited dataset's representative samples, making ground for the exploration of Data-Agnostic strategies. Our work also facilitates a "hybrid" formulation of expressiveness and importance-based pruning strategies, illustrating their complementary benefits and delivering up to 10x extra gains w.r.t. weight-based approaches in parameter compression ratios, with an average of 1% in performance degradation. We also show that employing expressiveness (independently) for pruning leads to an improvement over top-performing and foundational methods in terms of compression efficiency. Finally, on YOLOv8, we achieve a 46.1% MACs reduction by removing 55.4\% of the parameters, with an increase of 3% in the mean Absolute Precision () for object detection on COCO dataset.

Paper Structure

This paper contains 37 sections, 11 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Expressiveness statistics of feature maps from different convolutional layers and architectures on CIFAR-10.
  • Figure 2: NEXP Pruning Algorithm
  • Figure 2: Analytical Comparison of Importance-based solutions and Expressiveness on ImageNet-1k using ResNet-50 He_2016_CVPR.
  • Figure 3: Linear exploration of the combinatorial space between importance and expressiveness.
  • Figure 4: Expressiveness statistics of feature maps from different convolutional layers and architectures on CIFAR-10 (Extended). For each architecture we demonstrate the expressiveness distribution for both an untrained instance of the model (PaI), as well as a converged one (PaT). The x-axis represents the indices of convolutional layers and y-axis that of the feature maps in each layer. To maintain consistency across the y-axis, we have interpolated each layer's feature maps (pixel-wise) to match the layer with the most feature maps. Columns denote different sampling strategies and different colors denote different expressiveness values (the higher the value, the more expressive the feature map). To approximate the expressiveness score of each element, denoted as "non-approx", we used 25% of all dataset's samples (not 100% due to memory limitations) maintaining the label's distribution. As can be seen, the rank of each feature map (column of the sub-figure) is almost unchanged (the same color), regardless of the image batches. Hence, even a small number of images can effectively estimate the average rank of each feature map in different architectures.
  • ...and 2 more figures