Table of Contents
Fetching ...

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

Bailey J. Eccles, Leon Wong, Blesson Varghese

TL;DR

Edge deployment of DNNs demands large speedups and memory reductions without sacrificing accuracy. Reconvene combines pruning-at-initialization with selective structured pruning (SPaI) guided by layer sensitivity, enabling rapid generation of edge-ready models. It introduces Pruning Sensitivity Evaluator (PSE) and Resilient Layer Rectifier (RLR) to identify and prune non-sensitive layers while reinitializing affected layers, achieving up to $16.21\times$ compression and ~2× speedups while preserving dense-model accuracy. Across CIFAR-10 and Tiny ImageNet with models like VGG-16 and ResNet variants, Reconvene demonstrates faster training, better or comparable accuracy to UPaI, and superior edge-efficiency relative to SPaI and NAS baselines, establishing a practical pathway for structured PaI in heterogeneous edge environments.

Abstract

Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above problems. We develop Reconvene, a system for rapidly generating pruned models suited for edge deployments using structured PaI. Reconvene systematically identifies and prunes DNN convolution layers that are least sensitive to structured pruning. Reconvene rapidly creates pruned DNNs within seconds that are up to 16.21x smaller and 2x faster while maintaining the same accuracy as an unstructured PaI counterpart.

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

TL;DR

Edge deployment of DNNs demands large speedups and memory reductions without sacrificing accuracy. Reconvene combines pruning-at-initialization with selective structured pruning (SPaI) guided by layer sensitivity, enabling rapid generation of edge-ready models. It introduces Pruning Sensitivity Evaluator (PSE) and Resilient Layer Rectifier (RLR) to identify and prune non-sensitive layers while reinitializing affected layers, achieving up to compression and ~2× speedups while preserving dense-model accuracy. Across CIFAR-10 and Tiny ImageNet with models like VGG-16 and ResNet variants, Reconvene demonstrates faster training, better or comparable accuracy to UPaI, and superior edge-efficiency relative to SPaI and NAS baselines, establishing a practical pathway for structured PaI in heterogeneous edge environments.

Abstract

Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above problems. We develop Reconvene, a system for rapidly generating pruned models suited for edge deployments using structured PaI. Reconvene systematically identifies and prunes DNN convolution layers that are least sensitive to structured pruning. Reconvene rapidly creates pruned DNNs within seconds that are up to 16.21x smaller and 2x faster while maintaining the same accuracy as an unstructured PaI counterpart.
Paper Structure (16 sections, 1 equation, 7 figures, 4 tables, 2 algorithms)

This paper contains 16 sections, 1 equation, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Evaluating model compression methods to reduce parameter count of VGG-16 (on the CIFAR-10 dataset) by 50$\times$. Dashed lines are the baseline values of an uncompressed dense VGG-16. The bar for neural architecture search includes the discovery time for generating a range of compressed models with different levels of compression and accuracy.
  • Figure 2: Different pruning at initialization (PaI) methods applied to a convolutional layer. UPaI prunes and then reinitializes the remaining parameters. SPaI redistributes the parameters such that a smaller layer of only dense channels is created.
  • Figure 3: Pruning at initialization (PaI) methods for VGG-16 (CIFAR-10). UPaI maintains model accuracy without improving runtime performance, while SPaI improves performance but reduces accuracy.
  • Figure 4: System overview of Reconvene.
  • Figure 5: VGG-11 (CIFAR-10, $p=0.95$) before and after Reconvene. Layers to the left of the vertical dashed line are considered to be sensitive to pruning and, thereby, do not undergo structured pruning in the RLR.
  • ...and 2 more figures