Table of Contents
Fetching ...

Complexity-Driven CNN Compression for Resource-constrained Edge AI

Muhammad Zawish, Steven Davy, Lizy Abraham

TL;DR

This work tackles the challenge of running large CNNs on resource-constrained edge devices by introducing complexity-driven pruning, a single-pass, structured pruning approach that directly trains the pruned model. It selects layers for filter pruning based on layer-level complexity, using weighted random sampling, and offers three modes—PA, FA, and MA—to trade accuracy for parameters, FLOPs, or memory. The method avoids expensive ranking and fine-tuning steps and demonstrates competitive accuracy with substantial reductions in FLOPs, memory, and parameters across networks like VGG-16, MobileNetV2, AlexNet, and ResNet-50 on CIFAR-10/100, as well as improved training efficiency on GPUs and edge devices. The approach generalizes to low- and high-level vision tasks and provides a practical framework for developers to tailor CNN compression to available resources, latency constraints, and energy budgets.

Abstract

Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT)-enabled network edge has realized edge intelligence in several applications such as smart agriculture, smart hospitals, and smart factories by enabling low-latency and computational efficiency. However, deploying state-of-the-art Convolutional Neural Networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their large number of parameters and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured do not consider the different underlying nature of complexities being exhibited by convolutional layers and follow a training-pruning-retraining pipeline, which results in additional computational overhead. In this work, we propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs. Unlike typical methods, our proposed complexity-driven algorithm selects a particular layer for filter-pruning based on its contribution to overall network complexity. We follow a procedure that directly trains the pruned model and avoids the computationally complex ranking and fine-tuning steps. Moreover, we define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs. Our results show the competitive performance of our approach in terms of accuracy and acceleration. Lastly, we present a trade-off between different resources and accuracy which can be helpful for developers in making the right decisions in resource-constrained IoT environments.

Complexity-Driven CNN Compression for Resource-constrained Edge AI

TL;DR

This work tackles the challenge of running large CNNs on resource-constrained edge devices by introducing complexity-driven pruning, a single-pass, structured pruning approach that directly trains the pruned model. It selects layers for filter pruning based on layer-level complexity, using weighted random sampling, and offers three modes—PA, FA, and MA—to trade accuracy for parameters, FLOPs, or memory. The method avoids expensive ranking and fine-tuning steps and demonstrates competitive accuracy with substantial reductions in FLOPs, memory, and parameters across networks like VGG-16, MobileNetV2, AlexNet, and ResNet-50 on CIFAR-10/100, as well as improved training efficiency on GPUs and edge devices. The approach generalizes to low- and high-level vision tasks and provides a practical framework for developers to tailor CNN compression to available resources, latency constraints, and energy budgets.

Abstract

Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT)-enabled network edge has realized edge intelligence in several applications such as smart agriculture, smart hospitals, and smart factories by enabling low-latency and computational efficiency. However, deploying state-of-the-art Convolutional Neural Networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their large number of parameters and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured do not consider the different underlying nature of complexities being exhibited by convolutional layers and follow a training-pruning-retraining pipeline, which results in additional computational overhead. In this work, we propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs. Unlike typical methods, our proposed complexity-driven algorithm selects a particular layer for filter-pruning based on its contribution to overall network complexity. We follow a procedure that directly trains the pruned model and avoids the computationally complex ranking and fine-tuning steps. Moreover, we define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs. Our results show the competitive performance of our approach in terms of accuracy and acceleration. Lastly, we present a trade-off between different resources and accuracy which can be helpful for developers in making the right decisions in resource-constrained IoT environments.
Paper Structure (25 sections, 8 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 8 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: The VGG-16 architecture consists of 13 convolutional layers and 3 fully-connected layers. Each layer has a different ratio and nature of complexity i.e. number of parameters, FLOP count, and memory size. The size of circle represents the parameter-complexity, while x-axis and y-axis shows giga FLOPs and memory based complexity respectively.
  • Figure 2: A comparison between a typical structured pruning pipeline and the proposed complexity-driven approach. (a) shows a three-stage approach involving computationally intensive ranking and fine-tuning steps, and (b) shows a proposed complexity-driven approach skipping ranking and fine-tuning steps.
  • Figure 3: An example of a CNN with 3 conv layers with uneven distribution of filters showing the calculation of FLOPs, parameters, and memory. Each layer exhibits the different number of parameters, FLOPs, and memory requirements as we move from upper layers to lower layers. The spatial dimensions are calculate using: $(Input \; size - Filter\; size)/stride+1$. Note that we have considered $stride=0$ at this stage.
  • Figure 4: Trade-off among accuracy, latency, energy consumption, CPU, and memory utilisation for AlexNet on CIFAR-100.
  • Figure 5: Trade-off among accuracy, latency, energy consumption, CPU, and memory utilisation for VGG-16 on CIFAR-10.