Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Samer Francy; Raghubir Singh

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Samer Francy, Raghubir Singh

TL;DR

This study tackles the challenge of deploying CNNs on resource-constrained edge devices by systematically evaluating model compression techniques on ConvNeXt using CIFAR-10. It compares structured pruning (OTOV3), unstructured pruning, and dynamic quantization, including their combination, across cloud and edge deployments. Key findings show up to 75% reductions in model size and 95% reductions in parameters/MACs with dynamic quantization, and additive gains when combining OTOV3 with dynamic quantization (up to 89.7% size reduction and 95% MAC/parameter reductions, with a 3.8% accuracy increase). The results demonstrate practical edge-ready compression workflows that preserve accuracy while enabling fast on-device inference, with edge deployment achieving 92.5% accuracy and 20 ms latency on ConvNeXt Small.

Abstract

This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and computational complexity while maintaining accuracy. The experiments, conducted on cloud-based platforms and edge device, assess the performance of these techniques. Results show significant reductions in model size, with up to 75% reduction achieved using structured pruning techniques. Additionally, dynamic quantization achieves a reduction of up to 95% in the number of parameters. Fine-tuned models exhibit improved compression performance, indicating the benefits of pre-training in conjunction with compression techniques. Unstructured pruning methods reveal trends in accuracy and compression, with limited reductions in computational complexity. The combination of OTOV3 pruning and dynamic quantization further enhances compression performance, resulting 89.7% reduction in size, 95% reduction with number of parameters and MACs, and 3.8% increase with accuracy. The deployment of the final compressed model on edge device demonstrates high accuracy 92.5% and low inference time 20 ms, validating the effectiveness of compression techniques for real-world edge computing applications.

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

TL;DR

Abstract

Paper Structure (48 sections, 15 figures, 6 tables)

This paper contains 48 sections, 15 figures, 6 tables.

Introduction
Overview of Edge AI
Convolutional Neural Networks (CNNs)
CNN Architecture
Computation and Memory Demands
Key CNN Architectures
CNN on Edge
Related Work
Pruning
Pruning For Fully Connected Layer
Pruning For Convolutional Layer
Quantization
Low-Rank Decomposition/Factorization
Knowledge Distillation (KD)
Mixed Techniques
...and 33 more sections

Figures (15)

Figure 1: Unbalanced Demand For Computation (Left) and Memory (Right) in AlexNet zhang_optimized_2019.
Figure 2: Evolution of Key CNN Architectures Over Time.
Figure 3: Weight Pruning (a) and Neuron Pruning (b). x: input, w: weight. choudhary_comprehensive_2020.
Figure 4: Block modifications and resulted specifications. (a) is a ResNeXt block; in (b) we create an inverted bottleneck block and in (c) the position of the spatial depthwise conv layer is moved up liu_convnet_2022.
Figure 5: Block designs for a ResNet, a Swin Transformer, and a ConvNeXt. Swin Transformer’s block is more sophisticated due to the presence of multiple specialized modules and two residual connections liu_convnet_2022.
...and 10 more figures

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

TL;DR

Abstract

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (15)