Table of Contents
Fetching ...

DANCE: Dynamic 3D CNN Pruning: Joint Frame, Channel, and Feature Adaptation for Energy Efficiency on the Edge

Mohamed Mejri, Ashiqur Rasul, Abhijit Chatterjee

Abstract

Modern convolutional neural networks (CNNs) are workhorses for video and image processing, but fail to adapt to the computational complexity of input samples in a dynamic manner to minimize energy consumption. In this research, we propose DANCE, a fine-grained, input-aware, dynamic pruning framework for 3D CNNs to maximize power efficiency with negligible to zero impact on performance. In the proposed two-step approach, the first step is called activation variability amplification (AVA), and the 3D CNN model is retrained to increase the variance of the magnitude of neuron activations across the network in this step, facilitating pruning decisions across diverse CNN input scenarios. In the second step, called adaptive activation pruning (AAP), a lightweight activation controller network is trained to dynamically prune frames, channels, and features of 3D convolutional layers of the network (different for each layer), based on statistics of the outputs of the first layer of the network. Our method achieves substantial savings in multiply-accumulate (MAC) operations and memory accesses by introducing sparsity within convolutional layers. Hardware validation on the NVIDIA Jetson Nano GPU and the Qualcomm Snapdragon 8 Gen 1 platform demonstrates respective speedups of 1.37X and 2.22X, achieving up to 1.47X higher energy efficiency compared to the state of the art.

DANCE: Dynamic 3D CNN Pruning: Joint Frame, Channel, and Feature Adaptation for Energy Efficiency on the Edge

Abstract

Modern convolutional neural networks (CNNs) are workhorses for video and image processing, but fail to adapt to the computational complexity of input samples in a dynamic manner to minimize energy consumption. In this research, we propose DANCE, a fine-grained, input-aware, dynamic pruning framework for 3D CNNs to maximize power efficiency with negligible to zero impact on performance. In the proposed two-step approach, the first step is called activation variability amplification (AVA), and the 3D CNN model is retrained to increase the variance of the magnitude of neuron activations across the network in this step, facilitating pruning decisions across diverse CNN input scenarios. In the second step, called adaptive activation pruning (AAP), a lightweight activation controller network is trained to dynamically prune frames, channels, and features of 3D convolutional layers of the network (different for each layer), based on statistics of the outputs of the first layer of the network. Our method achieves substantial savings in multiply-accumulate (MAC) operations and memory accesses by introducing sparsity within convolutional layers. Hardware validation on the NVIDIA Jetson Nano GPU and the Qualcomm Snapdragon 8 Gen 1 platform demonstrates respective speedups of 1.37X and 2.22X, achieving up to 1.47X higher energy efficiency compared to the state of the art.
Paper Structure (7 sections, 1 theorem, 7 equations, 10 figures, 4 tables)

This paper contains 7 sections, 1 theorem, 7 equations, 10 figures, 4 tables.

Key Result

Lemma 1

On the probability simplex $\Delta^{D-1}$, the standard deviation $\operatorname{std}(\mathbf{x})$ and the Hoyer measure $H(\mathbf{x})$ are both strictly increasing functions of $\|\mathbf{x}\|_2$. Consequently, $\arg\max_{\mathbf{x} \in \Delta^{D-1}} \operatorname{std}(\mathbf{x}) = \arg\max_{\mat

Figures (10)

  • Figure 1: Illustration of 4D tensor pruning in intermediate convolution layers. Sparsity in the activation tensor is introduced at three levels by dynamic pruning: for frames (dotted in red), channels (shaded in green), and features (lined in black and white).
  • Figure 2: Overview of the Activation Variability Amplication (AVA) mechanism. Variance along the frame ($L_{FR}$), channel ($L_{CH}$) and feature ($L_{FE}$) dimensions are aggregated to determine the total variance $\sigma^2_{AVA}$, which is used as parameter in the loss function to train the model for boosting variance within activations.
  • Figure 3: Adaptive Activation Pruning (AAP): The AAP function is applied sequentially on frames, channels, and features (depicted from top to bottom) to produce a structured sparsity pattern in the activation tensor fed into the subsequent convolutional layer.
  • Figure 4: Overhead of the 3D CNN, AVA modules, and AAP controller
  • Figure 5: Visualization of frame, channel, and feature magnitude distribution before and after AVA
  • ...and 5 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof