Table of Contents
Fetching ...

Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint

Minh David Thao Chan, Ruoyu Zhao, Yukuan Jia, Ruiqing Mao, Sheng Zhou

TL;DR

This work tackles the energy and latency challenges of deploying CNNs on resource-limited devices by jointly applying Proportional Layer Skipping (PLS) and Frequency Scaling (FS). PLS partitions a network into groups and skips a proportion r of layers via a gating mechanism, while FS tunes processor frequency to balance energy and latency, aiming to minimize the Energy-Delay Product (EDP). Experiments on ResNet-152 with CIFAR-10 show substantial energy savings and latency reductions with modest accuracy loss, validated on both GPU and CPU hardware and analyzed through ablation and hardware-specific studies. The findings demonstrate practical pathways for energy-efficient real-time CNN inference across heterogeneous platforms, with clear guidance on the trade-offs between skipping, frequency scaling, accuracy, and latency.

Abstract

The energy consumption of Convolutional Neural Networks (CNNs) is a critical factor in deploying deep learning models on resource-limited equipment such as mobile devices and autonomous vehicles. We propose an approach involving Proportional Layer Skipping (PLS) and Frequency Scaling (FS). Layer skipping reduces computational complexity by selectively bypassing network layers, whereas frequency scaling adjusts the frequency of the processor to optimize energy use under latency constraints. Experiments of PLS and FS on ResNet-152 with the CIFAR-10 dataset demonstrated significant reductions in computational demands and energy consumption with minimal accuracy loss. This study offers practical solutions for improving real-time processing in resource-limited settings and provides insights into balancing computational efficiency and model performance.

Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint

TL;DR

This work tackles the energy and latency challenges of deploying CNNs on resource-limited devices by jointly applying Proportional Layer Skipping (PLS) and Frequency Scaling (FS). PLS partitions a network into groups and skips a proportion r of layers via a gating mechanism, while FS tunes processor frequency to balance energy and latency, aiming to minimize the Energy-Delay Product (EDP). Experiments on ResNet-152 with CIFAR-10 show substantial energy savings and latency reductions with modest accuracy loss, validated on both GPU and CPU hardware and analyzed through ablation and hardware-specific studies. The findings demonstrate practical pathways for energy-efficient real-time CNN inference across heterogeneous platforms, with clear guidance on the trade-offs between skipping, frequency scaling, accuracy, and latency.

Abstract

The energy consumption of Convolutional Neural Networks (CNNs) is a critical factor in deploying deep learning models on resource-limited equipment such as mobile devices and autonomous vehicles. We propose an approach involving Proportional Layer Skipping (PLS) and Frequency Scaling (FS). Layer skipping reduces computational complexity by selectively bypassing network layers, whereas frequency scaling adjusts the frequency of the processor to optimize energy use under latency constraints. Experiments of PLS and FS on ResNet-152 with the CIFAR-10 dataset demonstrated significant reductions in computational demands and energy consumption with minimal accuracy loss. This study offers practical solutions for improving real-time processing in resource-limited settings and provides insights into balancing computational efficiency and model performance.

Paper Structure

This paper contains 11 sections, 6 equations, 7 figures, 1 table, 2 algorithms.

Figures (7)

  • Figure 1: Architectures of AlexNet (a) and ResNet (b)
  • Figure 2: ResNet Architecture Visualization with Residual Connections and Layer Skipping: Displays the ResNet architecture with residual connections, where green spheres represent the skip connections. Layer skipping is shown, where $r = 0.6$, indicating the proportion of layers remaining. The transition through convolutional layers (orange) to global average pooling (red) before reaching the output layer (grey) is highlighted.
  • Figure 3: Impact of Layer Skipping and Dataset Size on ResNet-152: (a) Accuracy vs. Parameters: Shows accuracy loss with reduced parameters due to layer skipping. (b) Accuracy vs. Inference Time: Illustrates trade-off between faster inference and accuracy. (c) Inference Time vs. Complexity: Linear relationship between reduced complexity and faster inference.
  • Figure 4: Impact of Frequency Scaling and Layer Skipping on GPU Performance. (a) Trade-off between Accuracy Loss and Energy per Frame. (b) Trade-off between Accuracy Loss and Inference Time per Frame. (c) Relationship between Energy per Frame and Inference Time per Frame under Frequency Scaling. (d) Energy-Delay Product Analysis across Different Layer Skipping Ratios.
  • Figure 5: CPU Analysis of frequency scaling and layer skipping effects. (a) Accuracy Loss vs. Energy per Frame. (b) Accuracy Loss vs. Inference Time per Frame. (c) Energy per Frame vs. Inference Time per Frame across CPU frequencies. (d) Energy per Frame vs. Inference Time per Frame for different layer skipping ratios.
  • ...and 2 more figures