Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint
Minh David Thao Chan, Ruoyu Zhao, Yukuan Jia, Ruiqing Mao, Sheng Zhou
TL;DR
This work tackles the energy and latency challenges of deploying CNNs on resource-limited devices by jointly applying Proportional Layer Skipping (PLS) and Frequency Scaling (FS). PLS partitions a network into groups and skips a proportion r of layers via a gating mechanism, while FS tunes processor frequency to balance energy and latency, aiming to minimize the Energy-Delay Product (EDP). Experiments on ResNet-152 with CIFAR-10 show substantial energy savings and latency reductions with modest accuracy loss, validated on both GPU and CPU hardware and analyzed through ablation and hardware-specific studies. The findings demonstrate practical pathways for energy-efficient real-time CNN inference across heterogeneous platforms, with clear guidance on the trade-offs between skipping, frequency scaling, accuracy, and latency.
Abstract
The energy consumption of Convolutional Neural Networks (CNNs) is a critical factor in deploying deep learning models on resource-limited equipment such as mobile devices and autonomous vehicles. We propose an approach involving Proportional Layer Skipping (PLS) and Frequency Scaling (FS). Layer skipping reduces computational complexity by selectively bypassing network layers, whereas frequency scaling adjusts the frequency of the processor to optimize energy use under latency constraints. Experiments of PLS and FS on ResNet-152 with the CIFAR-10 dataset demonstrated significant reductions in computational demands and energy consumption with minimal accuracy loss. This study offers practical solutions for improving real-time processing in resource-limited settings and provides insights into balancing computational efficiency and model performance.
