Table of Contents
Fetching ...

E4: Energy-Efficient DNN Inference for Edge Video Analytics Via Early-Exit and DVFS

Ziyang Zhang, Yang Zhao, Ming-Ching Chang, Changyao Lin, Jie Liu

TL;DR

E4 tackles the energy efficiency bottleneck of DNN inference for edge video analytics by coupling an attention-guided early-exit mechanism with a Just-In-Time (JIT) profiler that uses coordinate descent to co-optimize per-layer CPU/GPU frequencies ahead of exit points. The approach accounts for varying video frame complexity and DNN model diversity, enabling dynamic, frame-aware exit decisions and fine-grained power management. Across five edge devices and two DNN backbones on ActivityNet-v1.3 and Mini-Kinetics, E4 achieves up to 2.8× speedup and up to 26% average energy savings while maintaining accuracy, with notable memory reductions and stronger gains on higher-end hardware. This work demonstrates that integrating frame-aware early exits with fine-grained DVFS, guided by an efficient JIT profiler, can substantially improve practical edge inference for video analytics.

Abstract

Deep neural network (DNN) models are increasingly popular in edge video analytic applications. However, the compute-intensive nature of DNN models pose challenges for energy-efficient inference on resource-constrained edge devices. Most existing solutions focus on optimizing DNN inference latency and accuracy, often overlooking energy efficiency. They also fail to account for the varying complexity of video frames, leading to sub-optimal performance in edge video analytics. In this paper, we propose an Energy-Efficient Early-Exit (E4) framework that enhances DNN inference efficiency for edge video analytics by integrating a novel early-exit mechanism with dynamic voltage and frequency scaling (DVFS) governors. It employs an attention-based cascade module to analyze video frame diversity and automatically determine optimal DNN exit points. Additionally, E4 features a just-in-time (JIT) profiler that uses coordinate descent search to co-optimize CPU and GPU clock frequencies for each layer before the DNN exit points. Extensive evaluations demonstrate that E4 outperforms current state-of-the-art methods, achieving up to 2.8x speedup and 26% average energy saving while maintaining high accuracy.

E4: Energy-Efficient DNN Inference for Edge Video Analytics Via Early-Exit and DVFS

TL;DR

E4 tackles the energy efficiency bottleneck of DNN inference for edge video analytics by coupling an attention-guided early-exit mechanism with a Just-In-Time (JIT) profiler that uses coordinate descent to co-optimize per-layer CPU/GPU frequencies ahead of exit points. The approach accounts for varying video frame complexity and DNN model diversity, enabling dynamic, frame-aware exit decisions and fine-grained power management. Across five edge devices and two DNN backbones on ActivityNet-v1.3 and Mini-Kinetics, E4 achieves up to 2.8× speedup and up to 26% average energy savings while maintaining accuracy, with notable memory reductions and stronger gains on higher-end hardware. This work demonstrates that integrating frame-aware early exits with fine-grained DVFS, guided by an efficient JIT profiler, can substantially improve practical edge inference for video analytics.

Abstract

Deep neural network (DNN) models are increasingly popular in edge video analytic applications. However, the compute-intensive nature of DNN models pose challenges for energy-efficient inference on resource-constrained edge devices. Most existing solutions focus on optimizing DNN inference latency and accuracy, often overlooking energy efficiency. They also fail to account for the varying complexity of video frames, leading to sub-optimal performance in edge video analytics. In this paper, we propose an Energy-Efficient Early-Exit (E4) framework that enhances DNN inference efficiency for edge video analytics by integrating a novel early-exit mechanism with dynamic voltage and frequency scaling (DVFS) governors. It employs an attention-based cascade module to analyze video frame diversity and automatically determine optimal DNN exit points. Additionally, E4 features a just-in-time (JIT) profiler that uses coordinate descent search to co-optimize CPU and GPU clock frequencies for each layer before the DNN exit points. Extensive evaluations demonstrate that E4 outperforms current state-of-the-art methods, achieving up to 2.8x speedup and 26% average energy saving while maintaining high accuracy.

Paper Structure

This paper contains 14 sections, 8 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The impact of CPU and GPU clock frequencies on (a) inference latency (ms) and (b) energy consumption (W), based on running EfficientNet-B0 on an Nvidia Xavier NX edge GPU with 8GB DRAM.
  • Figure 2: Overview of the proposed E4 efficient edge DNN video analytic inference framework. Given a video input, we sample $T$ frames with varying complexities, such as different numbers of detectable objects. The feature extractor processes each frame and aggregates these features to assess video frame complexity. An attention module and its corresponding gate are trained to determine DNN early exit points. The Just-In-Time (JIT) Profiler and DVFS Governor are then employed to search and scale CPU and GPU clock frequencies for each layer before the DNN exit points.
  • Figure 3: Comparison of energy consumption and inference latency for E4vs. baseline approaches (EENet, zTT, and Ring).
  • Figure 4: Effect of input frame rate vs. inference accuracy on (a,b) ActivityNet-v1.3 and (c,d) Mini-Kinetics datasets.
  • Figure 5: Comparison of memory usage between E4 and other approaches using two DNN models for edge video analysis on the ActivityNet-v1.3 and Mini-Kinetics datasets.