Table of Contents
Fetching ...

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

Motahare Mounesan, Xiaojie Zhang, Saptarshi Debroy

TL;DR

Infer-EDGE addresses the challenging problem of balancing latency, accuracy, and device energy for just-in-time edge-DNN inference. It formulates the optimization as an MDP and solves it with an Advantage Actor-Critic (A2C) RL agent that selects both DNN versions and partition cut-points for multiple UAV devices collaborating with a limited-edge server. The framework is benchmarked on real DNNs and a hardware testbed, demonstrating substantial latency and energy improvements (up to 77% and 92%, respectively) while preserving accuracy. These results suggest that adaptive, multi-version, partition-aware edge inference can enable reliable, energy-efficient real-time AI in resource-constrained, dynamic environments such as UAV-enabled missions.

Abstract

Balancing mutually diverging performance metrics, such as end-to-end latency, accuracy, and device energy consumption, is a challenging undertaking for deep neural network (DNN) inference in Just-in-Time edge environments that are inherently resource-constrained and loosely coupled. In this paper, we design and develop the Infer-EDGE framework that seeks to strike such a balance for latency-sensitive video processing applications. First, using comprehensive benchmarking experiments, we develop intuitions about the trade-off characteristics, which are then used by the framework to develop an Advantage Actor-Critic (A2C) Reinforcement Learning (RL) approach that can choose optimal run-time DNN inference parameters aligning the performance metrics based on the application requirements. Using real-world DNNs and a hardware testbed, we evaluate the benefits of the Infer-EDGE framework in terms of device energy savings, inference accuracy improvement, and end-to-end inference latency reduction.

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

TL;DR

Infer-EDGE addresses the challenging problem of balancing latency, accuracy, and device energy for just-in-time edge-DNN inference. It formulates the optimization as an MDP and solves it with an Advantage Actor-Critic (A2C) RL agent that selects both DNN versions and partition cut-points for multiple UAV devices collaborating with a limited-edge server. The framework is benchmarked on real DNNs and a hardware testbed, demonstrating substantial latency and energy improvements (up to 77% and 92%, respectively) while preserving accuracy. These results suggest that adaptive, multi-version, partition-aware edge inference can enable reliable, energy-efficient real-time AI in resource-constrained, dynamic environments such as UAV-enabled missions.

Abstract

Balancing mutually diverging performance metrics, such as end-to-end latency, accuracy, and device energy consumption, is a challenging undertaking for deep neural network (DNN) inference in Just-in-Time edge environments that are inherently resource-constrained and loosely coupled. In this paper, we design and develop the Infer-EDGE framework that seeks to strike such a balance for latency-sensitive video processing applications. First, using comprehensive benchmarking experiments, we develop intuitions about the trade-off characteristics, which are then used by the framework to develop an Advantage Actor-Critic (A2C) Reinforcement Learning (RL) approach that can choose optimal run-time DNN inference parameters aligning the performance metrics based on the application requirements. Using real-world DNNs and a hardware testbed, we evaluate the benefits of the Infer-EDGE framework in terms of device energy savings, inference accuracy improvement, and end-to-end inference latency reduction.

Paper Structure

This paper contains 25 sections, 8 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Layer-wise latency, output data size, and energy consumption comparisons of different versions of VGG
  • Figure 2: End-to-End latency comparison of different versions of VGG model
  • Figure 3: Energy consumption comparison of different versions of VGG model
  • Figure 4: 'Just-in-time' edge-AI system model for Infer-EDGE framework
  • Figure 5: Infer-EDGE framework with centralized controller implementing the proposed A2C algorithm
  • ...and 6 more figures