Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

Motahare Mounesan; Xiaojie Zhang; Saptarshi Debroy

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

Motahare Mounesan, Xiaojie Zhang, Saptarshi Debroy

TL;DR

Infer-EDGE addresses the challenging problem of balancing latency, accuracy, and device energy for just-in-time edge-DNN inference. It formulates the optimization as an MDP and solves it with an Advantage Actor-Critic (A2C) RL agent that selects both DNN versions and partition cut-points for multiple UAV devices collaborating with a limited-edge server. The framework is benchmarked on real DNNs and a hardware testbed, demonstrating substantial latency and energy improvements (up to 77% and 92%, respectively) while preserving accuracy. These results suggest that adaptive, multi-version, partition-aware edge inference can enable reliable, energy-efficient real-time AI in resource-constrained, dynamic environments such as UAV-enabled missions.

Abstract

Balancing mutually diverging performance metrics, such as end-to-end latency, accuracy, and device energy consumption, is a challenging undertaking for deep neural network (DNN) inference in Just-in-Time edge environments that are inherently resource-constrained and loosely coupled. In this paper, we design and develop the Infer-EDGE framework that seeks to strike such a balance for latency-sensitive video processing applications. First, using comprehensive benchmarking experiments, we develop intuitions about the trade-off characteristics, which are then used by the framework to develop an Advantage Actor-Critic (A2C) Reinforcement Learning (RL) approach that can choose optimal run-time DNN inference parameters aligning the performance metrics based on the application requirements. Using real-world DNNs and a hardware testbed, we evaluate the benefits of the Infer-EDGE framework in terms of device energy savings, inference accuracy improvement, and end-to-end inference latency reduction.

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

TL;DR

Abstract

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)