Table of Contents
Fetching ...

DynaSplit: A Hardware-Software Co-Design Framework for Energy-Aware Inference on Edge

Daniel May, Alessandro Tundo, Shashikant Ilager, Ivona Brandic

TL;DR

DynaSplit is proposed, a two-phase framework that dynamically configures parameters across both software and hardware parameters, and shows a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.

Abstract

The deployment of ML models on edge devices is challenged by limited computational resources and energy availability. While split computing enables the decomposition of large neural networks (NNs) and allows partial computation on both edge and cloud devices, identifying the most suitable split layer and hardware configurations is a non-trivial task. This process is in fact hindered by the large configuration space, the non-linear dependencies between software and hardware parameters, the heterogeneous hardware and energy characteristics, and the dynamic workload conditions. To overcome this challenge, we propose DynaSplit, a two-phase framework that dynamically configures parameters across both software (i.e., split layer) and hardware (e.g., accelerator usage, CPU frequency). During the Offline Phase, we solve a multi-objective optimization problem with a meta-heuristic approach to discover optimal settings. During the Online Phase, a scheduling algorithm identifies the most suitable settings for an incoming inference request and configures the system accordingly. We evaluate DynaSplit using popular pre-trained NNs on a real-world testbed. Experimental results show a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.

DynaSplit: A Hardware-Software Co-Design Framework for Energy-Aware Inference on Edge

TL;DR

DynaSplit is proposed, a two-phase framework that dynamically configures parameters across both software and hardware parameters, and shows a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.

Abstract

The deployment of ML models on edge devices is challenged by limited computational resources and energy availability. While split computing enables the decomposition of large neural networks (NNs) and allows partial computation on both edge and cloud devices, identifying the most suitable split layer and hardware configurations is a non-trivial task. This process is in fact hindered by the large configuration space, the non-linear dependencies between software and hardware parameters, the heterogeneous hardware and energy characteristics, and the dynamic workload conditions. To overcome this challenge, we propose DynaSplit, a two-phase framework that dynamically configures parameters across both software (i.e., split layer) and hardware (e.g., accelerator usage, CPU frequency). During the Offline Phase, we solve a multi-objective optimization problem with a meta-heuristic approach to discover optimal settings. During the Online Phase, a scheduling algorithm identifies the most suitable settings for an incoming inference request and configures the system accordingly. We evaluate DynaSplit using popular pre-trained NNs on a real-world testbed. Experimental results show a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.

Paper Structure

This paper contains 40 sections, 3 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: The illustration depicts our motivational scenario in which users send inference requests to an edge application capable of splitting inference between edge and cloud.
  • Figure 2: Impact of different configuration parameters on inference latency, energy consumption, and accuracy for the VGG16 network vgg. The reported results are averaged over 1,000 inferences.
  • Figure 3: An overview of the DynaSplit framework.
  • Figure 4: The testbed used to run the empirical evaluation.
  • Figure 5: Inference time request distributions for VGG16 and ViT networks.
  • ...and 10 more figures