Table of Contents
Fetching ...

Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference

Yaodan Xu, Sheng Zhou, Zhisheng Niu

TL;DR

This work tackles energy-efficient multi-user DNN inference in an edge environment by jointly optimizing device DVFS, offloading decisions, and edge batching. The authors introduce J-DOB, a low-complexity algorithm that enforces identical offloading points and greedy batching within groups of users, while using GPU frequency scaling to manage edge computation. The approach decomposes the problem into an outer grouping by deadlines and an inner optimization that yields near-optimal energy consumption under hard latency constraints, with per-user device frequencies and edge-batching decisions derived efficiently. Extensive experiments on MobileNetV2 demonstrate substantial energy savings (up to around 51% compared with local computing) across identical and varying deadlines, underscoring practical impact for energy-constrained edge deployments.

Abstract

With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these issues. In this study, we focus on a scenario where multiple mobile devices offload inference tasks to an edge server equipped with a graphics processing unit (GPU). For finer control over offloading and scheduling, inference tasks are partitioned into smaller sub-tasks. Additionally, GPU batch processing is employed to boost throughput and improve energy efficiency. This work investigates the problem of minimizing total energy consumption while meeting hard latency constraints. We propose a low-complexity Joint DVFS, Offloading, and Batching strategy (J-DOB) to solve this problem. The effectiveness of the proposed algorithm is validated through extensive experiments across varying user numbers and deadline constraints. Results show that J-DOB can reduce energy consumption by up to 51.30% and 45.27% under identical and different deadlines, respectively, compared to local computing.

Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference

TL;DR

This work tackles energy-efficient multi-user DNN inference in an edge environment by jointly optimizing device DVFS, offloading decisions, and edge batching. The authors introduce J-DOB, a low-complexity algorithm that enforces identical offloading points and greedy batching within groups of users, while using GPU frequency scaling to manage edge computation. The approach decomposes the problem into an outer grouping by deadlines and an inner optimization that yields near-optimal energy consumption under hard latency constraints, with per-user device frequencies and edge-batching decisions derived efficiently. Extensive experiments on MobileNetV2 demonstrate substantial energy savings (up to around 51% compared with local computing) across identical and varying deadlines, underscoring practical impact for energy-constrained edge deployments.

Abstract

With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these issues. In this study, we focus on a scenario where multiple mobile devices offload inference tasks to an edge server equipped with a graphics processing unit (GPU). For finer control over offloading and scheduling, inference tasks are partitioned into smaller sub-tasks. Additionally, GPU batch processing is employed to boost throughput and improve energy efficiency. This work investigates the problem of minimizing total energy consumption while meeting hard latency constraints. We propose a low-complexity Joint DVFS, Offloading, and Batching strategy (J-DOB) to solve this problem. The effectiveness of the proposed algorithm is validated through extensive experiments across varying user numbers and deadline constraints. Results show that J-DOB can reduce energy consumption by up to 51.30% and 45.27% under identical and different deadlines, respectively, compared to local computing.

Paper Structure

This paper contains 13 sections, 11 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: A depiction of edge-device co-inference: The DNN inference task consists of a sequence of sub-tasks. Device 1 offloads sub-task 2 and 3 to the edge, while device 2 offloads sub-task 3. The edge server processes sub-task 2 with a batch size of one and sub-task 3 with a batch size of two.
  • Figure 2: The architecture and partition points of MobileNetV2 used in experiments. Conv, B, and CLS are the abbreviations for convolution module, bottleneck module, and classification module. The architecture of the seventh bottleneck module of MobileNetV2 is illustrated, while the details of other modules can be found in the original paper sandler2018mobilenetv2. The shape of the output data of each sub-task is also demonstrated.
  • Figure 3: Profiling results for inference latency and energy consumption of MobileNetV2 w.r.t. batch size on an NVIDIA RTX30903090.
  • Figure 4: Average energy consumption per user v.s. the number of users under identical deadlines. Fig. \ref{['o_b']} (a) and (b) show the results for different identical deadline values, corresponding to $\beta=2.13$ and $\beta =30.25$, respectively.
  • Figure 5: Average energy consumption per user v.s. the range of $\beta$ (corresponding to the range of deadlines). Fig. \ref{['ee_b']} (a) and (b) show the results under $M=10$ and $M=20$, respectively.