Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference
Yaodan Xu, Sheng Zhou, Zhisheng Niu
TL;DR
This work tackles energy-efficient multi-user DNN inference in an edge environment by jointly optimizing device DVFS, offloading decisions, and edge batching. The authors introduce J-DOB, a low-complexity algorithm that enforces identical offloading points and greedy batching within groups of users, while using GPU frequency scaling to manage edge computation. The approach decomposes the problem into an outer grouping by deadlines and an inner optimization that yields near-optimal energy consumption under hard latency constraints, with per-user device frequencies and edge-batching decisions derived efficiently. Extensive experiments on MobileNetV2 demonstrate substantial energy savings (up to around 51% compared with local computing) across identical and varying deadlines, underscoring practical impact for energy-constrained edge deployments.
Abstract
With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these issues. In this study, we focus on a scenario where multiple mobile devices offload inference tasks to an edge server equipped with a graphics processing unit (GPU). For finer control over offloading and scheduling, inference tasks are partitioned into smaller sub-tasks. Additionally, GPU batch processing is employed to boost throughput and improve energy efficiency. This work investigates the problem of minimizing total energy consumption while meeting hard latency constraints. We propose a low-complexity Joint DVFS, Offloading, and Batching strategy (J-DOB) to solve this problem. The effectiveness of the proposed algorithm is validated through extensive experiments across varying user numbers and deadline constraints. Results show that J-DOB can reduce energy consumption by up to 51.30% and 45.27% under identical and different deadlines, respectively, compared to local computing.
