Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
Rui Li, Tao Ouyang, Liekang Zeng, Guocheng Liao, Zhi Zhou, Xu Chen
TL;DR
This work tackles joint workload allocation and routing (JOWR) in Collaborative Edge Computing when task utilities are unknown. It models the problem as a Network Utility Maximization (NUM) framework and develops a cross-layer online optimization stack: a gradient-sampling based outer loop for workload allocation ($GS$-OMA) and a distributed online mirror descent routing inner loop (OMD-RT), plus a faster single-loop variant (OMAD). The authors prove concavity of the outer problem, convexity of the routing subproblem, and provide convergence guarantees with explicit rates, complemented by extensive simulations across realistic edge/topology scenarios showing faster convergence and lower overhead than baselines. The proposed online framework enables scalable, distributed control that adapts to unknown utilities, improving DNN inference efficiency and resource utilization in dynamic edge environments.
Abstract
Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining an open question for joint workload allocation and routing optimization problem from a system perspective. To this end, this paper presents a holistic, learned optimization for CEC towards maximizing the total network utility in an online manner, even though the utility functions of task input rates are unknown a priori. In particular, we characterize the CEC system in a flow model and formulate an online learning problem in a form of cross-layer optimization. We propose a nested-loop algorithm to solve workload allocation and distributed routing iteratively, using the tools of gradient sampling and online mirror descent. To improve the convergence rate over the nested-loop version, we further devise a single-loop algorithm. Rigorous analysis is provided to show its inherent convexity, efficient convergence, as well as algorithmic optimality. Finally, extensive numerical simulations demonstrate the superior performance of our solutions.
