Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Rui Li; Tao Ouyang; Liekang Zeng; Guocheng Liao; Zhi Zhou; Xu Chen

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Rui Li, Tao Ouyang, Liekang Zeng, Guocheng Liao, Zhi Zhou, Xu Chen

TL;DR

This work tackles joint workload allocation and routing (JOWR) in Collaborative Edge Computing when task utilities are unknown. It models the problem as a Network Utility Maximization (NUM) framework and develops a cross-layer online optimization stack: a gradient-sampling based outer loop for workload allocation ($GS$-OMA) and a distributed online mirror descent routing inner loop (OMD-RT), plus a faster single-loop variant (OMAD). The authors prove concavity of the outer problem, convexity of the routing subproblem, and provide convergence guarantees with explicit rates, complemented by extensive simulations across realistic edge/topology scenarios showing faster convergence and lower overhead than baselines. The proposed online framework enables scalable, distributed control that adapts to unknown utilities, improving DNN inference efficiency and resource utilization in dynamic edge environments.

Abstract

Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining an open question for joint workload allocation and routing optimization problem from a system perspective. To this end, this paper presents a holistic, learned optimization for CEC towards maximizing the total network utility in an online manner, even though the utility functions of task input rates are unknown a priori. In particular, we characterize the CEC system in a flow model and formulate an online learning problem in a form of cross-layer optimization. We propose a nested-loop algorithm to solve workload allocation and distributed routing iteratively, using the tools of gradient sampling and online mirror descent. To improve the convergence rate over the nested-loop version, we further devise a single-loop algorithm. Rigorous analysis is provided to show its inherent convexity, efficient convergence, as well as algorithmic optimality. Finally, extensive numerical simulations demonstrate the superior performance of our solutions.

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

TL;DR

-OMA) and a distributed online mirror descent routing inner loop (OMD-RT), plus a faster single-loop variant (OMAD). The authors prove concavity of the outer problem, convexity of the routing subproblem, and provide convergence guarantees with explicit rates, complemented by extensive simulations across realistic edge/topology scenarios showing faster convergence and lower overhead than baselines. The proposed online framework enables scalable, distributed control that adapts to unknown utilities, improving DNN inference efficiency and resource utilization in dynamic edge environments.

Abstract

Paper Structure (21 sections, 5 theorems, 71 equations, 15 figures, 2 tables, 3 algorithms)

This paper contains 21 sections, 5 theorems, 71 equations, 15 figures, 2 tables, 3 algorithms.

Introduction
Problem Formulation
Network and Computation Models
Inference Task Utility Model
Traffic Model
Network Cost Model
Optimization Problem
Cross-layer Online Optimization algorithms
Optimal Workload Allocation Algorithm
Optimal Distributed Routing Algorithm
Improving Convergence Rate for JOWR
Performance Evaluation
Related work
Conclusion
Appendix
...and 6 more sections

Key Result

Theorem 1

If Assumption 4 holds, problem $\mathcal{P}1$ in i has a unique solution $\Lambda^{*}$. The necessary and sufficient condition of optimality is $\frac{\partial U}{\partial \lambda_1^{*}} = \cdots = \frac{\partial U}{\partial \lambda_w^{*}} = \cdots = \frac{\partial U}{\partial \lambda_W^{*}} = \alpha^{*}$, where $\alpha^{

Figures (15)

Figure 1: Illustration of CEC system topology. The edge devices are interconnected via LANs, and each device deploys a given version of DNN model to perform a specific computing task. These devices can collaborate their computation and communication resources to compute a specific application in order to achieve total network utility maximization.
Figure 2: Session 1,2,3 with rates $\lambda_1, \lambda_2, \lambda_3$ all originate at the virtual source node $S$ with destinations to virtual nodes $D_1, D_2, D_3$, respectively. Node $i$ routes session 2 to $D_2$ through $j$ and $k$, and routes session 3 to $D_3$ through $j,k$ and $l$. The total flow rate on link $(i,j)$ is the sum of flow rate of all sessions passing through it. Finally, the total incoming rate of each virtual node $D_w$ must be equal to the session rate $\lambda_w$.
Figure 3: Abilene Topology.
Figure 4: Tree Topology.
Figure 5: Fog Topology.
...and 10 more figures

Theorems & Definitions (9)

Theorem 1
Remark 1
Remark 2
Theorem 2
Theorem 3
Remark 3
Remark 4
Theorem 4
Theorem 5

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

TL;DR

Abstract

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (9)