Edge AI as a Service with Coordinated Deep Neural Networks

Alireza Maleki; Hamed Shah-Mansouri; Babak H. Khalaj

Edge AI as a Service with Coordinated Deep Neural Networks

Alireza Maleki, Hamed Shah-Mansouri, Babak H. Khalaj

TL;DR

The paper addresses the challenge of scalable AI as a Service at the edge by enabling coordination among distributed DNN services. It introduces CoDE, which partitions models into blocks, freezes parameters, and creates learnable cross- and skip-connections to form new inference paths that can leverage host models without sharing full parameters; path selection is guided by a reward $F(p)=\xi(A^{av}_p)\zeta(Th_p)$ where $Th_p=Th_p^l+Th_p^h$ and $A^{av}_p$ is the path's average accuracy. A multi-stage optimization predicts path performance and trains the most promising path, reducing search complexity from $O(2^{|\mathcal{P}|})$. Experiments across AlexNet and MobileNet show up to a $40\%$ throughput increase with about a $2.3\%$ drop in accuracy, outperforming the edge-early method and demonstrating effective cross-architecture cooperation. The approach preserves model privacy and avoids replication, offering practical benefits for scalable edge AIaaS with heterogeneous DNNs.

Abstract

As artificial intelligence (AI) applications continue to expand in next-generation networks, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising for providing AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider that DNN service providers share their computing resources as well as their models' parameters and allow other DNNs to offload their computations without mirroring. We propose a novel algorithm called coordinated DNNs on edge (\textbf{CoDE}) that facilitates coordination among DNN services by establishing new inference paths. CoDE aims to find the optimal path, which is the path with the highest possible reward, by creating multi-task DNNs from individual models. The reward reflects the inference throughput and model accuracy. With CoDE, DNN models can make new paths for inference by using their own or other models' parameters. We then evaluate the performance of CoDE through numerical experiments. The results demonstrate a $40\%$ increase in the inference throughput while degrading the average accuracy by only $2.3\%$. Experiments show that CoDE enhances the inference throughput and, achieves higher precision compared to a state-of-the-art existing method.

Edge AI as a Service with Coordinated Deep Neural Networks

TL;DR

where

and

is the path's average accuracy. A multi-stage optimization predicts path performance and trains the most promising path, reducing search complexity from

. Experiments across AlexNet and MobileNet show up to a

throughput increase with about a

drop in accuracy, outperforming the edge-early method and demonstrating effective cross-architecture cooperation. The approach preserves model privacy and avoids replication, offering practical benefits for scalable edge AIaaS with heterogeneous DNNs.

Abstract

increase in the inference throughput while degrading the average accuracy by only

. Experiments show that CoDE enhances the inference throughput and, achieves higher precision compared to a state-of-the-art existing method.

Paper Structure (12 sections, 5 equations, 8 figures, 1 algorithm)

This paper contains 12 sections, 5 equations, 8 figures, 1 algorithm.

Introduction
Related Work
Motivation and Contributions
Coordinated DNN on Edge (CoDE)
System Model
Linking Blocks
Experiments
Experiment 1: AlexNet - AlexNet
Experiment 2: AlexNet - Skip-connection
Experiment 3: AlexNet - MobileNet
Experiment 4: Selecting paths
Conclusion

Figures (8)

Figure 1: A SP provides one or multiple DNN services on its server, where each service offers one DNN application (i.e., model).
Figure 2: (a) We consider that SP1 provides its DNN model (i.e., APP1) on server 1. We divide it into a number of manageable blocks. By freezing the model's parameters, SP1 can keep its model integrity through any further training. (b) In this scenario, SP1 and SP2 provide their services on server1 and server2, respectively. SP1 aims to offload its tasks to server2, and SP2 generates the relative links (i.e., small NN modules) between its blocks. SP1 does not add any links unless it uses skip-connections.
Figure 3: The host service reserves $s$ samples of its batch. The total throughput is the sum of the local and host throughput (i.e., $Th_p =Th_p^l + Th_p^h$).
Figure 4: A sample of a partitioned AlexNet model with $N=6$.
Figure 5: (a) The accuracy of generated paths. We set $lout_p=0$ and $hin_p=1$. The local model is AlexNet, and it is optimized for CIFAR-10, and its accuracy is $86.7\%$. The host model is also AlexNet, which is optimized for Image-net and Food-101. We also measure the performance of a random model to compare with pre-trained models. (b) The number of parameters when we add a new path. The number of parameters associated with the links is relatively low, but the number of the local-skipped and host-added parameters (sum of the links' and host's parameters) are higher and change according to the paths. (c) These DNN models show the related structure for each connection.
...and 3 more figures

Edge AI as a Service with Coordinated Deep Neural Networks

TL;DR

Abstract

Edge AI as a Service with Coordinated Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)