Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

Seyed Nima Omidsajedi; Rekha Reddy; Jianming Yi; Jan Herbst; Christoph Lipps; Hans Dieter Schotten

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

Seyed Nima Omidsajedi, Rekha Reddy, Jianming Yi, Jan Herbst, Christoph Lipps, Hans Dieter Schotten

TL;DR

This work tackles latency-sensitive AI at the mobile edge by implementing a DNN accelerator on an embedded FPGA MPSoC (ZCU102) using the Xilinx DPU IP and a carefully quantified edge workflow (quantization, xmodel generation, and DMA-mediated data transfer) to run ResNet50 on-device. It provides a detailed edge-versus-cloud evaluation, showing that on-device edge inference achieves substantially lower latency and better energy efficiency than cloud GPU inference, while cloud setups deliver higher raw throughput at much higher power. The study demonstrates the feasibility and benefits of edge AI on MPSoC-FPGAs for real-time applications, and suggests future directions including multi-DNN co-implementation on MPSoCs and leveraging newer AI engines such as Versal. The findings support edge-centric architectures for latency-constrained scenarios in 6G, autonomous systems, and other bandwidth-constrained environments, informing future hardware-software co-design for AI at the edge.

Abstract

Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. Therefore, in this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Utilizing Field Programmable Gate Array (FPGA) based solutions at the edge will lead to bandwidth-optimized designs and as a consequence can boost the computational effectiveness at a system-level deadline. Moreover, various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices (using Xilinx Multiprocessor System on Chip (MPSoC)) and Cloud are discussed throughout this research. The main goal of this work is to demonstrate a hybrid system that uses the deep learning programmable engine developed by Xilinx Inc. as the main component of the hardware accelerator. Then based on this design, an efficient system for mobile edge computing is represented by utilizing an embedded solution.

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

TL;DR

Abstract

Paper Structure (12 sections, 4 figures, 4 tables)

This paper contains 12 sections, 4 figures, 4 tables.

Introduction: Edge computing platforms
Related works: Using Embedded FPGAs as Edge computing systems
Proposed system: Hardware accelerator on embedded FPGA for DNNs
Performance results of DNN: On Edge versus on Cloud
Deep Neural Network Implementation over Edge
Physical Infrastructure
DNN Implementation over Edge device
Deep Neural Network Implementation over Cloud
Physical Infrastructure
DNN Implementation over GPU Cluster
Comparison results of DNN Implementation over Edge versus GPU Cluster
Conclusion & Future Work

Figures (4)

Figure 1: Overall workflow for implementing NNs on the target SoC/MPSoC board
Figure 2: The hardware accelerator design for implementing a DNN on the target MPSoC
Figure 3: Latency comparison between the Edge and Cloud implementations
Figure 4: Latency sources in Cloud implementations

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

TL;DR

Abstract

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

Authors

TL;DR

Abstract

Table of Contents

Figures (4)