Table of Contents
Fetching ...

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

Seyed Nima Omidsajedi, Rekha Reddy, Jianming Yi, Jan Herbst, Christoph Lipps, Hans Dieter Schotten

TL;DR

This work tackles latency-sensitive AI at the mobile edge by implementing a DNN accelerator on an embedded FPGA MPSoC (ZCU102) using the Xilinx DPU IP and a carefully quantified edge workflow (quantization, xmodel generation, and DMA-mediated data transfer) to run ResNet50 on-device. It provides a detailed edge-versus-cloud evaluation, showing that on-device edge inference achieves substantially lower latency and better energy efficiency than cloud GPU inference, while cloud setups deliver higher raw throughput at much higher power. The study demonstrates the feasibility and benefits of edge AI on MPSoC-FPGAs for real-time applications, and suggests future directions including multi-DNN co-implementation on MPSoCs and leveraging newer AI engines such as Versal. The findings support edge-centric architectures for latency-constrained scenarios in 6G, autonomous systems, and other bandwidth-constrained environments, informing future hardware-software co-design for AI at the edge.

Abstract

Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. Therefore, in this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Utilizing Field Programmable Gate Array (FPGA) based solutions at the edge will lead to bandwidth-optimized designs and as a consequence can boost the computational effectiveness at a system-level deadline. Moreover, various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices (using Xilinx Multiprocessor System on Chip (MPSoC)) and Cloud are discussed throughout this research. The main goal of this work is to demonstrate a hybrid system that uses the deep learning programmable engine developed by Xilinx Inc. as the main component of the hardware accelerator. Then based on this design, an efficient system for mobile edge computing is represented by utilizing an embedded solution.

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

TL;DR

This work tackles latency-sensitive AI at the mobile edge by implementing a DNN accelerator on an embedded FPGA MPSoC (ZCU102) using the Xilinx DPU IP and a carefully quantified edge workflow (quantization, xmodel generation, and DMA-mediated data transfer) to run ResNet50 on-device. It provides a detailed edge-versus-cloud evaluation, showing that on-device edge inference achieves substantially lower latency and better energy efficiency than cloud GPU inference, while cloud setups deliver higher raw throughput at much higher power. The study demonstrates the feasibility and benefits of edge AI on MPSoC-FPGAs for real-time applications, and suggests future directions including multi-DNN co-implementation on MPSoCs and leveraging newer AI engines such as Versal. The findings support edge-centric architectures for latency-constrained scenarios in 6G, autonomous systems, and other bandwidth-constrained environments, informing future hardware-software co-design for AI at the edge.

Abstract

Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. Therefore, in this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Utilizing Field Programmable Gate Array (FPGA) based solutions at the edge will lead to bandwidth-optimized designs and as a consequence can boost the computational effectiveness at a system-level deadline. Moreover, various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices (using Xilinx Multiprocessor System on Chip (MPSoC)) and Cloud are discussed throughout this research. The main goal of this work is to demonstrate a hybrid system that uses the deep learning programmable engine developed by Xilinx Inc. as the main component of the hardware accelerator. Then based on this design, an efficient system for mobile edge computing is represented by utilizing an embedded solution.
Paper Structure (12 sections, 4 figures, 4 tables)

This paper contains 12 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overall workflow for implementing NNs on the target SoC/MPSoC board
  • Figure 2: The hardware accelerator design for implementing a DNN on the target MPSoC
  • Figure 3: Latency comparison between the Edge and Cloud implementations
  • Figure 4: Latency sources in Cloud implementations