Table of Contents
Fetching ...

Orchestrating Multimodal DNN Workloads in Wireless Neural Processing

Sai Xu, Kai-Kit Wong, Yanan Du, Hyundong Shin

TL;DR

Simulation results demonstrate that PACS significantly outperforms RTFS under high modality heterogeneity by better masking wireless latency by better masking wireless latency through communication-computation overlap, thereby highlighting the effectiveness of communication-computation pipelining in accelerating multimodal DNN execution in WNP.

Abstract

In edge inference, wireless resource allocation and accelerator-level deep neural network (DNN) scheduling have yet to be co-optimized in an end-to-end manner. The lack of coordination between wireless transmission and accelerator-level DNN execution prevents efficient overlap, leading to higher end-to-end inference latency. To address this issue, this paper investigates multimodal DNN workload orchestration in wireless neural processing (WNP), a paradigm that integrates wireless transmission and multi-core accelerator execution into a unified end-to-end pipeline. First, we develop a unified communication-computation model for multimodal DNN execution and formulate the corresponding optimization problem. Second, we propose O-WiN, a framework that orchestrates DNN workloads in WNP through two tightly coupled stages: simulation-based optimization and runtime execution. Third, we develop two algorithms, RTFS and PACS. RTFS schedules communication and computation sequentially, whereas PACS interleaves them to enable pipeline parallelism by overlapping wireless data transfer with accelerator-level DNN execution. Simulation results demonstrate that PACS significantly outperforms RTFS under high modality heterogeneity by better masking wireless latency through communication-computation overlap, thereby highlighting the effectiveness of communication-computation pipelining in accelerating multimodal DNN execution in WNP.

Orchestrating Multimodal DNN Workloads in Wireless Neural Processing

TL;DR

Simulation results demonstrate that PACS significantly outperforms RTFS under high modality heterogeneity by better masking wireless latency by better masking wireless latency through communication-computation overlap, thereby highlighting the effectiveness of communication-computation pipelining in accelerating multimodal DNN execution in WNP.

Abstract

In edge inference, wireless resource allocation and accelerator-level deep neural network (DNN) scheduling have yet to be co-optimized in an end-to-end manner. The lack of coordination between wireless transmission and accelerator-level DNN execution prevents efficient overlap, leading to higher end-to-end inference latency. To address this issue, this paper investigates multimodal DNN workload orchestration in wireless neural processing (WNP), a paradigm that integrates wireless transmission and multi-core accelerator execution into a unified end-to-end pipeline. First, we develop a unified communication-computation model for multimodal DNN execution and formulate the corresponding optimization problem. Second, we propose O-WiN, a framework that orchestrates DNN workloads in WNP through two tightly coupled stages: simulation-based optimization and runtime execution. Third, we develop two algorithms, RTFS and PACS. RTFS schedules communication and computation sequentially, whereas PACS interleaves them to enable pipeline parallelism by overlapping wireless data transfer with accelerator-level DNN execution. Simulation results demonstrate that PACS significantly outperforms RTFS under high modality heterogeneity by better masking wireless latency through communication-computation overlap, thereby highlighting the effectiveness of communication-computation pipelining in accelerating multimodal DNN execution in WNP.
Paper Structure (26 sections, 30 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 26 sections, 30 equations, 6 figures, 3 tables, 4 algorithms.

Figures (6)

  • Figure 1: A system architecture for multimodal DNN execution in WNP.
  • Figure 2: O-WiN: A modular and scalable orchestration framework that decomposes the unified communication–computation pipeline into interface-defined functional modules for decoupled integration and systematic cross-module coordination.
  • Figure 3: Network architecture of multimodal DNN workload.
  • Figure 4: The end-to-end execution timelines of RTFS and PACS.
  • Figure 5: NoC bandwidth sharing on the multi-core accelerator of RTFS and PACS.
  • ...and 1 more figures