Table of Contents
Fetching ...

Optimizing Multi-DNN Inference on Mobile Devices through Heterogeneous Processor Co-Execution

Yunquan Gao, Zhiguo Zhang, Praveen Kumar Donta, Chinmaya Kumar Dehury, Xiujun Wang, Dusit Niyato, Qiyang Zhang

TL;DR

ADMS tackles the challenge of running multiple DNNs concurrently on mobile devices with heterogeneous processors by combining offline subgraph partitioning and online processor-state-aware scheduling. It introduces a Model Analyzer that creates hardware-friendly subgraphs using a window size parameter, a Hardware Monitor that feeds real-time device status, and a Scheduler that optimizes task assignment via a multi-factor priority model. Empirical results on Redmi K50 Pro and Huawei P20 show up to 4.04x latency reduction compared with TFLite and a 24.2% improvement in energy efficiency over Band, along with enhanced thermal stability and robustness under stress. The work demonstrates that fine-grained subgraph scheduling coupled with dynamic, hardware-aware coordination can unlock substantial performance gains for real-world multi-DNN mobile workloads.

Abstract

Deep Neural Networks (DNNs) are increasingly deployed across diverse industries, driving demand for mobile device support. However, existing mobile inference frameworks often rely on a single processor per model, limiting hardware utilization and causing suboptimal performance and energy efficiency. Expanding DNN accessibility on mobile platforms requires adaptive, resource-efficient solutions to meet rising computational needs without compromising functionality. Parallel inference of multiple DNNs on heterogeneous processors remains challenging. Some works partition DNN operations into subgraphs for parallel execution across processors, but these often create excessive subgraphs based only on hardware compatibility, increasing scheduling complexity and memory overhead. To address this, we propose an Advanced Multi-DNN Model Scheduling (ADMS) strategy for optimizing multi-DNN inference on mobile heterogeneous processors. ADMS constructs an optimal subgraph partitioning strategy offline, balancing hardware operation support and scheduling granularity, and uses a processor-state-aware algorithm to dynamically adjust workloads based on real-time conditions. This ensures efficient workload distribution and maximizes processor utilization. Experiments show ADMS reduces multi-DNN inference latency by 4.04 times compared to vanilla frameworks.

Optimizing Multi-DNN Inference on Mobile Devices through Heterogeneous Processor Co-Execution

TL;DR

ADMS tackles the challenge of running multiple DNNs concurrently on mobile devices with heterogeneous processors by combining offline subgraph partitioning and online processor-state-aware scheduling. It introduces a Model Analyzer that creates hardware-friendly subgraphs using a window size parameter, a Hardware Monitor that feeds real-time device status, and a Scheduler that optimizes task assignment via a multi-factor priority model. Empirical results on Redmi K50 Pro and Huawei P20 show up to 4.04x latency reduction compared with TFLite and a 24.2% improvement in energy efficiency over Band, along with enhanced thermal stability and robustness under stress. The work demonstrates that fine-grained subgraph scheduling coupled with dynamic, hardware-aware coordination can unlock substantial performance gains for real-world multi-DNN mobile workloads.

Abstract

Deep Neural Networks (DNNs) are increasingly deployed across diverse industries, driving demand for mobile device support. However, existing mobile inference frameworks often rely on a single processor per model, limiting hardware utilization and causing suboptimal performance and energy efficiency. Expanding DNN accessibility on mobile platforms requires adaptive, resource-efficient solutions to meet rising computational needs without compromising functionality. Parallel inference of multiple DNNs on heterogeneous processors remains challenging. Some works partition DNN operations into subgraphs for parallel execution across processors, but these often create excessive subgraphs based only on hardware compatibility, increasing scheduling complexity and memory overhead. To address this, we propose an Advanced Multi-DNN Model Scheduling (ADMS) strategy for optimizing multi-DNN inference on mobile heterogeneous processors. ADMS constructs an optimal subgraph partitioning strategy offline, balancing hardware operation support and scheduling granularity, and uses a processor-state-aware algorithm to dynamically adjust workloads based on real-time conditions. This ensures efficient workload distribution and maximizes processor utilization. Experiments show ADMS reduces multi-DNN inference latency by 4.04 times compared to vanilla frameworks.

Paper Structure

This paper contains 24 sections, 4 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Subgraph partitioning process for processor-specific execution: original network (left), grouped ops by processor compatibility (middle), and merged processor-specific subgraphs (right).
  • Figure 2: Support for different operation types by various processors on the Redmi K50 Pro.
  • Figure 3: Average latency of DNN inference on single and multi-processors on the Android platform using Kirin 970 and Dimensity 9000 Chipsets.
  • Figure 4: The overview of ADMS.
  • Figure 5: Sample diagram for subgraph generation.
  • ...and 7 more figures