IMMSched: Interruptible Multi-DNN Scheduling via Parallel Multi-Particle Optimizing Subgraph Isomorphism

Boran Zhao; Hetian Liu; Zihang Yuan; Yanbin Hu; Wenzhe Zhao; Tian Xia; Pengju Ren

IMMSched: Interruptible Multi-DNN Scheduling via Parallel Multi-Particle Optimizing Subgraph Isomorphism

Boran Zhao, Hetian Liu, Zihang Yuan, Yanbin Hu, Wenzhe Zhao, Tian Xia, Pengju Ren

Abstract

The growing demand for multi-DNN workloads with unpredictable task arrival times has highlighted the need for interruptible scheduling on edge accelerators. However, existing preemptive frameworks typically assume known task arrival times and rely on CPU-based offline scheduling, which incurs heavy runtime overhead and struggles to handle unpredictable task arrivals. Even worse, prior studies have shown that multi-DNN scheduling requires solving an NP-hard subgraph isomorphism problem on large directed acyclic graphs within limited time, which is extremely challenging. To tackle this, we propose IMMSched, a parallel subgraph isomorphism method that combines Multi-Particle Optimization with the Ullmann algorithm based on a probabilistic continuous-relaxation scheme, eliminating the serial data dependencies of previous works. Finally, a quantized scheduling scheme and a global controller in the hardware architecture further combine multi-particle results for consensus-guided exploration. Evaluations demonstrate that IMMSched achieves orders-of-magnitude reductions in scheduling latency and energy consumption, enabling real-time execution of unpredictable DNN tasks on edge accelerators.

IMMSched: Interruptible Multi-DNN Scheduling via Parallel Multi-Particle Optimizing Subgraph Isomorphism

Abstract

Paper Structure (21 sections, 1 equation, 8 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 1 equation, 8 figures, 2 tables, 1 algorithm.

Introduction
Background and Motivation
Problems of Previous Works
Inspiration of Subgraph Isomorphism
Challenge of Interruptible Scheduling
IMMSched
Preliminary of Formalization
Proposed Continuous-Relaxation Modeling
Proposed Parallel Subgraph Matching Algorithm
Algorithm Quantization and Hardware Improvement
Experiment and Analysis
Experimental Setups
Hardware Modeling
Workloads
Baseline
...and 6 more sections

Figures (8)

Figure 1: (a) Previous methods target deterministic scenarios, performed offline using CPU-based serial strategies, with tasks executed periodically on the DNN accelerator (i.e., NPU). (b) In open-ended scenarios, previous methods require significant time to schedule Uncertain tasks, reducing execution time and often causing timeouts. (c) The proposed interruptible scheduling method leverages the DNN accelerator to parallelize scheduling, ensuring real-time execution of uncertain tasks.
Figure 2: (a) Comparison of execution time and scheduling time on the Cloud platform using MoCA. Scenario A uses the middle workload (i.e., UNet), and Scenario B uses the complex workload (i.e., Qwen). (b) Improved search stability of Particle Swarm Optimization after applying the continuous relaxation mechanism.
Figure 3: Comparison between Layer Temporal Scheduling (LTS) and Tile Spatial Scheduling (TSS): TSS utilizes on-chip links to avoid the energy and latency overheads associated with DRAM accesses in LTS isosched.
Figure 4: MMSched schedules by preempting engines based on the "single-core preemption ratio," prioritizing low-priority tasks (e.g., Task B and D) according to execution time slack, while not interrupting high-priority tasks (e.g., Task A) by default.
Figure 5: Hardware architecture supporting IMMSched based on typical DNN accelerator.
...and 3 more figures

IMMSched: Interruptible Multi-DNN Scheduling via Parallel Multi-Particle Optimizing Subgraph Isomorphism

Abstract

IMMSched: Interruptible Multi-DNN Scheduling via Parallel Multi-Particle Optimizing Subgraph Isomorphism

Authors

Abstract

Table of Contents

Figures (8)