Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

Yuanhai Zhang; Songyang He; Ruizhe Gou; Mingyue Cui; Boyang Li; Shuai Zhao; Kai Huang

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

Yuanhai Zhang, Songyang He, Ruizhe Gou, Mingyue Cui, Boyang Li, Shuai Zhao, Kai Huang

TL;DR

The proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time and yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities.

Abstract

With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. Applying a GPU is indispensable for parallel computing; however, the complex data dependencies and resource contention across kernels within a GPU task may unpredictably delay its execution time. To address these problems, this paper presents a scheduling and analysis method for Directed Acyclic Graph (DAG)-structured GPU tasks. Given a DAG representation, the proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time. The corresponding timing analysis yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities. The proposed method is implemented using the standard CUDA API, requiring no additional software or hardware support. Experimental results under synthetic and real-world benchmarks demonstrate that the proposed approach effectively reduces the worst-case makespan and measured task execution time compared to the existing methods up to 32.8% and 21.3%, respectively.

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

TL;DR

Abstract

Paper Structure (15 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 15 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
System Model
GPU Task Model
Execution Model
Sub-graph Division
Block Formulation
Balanced Groups Construction
Scheduling and Analysis
Scheduling
Response Time Analysis
Evaluation
Experimental Setup
Large-scale Experiment
Case Study
Conclusion

Key Result

lemma 1

For any balanced group $\pi_j \in \Pi$, its relative response time $R(\pi_j)$, defined as the interval from the completion of $\pi_{j-1}$ to the completion of $\pi_j$, is upper-bounded as:

Figures (6)

Figure 1: The execution of a GPU task in DAG representation
Figure 2: Example of proposed schedule
Figure 3: Example of timing analysis
Figure 4: Makespan with various $M$ when $\hat{C}_{\text{avg}} = 20$
Figure 5: Makespan with various $P$ when $M=32,\hat{C}_{\text{avg}} = 20$
...and 1 more figures

Theorems & Definitions (4)

lemma 1
proof
theorem 1
proof

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

TL;DR

Abstract

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)