Table of Contents
Fetching ...

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

Yuanhai Zhang, Songyang He, Ruizhe Gou, Mingyue Cui, Boyang Li, Shuai Zhao, Kai Huang

TL;DR

The proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time and yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities.

Abstract

With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. Applying a GPU is indispensable for parallel computing; however, the complex data dependencies and resource contention across kernels within a GPU task may unpredictably delay its execution time. To address these problems, this paper presents a scheduling and analysis method for Directed Acyclic Graph (DAG)-structured GPU tasks. Given a DAG representation, the proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time. The corresponding timing analysis yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities. The proposed method is implemented using the standard CUDA API, requiring no additional software or hardware support. Experimental results under synthetic and real-world benchmarks demonstrate that the proposed approach effectively reduces the worst-case makespan and measured task execution time compared to the existing methods up to 32.8% and 21.3%, respectively.

Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks

TL;DR

The proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time and yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities.

Abstract

With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. Applying a GPU is indispensable for parallel computing; however, the complex data dependencies and resource contention across kernels within a GPU task may unpredictably delay its execution time. To address these problems, this paper presents a scheduling and analysis method for Directed Acyclic Graph (DAG)-structured GPU tasks. Given a DAG representation, the proposed scheduling scales the kernel-level parallelism and establishes inter-kernel dependencies to provide a reduced and predictable DAG response time. The corresponding timing analysis yields a safe yet nonpessimistic makespan bound without any assumption on kernel priorities. The proposed method is implemented using the standard CUDA API, requiring no additional software or hardware support. Experimental results under synthetic and real-world benchmarks demonstrate that the proposed approach effectively reduces the worst-case makespan and measured task execution time compared to the existing methods up to 32.8% and 21.3%, respectively.
Paper Structure (15 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 15 sections, 2 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

lemma 1

For any balanced group $\pi_j \in \Pi$, its relative response time $R(\pi_j)$, defined as the interval from the completion of $\pi_{j-1}$ to the completion of $\pi_j$, is upper-bounded as:

Figures (6)

  • Figure 1: The execution of a GPU task in DAG representation
  • Figure 2: Example of proposed schedule
  • Figure 3: Example of timing analysis
  • Figure 4: Makespan with various $M$ when $\hat{C}_{\text{avg}} = 20$
  • Figure 5: Makespan with various $P$ when $M=32,\hat{C}_{\text{avg}} = 20$
  • ...and 1 more figures

Theorems & Definitions (4)

  • lemma 1
  • proof
  • theorem 1
  • proof