Table of Contents
Fetching ...

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

Yidi Wang, Cong Liu, Daniel Wong, Hyoseung Kim

TL;DR

This work tackles real-time scheduling of GPU-using tasks on multi-core systems by addressing the lack of controllable GPU preemption in commercial drivers. It introduces GCAPS, a device-driver–level, priority-based preemptive GPU context scheduler that uses two user-space macros to delineate GPU segments and maintain a runlist, enabling higher-priority tasks to preempt lower-priority GPU work. The authors derive comprehensive end-to-end response-time analyses for both the default Nvidia Tegra round-robin driver and GCAPS, accounting for busy-waiting and self-suspending GPU segments as well as GPU-context-switch overhead, including a runlist-update cost ε. Empirical results show GCAPS substantially improves task-set schedulability (up to 40% gains) and predictability on Nvidia Jetson platforms, with case studies on Xavier NX and Orin Nano confirming practical viability and real-time performance benefits.

Abstract

Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

TL;DR

This work tackles real-time scheduling of GPU-using tasks on multi-core systems by addressing the lack of controllable GPU preemption in commercial drivers. It introduces GCAPS, a device-driver–level, priority-based preemptive GPU context scheduler that uses two user-space macros to delineate GPU segments and maintain a runlist, enabling higher-priority tasks to preempt lower-priority GPU work. The authors derive comprehensive end-to-end response-time analyses for both the default Nvidia Tegra round-robin driver and GCAPS, accounting for busy-waiting and self-suspending GPU segments as well as GPU-context-switch overhead, including a runlist-update cost ε. Empirical results show GCAPS substantially improves task-set schedulability (up to 40% gains) and predictability on Nvidia Jetson platforms, with case studies on Xavier NX and Orin Nano confirming practical viability and real-time performance benefits.

Abstract

Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
Paper Structure (23 sections, 15 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 15 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

Under the default Tegra GPU driver, the worst-case interference from GPU interleaved execution for a task $\tau_i$ is bounded by:

Figures (13)

  • Figure 1: Runlist and time-sliced GPU scheduling
  • Figure 2: Task model
  • Figure 3: Example schedule of three tasks under different approaches (priority $\tau_1 > \tau_2 > \tau_3$)
  • Figure 4: Preemption by GPU segments and GPU context switching
  • Figure 5: Example schedule of assigning separate GPU priority under self-suspension mode
  • ...and 8 more figures

Theorems & Definitions (22)

  • Definition 1: GPU context switch overhead
  • Example 1: Motivational example
  • Definition 2: Runlist update delay
  • Example 2: Effect of separate priority for GPU segment
  • Example 3: Indirect delay
  • Lemma 1: GPU interleaved execution
  • Lemma 2: GPU direct preemption
  • Lemma 3: CPU blocking time
  • Lemma 4: GPU indirect delay
  • Lemma 5: CPU preemption
  • ...and 12 more