GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

Yidi Wang; Cong Liu; Daniel Wong; Hyoseung Kim

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

Yidi Wang, Cong Liu, Daniel Wong, Hyoseung Kim

TL;DR

This work tackles real-time scheduling of GPU-using tasks on multi-core systems by addressing the lack of controllable GPU preemption in commercial drivers. It introduces GCAPS, a device-driver–level, priority-based preemptive GPU context scheduler that uses two user-space macros to delineate GPU segments and maintain a runlist, enabling higher-priority tasks to preempt lower-priority GPU work. The authors derive comprehensive end-to-end response-time analyses for both the default Nvidia Tegra round-robin driver and GCAPS, accounting for busy-waiting and self-suspending GPU segments as well as GPU-context-switch overhead, including a runlist-update cost ε. Empirical results show GCAPS substantially improves task-set schedulability (up to 40% gains) and predictability on Nvidia Jetson platforms, with case studies on Xavier NX and Orin Nano confirming practical viability and real-time performance benefits.

Abstract

Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

TL;DR

Abstract

Paper Structure (23 sections, 15 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 15 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm.

Introduction
Background on Tegra GPU Scheduling
Related Work
System Model
GCAPS: Priority-based Preemptive GPU Context Scheduling
GCAPS Algorithm
GPU Context Switching Details
Separate Priority for GPU Segments
End-to-End Response Time Analysis
Response Time Breakdown
Analysis for Default Round-Robin TSG Scheduling
Busy-Waiting Mode
Self-Suspension Mode
Analysis for Proposed GPU Context Scheduling
Busy-Waiting Mode
...and 8 more sections

Key Result

Lemma 1

Under the default Tegra GPU driver, the worst-case interference from GPU interleaved execution for a task $\tau_i$ is bounded by:

Figures (13)

Figure 1: Runlist and time-sliced GPU scheduling
Figure 2: Task model
Figure 3: Example schedule of three tasks under different approaches (priority $\tau_1 > \tau_2 > \tau_3$)
Figure 4: Preemption by GPU segments and GPU context switching
Figure 5: Example schedule of assigning separate GPU priority under self-suspension mode
...and 8 more figures

Theorems & Definitions (22)

Definition 1: GPU context switch overhead
Example 1: Motivational example
Definition 2: Runlist update delay
Example 2: Effect of separate priority for GPU segment
Example 3: Indirect delay
Lemma 1: GPU interleaved execution
Lemma 2: GPU direct preemption
Lemma 3: CPU blocking time
Lemma 4: GPU indirect delay
Lemma 5: CPU preemption
...and 12 more

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

TL;DR

Abstract

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (22)