Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

Yidi Wang; Cong Liu; Daniel Wong; Hyoseung Kim

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

Yidi Wang, Cong Liu, Daniel Wong, Hyoseung Kim

TL;DR

This work tackles the challenge of guaranteeing real-time performance for GPU-using tasks by introducing two driver-level approaches—kernel-thread and IOCTL-based—to enable preemptive, priority-driven GPU scheduling on Nvidia Tegra GPUs. It provides formal end-to-end response-time analyses for both methods, including a GPU-priority assignment mechanism (Audsley-based) and a reduced-pessimism enhancement that accounts for overlaps between CPU and GPU segments. Through extensive simulations and real-hardware case studies on Jetson Xavier NX and Orin Nano, the authors demonstrate substantial schedulability improvements (up to ~40%) and superior predictability compared with traditional synchronization-based strategies. The results underscore the practical impact of device-driver level control for real-time GPU tasks in embedded platforms, and the work is released as open source for broader adoption and further refinement.

Abstract

Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose two novel techniques, namely the kernel thread and IOCTL-based approaches, to enable preemptive priority-based scheduling for real-time GPU tasks. Our approaches exert control over GPU context scheduling at the device driver level and enable preemptive GPU scheduling based on task priorities. The kernel thread-based approach achieves this without requiring modifications to user-level programs, while the IOCTL-based approach needs only a single macro at the boundaries of GPU access segments. In addition, we provide a comprehensive response time analysis that takes into account overlaps between different task segments, mitigating pessimism in worst-case estimates. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and timeliness of real-time tasks. The results highlight significant improvements over prior work, with up to 40\% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

TL;DR

Abstract

Paper Structure (21 sections, 7 theorems, 15 equations, 18 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 7 theorems, 15 equations, 18 figures, 5 tables, 2 algorithms.

Introduction
Background on Tegra GPU Scheduling
Related Work
System Model
Priority-based Preemptive GPU Scheduling
Kernel Thread Approach
IOCTL-based Approach
GPU Segment Priority Assignment
End-to-End Response Time Analysis
Baseline Analysis
Preemptive GPU under Kernel Thread Approach
Preemptive GPU under IOCTL-based Approach
Analysis for GPU Priority Assignment
Analysis with Reduced Pessimism
Evaluation
...and 6 more sections

Key Result

Lemma 1

The runlist update delay from the kernel thread for a job of task $\tau_i$ is upper-bounded by: where , $\epsilon$ is the runlist update time (Sec. sec:kernel_thread_approach), $R_i$ is the worst-case response time of $\tau_i$, $hp(\tau_i)$ is a set of all the higher-priority tasks than $\tau_i$ in the system, and $J_h=R_h-(C_h+G_h)$ is the release jitter to capture the carry-in effect.

Figures (18)

Figure 1: Runlist and time-sliced GPU scheduling
Figure 2: Task model example
Figure 3: Example schedule of three tasks under different approaches (priority $\tau_1 > \tau_2 > \tau_3$)
Figure 4: Example schedule of three tasks with runlist update delay (task priority: $\tau_1 > \tau_2 > \tau_3$)
Figure 5: Preemption by GPU segments on CPU tasks under busy-waiting mode, kernel thread approach as an example
...and 13 more figures

Theorems & Definitions (16)

Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
Definition 1: Completion time
Definition 2: Full overlap
...and 6 more

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

TL;DR

Abstract

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (16)