GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks
Yidi Wang, Cong Liu, Daniel Wong, Hyoseung Kim
TL;DR
This work tackles real-time scheduling of GPU-using tasks on multi-core systems by addressing the lack of controllable GPU preemption in commercial drivers. It introduces GCAPS, a device-driver–level, priority-based preemptive GPU context scheduler that uses two user-space macros to delineate GPU segments and maintain a runlist, enabling higher-priority tasks to preempt lower-priority GPU work. The authors derive comprehensive end-to-end response-time analyses for both the default Nvidia Tegra round-robin driver and GCAPS, accounting for busy-waiting and self-suspending GPU segments as well as GPU-context-switch overhead, including a runlist-update cost ε. Empirical results show GCAPS substantially improves task-set schedulability (up to 40% gains) and predictability on Nvidia Jetson platforms, with case studies on Xavier NX and Orin Nano confirming practical viability and real-time performance benefits.
Abstract
Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
