COOK Access Control on an embedded Volta GPU
Benjamin Lesage, Frédéric Boniol, Claire Pagetti
TL;DR
The paper addresses timing variability caused by interference among concurrent GPU workloads on the Jetson AGX Xavier. It introduces COOK, a software-based hookable access controller that defers and serializes GPU operations by intercepting CUDA Runtime calls, offering three strategies (Host Callback, Synced, Deferred Worker) to achieve temporal isolation with varying tradeoffs. Across CUDA_mmult and ONNX-based benchmarks, COOK reduces extreme slowdowns and interference-related variability, though some workloads incur overhead due to synchronization and deferral; Synced and Worker show the strongest isolation, while Callback may introduce more variability in some cases. The work demonstrates a practical, software-only approach to predictability and transparency on a constrained embedded GPU platform, with implications for platform-level resource management and future enhancements.
Abstract
The last decade has seen the emergence of a new generation of multi-core in response to advances in machine learning, and in particular Deep Neural Network (DNN) training and inference tasks. These platforms, like the JETSON AGX XAVIER, embed several cores and accelerators in a SWaP- efficient (Size Weight and Power) package with a limited set of resources. However, concurrent applications tend to interfere on shared resources, resulting in high execution time variability for applications compared to their behaviour in isolation.Access control techniques aim to selectively restrict the flow of operations executed by a resource. To reduce the impact of interference on the JETSON Volta GPU, we specify and implement an access control technique to ensure each GPU operation executes in isolation to reduce its timing variability. We implement the controller using three different strategies and assess their complexity and impact on the application performance. Our evaluation shows the benefits of adding the access control: its transparency to applications, reduced timing variability, isolation between GPU operations, and small code complexity. However, the strategies may cause some potential slowdowns for applications even in isolation but which are reasonable.
