PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2
Daniel Enright, Yecheng Xiang, Hyunjong Choi, Hyoseung Kim
TL;DR
PAAM tackles the challenge of predictable accelerator access in ROS 2 by introducing a dedicated accelerator resource server that arbitrates requests with chain-centric priorities. The framework adds a two-level hierarchical queueing model and device-aware execution paths for GPUs and TPUs, enabling per-segment and per-chain worst-case bounds while supporting both local and remote accelerator usage. Empirical results on Jetson hardware show significant reductions in critical-chain latency (up to about 91% in some scenarios) and robust adherence to analytical bounds, with overhead primarily attributed to DDS data transport. By enabling granular, priority-driven scheduling at the application layer without modifying accelerator drivers, PAAM offers a practical pathway to real-time, safety-critical robotic systems that leverage heterogeneous accelerators.
Abstract
This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91\% reduction in end-to-end response time of their critical callback chains.
