Kino-PAX: Highly Parallel Kinodynamic Sampling-based Planner
Nicolas Perrault, Qi Heng Ho, Morteza Lahijanian
TL;DR
Kino-PAX addresses the challenge of real-time kinodynamic motion planning in high-dimensional spaces by introducing a highly parallel SBMP designed for GPU architectures. It decomposes the iterative tree-growth process into three parallel subroutines and organizes sampling with a region-based decomposition, using three disjoint sets ($V_U$, $V_O$, $V_E$) and a region partition $\mathcal{R}$ to guide exploration. The paper proves probabilistic completeness, analyzes scalability with hardware improvements, and demonstrates millisecond-scale planning times (up to $<8$ ms for 6D and $<25$ ms for 12D) on GPUs, including embedded devices, with substantial speedups over CPU baselines. Key contributions include the Kino-PAX algorithm, hyperparameter and decomposition guidance, probabilistic-completeness analysis, and extensive benchmarks across multiple dynamical systems. These results indicate a significant practical impact for real-time, high-dimensional kinodynamic planning on modern parallel hardware.
Abstract
Sampling-based motion planners (SBMPs) are effective for planning with complex kinodynamic constraints in high-dimensional spaces, but they still struggle to achieve real-time performance, which is mainly due to their serial computation design. We present Kinodynamic Parallel Accelerated eXpansion (Kino-PAX), a novel highly parallel kinodynamic SBMP designed for parallel devices such as GPUs. Kino-PAX grows a tree of trajectory segments directly in parallel. Our key insight is how to decompose the iterative tree growth process into three massively parallel subroutines. Kino-PAX is designed to align with the parallel device execution hierarchies, through ensuring that threads are largely independent, share equal workloads, and take advantage of low-latency resources while minimizing high-latency data transfers and process synchronization. This design results in a very efficient GPU implementation. We prove that Kino-PAX is probabilistically complete and analyze its scalability with compute hardware improvements. Empirical evaluations demonstrate solutions in the order of 10 ms on a desktop GPU and in the order of 100 ms on an embedded GPU, representing up to 1000 times improvement compared to coarse-grained CPU parallelization of state-of-the-art sequential algorithms over a range of complex environments and systems.
