Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving
Xuanjin Jin, Yanxin Dong, Bin Sun, Huan Xu, Zhihui Hao, XianPeng Lang, Panpan Cai
TL;DR
Vec-QMDP introduces a CPU-native, SIMD-optimized POMDP planner for real-time autonomous driving by leveraging the QMDP approximation to decompose belief trees into independent sub-trees and applying data-oriented design to enable global and local vectorization. It couples multi-threading across CPU cores with wide SIMD kernels, and uses a load-balancing UCB to align expansion depth across scenario trees, achieving substantial speedups over serial planners. The approach also includes vectorized belief-space trajectory optimization with cross-scenario evaluation, supported by efficient collision checking via STR-trees and two-stage SIMD kernels. Evaluations on the nuPlan benchmark demonstrate millisecond-level planning and up to 1073× throughput gains in dense traffic, establishing CPUs as a high-performance platform for large-scale planning under uncertainty in autonomous driving.
Abstract
Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving $227\times$--$1073\times$ speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evaluated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.
