Warm-Starting Optimization-Based Motion Planning for Robotic Manipulators via Point Cloud-Conditioned Flow Matching
Sibo Tian, Minghui Zheng, Xiao Liang
TL;DR
The paper tackles real-time motion planning for robotic manipulators under partial observability by introducing a Flow Matching-based neural initializer conditioned on a single-view point cloud to generate multiple near-optimal seeds. These seeds are rapidly refined by a batched, GPU-accelerated trajectory optimizer, cuRobo, enabling fast, collision-free trajectories without prior knowledge of obstacle geometry. The approach leverages a SE-Transformer architecture with PointNet++-based point-cloud features and formal Flow Matching loss to learn the velocity field guiding trajectory samples, achieving higher success rates and faster convergence than deterministic or diffusion-based baselines, with strong generalization to unseen environments. This work advances fast replanning in dynamic human-robot collaboration and cluttered manufacturing settings by seamlessly integrating perception, learning, and optimization in a partially observable regime.
Abstract
Rapid robot motion generation is critical in Human-Robot Collaboration (HRC) systems, as robots need to respond to dynamic environments in real time by continuously observing their surroundings and replanning their motions to ensure both safe interactions and efficient task execution. Current sampling-based motion planners face challenges in scaling to high-dimensional configuration spaces and often require post-processing to interpolate and smooth the generated paths, resulting in time inefficiency in complex environments. Optimization-based planners, on the other hand, can incorporate multiple constraints and generate smooth trajectories directly, making them potentially more time-efficient. However, optimization-based planners are sensitive to initialization and may get stuck in local minima. In this work, we present a novel learning-based method that utilizes a Flow Matching model conditioned on a single-view point cloud to learn near-optimal solutions for optimization initialization. Our method does not require prior knowledge of the environment, such as obstacle locations and geometries, and can generate feasible trajectories directly from single-view depth camera input. Simulation studies on a UR5e robotic manipulator in cluttered workspaces demonstrate that the proposed generative initializer achieves a high success rate on its own, significantly improves the success rate of trajectory optimization compared with traditional and learning-based benchmark initializers, requires fewer optimization iterations, and exhibits strong generalization to unseen environments.
