Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics
Yixuan Huang, Christopher Agia, Jimmy Wu, Tucker Hermans, Jeannette Bohg
TL;DR
Points2Plans tackles long-horizon robotic manipulation from partial-view point clouds by unifying symbolic and geometric reasoning through a transformer-based relational dynamics model trained on single-step transitions. A hybrid latent-geometric rollout paired with an LLM-guided task planner enables efficient planning of manipulation sequences, while a sampling-based parameter planner enforces feasibility against predicate constraints. The framework demonstrates strong generalization to unseen tasks and real-world success (>85%) compared with baselines (~50%), highlighting the viability of composable planning from rich perceptual input. This approach advances scalable, language-driven planning for complex, occluded environments without requiring multi-step demonstrations for training, paving the way for robust, open-world robotics applications.
Abstract
We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%.
