Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

Yixuan Huang; Christopher Agia; Jimmy Wu; Tucker Hermans; Jeannette Bohg

Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

Yixuan Huang, Christopher Agia, Jimmy Wu, Tucker Hermans, Jeannette Bohg

TL;DR

Points2Plans tackles long-horizon robotic manipulation from partial-view point clouds by unifying symbolic and geometric reasoning through a transformer-based relational dynamics model trained on single-step transitions. A hybrid latent-geometric rollout paired with an LLM-guided task planner enables efficient planning of manipulation sequences, while a sampling-based parameter planner enforces feasibility against predicate constraints. The framework demonstrates strong generalization to unseen tasks and real-world success (>85%) compared with baselines (~50%), highlighting the viability of composable planning from rich perceptual input. This approach advances scalable, language-driven planning for complex, occluded environments without requiring multi-step demonstrations for training, paving the way for robust, open-world robotics applications.

Abstract

We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a language instruction and a point cloud of the scene, our framework initiates a hierarchical planning procedure, whereby a language model generates a high-level plan and a sampling-based planner produces constraint-satisfying continuous parameters for manipulation primitives sequenced according to the high-level plan. Key to our approach is the use of a relational dynamics model as a unifying interface between the continuous and symbolic representations of states and actions, thus facilitating language-driven planning from high-dimensional perceptual input such as point clouds. Whereas previous relational dynamics models require training on datasets of multi-step manipulation scenarios that align with the intended test scenarios, Points2Plans uses only single-step simulated training data while generalizing zero-shot to a variable number of steps during real-world evaluations. We evaluate our approach on tasks involving geometric reasoning, multi-object interactions, and occluded object reasoning in both simulated and real-world settings. Results demonstrate that Points2Plans offers strong generalization to unseen long-horizon tasks in the real world, where it solves over 85% of evaluated tasks while the next best baseline solves only 50%.

Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 7 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Problem Setup
Proposed Approach: Points2Plans
Composable Relational Dynamics
Hybrid Rollout Strategy
Planning Action Sequences with Relational Dynamics
Experiments
Results
Conclusion
Acknowledgements
Appendix
Generative Model and Problem Formulation
Approximate Sampling Distributions
Planning and Optimization Details
...and 13 more sections

Figures (10)

Figure 2: Overview of Points2Plans. A partial-view segmented point cloud $\mathbf{o}_1$ is first encoded into the (object-centric) latent state $\mathbf{z}_1$. The latent state $\mathbf{z}_1$ is then decoded into predicates that serve as environment context for the task planning and goal prediction module (e.g., an LLM), from which a task plan $\phi_{1:H}$ and a symbolic goal $\mathcal{G}$ are sampled. Points2Plans then invokes a sampling-based planning procedure to compute continuous parameters $a_{1:H}$ for the manipulation primitives in the task plan $\phi_{1:H}$. Infeasible plans (e.g., collisions) are rejected, and the plan that maximizes the goal likelihood in the final state $\mathbf{z}_{H+1}$ is returned.
Figure 3: Points2Plans hybrid rollout strategy.
Figure 4: Simulation and real-world results for the Constrained Packing (a-d) and Constrained Retrieval (e-f) tasks. As task complexity increases, Points2Plans significantly outperforms baselines in terms of planning success rate (a-b), position prediction error (c), and predicate classification accuracy (d). Interfacing Points2Plans with an LLM task planner increases planning efficiency (e) and correctness (f). Planning time is shown on a logarithmic scale. Errors bars denote standard deviations across 500 trials.
Figure 5: Points2Plans generalizes to unseen long-horizon tasks, whereas the baselines struggle to find collision-free plans.
Figure 6: A causal Bayes net to derive Eq. \ref{['eq:planning-objective']}. $\mathcal{G}$ represents the goal predicates, $l$ is the language instruction, $o_1$ is the initial observation, $\phi_{1:H}$ are the task plans, $a_{1:H}$ are the continuous parameters, and $\mathbf{x}_{1:H}$ represent world states (including predicates $\mathbf{r}_{1:H}$ and positions $\mathbf{p}_{1:H}$. Shaded nodes represent observed variables.
...and 5 more figures

Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

TL;DR

Abstract

Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics

Authors

TL;DR

Abstract

Table of Contents

Figures (10)