Composable Part-Based Manipulation

Weiyu Liu; Jiayuan Mao; Joy Hsu; Tucker Hermans; Animesh Garg; Jiajun Wu

Composable Part-Based Manipulation

Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

TL;DR

CPM addresses generalization in robotic manipulation by decomposing objects into functional parts and learning part-part correspondences as probabilistic constraints over $SE(3)$ pose trajectories. It trains a collection of conditional diffusion models, one per correspondence, and performs inference-time composition to sample trajectories that satisfy all constraints. The approach is validated on pouring and safe-placing tasks with both simulated data from PartNet/ShapeNetSem and zero-shot real-world transfer, outperforming several baselines. The results show strong cross-category generalization and robustness to geometric variation, suggesting a scalable route to general-purpose, composable manipulation skills.

Abstract

In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of different correspondence constraints. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. These diffusion models can generate parameters for manipulation skills based on the specific object parts. Leveraging part-based correspondences coupled with the task decomposition into distinct constraints enables strong generalization to novel objects and object categories. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities.

Composable Part-Based Manipulation

TL;DR

CPM addresses generalization in robotic manipulation by decomposing objects into functional parts and learning part-part correspondences as probabilistic constraints over

pose trajectories. It trains a collection of conditional diffusion models, one per correspondence, and performs inference-time composition to sample trajectories that satisfy all constraints. The approach is validated on pouring and safe-placing tasks with both simulated data from PartNet/ShapeNetSem and zero-shot real-world transfer, outperforming several baselines. The results show strong cross-category generalization and robustness to geometric variation, suggesting a scalable route to general-purpose, composable manipulation skills.

Abstract

Paper Structure (19 sections, 3 equations, 5 figures, 8 tables)

This paper contains 19 sections, 3 equations, 5 figures, 8 tables.

Introduction
Related Work
Composable Part-Based Manipulation
Action as Part-Based Functional Correspondences
Generative Modeling of Functional Correspondences with Diffusion Models
Inference-Time Composition of Diffusion Models
Data Collection
Experiments
Experimental Setup
Compared Methods
Simulation Results
Real-World Transfer
Limitations and Conclusion
Network Architecture
Implementation Details for Baselines
...and 4 more sections

Figures (5)

Figure 1: CPM composes part-based diffusion models to predict target object poses directly from point clouds. In this example, we show that the "pouring" action is decomposed into three part-based correspondences, which generalize manipulation across object categories, and from simulation to the real world
Figure 2: (a) Given a task, the partial point clouds of the anchor and function objects, and their parts extracted from a learned segmentation model $g_\phi$, we sample a sequence of transformations from a learned distribution $p_\theta$ to parameterize the function object's trajectory. (b) CPM can be generalized to novel object categories because it decomposes each action to a collection of functional correspondences between object parts. To sample the target transformations that satisfy all functional correspondences, CPM combines the noise predictions from a collection of primitive diffusion models at inference time. (c) Each primitive diffusion model learns a target pose distribution that satisfies a particular part-part correspondence, based on the point clouds of the object parts.
Figure 3: We generate task demonstrations using the PartNet and ShapeNetSem datasets for the "pouring" and "safe placing" tasks. We create demonstrations for a variety of function and anchor object combinations.
Figure 4: We illustrate the learned distribution of each primitive diffusion model, which generates diverse samples conforming to the specified constraints, as well as the distribution from the combined full CPM model. The highest-ranked sample is highlighted.
Figure 5: We show sampled frames from trajectories of CPM's policy. The model is trained only on demonstrations with pans, bowls, and wine glasses in simulation and generalizes to mugs in the real world.

Composable Part-Based Manipulation

TL;DR

Abstract

Composable Part-Based Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)