Table of Contents
Fetching ...

DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

Xiaolin Fang, Caelan Reed Garrett, Clemens Eppner, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Dieter Fox

TL;DR

This work uses classical TAMP, generative modeling, and latent embedding to compose diffusion models using a TAMP system and shows how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning.

Abstract

Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler in the real world. Website: https://sites.google.com/view/dimsam-tamp

DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

TL;DR

This work uses classical TAMP, generative modeling, and latent embedding to compose diffusion models using a TAMP system and shows how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning.

Abstract

Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler in the real world. Website: https://sites.google.com/view/dimsam-tamp
Paper Structure (25 sections, 9 equations, 6 figures, 3 tables)

This paper contains 25 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: DiMSam composes diffusion models using a TAMP system towards solving multi-step manipulation problems. The planner searches for the task plan, while learned diffusion samplers find constraint-satisfying continuous values. The bottom shows a sampling procedure for finding a microwave door closing trajectory and a collision-free object stowing trajectory. A diffusion model samples a trajectory of latent microwave states $z_1, ..., z_T$ and robot configurations $q_1, ..., q_T$ that reaches a door-closed state $z_T$.
  • Figure 2: The push action description.
  • Figure 3: Samples from the DiffPush model. (a) No condition except for known $z_1$. (b) Checking collision with the purple obstacle using classifier PairwiseCollision. Rejected samples are colored red. (c) (d) Classifier-guided conditional sampling with DoorOpen and Doorclosed.
  • Figure 4: One sampled trajectory of the microwave from the initial state (left figure) to the "fully closed" state. Sampled latent state $z$ is decoded into a point cloud and rotated for better visualization.
  • Figure 5: The initial states in the (a) Close, (b) Stow-Close, (c) Stow-Close-B simulated tasks.
  • ...and 1 more figures