Table of Contents
Fetching ...

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

Shuo Cheng, Caelan Garrett, Ajay Mandlekar, Danfei Xu

TL;DR

NOD-TAMP is introduced, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks.

Abstract

Solving complex manipulation tasks in household and factory settings remains challenging due to long-horizon reasoning, fine-grained interactions, and broad object and scene diversity. Learning skills from demonstrations can be an effective strategy, but such methods often have limited generalizability beyond training data and struggle to solve long-horizon tasks. To overcome this, we propose to synergistically combine two paradigms: Neural Object Descriptors (NODs) that produce generalizable object-centric features and Task and Motion Planning (TAMP) frameworks that chain short-horizon skills to solve multi-step tasks. We introduce NOD-TAMP, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks. NOD-TAMP solves existing manipulation benchmarks with a handful of demonstrations and significantly outperforms prior NOD-based approaches on new tabletop manipulation tasks that require diverse generalization. Finally, we deploy NOD-TAMP on a number of real-world tasks, including tool-use and high-precision insertion. For more details, please visit https://nodtamp.github.io/.

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

TL;DR

NOD-TAMP is introduced, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks.

Abstract

Solving complex manipulation tasks in household and factory settings remains challenging due to long-horizon reasoning, fine-grained interactions, and broad object and scene diversity. Learning skills from demonstrations can be an effective strategy, but such methods often have limited generalizability beyond training data and struggle to solve long-horizon tasks. To overcome this, we propose to synergistically combine two paradigms: Neural Object Descriptors (NODs) that produce generalizable object-centric features and Task and Motion Planning (TAMP) frameworks that chain short-horizon skills to solve multi-step tasks. We introduce NOD-TAMP, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks. NOD-TAMP solves existing manipulation benchmarks with a handful of demonstrations and significantly outperforms prior NOD-based approaches on new tabletop manipulation tasks that require diverse generalization. Finally, we deploy NOD-TAMP on a number of real-world tasks, including tool-use and high-precision insertion. For more details, please visit https://nodtamp.github.io/.
Paper Structure (31 sections, 2 equations, 20 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 2 equations, 20 figures, 1 table, 2 algorithms.

Figures (20)

  • Figure 1: Overview. NOD-TAMP is a TAMP-based framework that adapts demonstration trajectories to new situations to accomplish long-horizon, fine-grained tasks.
  • Figure 2: NOD-TAMP Pipeline. Given a goal specification, a task planner plans a sequence of skill types. Then, a skill reasoner searches for the combination of skill demonstrations that maximizes compatibility. Using learned neural object descriptors (e.g., NDFs), each selected skill demonstration is adapted to the current scene. Finally, the adapted skills are executed in sequence.
  • Figure 3: Customized tasks. Examples of initial state and goal state (in green bounding box).
  • Figure 4: Success rates on LIBERO tasks. MimicGen$^{+}$, Ours/MP, and Ours/SR are abbreviated as M$^{+}$, O/MP, and O/SR.
  • Figure 5: Success rates on customized tabletop tasks. MimicGen$^{+}$, Ours/MP, and Ours/SR are abbreviated as M$^{+}$, O/MP, and O/SR.
  • ...and 15 more figures