NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

Shuo Cheng; Caelan Garrett; Ajay Mandlekar; Danfei Xu

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

Shuo Cheng, Caelan Garrett, Ajay Mandlekar, Danfei Xu

TL;DR

NOD-TAMP is introduced, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks.

Abstract

Solving complex manipulation tasks in household and factory settings remains challenging due to long-horizon reasoning, fine-grained interactions, and broad object and scene diversity. Learning skills from demonstrations can be an effective strategy, but such methods often have limited generalizability beyond training data and struggle to solve long-horizon tasks. To overcome this, we propose to synergistically combine two paradigms: Neural Object Descriptors (NODs) that produce generalizable object-centric features and Task and Motion Planning (TAMP) frameworks that chain short-horizon skills to solve multi-step tasks. We introduce NOD-TAMP, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks. NOD-TAMP solves existing manipulation benchmarks with a handful of demonstrations and significantly outperforms prior NOD-based approaches on new tabletop manipulation tasks that require diverse generalization. Finally, we deploy NOD-TAMP on a number of real-world tasks, including tool-use and high-precision insertion. For more details, please visit https://nodtamp.github.io/.

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

TL;DR

Abstract

Paper Structure (31 sections, 2 equations, 20 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 2 equations, 20 figures, 1 table, 2 algorithms.

Introduction
Related Work
Problem Setup and Background
Problem Setup
Skill Representation
NOD-TAMP
Skill Adaptation
Skill Planning
Transit & Transfer Motion
Experiments
Experimental Setup
Baselines
Evaluation on the LIBERO Benchmark
Evaluation on Customized Tabletop Tasks
Real-world Evaluation
...and 16 more sections

Figures (20)

Figure 1: Overview. NOD-TAMP is a TAMP-based framework that adapts demonstration trajectories to new situations to accomplish long-horizon, fine-grained tasks.
Figure 2: NOD-TAMP Pipeline. Given a goal specification, a task planner plans a sequence of skill types. Then, a skill reasoner searches for the combination of skill demonstrations that maximizes compatibility. Using learned neural object descriptors (e.g., NDFs), each selected skill demonstration is adapted to the current scene. Finally, the adapted skills are executed in sequence.
Figure 3: Customized tasks. Examples of initial state and goal state (in green bounding box).
Figure 4: Success rates on LIBERO tasks. MimicGen$^{+}$, Ours/MP, and Ours/SR are abbreviated as M$^{+}$, O/MP, and O/SR.
Figure 5: Success rates on customized tabletop tasks. MimicGen$^{+}$, Ours/MP, and Ours/SR are abbreviated as M$^{+}$, O/MP, and O/SR.
...and 15 more figures

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

TL;DR

Abstract

NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors

Authors

TL;DR

Abstract

Table of Contents

Figures (20)