Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

Yan Zhang; Teng Xue; Amirreza Razmjoo; Sylvain Calinon

Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

Yan Zhang, Teng Xue, Amirreza Razmjoo, Sylvain Calinon

TL;DR

Logic-LfD is presented, which combines Task and Motion Planning (TAMP) with an optimal control formulation of Dynamic Movement Primitives (DMP), allowing it to incorporate motion-level via-point specifications and to handle task-level variations or disturbances in dynamic environments.

Abstract

Learning from Demonstration (LfD) stands as an efficient framework for imparting human-like skills to robots. Nevertheless, designing an LfD framework capable of seamlessly imitating, generalizing, and reacting to disturbances for long-horizon manipulation tasks in dynamic environments remains a challenge. To tackle this challenge, we present Logic Dynamic Movement Primitives (Logic-DMP), which combines Task and Motion Planning (TAMP) with an optimal control formulation of DMP, allowing us to incorporate motion-level via-point specifications and to handle task-level variations or disturbances in dynamic environments. We conduct a comparative analysis of our proposed approach against several baselines, evaluating its generalization ability and reactivity across three long-horizon manipulation tasks. Our experiment demonstrates the fast generalization and reactivity of Logic-DMP for handling task-level variants and disturbances in long-horizon manipulation tasks.

Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

TL;DR

Abstract

Paper Structure (21 sections, 5 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 5 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related Work
LfD for long-horizon manipulation tasks
Reactive Task and Motion Planning
Preliminary
Planning Domain Definition Language (PDDL)
Method
LQT-CP: an optimal control formulation of DMP
Logic-LfD
Reactive TAMP with Logic-LfD
Experiments
Benchmarks
Tower Construction (B1)
Workspace Reach (B2)
Dual-arm Block Transportation (B3)
...and 6 more sections

Figures (4)

Figure 1: Overview of Logic-LfD. Arrows refer to action primitives encoded with the proposed DMP variant (LQT-CP). The template task starts from $\bm{\mathcal{L}_{0}}$ and ends at goal $\bm{G}$. $\bm{\mathcal{L}_{0}} \rightarrow \bm{\mathcal{L}_{1}} \rightarrow \bm{\mathcal{L}_{2}} \rightarrow \bm{G}$ illustrates the task-level demonstration. For a new task starting from $\bm{\mathcal{L}_{0}^{\prime}}$, a fixed sequential execution of actions primitives encoded by DMPs (green arrows) cannot transition from $\bm{\mathcal{L}_{0}^{\prime}}$ to the goal state $\bm{G}$. Typical TAMP solvers find the action skeleton from $\bm{\mathcal{L}_{0}^{\prime}}$ to $\bm{G}$ from scratch (grey dashed long arrow). Instead, Logic-LfD tries to reach both task goal $\bm{G}$ and all other states (blue arrows) in the demonstration in parallel to find a feasible plan $\bm{\mathcal{L}_{0}^{\prime}}$$\rightarrow$$\bm{\mathcal{L}_{1}}$ connecting $\bm{\mathcal{L}_{0}^{\prime}}$ to the demonstration trajectory within the minimum time. It then merges the new plan with the corresponding segmentation of the demonstration $\bm{\mathcal{L}_{1}} \rightarrow \bm{\mathcal{L}_{2}} \rightarrow \bm{G}$, thus accomplishing the new task faster than classical TAMP solvers.
Figure 2: Experimental setups for the three benchmarks. B1: Tower Construction, B2: Workspace Reach, B3: Dual-arm Box Transportation. Each sub-figure illustrates the demonstrated task, with the initial and task goal states depicted in grey and white color, respectively.
Figure 3: Comparison between LQT-CP and standard DMP for two sub-tasks in the Workspace Reach benchmark (B2). Top: pick the hook. Bottom: pull cube A with the hook. The trajectories generated from LQT-CP are illustrated with blue lines in the left figures, and the ones for DMP are shown with red lines on the right. In the pulling task, the hook is expected to pass through two crucial via-points, the top and left-down corners of the cube A, for successfully accomplishing the task. In this figure, only the via-points for pulling cube A to the blue goal are shown with transparent hooks in the Bottom figures.
Figure 4: Closed-loop Logic-LfD under extreme task-level disturbance (placing the red block on cube B after stacking cube C on D) in a real-world four-block stacking task. Logic-LfD reacts to the disturbance by unstacking the red block from cube B and placing it on the table, then continuing the original plan for achieving the goal state, as shown in the last figure.

Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

TL;DR

Abstract

Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (4)