Table of Contents
Fetching ...

Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah

TL;DR

This work proves its learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula and is robust to both task- and motion-level perturbations and guaranteed to achieve task success.

Abstract

Learning from demonstration (LfD) has succeeded in tasks featuring a long time horizon. However, when the problem complexity also includes human-in-the-loop perturbations, state-of-the-art approaches do not guarantee the successful reproduction of a task. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success. Project page: https://yanweiw.github.io/tli/

Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

TL;DR

This work proves its learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula and is robust to both task- and motion-level perturbations and guaranteed to achieve task success.

Abstract

Learning from demonstration (LfD) has succeeded in tasks featuring a long time horizon. However, when the problem complexity also includes human-in-the-loop perturbations, state-of-the-art approaches do not guarantee the successful reproduction of a task. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success. Project page: https://yanweiw.github.io/tli/
Paper Structure (30 sections, 7 theorems, 11 equations, 14 figures, 1 table)

This paper contains 30 sections, 7 theorems, 11 equations, 14 figures, 1 table.

Key Result

Theorem 1

(Key Contribution 1) A nonlinear DS defined by Eq. eq:ds_eq, learned from demonstrations, and modulated by cutting planes as described in Section sec:invariance with the reference point $x^r$ set at the attractor $x^*$, will never penetrate the cuts and is G.A.S. at $x^*$. Proof: See Appendix sec:pr

Figures (14)

  • Figure 1: (a) A successful replay of the scooping task. The robot (b) is robust to motion-level perturbations; (c) experiences an invariance failure (i.e., drops material) after a task-level perturbation; and (d) re-scoops after a task-level perturbation, avoiding failure after DS motion policy modulation.
  • Figure 2: Mode abstraction of a 2D soup-scooping task: $x_1$ and $x_2$ denote the spoon's orientation and distance to the soup. (a) Task: To move the spoon's configuration from the white region (spoon without soup) $\Rightarrow$ yellow region (spoon in contact with soup) $\Rightarrow$ pink region (spoon holding soup) $\Rightarrow$ green region (soup at target). (Note that transitions (white $\Rightarrow$ pink) and (white $\Rightarrow$ green) are not physically realizable.) Black curves denote successful demonstrations. (b) Learning DS policies figueroa2018physically over unsegmented data can result in successful task replay (blue trajectories), but lacks a guarantee due to invalid transitions (red trajectories). (c) Trajectories are segmented into three colored regions (modes) with orange attractors. (d-f) Learning DSs on segments may still result in invariance failures (i.e., traveling outside of modes as depicted by red trajectories).
  • Figure 3: (a) Task automaton for a scooping task LTL. Mode $a, b, c, d$ are reaching, scooping, transporting, and done mode respectively. Atomic proposition $r, s, t$ denote sensing the spoon reaching the soup, soup on the spoon, and task success respectively. During successful demonstrations, only mode transitions in black, $a \Rightarrow b \Rightarrow c \Rightarrow d$, are observed. Additional valid transitions in gray, $b \Rightarrow a$, $c \Rightarrow a$, and $c \Rightarrow b$, are given by the LTL to help recover from unexpected mode transitions. (b) System flowchart of LTL-DS.
  • Figure 4: An illustration of iterative estimation of a mode boundary with cutting planes. A system enters a mode with an unknown boundary (dashed line) at the black circle, and is attracted to the goal at the orange circle. The trajectory in black shows the original policy rollout, and the trajectory in red is driven by perturbations. After the system exits the mode and before it eventually re-enters the same mode through replanning, a cut is placed at the last in-mode state (yellow circle) to bound the mode from the failure state (red cross). When the system is inside the cuts, it experiences modulated DS and never moves out of the cuts (flows moving into the cuts are not modulated); when the system is outside the cuts but inside the mode, it follows the nominal DS. Note only mode exits in black are invariance failures in need of modulation (green circles); mode exits in red are driven by perturbations to illustrate that more cuts lead to better boundary approximation.
  • Figure 5: Policy rollouts from different starting states for a randomly generated convex mode. The top row shows BC results, and the bottom row depicts DS results. The left column visualizes the nominal policies learned from two demonstrations (black trajectories) reaching the orange attractor. The middle columns add different levels of Gaussian noise to the initial states sampled from the demonstration distribution. Blue trajectories successfully reach the attractor, while red trajectories fail due to either invariance failures or reachability failures. (Note that these failures only occur at locations without data coverage.) The right columns show that cutting planes (blue lines) separate failures (red crosses) from last-visited in-mode states (yellow circles) and consequently bound both policies to be mode-invariant. Applying cutting planes to BC policies without a stability guarantee cannot correct reachability failures within the mode. More results in Appendix \ref{['sec:single-mode']}.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Theorem 2
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 2