Table of Contents
Fetching ...

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks

Alexandre St-Aubin, Amin Abyaneh, Hsiu-Chin Lin

TL;DR

The paper addresses the challenge of safely learning long-horizon robotic manipulation with guarantees by decomposing demonstrations into subgoals with waypoints and training globally stable dynamical policies for each segment. AWE is used to extract informative waypoints within each segment, and a high-level cascade controller stitches segment policies to reproduce the full trajectory. Each per-segment SNDS is Lyapunov-stable, enabling robust performance under sensory noise and disturbances, and the approach transfers from simulation to real robots with a single demonstration. Experiments in deterministic and perturbed settings, plus zero-shot sim-to-real tests, show substantial improvements over baselines and demonstrate one-shot, data-efficient learning for long-horizon tasks.

Abstract

Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-horizon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks

TL;DR

The paper addresses the challenge of safely learning long-horizon robotic manipulation with guarantees by decomposing demonstrations into subgoals with waypoints and training globally stable dynamical policies for each segment. AWE is used to extract informative waypoints within each segment, and a high-level cascade controller stitches segment policies to reproduce the full trajectory. Each per-segment SNDS is Lyapunov-stable, enabling robust performance under sensory noise and disturbances, and the approach transfers from simulation to real robots with a single demonstration. Experiments in deterministic and perturbed settings, plus zero-shot sim-to-real tests, show substantial improvements over baselines and demonstrate one-shot, data-efficient learning for long-horizon tasks.

Abstract

Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-horizon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints
Paper Structure (18 sections, 1 theorem, 7 equations, 8 figures, 1 table)

This paper contains 18 sections, 1 theorem, 7 equations, 8 figures, 1 table.

Key Result

Proposition III-C.1

The high-level policy outlined in Eq. is globally stable at the last subgoal, $\mathbf{g}^\mathit{K}$.

Figures (8)

  • Figure 1: Overview of our approach: Long-horizon demonstrations (1) are first segmented into subgoals (2). Low-level stable dynamical policies are then learned to robustly reach each subgoal, even in the presence of perturbations (3). Finally, a high-level policy orchestrates a cascade of these stable policies for each segment, replicating the long-horizon expert demonstrations (4).
  • Figure 2: An example of (a) a single expert demonstration in the robot's task space, (b) three subgoals selected, and (c) outcomes of the automatic waypoint selections in each segment.
  • Figure 3: Stable dynamical policy rollout by an optimized SNDS model (left), and its Lyapunov candidate (right). The learned Lyapunov candidate ensures the induced trajectories always move toward the lowest energy point, regardless of the initial state or perturbations.
  • Figure 4: Our framework on learning stable policies for long-horizon manipulation tasks
  • Figure 5: Examples of tasks in Robosuite (top) and an overview of the task demonstrations (bottom).
  • ...and 3 more figures

Theorems & Definitions (3)

  • Remark III-A.1
  • Remark III-B.1
  • Proposition III-C.1