Table of Contents
Fetching ...

WayEx: Waypoint Exploration using a Single Demonstration

Mara Levy, Nirat Saini, Abhinav Shrivastava

TL;DR

The paper tackles the challenge of learning goal-conditioned robotics tasks with minimal supervision by proposing WayEx, a framework that learns from a single demonstration without requiring action-space information. It introduces a proximal waypoint reward mechanism and a knowledge expansion strategy to generalize from the demonstrated trajectory to unseen start and goal states, operating as a wrapper around standard RL algorithms with a sparse reward $R(s,a)=0$ if $s=g$ and $-1$ otherwise. Empirically, WayEx accelerates learning by guiding exploration toward waypoints and demonstrates strong performance across six tasks, often surpassing baselines that rely on many demonstrations, while matching or exceeding results even when baselines receive $100$ demonstrations. The approach offers practical impact by reducing data and computation requirements in robotic learning and improving robustness to sparse rewards, with future work exploring nonlinear state representations and image-based inputs.

Abstract

We propose WayEx, a new method for learning complex goal-conditioned robotics tasks from a single demonstration. Our approach distinguishes itself from existing imitation learning methods by demanding fewer expert examples and eliminating the need for information about the actions taken during the demonstration. This is accomplished by introducing a new reward function and employing a knowledge expansion technique. We demonstrate the effectiveness of WayEx, our waypoint exploration strategy, across six diverse tasks, showcasing its applicability in various environments. Notably, our method significantly reduces training time by 50% as compared to traditional reinforcement learning methods. WayEx obtains a higher reward than existing imitation learning methods given only a single demonstration. Furthermore, we demonstrate its success in tackling complex environments where standard approaches fall short. More information is available at: https://waypoint-ex.github.io.

WayEx: Waypoint Exploration using a Single Demonstration

TL;DR

The paper tackles the challenge of learning goal-conditioned robotics tasks with minimal supervision by proposing WayEx, a framework that learns from a single demonstration without requiring action-space information. It introduces a proximal waypoint reward mechanism and a knowledge expansion strategy to generalize from the demonstrated trajectory to unseen start and goal states, operating as a wrapper around standard RL algorithms with a sparse reward if and otherwise. Empirically, WayEx accelerates learning by guiding exploration toward waypoints and demonstrates strong performance across six tasks, often surpassing baselines that rely on many demonstrations, while matching or exceeding results even when baselines receive demonstrations. The approach offers practical impact by reducing data and computation requirements in robotic learning and improving robustness to sparse rewards, with future work exploring nonlinear state representations and image-based inputs.

Abstract

We propose WayEx, a new method for learning complex goal-conditioned robotics tasks from a single demonstration. Our approach distinguishes itself from existing imitation learning methods by demanding fewer expert examples and eliminating the need for information about the actions taken during the demonstration. This is accomplished by introducing a new reward function and employing a knowledge expansion technique. We demonstrate the effectiveness of WayEx, our waypoint exploration strategy, across six diverse tasks, showcasing its applicability in various environments. Notably, our method significantly reduces training time by 50% as compared to traditional reinforcement learning methods. WayEx obtains a higher reward than existing imitation learning methods given only a single demonstration. Furthermore, we demonstrate its success in tackling complex environments where standard approaches fall short. More information is available at: https://waypoint-ex.github.io.
Paper Structure (17 sections, 5 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 5 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: A comparison of our approach to general imitation learning techniques. (a) Traditional Imitation learning approaches require multiple expert trajectories with a known action space for training (4 shown here). (b) For our proposed method WayEx we use only one expert trajectory, and expand knowledge from this one trajectory to learn how to solve the task. During training with a single initial state ($s_0$) and a single goal state ($g_0$), our model learns to navigate back to the expert trajectory from points that are not part of the trajectory (all dotted states, which can be a combination of 4 expert trajectories shown on the top). This enables the model to successfully reach the goal state. We further introduce additional start and goal states ([$s_1, g_1$],[ $s_2,g_2$], [$s_3,g_3$]).
  • Figure 2: Visualization of Grid World Toy Example.(a) shows the environment setup with a sparse reward. (b) shows the reward for each state once the entire environment has been solved using the bellman equation bellman. (c) represents WayEx where with a single demonstration, we compute a close approximation of the reward for each state along the path to the goal.
  • Figure 3: The environments that we experimented on with WayEx. We show results on 4 different tasks: (a) pick and place, (b) peg assembly, (c) open door and (d) peg insertion. These tasks are ideal because they have a clear definition of success and therefore a clear sparse reward. However, most of these tasks cannot be solved with sparse rewards alone.
  • Figure 4: (a,b,c) shows the results of the Meta World metaworld environments when trained using SAC SAC and a batch size of 2048. (a) Open Door Task, (b) Peg Insertion Task, (c) Peg Assembly Task. (d) Pick and Place task is from OpenAI Fetch.
  • Figure 5: This figure shows the results of several baselines when the are given one expert demonstration (a,b,c) shows the results of the Meta World metaworld environments when trained using AWAC nair2021awac, MCAC MCAC, SAC + RB and AWAC + MCAC. (d,e) shows the results of these same baselines on the robosuite tasks robosuite2020
  • ...and 1 more figures