Table of Contents
Fetching ...

HILONet: Hierarchical Imitation Learning from Non-Aligned Observations

Shanqi Liu, Junjie Cao, Wenzhou Chen, Licheng Wen, Yong Liu

TL;DR

A new imitation learning approach called Hierarchical Imitation Learning from Observation (HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically, which can solve all kinds of tasks by achieving these sub-Goals, whether it has a single goal position or not.

Abstract

It is challenging learning from demonstrated observation-only trajectories in a non-time-aligned environment because most imitation learning methods aim to imitate experts by following the demonstration step-by-step. However, aligned demonstrations are seldom obtainable in real-world scenarios. In this work, we propose a new imitation learning approach called Hierarchical Imitation Learning from Observation(HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically. Our method can solve all kinds of tasks by achieving these sub-goals, whether it has a single goal position or not. We also present three different ways to increase sample efficiency in the hierarchical structure. We conduct extensive experiments using several environments. The results show the improvement in both performance and learning efficiency.

HILONet: Hierarchical Imitation Learning from Non-Aligned Observations

TL;DR

A new imitation learning approach called Hierarchical Imitation Learning from Observation (HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically, which can solve all kinds of tasks by achieving these sub-Goals, whether it has a single goal position or not.

Abstract

It is challenging learning from demonstrated observation-only trajectories in a non-time-aligned environment because most imitation learning methods aim to imitate experts by following the demonstration step-by-step. However, aligned demonstrations are seldom obtainable in real-world scenarios. In this work, we propose a new imitation learning approach called Hierarchical Imitation Learning from Observation(HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically. Our method can solve all kinds of tasks by achieving these sub-goals, whether it has a single goal position or not. We also present three different ways to increase sample efficiency in the hierarchical structure. We conduct extensive experiments using several environments. The results show the improvement in both performance and learning efficiency.

Paper Structure

This paper contains 25 sections, 17 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: The structure of our overall policy. The high level policy is charged for choosing a reachable sub-goal depending on current observation from all demonstrated trajectories. Low level policy is capable of achieving that sub-goal in specific steps. The rollout is shown in Algorithm. \ref{['algo']}.
  • Figure 2: All environments used in experiments. From left to right: MountainCar; LunarLander; Reacher:a robot arm try to reach the goal position; 3Dball:a platform robot try to balance the ball; Swimmer.
  • Figure 3: Hindsight Transitions. Once low level policy achieves expert trajectories observations(but not the original sub-goal), we can use hindsight methods to replace sub-goal(high level policy action) by the new one.
  • Figure 4: Performances in MountainCar environment.
  • Figure 5: Performances in LunarLander environment.
  • ...and 7 more figures