KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

Jingxian Lu; Wenke Xia; Dong Wang; Zhigang Wang; Bin Zhao; Di Hu; Xuelong Li

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

TL;DR

This work introduces the hybrid Key-state guided Online Imitation (KOI) learning method, which leverages the integration of semantic and motion key states as guidance for reward estimation, and refine the trajectory-matching reward computation.

Abstract

Online Imitation Learning struggles with the gap between extensive online exploration space and limited expert trajectories, hindering efficient exploration due to inaccurate reward estimation. Inspired by the findings from cognitive neuroscience, we hypothesize that an agent could estimate precise task-aware reward for efficient online exploration, through decomposing the target task into the objectives of "what to do" and the mechanisms of "how to do". In this work, we introduce the hybrid Key-state guided Online Imitation (KOI) learning method, which leverages the integration of semantic and motion key states as guidance for reward estimation. Initially, we utilize visual-language models to extract semantic key states from expert trajectory, indicating the objectives of "what to do". Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do". By integrating a thorough grasp of hybrid key states, we refine the trajectory-matching reward computation, accelerating online imitation learning with task-aware exploration. We evaluate not only the success rate of the tasks in the Meta-World and LIBERO environments, but also the trend of variance during online imitation learning, proving that our method is more sample efficient. We also conduct real-world robotic manipulation experiments to validate the efficacy of our method, demonstrating the practical applicability of our KOI method. Videos and code are available at https://gewu-lab.github.io/Keystate_Online_Imitation/.

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 12 figures, 3 tables)

This paper contains 24 sections, 9 equations, 12 figures, 3 tables.

Introduction
Related Work
Method
Background
Semantic Key-state Extraction
Motion Key-state Identification
Importance Weight Adaption
Learning Paradigm
Experiments
Experiments Setting
Comparison Experiments
Ablation Experiments
Qualitative Analysis
Real-world Results
Conclusion and Limitation
...and 9 more sections

Figures (12)

Figure 1: The pipeline of our hybrid Key-state guided Online Imitation (KOI) learning method. We first extract semantic key states with the Semantic Decomposition Module. Within intervals between semantic key states, the Motion Capture Module is proposed to identify motion key states. Further, we adjust the importance weight in OT-based reward estimation with these hybrid key states, to enable task-aware exploration for efficient online imitation learning. The color intensity of OT matrix represents the value of estimated reward.
Figure 2: A subset of experiment results on Meta-World and LIBERO suites, with exploration 5$\times$1e5 and 2$\times$1e5 timesteps respectively. The shaded region represents $\pm$ 1 standard deviation across 3 seeds. The results prove that our KOI method excels in sample efficiency compared to others.
Figure 3: The trend of variance during online imitation learning. The variances of KOI continuously decreases over time.
Figure 4: Ablation experiments of our proposed Semantic Decomposition Module (SDM) and Motion Capture Module (MCM). The shaded region represents $\pm$ 1 standard deviation across 3 seeds.
Figure 5: The qualitative analysis of our method. (a) demonstrates the selected semantic and motion key states. (b)-(d) illustrates three representative cases and estimated reward values in "bin picking" tasks. The triangular and pentagonal icons represent two objectives of this task, with the filling denoting completion status, the corresponding estimated rewards are linked by dashed lines.
...and 7 more figures

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

TL;DR

Abstract

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

Authors

TL;DR

Abstract

Table of Contents

Figures (12)