Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Zengjie Zhang; Jayden Hong; Amir Soufi Enayati; Homayoun Najjaran

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Zengjie Zhang, Jayden Hong, Amir Soufi Enayati, Homayoun Najjaran

TL;DR

The paper tackles slow training and limited generalization in reinforcement-learning–based robot motion planning by proposing an IBC-DMP RL framework that fuses implicit behavior cloning with a multi-DoF dynamic movement primitive. It introduces a dual-buffer training pipeline, a reshaped actor loss based on energy-based IBC, and a refined critic loss to leverage human demonstrations without overfitting. Comprehensive simulations and hardware experiments show faster convergence, higher generalization, and reliable collision avoidance, including a real-robot assembly task. The work demonstrates that incorporating motion primitives and human demonstrations via IBC can markedly enhance the efficiency and robustness of RL for robot motion planning, with practical implications for industrial automation.

Abstract

Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

TL;DR

Abstract

Paper Structure (40 sections, 18 equations, 12 figures, 7 tables, 2 algorithms)

This paper contains 40 sections, 18 equations, 12 figures, 7 tables, 2 algorithms.

Introduction
Preliminaries and Related Work
Dynamic Movement Primitive (DMP)
Off-policy Reinforcement Learning (RL)
Behavior Cloning (BC)
Framework and Problem Statement
Motion Planning Using Multi-DoF DMP
Motion Planning Problem with Shaped Reward
Human Demonstration Collection
Human Data Acquisition
Data Preprocessing: Normalization
Translating Hand Trajectories Using DMP
Training of an IBC-DMP agent
The Training Overview
The Training Details
...and 25 more sections

Figures (12)

Figure 1: Motion planning is the essential problem of more complicated robotic tasks, e.g. grasping and manipulation.
Figure 2: The IBC-DMP RL framework, where $\mathcal{X}_t$ is the comprehensive state of the multi-DoF DMP model, $\tilde{\mathbf{f}}(\mathcal{X}_t)$ and $\mathbf{f}(\mathcal{X}_t)$ are the virtual force functions provided by the human demonstration and the IBC-DMP agent, respectively, and $s_t$ and $a_t$ are the state and action data stored in the buffers. The solid arrows denote the interactions, the dotted arrows indicate data storing, and the dashed arrow represents agent training. Besides, the left column (in blue) is prepared only once before the training process, while the right column (in red) is repeatedly updated during training.
Figure 3: The top view of the data recording experiment, where $\mathcal{I}$, $\mathcal{O}$, and $\mathcal{G}$ are the initial, obstacle, and goal positions, respectively.
Figure 4: The demonstration recording process. (a). Starting from the initial position. (b). Avoiding the obstacle. (c). Reaching the goal.
Figure 5: The recorded hand trajectories, where the blue dot denotes the starting position and the orange stars are the goal positions.
...and 7 more figures

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

TL;DR

Abstract

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)