Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning
Zengjie Zhang, Jayden Hong, Amir Soufi Enayati, Homayoun Najjaran
TL;DR
The paper tackles slow training and limited generalization in reinforcement-learning–based robot motion planning by proposing an IBC-DMP RL framework that fuses implicit behavior cloning with a multi-DoF dynamic movement primitive. It introduces a dual-buffer training pipeline, a reshaped actor loss based on energy-based IBC, and a refined critic loss to leverage human demonstrations without overfitting. Comprehensive simulations and hardware experiments show faster convergence, higher generalization, and reliable collision avoidance, including a real-robot assembly task. The work demonstrates that incorporating motion primitives and human demonstrations via IBC can markedly enhance the efficiency and robustness of RL for robot motion planning, with practical implications for industrial automation.
Abstract
Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.
