MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning
Xi Huang, Hongyi Zhou, Ge Li, Yucheng Tang, Weiran Liao, Björn Hein, Tamim Asfour, Rudolf Lioutikov
TL;DR
MoRe-ERL addresses reactive, safe, and efficient robotic motion in dynamic environments by combining Episodic Reinforcement Learning with residual learning, parameterizing refinements as B-spline Movement Primitives on segments of a preplanned reference trajectory. The method identifies start and end times $\alpha_s$ and $\alpha_e$ for residuals, uses BMPs to guarantee smooth transitions and boundary conditioning, and jointly learns residuals and timing to adapt to changing task contexts. Across multi-box and dual-arm experiments, MoRe-ERL achieves superior sample efficiency and task performance compared with baselines, while maintaining a practical sim-to-real transfer with minimal gap. The framework is base-planner-agnostic and can integrate with arbitrary ERL methods and motion generators, offering a general approach to refine trajectories without discarding valuable prior maneuver knowledge.
Abstract
We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.
