Table of Contents
Fetching ...

MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning

Xi Huang, Hongyi Zhou, Ge Li, Yucheng Tang, Weiran Liao, Björn Hein, Tamim Asfour, Rudolf Lioutikov

TL;DR

MoRe-ERL addresses reactive, safe, and efficient robotic motion in dynamic environments by combining Episodic Reinforcement Learning with residual learning, parameterizing refinements as B-spline Movement Primitives on segments of a preplanned reference trajectory. The method identifies start and end times $\alpha_s$ and $\alpha_e$ for residuals, uses BMPs to guarantee smooth transitions and boundary conditioning, and jointly learns residuals and timing to adapt to changing task contexts. Across multi-box and dual-arm experiments, MoRe-ERL achieves superior sample efficiency and task performance compared with baselines, while maintaining a practical sim-to-real transfer with minimal gap. The framework is base-planner-agnostic and can integrate with arbitrary ERL methods and motion generators, offering a general approach to refine trajectories without discarding valuable prior maneuver knowledge.

Abstract

We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.

MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning

TL;DR

MoRe-ERL addresses reactive, safe, and efficient robotic motion in dynamic environments by combining Episodic Reinforcement Learning with residual learning, parameterizing refinements as B-spline Movement Primitives on segments of a preplanned reference trajectory. The method identifies start and end times and for residuals, uses BMPs to guarantee smooth transitions and boundary conditioning, and jointly learns residuals and timing to adapt to changing task contexts. Across multi-box and dual-arm experiments, MoRe-ERL achieves superior sample efficiency and task performance compared with baselines, while maintaining a practical sim-to-real transfer with minimal gap. The framework is base-planner-agnostic and can integrate with arbitrary ERL methods and motion generators, offering a general approach to refine trajectories without discarding valuable prior maneuver knowledge.

Abstract

We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.

Paper Structure

This paper contains 17 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: MoRe-ERL applies residuals to adjust the trajectory of a KUKA iiwa robot into a safe and feasible motion , effectively avoiding the moving UR5 robot. Snapshots of the KUKA iiwa executing the adjusted motion are outlined in green, while the red marker highlights the frame on the reference trajectory where the robots would have collided.
  • Figure 2: Illustration of the MoRe-ERL pipeline. The robot follows the reference trajectory provided by a motion generator, with the executed segment shown in and the remainder in , based on the task context. When the task context changes, such as the appearance of new obstacles, MoRe-ERL identifies critical segments on the remaining reference trajectory ( ) using learned parameters $\bm \alpha = [\alpha_s, \alpha_e]^\top$ and parameterize residuals $f(\bm{w})$ for the selected segments using B-spline-based movement primitives. The adjusted trajectory, after applying these residuals, is shown by the solid blue-green curve ( ).
  • Figure 3: Illustration of BMPs: (a) A clamped B-spline curve in 2D parameterized with 6 control points. (b) Basis function of different orders using recursive formulation, where $\Phi_0^p$ denotes the basis function of $p^{\mathrm{th}}$ order for the $0^{\mathrm{th}}$ control point. The knots $u$ represent the change of time.
  • Figure 4: MoRe-ERL residuals and two ablation variants. The reference trajectory is shown in green, with bold solid points indicating the timing variables $\alpha_s$ and $\alpha_e$. Cyan sections show learned residuals or replacements, and the solid blue-green curve denotes the adjusted trajectory.
  • Figure 5: Random roll-out using MoRe-ERL and step-based residual method. In the demonstrated case, the trajectory with MoRe-ERL residuals ( ) deviates from the reference trajectory at $\alpha_s = 20$ and converges back at $\alpha_e = 70$.
  • ...and 2 more figures