Table of Contents
Fetching ...

Data Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control

Cong Li, Fangzhou Liu, Yongchao Wang, Martin Buss

TL;DR

This work tackles the data inefficiency of reinforcement learning for high-dimensional robotic tracking by introducing DR-RL, a data-informed residual RL framework. DR-RL decouples a robot into low-dimensional subsystems and leverages one-step backward (OSBK) data to form incremental subsystems that admit model-free, parallel learning and rigorous analysis. By combining an incremental base policy with an incremental residual policy and learning the residual on top of a guided base, the method achieves improved sample efficiency and robustness to dynamic environments, with stability and weight-convergence guarantees. The approach is validated numerically on a 7-DoF KUKA iiwa and experimentally on a 3-DoF manipulator, showing superior performance and scalability compared to baselines, and demonstrating practical applicability for high-dimensional robotic tracking.

Abstract

The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application towards continuous robotic tracking control, especially for high-dimensional robots. This work proposes a data-informed residual reinforcement learning (DR-RL) based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward (OSBK) data to construct incremental subsystems that are equivalent model-free representations of the above decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability towards dynamically changing environments. Our proposed DR-RL based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.

Data Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control

TL;DR

This work tackles the data inefficiency of reinforcement learning for high-dimensional robotic tracking by introducing DR-RL, a data-informed residual RL framework. DR-RL decouples a robot into low-dimensional subsystems and leverages one-step backward (OSBK) data to form incremental subsystems that admit model-free, parallel learning and rigorous analysis. By combining an incremental base policy with an incremental residual policy and learning the residual on top of a guided base, the method achieves improved sample efficiency and robustness to dynamic environments, with stability and weight-convergence guarantees. The approach is validated numerically on a 7-DoF KUKA iiwa and experimentally on a 3-DoF manipulator, showing superior performance and scalability compared to baselines, and demonstrating practical applicability for high-dimensional robotic tracking.

Abstract

The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application towards continuous robotic tracking control, especially for high-dimensional robots. This work proposes a data-informed residual reinforcement learning (DR-RL) based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward (OSBK) data to construct incremental subsystems that are equivalent model-free representations of the above decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability towards dynamically changing environments. Our proposed DR-RL based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.

Paper Structure

This paper contains 28 sections, 4 theorems, 75 equations, 7 figures, 4 tables.

Key Result

Lemma 1

Given a sufficiently high sampling rate, $\exists \Bar{\xi}_i \in \mathbb{R}^+$, there holds $\left\|\xi_i\right\| \leq \Bar{\xi}_i$.

Figures (7)

  • Figure 1: Schematic of the DR-RL based robotic tracking control policy. The original high-dimensional robotic tracking task is decoupled into subtasks of incremental subsystems for efficient parallel implementation. The incremental policies, a combination of the incremental base policy and the incremental residual policy, are learned in parallel to solve the associated subtasks. The incremental base policy provides a policy initialization for the subsequent incremental residual policy learning process.
  • Figure 2: The numerical validation results on the 7-DoF KUKA iiwa robot manipulator. Top: the evolution trajectories of the $i$th subsystem's tracking error $e_{i_{1}}$, $e_{i_{2}}$ and the learned IRP $\Delta \hat{u}_{i_{r}}$, $i = 1,2,\cdots,7$; Bottom: the comparative simulation results focusing on the $2$nd subsystem, including the associated error trajectories of both incremental base policy (IBP) and DR-RL cases, the evolution trajectories of the IBP $\Delta \hat{u}_{i_{r}}$ and the incremental residual policy $\Delta \hat{u}_{i_{r}}$, and the weight convergence result.
  • Figure 3: The comparative numerical simulation results about the task-space task of the 2-DoF robot manipulator. Top: the evolution trajectories of the $i$th subsystem's tracking errors $e_{i_{1}}$, $e_{i_{2}}$ based on the DF-RL and DR-RL methods, $i = 1,2$; Bottom: the evolution trajectories of the $i$th subsystem's 4-D estimated NN weight $\hat{W}_i = [\hat{W}_{i_{1}},\cdots,\hat{W}_{i_{4}}]^{\top}$ for the DR-RL method, and the $i$th subsystem's 10-D estimated NN weight $\hat{W}_i = [\hat{W}_{i_{1}},\cdots,\hat{W}_{i_{10}}]^{\top}$ for the DF-RL method, $i = 1,2$.
  • Figure 4: The experimental validation results about the 3-DoF robot manipulator. Top: the evolution trajectories of the $i$th subsystem's tracking error $e_{i_{1}}$ under different payloads, $i = 1,2,3$; Bottom: the evolution trajectories of the $i$th subsystem's 4-D estimated weight $\hat{W}_i = [\hat{W}_{i1},\cdots,\hat{W}_{i4}]^{\top}$, $i = 1,2,3$ for the $500~g$ payload case.
  • Figure 5: The numerical validation results on the 7-DoF KUKA iiwa robot manipulator. The simulation results of the subsystem 1 to the subsystem 7 are displayed from bottom to top. In each row, each subsystem's evolution trajectories of angle error $e_{i1}$, angle velocity error $e_{i2}$, incremental control inputs $\Delta u_{i_{b}}$ and $\Delta u_{i_{r}}$, and weight $\hat{W}_{i}$ are presented from left to right in sequence.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Example 1
  • Remark 1
  • Remark 2
  • Example 2
  • Example 3
  • Remark 3
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • ...and 9 more