Table of Contents
Fetching ...

SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks

Heng Zhang, Gokhan Solak, Gustavo J. G. Lahr, Arash Ajoudani

TL;DR

SRL-VIC is proposed: a model-free safe RL framework combined with a variable impedance controller (VIC) that can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization.

Abstract

Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic and recovery policy networks are pre-trained where safety critic evaluates the safety of the next action using a risk value before it is executed and the recovery policy suggests a corrective action if the risk value is high. Furthermore, the policies are updated online where the task policy not only achieves the task but also modulates the stiffness parameters to keep a safe and compliant profile. A set of experiments in contact-rich maze tasks demonstrate that our framework outperforms the baselines (without the recovery mechanism and without the VIC), yielding a good trade-off between efficient task accomplishment and safety guarantee. We show our policy trained on simulation can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization. The video is available at https://youtu.be/ksWXR3vByoQ.

SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks

TL;DR

SRL-VIC is proposed: a model-free safe RL framework combined with a variable impedance controller (VIC) that can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization.

Abstract

Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic and recovery policy networks are pre-trained where safety critic evaluates the safety of the next action using a risk value before it is executed and the recovery policy suggests a corrective action if the risk value is high. Furthermore, the policies are updated online where the task policy not only achieves the task but also modulates the stiffness parameters to keep a safe and compliant profile. A set of experiments in contact-rich maze tasks demonstrate that our framework outperforms the baselines (without the recovery mechanism and without the VIC), yielding a good trade-off between efficient task accomplishment and safety guarantee. We show our policy trained on simulation can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization. The video is available at https://youtu.be/ksWXR3vByoQ.
Paper Structure (21 sections, 5 equations, 9 figures)

This paper contains 21 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: Maze exploration is an unstructured contact-rich task that has practical applications such as cable-lying in walls. It also has similarities to search missions in dark and narrow environments. The agent does not have access to vision, thus it should navigate using only the contact information.
  • Figure 2: The proposed framework: we combine the recovery-based safe RL approach thananjeyan2021recovery with VIC to solve the contact-rich maze exploration task. We first use an automated procedure to collect the offline data and pre-train our safety critic and recovery policy. Then, we train all learning components using online data. The risk value $\epsilon_{risk}$ is used to activate either the task or recovery policy. The action $a_t$ is chosen by the activated component, and it is fed to the VIC. Our action includes a relative position change and desired stiffness vector $\{K_x, K_y\}$.
  • Figure 3: Trajectories of the end-effector during different stages of training, in top-down view\ref{['fn:traj']}.
  • Figure 4: Learning curves for maze exploration. Our framework (SRL-VIC) outperforms others, particularly in terms of ratio of successes/violation and cumulative violations, i.e., our framework is safer than others. Furthermore, it learns faster, achieving success at an earlier stage. The large deviation area in the ratio of successes/violations comes from the division operation; the cumulative successes and violations do not exhibit large variation.
  • Figure 5: Different behaviours when encountering obstacles at training episode 1000\ref{['fn:traj']}. Top (SRL-K300): The robot cannot move forward and fails due to low stiffness. Bottom (SRL-VIC): VIC approach switches to high stiffness when there are some obstacles on the way. The method learns to safely push the obstacles without exceeding the force threshold.
  • ...and 4 more figures