Table of Contents
Fetching ...

Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks

Heng Zhang, Gokhan Solak, Arash Ajoudani

TL;DR

The paper tackles safety in reinforcement learning for contact-rich robotics by addressing low-level execution risk often overlooked by high-level safe-RL methods. It introduces Bresa, a bio-inspired hierarchical framework where a high-frequency safety critic can reflexively intervene at the low-level control loop, while a slower task policy handles planning, and a variable-impedance trajectory controller enables compliant execution. The key contributions include decoupling task and safety learning, a reflex mechanism that triggers a recovery policy via a risk critic $Q_{\text{risk}}$, and integration with Cartesian impedance control to maintain safety in dynamic interactions, validated across 2D navigation and 3D maze tasks with both simulation and real-world experiments. Results show substantial improvements in the safety-to-task-success trade-off, faster learning, and robust performance under disturbances, highlighting practical impact for real robots operating in unstructured, contact-rich environments. The work opens avenues for extending reflexive safety with multi-modal sensing and broader task domains, bridging planning and low-level control in a safe, responsive manner.

Abstract

Ensuring safety in reinforcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments. While the state-of-the-art safe RL approaches mitigate risks through safe exploration or high-level recovery mechanisms, they often overlook low-level execution safety, where reflexive responses to potential hazards are crucial. Similarly, variable impedance control (VIC) enhances safety by adjusting the robot's mechanical response, yet lacks a systematic way to adapt parameters, such as stiffness and damping throughout the task. In this paper, we propose Bresa, a Bio-inspired Reflexive Hierarchical Safe RL method inspired by biological reflexes. Our method decouples task learning from safety learning, incorporating a safety critic network that evaluates action risks and operates at a higher frequency than the task solver. Unlike existing recovery-based methods, our safety critic functions at a low-level control layer, allowing real-time intervention when unsafe conditions arise. The task-solving RL policy, running at a lower frequency, focuses on high-level planning (decision-making), while the safety critic ensures instantaneous safety corrections. We validate Bresa on multiple tasks including a contact-rich robotic task, demonstrating its reflexive ability to enhance safety, and adaptability in unforeseen dynamic environments. Our results show that Bresa outperforms the baseline, providing a robust and reflexive safety mechanism that bridges the gap between high-level planning and low-level execution. Real-world experiments and supplementary material are available at project website https://jack-sherman01.github.io/Bresa.

Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks

TL;DR

The paper tackles safety in reinforcement learning for contact-rich robotics by addressing low-level execution risk often overlooked by high-level safe-RL methods. It introduces Bresa, a bio-inspired hierarchical framework where a high-frequency safety critic can reflexively intervene at the low-level control loop, while a slower task policy handles planning, and a variable-impedance trajectory controller enables compliant execution. The key contributions include decoupling task and safety learning, a reflex mechanism that triggers a recovery policy via a risk critic , and integration with Cartesian impedance control to maintain safety in dynamic interactions, validated across 2D navigation and 3D maze tasks with both simulation and real-world experiments. Results show substantial improvements in the safety-to-task-success trade-off, faster learning, and robust performance under disturbances, highlighting practical impact for real robots operating in unstructured, contact-rich environments. The work opens avenues for extending reflexive safety with multi-modal sensing and broader task domains, bridging planning and low-level control in a safe, responsive manner.

Abstract

Ensuring safety in reinforcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments. While the state-of-the-art safe RL approaches mitigate risks through safe exploration or high-level recovery mechanisms, they often overlook low-level execution safety, where reflexive responses to potential hazards are crucial. Similarly, variable impedance control (VIC) enhances safety by adjusting the robot's mechanical response, yet lacks a systematic way to adapt parameters, such as stiffness and damping throughout the task. In this paper, we propose Bresa, a Bio-inspired Reflexive Hierarchical Safe RL method inspired by biological reflexes. Our method decouples task learning from safety learning, incorporating a safety critic network that evaluates action risks and operates at a higher frequency than the task solver. Unlike existing recovery-based methods, our safety critic functions at a low-level control layer, allowing real-time intervention when unsafe conditions arise. The task-solving RL policy, running at a lower frequency, focuses on high-level planning (decision-making), while the safety critic ensures instantaneous safety corrections. We validate Bresa on multiple tasks including a contact-rich robotic task, demonstrating its reflexive ability to enhance safety, and adaptability in unforeseen dynamic environments. Our results show that Bresa outperforms the baseline, providing a robust and reflexive safety mechanism that bridges the gap between high-level planning and low-level execution. Real-world experiments and supplementary material are available at project website https://jack-sherman01.github.io/Bresa.

Paper Structure

This paper contains 18 sections, 4 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: a) Bresa framework. The RL agent operates at the decision loop, planning the high-level action $\mathbf{a}$ that is executed by the trajectory controller. The controller operates at a high-frequency control loop, executing the low-level action $\hat{\mathbf{a}}$ based on the state feedback $\hat{\mathbf{s}}$ at each control step. The reflex mechanism gives the system a quick reaction capability by interrupting the control loop in the case of high risk. b) A simplified illustration of the human central nervous system. While high-level decisions are made in the brain, safety-related reflexes are managed by the spinal cord, allowing for faster responses that override slower, more complex decision-making processes.
  • Figure 2: a) Reflex mechanism on an obstacle avoidance scenario. Even when the high-level state-action pair $(\mathbf{s}, \mathbf{a})$ is evaluated to be safe, an intermediate state-action pair $(\hat{\mathbf{s}},\hat{\mathbf{a}})$ may entail high risk ($\epsilon_\text{risk} > \epsilon_\text{safe}$) and trigger the reflex mechanism. The stochasticity of the environment leads to a drift in the outcomes of minor actions $\hat{\mathbf{a}}$. b) Flowchart of the Bresa algorithm. We color-coded the decision loop, control loop and reflex for comparison to Fig. \ref{['fig:intro']}.a. We reuse $\mathbf{s}$ instead of showing $\hat{\mathbf{s}}$ to simplify the structure, however, they are equivalent in the control loop. c) Maze exploration environment in the Mujoco simulator. The robot physically interacts with the maze walls and the obstacles through an end-effector flange equipped with F/T sensor.
  • Figure 3: Offline data collection locations in both tasks. Left: navigation task. The green and yellow circles indicate start and goal points, and red dots indicate the sampled start positions. Upper right: histogram of exponentially sampled action sizes in the maze exploration task. Lower right: sampled action locations on the maze.
  • Figure 4: Overall performance of Bresa in terms of success, violation, and the ratio between these two.
  • Figure 5: Reflexive mechanism in Navigation task during training. The risk value along with exploration is plotted in a colormap showing fine-grained risk prediction in our method while it is coarse-grained in the baseline. Blue annotations show single high-level actions. Note that the figures only show part of task space.
  • ...and 4 more figures