Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks
Heng Zhang, Gokhan Solak, Arash Ajoudani
TL;DR
The paper tackles safety in reinforcement learning for contact-rich robotics by addressing low-level execution risk often overlooked by high-level safe-RL methods. It introduces Bresa, a bio-inspired hierarchical framework where a high-frequency safety critic can reflexively intervene at the low-level control loop, while a slower task policy handles planning, and a variable-impedance trajectory controller enables compliant execution. The key contributions include decoupling task and safety learning, a reflex mechanism that triggers a recovery policy via a risk critic $Q_{\text{risk}}$, and integration with Cartesian impedance control to maintain safety in dynamic interactions, validated across 2D navigation and 3D maze tasks with both simulation and real-world experiments. Results show substantial improvements in the safety-to-task-success trade-off, faster learning, and robust performance under disturbances, highlighting practical impact for real robots operating in unstructured, contact-rich environments. The work opens avenues for extending reflexive safety with multi-modal sensing and broader task domains, bridging planning and low-level control in a safe, responsive manner.
Abstract
Ensuring safety in reinforcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments. While the state-of-the-art safe RL approaches mitigate risks through safe exploration or high-level recovery mechanisms, they often overlook low-level execution safety, where reflexive responses to potential hazards are crucial. Similarly, variable impedance control (VIC) enhances safety by adjusting the robot's mechanical response, yet lacks a systematic way to adapt parameters, such as stiffness and damping throughout the task. In this paper, we propose Bresa, a Bio-inspired Reflexive Hierarchical Safe RL method inspired by biological reflexes. Our method decouples task learning from safety learning, incorporating a safety critic network that evaluates action risks and operates at a higher frequency than the task solver. Unlike existing recovery-based methods, our safety critic functions at a low-level control layer, allowing real-time intervention when unsafe conditions arise. The task-solving RL policy, running at a lower frequency, focuses on high-level planning (decision-making), while the safety critic ensures instantaneous safety corrections. We validate Bresa on multiple tasks including a contact-rich robotic task, demonstrating its reflexive ability to enhance safety, and adaptability in unforeseen dynamic environments. Our results show that Bresa outperforms the baseline, providing a robust and reflexive safety mechanism that bridges the gap between high-level planning and low-level execution. Real-world experiments and supplementary material are available at project website https://jack-sherman01.github.io/Bresa.
