Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

Bingkun Huang; Yuhe Gong; Zewen Yang; Tianyu Ren; Luis Figueredo

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

Bingkun Huang, Yuhe Gong, Zewen Yang, Tianyu Ren, Luis Figueredo

TL;DR

This paper addresses safe and robust control for contact-rich robotic manipulation by learning in task space with a ProMP-based trajectory prior, reinforced through PPO, while enforcing energy-aware, passivity-based safety using an energy-tank layer and Cartesian impedance control. The key idea is to refine smooth, compliant trajectories in weight space, condition trajectories via via-points, and gate execution to respect energy and power limits during contact. The authors demonstrate that this PPT framework yields higher task success, smoother trajectories, and safer interactions in both simulation (box pushing and maze sliding) and real-world experiments, with strong sim-to-real transfer and generalization to unseen contact conditions. The work offers a practical, safety-conscious pathway for deploying learning-based manipulation policies on real robots without task-specific retuning, potentially enabling more reliable deployment in cluttered, contact-rich environments.

Abstract

Reinforcement learning (RL) approaches based on Markov Decision Processes (MDPs) are predominantly applied in the robot joint space, often relying on limited task-specific information and partial awareness of the 3D environment. In contrast, episodic RL has demonstrated advantages over traditional MDP-based methods in terms of trajectory consistency, task awareness, and overall performance in complex robotic tasks. Moreover, traditional step-wise and episodic RL methods often neglect the contact-rich information inherent in task-space manipulation, especially considering the contact-safety and robustness. In this work, contact-rich manipulation tasks are tackled using a task-space, energy-safe framework, where reliable and safe task-space trajectories are generated through the combination of Proximal Policy Optimization (PPO) and movement primitives. Furthermore, an energy-aware Cartesian Impedance Controller objective is incorporated within the proposed framework to ensure safe interactions between the robot and the environment. Our experimental results demonstrate that the proposed framework outperforms existing methods in handling tasks on various types of surfaces in 3D environments, achieving high success rates as well as smooth trajectories and energy-safe interactions.

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

TL;DR

Abstract

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)