Contact Energy Based Hindsight Experience Prioritization

Erdi Sayar; Zhenshan Bing; Carlo D'Eramo; Ozgur S. Oguz; Alois Knoll

Contact Energy Based Hindsight Experience Prioritization

Erdi Sayar, Zhenshan Bing, Carlo D'Eramo, Ozgur S. Oguz, Alois Knoll

TL;DR

Sparse rewards present a major challenge in multi-goal RL for robotic manipulation, and uniform hindsight goal sampling can be inefficient. The authors introduce Contact Energy Based Prioritization (CEBP), which computes a continuous contact energy from gripper tactile forces and object displacement, smoothed with a sigmoid $\sigma(x)=\frac{k}{1+e^{-xT}}$, and samples replay episodes with probability $p_{episode}(e) \propto \sum_t \tilde{c}(e,t)$, complemented by importance-sampling weights $w_i=(N\cdot p_{episode}(e))^{-\beta}$ to correct bias. Empirically, CEBP outperforms or matches CPER, PER, MEP, EBP, and HER on three Fetch tasks, with ablation showing the sigmoid temperature affects learning speed and final performance. A Sim2Real demonstration transfers the trained policy to a Franka robot for pick-and-place, underscoring the practical impact of tactile-guided replay prioritization for contact-rich robotics.

Abstract

Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be utilized as a contribution to learning. However, HER uniformly chooses failed trajectories, without taking into account which ones might be the most valuable for learning. In this paper, we address this problem and propose a novel approach Contact Energy Based Prioritization~(CEBP) to select the samples from the replay buffer based on rich information due to contact, leveraging the touch sensors in the gripper of the robot and object displacement. Our prioritization scheme favors sampling of contact-rich experiences, which are arguably the ones providing the largest amount of information. We evaluate our proposed approach on various sparse reward robotic tasks and compare them with the state-of-the-art methods. We show that our method surpasses or performs on par with those methods on robot manipulation tasks. Finally, we deploy the trained policy from our method to a real Franka robot for a pick-and-place task. We observe that the robot can solve the task successfully. The videos and code are publicly available at: https://erdiphd.github.io/HER_force

Contact Energy Based Hindsight Experience Prioritization

TL;DR

, and samples replay episodes with probability

, complemented by importance-sampling weights

to correct bias. Empirically, CEBP outperforms or matches CPER, PER, MEP, EBP, and HER on three Fetch tasks, with ablation showing the sigmoid temperature affects learning speed and final performance. A Sim2Real demonstration transfers the trained policy to a Franka robot for pick-and-place, underscoring the practical impact of tactile-guided replay prioritization for contact-rich robotics.

Abstract

Paper Structure (14 sections, 6 equations, 5 figures, 1 algorithm)

This paper contains 14 sections, 6 equations, 5 figures, 1 algorithm.

INTRODUCTION
RELATED WORK
Tactile Feedback
Energy-Based Hindsight Experience Prioritization
Prioritized Experience Replay
Maximum Entropy-based Prioritization
Hindsight Experience Replay
BACKGROUND
METHODOLOGY
Contact Energy Prioritization
EXPERIMENT
Sim2Real
Conclusions
Acknowledgements

Figures (5)

Figure 1: Overview of the robotic manipulation benchmark tasks.
Figure 2: Median success rate for all three Fetch tasks. The average success rate (line) and interquartile range (shaded) are shown with training 5 random seeds.
Figure 3: Through the ablation study, we analyze the impact of different temperature parameters $T$ of sigmoid function on the success rate. The median success rate (line) and interquartile range (shaded) are shown with training 5 random seeds.
Figure 4: Franka robot setups
Figure 5: The camera view, showing the identified ArUco markers and detected object.

Contact Energy Based Hindsight Experience Prioritization

TL;DR

Abstract

Contact Energy Based Hindsight Experience Prioritization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)