HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
Dániel Horváth, Jesús Bujalance Martín, Ferenc Gábor Erdős, Zoltán Istenes, Fabien Moutarde
TL;DR
This work tackles the difficulty of training off-policy reinforcement learning agents for robotics in continuous, high-dimensional, and sparse-reward environments without demonstrations. It introduces HiER, which adds a secondary highlight replay buffer to store and emphasize the most relevant experiences, and HiER+, which integrates a data-collection curriculum method (E2H-ISE) to further boost learning. Empirical results across 8 tasks on Panda-Gym, Fetch, and PointMaze benchmarks show that HiER and HiER+ consistently outperform strong baselines and even state-of-the-art variants, reducing the likelihood of getting stuck in local minima and enabling more reliable task success. The proposed approach provides a versatile, generalizable improvement to off-policy RL in robotics, with potential for broader applicability and future exploration of more sophisticated curriculum strategies.
Abstract
Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/.
