Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Kaiyan Zhao, Yiming Wang, Yuyang Chen, Yan Li, Leong Hou U, Xiaoguang Niu
TL;DR
This paper tackles low ER efficiency in deep reinforcement learning within high-dimensional state spaces and sparse rewards. It introduces Efficient Diversity-based Experience Replay (EDER), which uses Determinantal Point Processes to quantify trajectory diversity and prioritize diverse replay samples, complemented by Cholesky decomposition and rejection sampling to reduce computation. The approach yields a top-tier diversity-driven replay mechanism and demonstrates superior learning speed and final performance across Habitat HM3D, Atari, and MuJoCo tasks, outperforming strong baselines. By enabling more data-efficient learning in realistic, high-dimensional environments, EDER has practical implications for robotics, vision-based navigation, and complex control tasks. $d_{ au}=\,\det(L_{ au})$ and $L_{ au}=M^\top M$ underpin the diversity scoring, while $L_{ au}=L_C L_C^\top$ with $\det(L_{\tau})=\prod_{i=1}^b l_{ii}^2$ facilitates scalable computation.
Abstract
Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a determinantal point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.
