Table of Contents
Fetching ...

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Kaiyan Zhao, Yiming Wang, Yuyang Chen, Yan Li, Leong Hou U, Xiaoguang Niu

TL;DR

This paper tackles low ER efficiency in deep reinforcement learning within high-dimensional state spaces and sparse rewards. It introduces Efficient Diversity-based Experience Replay (EDER), which uses Determinantal Point Processes to quantify trajectory diversity and prioritize diverse replay samples, complemented by Cholesky decomposition and rejection sampling to reduce computation. The approach yields a top-tier diversity-driven replay mechanism and demonstrates superior learning speed and final performance across Habitat HM3D, Atari, and MuJoCo tasks, outperforming strong baselines. By enabling more data-efficient learning in realistic, high-dimensional environments, EDER has practical implications for robotics, vision-based navigation, and complex control tasks. $d_{ au}=\,\det(L_{ au})$ and $L_{ au}=M^\top M$ underpin the diversity scoring, while $L_{ au}=L_C L_C^\top$ with $\det(L_{\tau})=\prod_{i=1}^b l_{ii}^2$ facilitates scalable computation.

Abstract

Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a determinantal point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

TL;DR

This paper tackles low ER efficiency in deep reinforcement learning within high-dimensional state spaces and sparse rewards. It introduces Efficient Diversity-based Experience Replay (EDER), which uses Determinantal Point Processes to quantify trajectory diversity and prioritize diverse replay samples, complemented by Cholesky decomposition and rejection sampling to reduce computation. The approach yields a top-tier diversity-driven replay mechanism and demonstrates superior learning speed and final performance across Habitat HM3D, Atari, and MuJoCo tasks, outperforming strong baselines. By enabling more data-efficient learning in realistic, high-dimensional environments, EDER has practical implications for robotics, vision-based navigation, and complex control tasks. and underpin the diversity scoring, while with facilitates scalable computation.

Abstract

Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a determinantal point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.

Paper Structure

This paper contains 24 sections, 4 theorems, 27 equations, 10 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Let $M \in \mathbb{R}^{d \times b}$ be a matrix whose columns are the $\ell_2$-normalized state vectors $\hat{s}$ in trajectory $\tau_j$. The determinant $\det(L_{\tau_j})$ of the kernel matrix $L_{\tau_j} = M^T M$ reaches its maximum value when the state vectors are mutually orthogonal, indicating

Figures (10)

  • Figure 1: Sample distribution comparison of the replay buffer. Left: Uniform sampling results in an imbalanced distribution, with some data types overrepresented and others underrepresented. Right: Our method achieves a more balanced and diverse selection of samples, enhancing overall diversity and improving learning efficiency.
  • Figure 2: In the EDER framework, we leverage the Determinantal Point Process (DPP) to compute diversity scores for trajectories via Cholesky decomposition, enhancing the sampling process. Specifically, our method first uses these diversity scores to select the top $m$ most diverse trajectories. Next, we apply a rejection sampling technique to choose a subset of these trajectories for policy updates. The resulting diverse samples facilitate more efficient learning, particularly in high-dimensional environments.
  • Figure 3: Habitat scene.
  • Figure 4: Success rates between EDER and other baselines
  • Figure 5: Trajectories of policies trained with different exploration algorithms on the Habitat environment
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 1: Correlation between Determinant and Diversity
  • Theorem 2
  • Theorem 1: Correlation between Determinant and Diversity
  • proof
  • Theorem 2: Time Complexity of EDER
  • proof