Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal
TL;DR
This work tackles the exploration problem in deep reinforcement learning by introducing Random Latent Exploration (RLE), which conditions policies on randomly sampled latent vectors $\boldsymbol{z}$ drawn from a fixed distribution $P_{\boldsymbol{z}}$. A state-dependent randomized reward $F(s,\boldsymbol{z})=\phi(s)\cdot \boldsymbol{z}$ is used, and both the policy $\pi(.|s,\boldsymbol{z})$ and value function $V^{\pi}(s,\boldsymbol{z})$ are conditioned on $\boldsymbol{z}$, with $\boldsymbol{z}$ resampled at the start of each trajectory. The method serves as a simple plug-in for PPO and demonstrates improved, deeper exploration across Atari and Isaac Gym benchmarks, as evidenced by higher aggregated scores and more diverse trajectories, while ablations confirm robustness to latent distribution and vector dimension. Although Montezuma’s Revenge remains challenging, the results indicate that random latent rewards can outperform traditional noise-based and some bonus-based strategies on a wide range of tasks, offering a scalable, general approach to exploration in deep RL.
Abstract
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL). On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors. The core idea of RLE is to encourage the agent to explore different parts of the environment by pursuing randomly sampled goals in a latent space. RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods. Our experiments show that RLE improves performance on average in both discrete (e.g., Atari) and continuous control tasks (e.g., Isaac Gym), enhancing exploration while remaining a simple and general plug-in for existing RL algorithms. Project website and code: https://srinathm1359.github.io/random-latent-exploration
