DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay
Daniel Perkins, Oscar J. Escobar, Luke Green
TL;DR
This work analyzes how epsilon-greedy exploration schedules and replay memory affect Deep Q-Network (DQN) training in a finite, deterministic CartPole setting. It contrasts classic Q-learning with DQN, examining exponential and other decaying epsilon strategies, and evaluates uniform versus prioritized experience replay. Key findings show that fast, super-linear epsilon decay can improve cumulative rewards, while prioritized replay often boosts sample efficiency but can introduce runtime and performance variability depending on the task and schedule. The study highlights the interplay between exploration, memory management, and function approximation, offering practical guidance for robust deep RL in resource-constrained environments. The results suggest that PER is beneficial in more complex domains, but for simple environments like CartPole, uniform replay can be sufficient, with hyperparameter tuning being crucial for optimal performance.
Abstract
We present a detailed study of Deep Q-Networks in finite environments, emphasizing the impact of epsilon-greedy exploration schedules and prioritized experience replay. Through systematic experimentation, we evaluate how variations in epsilon decay schedules affect learning efficiency, convergence behavior, and reward optimization. We investigate how prioritized experience replay leads to faster convergence and higher returns and show empirical results comparing uniform, no replay, and prioritized strategies across multiple simulations. Our findings illuminate the trade-offs and interactions between exploration strategies and memory management in DQN training, offering practical recommendations for robust reinforcement learning in resource-constrained settings.
