Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning
Sooyoung Jang, Hyung-Il Kim
TL;DR
This work examines how the initial policy entropy affects exploration in discrete-action deep reinforcement learning. It shows that low initial entropy correlates with learning failures and that entropy distributions are biased toward low values, varying by task and initialization. The authors propose entropy-aware model initialization, which repeatedly reinitializes the model until the mean entropy across actors and steps exceeds a threshold h_th, yielding a well-initialized starting point for any RL algorithm. Empirical results on Pong and Breakout demonstrate reduced failures, substantial reward improvements, and faster learning, with modest initialization overhead that scales favorably with task complexity.
Abstract
Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.
