Table of Contents
Fetching ...

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

Sooyoung Jang, Hyung-Il Kim

TL;DR

This work examines how the initial policy entropy affects exploration in discrete-action deep reinforcement learning. It shows that low initial entropy correlates with learning failures and that entropy distributions are biased toward low values, varying by task and initialization. The authors propose entropy-aware model initialization, which repeatedly reinitializes the model until the mean entropy across actors and steps exceeds a threshold h_th, yielding a well-initialized starting point for any RL algorithm. Empirical results on Pong and Breakout demonstrate reduced failures, substantial reward improvements, and faster learning, with modest initialization overhead that scales favorably with task complexity.

Abstract

Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

TL;DR

This work examines how the initial policy entropy affects exploration in discrete-action deep reinforcement learning. It shows that low initial entropy correlates with learning failures and that entropy distributions are biased toward low values, varying by task and initialization. The authors propose entropy-aware model initialization, which repeatedly reinitializes the model until the mean entropy across actors and steps exceeds a threshold h_th, yielding a well-initialized starting point for any RL algorithm. Empirical results on Pong and Breakout demonstrate reduced failures, substantial reward improvements, and faster learning, with modest initialization overhead that scales favorably with task complexity.

Abstract

Encouraging exploration is a critical issue in deep reinforcement learning. We investigate the effect of initial entropy that significantly influences the exploration, especially at the earlier stage. Our main observations are as follows: 1) low initial entropy increases the probability of learning failure, and 2) this initial entropy is biased towards a low value that inhibits exploration. Inspired by the investigations, we devise entropy-aware model initialization, a simple yet powerful learning strategy for effective exploration. We show that the devised learning strategy significantly reduces learning failures and enhances performance, stability, and learning speed through experiments.

Paper Structure

This paper contains 5 sections, 2 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: The reward depending on the initial entropy (left: Pong, right: Breakout), where 30 models were generated to investigate the effect of the initial entropy on the performance.
  • Figure 2: The histograms of the initial entropy with 1,000 models generated with different random seeds for two tasks (left: Pong, right: Breakout).
  • Figure 3: Comparison of the entropy-aware model initialization-based DRL with the conventional DRL for two tasks, (left) Pong and (right) Breakout.