LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Woojun Kim; Jeonghye Kim; Youngchul Sung

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Woojun Kim, Jeonghye Kim, Youngchul Sung

Abstract

In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Abstract

Paper Structure (27 sections, 11 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 11 equations, 16 figures, 2 tables, 1 algorithm.

Introduction
Background and Related Works
Methodology
Target Policy
Behavior Policy Construction via Option Model
Learning the Option Model
Experiments
Performance Comparison
Analysis
Conclusion
Environment Specifications
MiniGrid
Atari 2600
Implementation Details
Architecture of Neural Networks
...and 12 more sections

Figures (16)

Figure 1: Overall diagram of LESSON: The blue box shows the behavior policy realized by the proposed option model. The option selection policy $\pi_\Omega$ selects an intra-policy and the corresponding termination function. The target policy denoted by the red box is trained using the samples generated by the behavior policy.
Figure 2: Performance comparison on the MiniGrid tasks. More results are provided in Appendix \ref{['sec:appx-experimental-results']}.
Figure 3: Performance comparison on the Atari 2600 tasks
Figure 4: Comparison of LESSON with the baselines in the Empty-16x16 environment with the goal at the right lower corner: (a) the view of environment, (b) performance comparison, (c) the termination probabilities $\beta_\omega$ over time for LESSON, and (d) state visitation frequency. (Fig.4(a) was obtained by rendering the MiniGrid Empty-16x16 environment while training gym_MiniGrid.)
Figure 5: Option selection policy and termination probability during training.
...and 11 more figures

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Abstract

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Authors

Abstract

Table of Contents

Figures (16)