AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Renye Yan; Yaozhong Gan; You Wu; Junliang Xing; Ling Liangn; Yeshang Zhu; Yimao Cai

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Renye Yan, Yaozhong Gan, You Wu, Junliang Xing, Ling Liangn, Yeshang Zhu, Yimao Cai

TL;DR

AdaMemento, an adaptive memory-enhanced RL framework that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states, is proposed and theoretically proves the superiority of the new intrinsic motivation and ensemble mechanism.

Abstract

In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and reuse high-value policies, lacking a deeper refining and filtering of diverse past experiences and hence limiting the capability of memory. In this paper, we propose AdaMemento, an adaptive memory-enhanced RL framework. Instead of just memorizing positive past experiences, we design a memory-reflection module that exploits both positive and negative experiences by learning to predict known local optimal policies based on real-time states. To effectively gather informative trajectories for the memory, we further introduce a fine-grained intrinsic motivation paradigm, where nuances in similar states can be precisely distinguished to guide exploration. The exploitation of past experiences and exploration of new policies are then adaptively coordinated by ensemble learning to approach the global optimum. Furthermore, we theoretically prove the superiority of our new intrinsic motivation and ensemble mechanism. From 59 quantitative and visualization experiments, we confirm that AdaMemento can distinguish subtle states for better exploration and effectively exploiting past experiences in memory, achieving significant improvement over previous methods.

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

TL;DR

Abstract

Paper Structure (48 sections, 2 theorems, 19 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 48 sections, 2 theorems, 19 equations, 15 figures, 1 table, 1 algorithm.

Introduction
Related work
Memory-based RL
Intrinsic motivation
Method
Memory-reflection module
Prediction network
Reflection network
Coarse-fine distinction module
Intrinsic reward design
Proof of the retaining of policy optimality
Ensemble learning for exploration-exploitation balance
Proof of memory effectiveness
Experiments
Visualization Analysis
...and 33 more sections

Key Result

Theorem 3.2

After $k$ updates of the coarse-fine distinction network, under Assumption assu1, the optimal action remains the same after adding the intrinsic rewards. That is, for any state $s$, we have where $Q^*_1$ is the optimal $Q$ function after adding the intrinsic rewards.

Figures (15)

Figure 1: Different granularity of state discrimination. (a) versus (b) represents fine-grained distinction, where the state images look similar but are of completely different importance, which is not well addressed in previous research.
Figure 2: The figure shows that the left side represents past experience trajectories stored in the memory buffer. AdaMemento learns to avoid danger and continues updating the current optimal strategy by synthesizing and reflecting on the commonalities in these trajectories. The updated strategy is illustrated on the right side.
Figure 3: AdaMemento's framework. We evaluate each sub-module in (a) and parameters in (b) and (c).
Figure 4: Comparison in Montezuma's Revenge Environment. (a) illustrates a comparison between baseline methods before and after integration with our AdaMemento; (b) presents a performance comparison to other advanced baseline models.
Figure 5: Generalization experiments in discrete-space environments (Atari). The x-axis represents timesteps in 10 million.
...and 10 more figures

Theorems & Definitions (4)

Theorem 3.2
Theorem 3.3
proof
proof

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

TL;DR

Abstract

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (4)