Proximal Policy Optimization with Adaptive Exploration

Andrei Lixandru

Proximal Policy Optimization with Adaptive Exploration

Andrei Lixandru

TL;DR

Proximal Policy Optimization with Adaptive Exploration (axPPO) tackles the exploration-exploitation tradeoff in reinforcement learning by making the entropy bonus in PPO dynamic, driven by recent performance. It defines $G_{recent}$ as a normalized moving-average of past returns and incorporates it into the PPO objective as a scaling of the entropy term. In experiments on CartPole-v1, axPPO achieves competitive or superior returns across a range of entropy coefficients, demonstrating robustness to initial exploration levels. These results suggest that performance-driven adaptive exploration can improve learning efficiency and motivate broader testing in richer domains.

Abstract

Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.

Proximal Policy Optimization with Adaptive Exploration

TL;DR

as a normalized moving-average of past returns and incorporates it into the PPO objective as a scaling of the entropy term. In experiments on CartPole-v1, axPPO achieves competitive or superior returns across a range of entropy coefficients, demonstrating robustness to initial exploration levels. These results suggest that performance-driven adaptive exploration can improve learning efficiency and motivate broader testing in richer domains.

Abstract

Paper Structure (7 sections, 3 equations, 1 table)

This paper contains 7 sections, 3 equations, 1 table.

Introduction
Methods
Algorithm
Experiments
Results
Discussion
Conclusion

Proximal Policy Optimization with Adaptive Exploration

TL;DR

Abstract

Proximal Policy Optimization with Adaptive Exploration

Authors

TL;DR

Abstract

Table of Contents