Enhancing Policy Gradient with the Polyak Step-Size Adaption
Yunxiang Li, Rui Yuan, Chen Fan, Mark Schmidt, Samuel Horváth, Robert M. Gower, Martin Takáč
TL;DR
This work tackles the sensitivity of policy-gradient methods to learning-rate choices by introducing a Polyak step-size variant tailored for RL. The approach removes the need for problem-specific constants via a stochastic SPS_max-like update and addresses unknown optimal values $V^*$ with a twin-model method, complemented by an entropy-regularized loss to prevent explosive updates. The proposed algorithm combines twin-model-based $V^*$ estimation with GPOMDP gradient estimation under an adaptive step-size, and it is empirically shown to yield faster convergence and more stable policies than Adam on standard control tasks. Overall, the method provides a practical, hyper-parameter-free adaptive learning-rate mechanism that improves sample efficiency and stability in policy-gradient RL.
Abstract
Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically adjusts the step-size without prior knowledge. To adapt this method to RL settings, we address several issues, including unknown f* in the Polyak step-size. Additionally, we showcase the performance of the Polyak step-size in RL through experiments, demonstrating faster convergence and the attainment of more stable policies.
