Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

Ziheng Cheng; Xin Guo; Yufei Zhang

Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

Ziheng Cheng, Xin Guo, Yufei Zhang

Abstract

The theory of continuous-time reinforcement learning (RL) has progressed rapidly in recent years. While the ultimate objective of RL is typically to learn deterministic control policies, most existing continuous-time RL methods rely on stochastic policies. Such approaches often require sampling actions at very high frequencies, and involve computationally expensive expectations over continuous action spaces, resulting in high-variance gradient estimates and slow convergence. In this paper, we introduce and develop deterministic policy gradient (DPG) methods for continuous-time RL. We derive a continuous-time policy gradient formula expressed as the expected gradient of an advantage rate function and establish a martingale characterization for both the value function and the advantage rate. These theoretical results provide tractable estimators for deterministic policy gradients in continuous-time RL. Building on this foundation, we propose a model-free continuous-time Deep Deterministic Policy Gradient (CT-DDPG) algorithm that enables stable learning for general reinforcement learning problems with continuous time-and-state. Numerical experiments show that CT-DDPG achieves superior stability and faster convergence compared to existing stochastic-policy methods, across a wide range of learning tasks with varying time discretizations and noise levels.

Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

Abstract

Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (31)