Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes

Erhan Bayraktar; Ali D. Kara; Somnath Pradhan; Serdar Yuksel

Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes

Erhan Bayraktar, Ali D. Kara, Somnath Pradhan, Serdar Yuksel

Abstract

This paper develops a quantized Q-learning algorithm for the optimal control of controlled diffusion processes on $\mathbb{R}^d$ under both discounted and ergodic (average) cost criteria. We first establish near-optimality of finite-state MDP approximations to discrete-time discretizations of the diffusion, then introduce a quantized Q-learning scheme and prove its almost-sure convergence to near-optimal policies for the finite MDP. These policies, when interpolated to continuous time, are shown to be near-optimal for the original diffusion model under discounted costs and -- via a vanishing-discount argument -- also under ergodic costs for sufficiently small discount factors. The analysis applies under mild conditions (Lipschitz dynamics, non-degeneracy, bounded continuous costs, and Lyapunov stability for ergodic case) without requiring prior knowledge of the system dynamics or restrictions on control policies (beyond admissibility). Our results complement recent work on continuous-time reinforcement learning for diffusions by providing explicit near-optimality rates and extending rigorous guarantees both for discounted cost and ergodic cost criteria for diffusions with unbounded state space.

Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes

Abstract

This paper develops a quantized Q-learning algorithm for the optimal control of controlled diffusion processes on

under both discounted and ergodic (average) cost criteria. We first establish near-optimality of finite-state MDP approximations to discrete-time discretizations of the diffusion, then introduce a quantized Q-learning scheme and prove its almost-sure convergence to near-optimal policies for the finite MDP. These policies, when interpolated to continuous time, are shown to be near-optimal for the original diffusion model under discounted costs and -- via a vanishing-discount argument -- also under ergodic costs for sufficiently small discount factors. The analysis applies under mild conditions (Lipschitz dynamics, non-degeneracy, bounded continuous costs, and Lyapunov stability for ergodic case) without requiring prior knowledge of the system dynamics or restrictions on control policies (beyond admissibility). Our results complement recent work on continuous-time reinforcement learning for diffusions by providing explicit near-optimality rates and extending rigorous guarantees both for discounted cost and ergodic cost criteria for diffusions with unbounded state space.

Paper Structure (29 sections, 14 theorems, 106 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 14 theorems, 106 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Problem Setup, Policies, and Cost Criteria
Main Results and the Learning Algorithms
Planning I: Discretizing Time and Near Optimality of Discrete-Time Approximate Solutions
Discounted Cost
Ergodic Cost
Planning II: Discretization of Space and Finite State MDP Construction
Regularity Properties for Near Optimality of Finite Model Approximations
Finite MDP Construction
Discounted Cost
Near Optimal Approximation of the Diffusion Process by a Finite MDP
Analysis of Term (\ref{['term1']})
Analysis of Term (\ref{['term2']})
Analysis of Term (\ref{['term3']})
Near Optimality of Finite State MDP Construction for Ergodic Cost
...and 14 more sections

Key Result

Theorem 1

pradhan2025discrete Suppose that Assumptions A1--A2 hold. Then we have

Figures (5)

Figure 1: Average cost evaluation under policies for double-well SDE
Figure 2: Average cost evaluation under different discount factors for double-well SDE
Figure 3: Expected cost versus control discretization $h$ with 95% confidence interval for the sample mean.
Figure 4: Average state trajectories $x_t$ under the learned policies for different control discretizations $h$. The control pulls the state toward $K/2=0.5$, against the natural growth toward $K=1$.
Figure 5: Average cost evaluation under policies learned under different discount factors

Theorems & Definitions (17)

Remark 1
Theorem 1
Theorem 2
Theorem 3
Theorem 4
Proposition 1
Theorem 5
Theorem 6
Lemma 1
Lemma 2
...and 7 more

Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes

Abstract

Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (17)