Table of Contents
Fetching ...

Convergence Analysis for Entropy-Regularized Control Problems: A Probabilistic Approach

Jin Ma, Gaozhan Wang, Jianfeng Zhang

TL;DR

The paper analyzes the Policy Iteration Algorithm (PIA) for entropy-regularized stochastic control in finite and infinite horizon settings, using probabilistic representations (Feynman-Kac and Bismut-Elworthy-Li) to establish convergence without relying on heavy PDE estimates. It proves a super-exponential convergence rate in the finite-horizon case under large discount, and and shows analogous results in the infinite-horizon case when the discount is large, with convergence on compacts in the general case. The study extends to the one-dimensional diffusion-control setting, obtaining $C^2$ convergence of the value function and Lipschitz convergence of the optimal policy, and provides rate results in a further special case under additional assumptions. Overall, the work delivers a simple, probabilistic route to convergence analysis for entropy-regularized PIA, yielding sharp rates and informing potential implementable algorithms in reinforcement learning under model uncertainty.

Abstract

In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the algorithm (see, e.g., Huang-Wang-Zhou(2025)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the finite horizon model and in the infinite horizon model with large discount factor, the similar arguments lead to a super-exponential rate of convergence without tear. Finally, with some extra efforts we show that our approach can be extended to the diffusion control case in the one dimensional setting, also with a super-exponential rate of convergence.

Convergence Analysis for Entropy-Regularized Control Problems: A Probabilistic Approach

TL;DR

The paper analyzes the Policy Iteration Algorithm (PIA) for entropy-regularized stochastic control in finite and infinite horizon settings, using probabilistic representations (Feynman-Kac and Bismut-Elworthy-Li) to establish convergence without relying on heavy PDE estimates. It proves a super-exponential convergence rate in the finite-horizon case under large discount, and and shows analogous results in the infinite-horizon case when the discount is large, with convergence on compacts in the general case. The study extends to the one-dimensional diffusion-control setting, obtaining convergence of the value function and Lipschitz convergence of the optimal policy, and provides rate results in a further special case under additional assumptions. Overall, the work delivers a simple, probabilistic route to convergence analysis for entropy-regularized PIA, yielding sharp rates and informing potential implementable algorithms in reinforcement learning under model uncertainty.

Abstract

In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the algorithm (see, e.g., Huang-Wang-Zhou(2025)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the finite horizon model and in the infinite horizon model with large discount factor, the similar arguments lead to a super-exponential rate of convergence without tear. Finally, with some extra efforts we show that our approach can be extended to the diffusion control case in the one dimensional setting, also with a super-exponential rate of convergence.
Paper Structure (7 sections, 11 theorems, 158 equations)

This paper contains 7 sections, 11 theorems, 158 equations.

Key Result

Lemma 2.2

Let Assumption assum-finite hold. (i) $H$ is twice continuously differentiable in $(x, z)$, with the following estimates: (ii) The PDE (HJBu) has a unique classical solution $u$ with $\|u\|_{1,2}\le Ce^{CT}$.

Theorems & Definitions (16)

  • Lemma 2.2
  • Proposition 2.3
  • Theorem 2.4
  • Lemma 2.5
  • Remark 2.6: Dependence of the estimates on $\lambda$
  • Proposition 3.1
  • Theorem 3.2
  • Remark 3.3
  • Remark 3.4
  • Example 3.5
  • ...and 6 more