Convergence Analysis for Entropy-Regularized Control Problems: A Probabilistic Approach
Jin Ma, Gaozhan Wang, Jianfeng Zhang
TL;DR
The paper analyzes the Policy Iteration Algorithm (PIA) for entropy-regularized stochastic control in finite and infinite horizon settings, using probabilistic representations (Feynman-Kac and Bismut-Elworthy-Li) to establish convergence without relying on heavy PDE estimates. It proves a super-exponential convergence rate in the finite-horizon case under large discount, and and shows analogous results in the infinite-horizon case when the discount is large, with convergence on compacts in the general case. The study extends to the one-dimensional diffusion-control setting, obtaining $C^2$ convergence of the value function and Lipschitz convergence of the optimal policy, and provides rate results in a further special case under additional assumptions. Overall, the work delivers a simple, probabilistic route to convergence analysis for entropy-regularized PIA, yielding sharp rates and informing potential implementable algorithms in reinforcement learning under model uncertainty.
Abstract
In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the algorithm (see, e.g., Huang-Wang-Zhou(2025)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the finite horizon model and in the infinite horizon model with large discount factor, the similar arguments lead to a super-exponential rate of convergence without tear. Finally, with some extra efforts we show that our approach can be extended to the diffusion control case in the one dimensional setting, also with a super-exponential rate of convergence.
