On Bellman equations for continuous-time policy evaluation I: discretization and approximation
Wenlong Mou, Yuhua Zhu
TL;DR
The paper introduces high-order, model-free discretization schemes to estimate the continuous-time diffusion value function $f^*$ from discrete trajectories, by combining high-order Bellman operators $\mathcal{T}^{(n)}$ and high-order generators $\mathcal{A}^{(n)}$ with function-approximation projections. By exploiting the elliptic structure of the underlying diffusion, the authors derive uniformly bounded approximation factors and high-order error bounds in both $\mathbb{L}^\infty$ and $\mathbb{H}^1$ norms, under suitable smoothness and ellipticity assumptions. They also provide data-driven implementations via empirical estimates over trajectories and extend guarantees to discounted occupancy measures, supported by numerical simulations that demonstrate the practical gains of second-order and higher schemes over naive discretizations. Overall, the work offers a principled, high-accuracy framework for continuous-time policy evaluation that integrates seamlessly with model-free RL using function approximation. This advances the ability to learn value functions for continuous-time systems from discrete-time data with provable error control and practical algorithms.
Abstract
We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.
