Recovery of the optimal control value function in reproducing kernel Hilbert spaces from verification conditions
Tobias Ehring, Behzad Azmi, Bernard Haasdonk
TL;DR
The paper develops a verification-based, reproducing-kernel-Hilbert-space framework to recover the infinite-horizon optimal value function v^* for nonlinear autonomous OCPs by enforcing Hamilton-Jacobi-Bellman verification conditions. It recasts the problem as nonlinear optimal recovery in RKHSs and shows finite-dimensional reductions that yield practical algorithms, with a Gauss-Newton interpretation that is algorithmically equivalent to policy iteration (RKHS–PI). Convergence is established in two regimes: global convergence when v^* is real-analytic (via Gaussian RKHS) and local convergence under two-sided quadratic bounds near the origin. Numerical experiments across toy, mechanical, and PDE-inspired models (including a 50D linear heat equation) demonstrate rapid convergence and the effectiveness of structure-aware kernels in high-dimensional settings.
Abstract
Approximating the optimal value function $v^*$ for infinite-horizon, nonlinear, autonomous optimal control problems is both challenging and essential for synthesizing real-time optimal feedback. We develop an abstract optimal recovery framework in reproducing kernel Hilbert spaces (RKHS) for reconstructing unknown target functions from mixed equality and inequality functional constraints. Within this framework, the approximation of $v^*$ is cast as a collocation-type problem derived from verification conditions for optimality -- most prominently, the Hamilton-Jacobi-Bellman (HJB) equation -- that uniquely characterizes $v^*$. As the set of collocation points becomes dense in the ambient domain $Ω$, we establish convergence of the RKHS approximants to $v^*$: globally on $Ω$ in the RKHS norm when $v^*$ is analytic, and locally (in a neighborhood of the origin) in the RKHS norm when $v^*$ is bounded from above and below by quadratic functions. Furthermore, we show that a practical numerical realization of the abstract scheme reduces to the classical policy iteration algorithm. Numerical experiments support the effectiveness of the proposed approach.
