Table of Contents
Fetching ...

Recovery of the optimal control value function in reproducing kernel Hilbert spaces from verification conditions

Tobias Ehring, Behzad Azmi, Bernard Haasdonk

TL;DR

The paper develops a verification-based, reproducing-kernel-Hilbert-space framework to recover the infinite-horizon optimal value function v^* for nonlinear autonomous OCPs by enforcing Hamilton-Jacobi-Bellman verification conditions. It recasts the problem as nonlinear optimal recovery in RKHSs and shows finite-dimensional reductions that yield practical algorithms, with a Gauss-Newton interpretation that is algorithmically equivalent to policy iteration (RKHS–PI). Convergence is established in two regimes: global convergence when v^* is real-analytic (via Gaussian RKHS) and local convergence under two-sided quadratic bounds near the origin. Numerical experiments across toy, mechanical, and PDE-inspired models (including a 50D linear heat equation) demonstrate rapid convergence and the effectiveness of structure-aware kernels in high-dimensional settings.

Abstract

Approximating the optimal value function $v^*$ for infinite-horizon, nonlinear, autonomous optimal control problems is both challenging and essential for synthesizing real-time optimal feedback. We develop an abstract optimal recovery framework in reproducing kernel Hilbert spaces (RKHS) for reconstructing unknown target functions from mixed equality and inequality functional constraints. Within this framework, the approximation of $v^*$ is cast as a collocation-type problem derived from verification conditions for optimality -- most prominently, the Hamilton-Jacobi-Bellman (HJB) equation -- that uniquely characterizes $v^*$. As the set of collocation points becomes dense in the ambient domain $Ω$, we establish convergence of the RKHS approximants to $v^*$: globally on $Ω$ in the RKHS norm when $v^*$ is analytic, and locally (in a neighborhood of the origin) in the RKHS norm when $v^*$ is bounded from above and below by quadratic functions. Furthermore, we show that a practical numerical realization of the abstract scheme reduces to the classical policy iteration algorithm. Numerical experiments support the effectiveness of the proposed approach.

Recovery of the optimal control value function in reproducing kernel Hilbert spaces from verification conditions

TL;DR

The paper develops a verification-based, reproducing-kernel-Hilbert-space framework to recover the infinite-horizon optimal value function v^* for nonlinear autonomous OCPs by enforcing Hamilton-Jacobi-Bellman verification conditions. It recasts the problem as nonlinear optimal recovery in RKHSs and shows finite-dimensional reductions that yield practical algorithms, with a Gauss-Newton interpretation that is algorithmically equivalent to policy iteration (RKHS–PI). Convergence is established in two regimes: global convergence when v^* is real-analytic (via Gaussian RKHS) and local convergence under two-sided quadratic bounds near the origin. Numerical experiments across toy, mechanical, and PDE-inspired models (including a 50D linear heat equation) demonstrate rapid convergence and the effectiveness of structure-aware kernels in high-dimensional settings.

Abstract

Approximating the optimal value function for infinite-horizon, nonlinear, autonomous optimal control problems is both challenging and essential for synthesizing real-time optimal feedback. We develop an abstract optimal recovery framework in reproducing kernel Hilbert spaces (RKHS) for reconstructing unknown target functions from mixed equality and inequality functional constraints. Within this framework, the approximation of is cast as a collocation-type problem derived from verification conditions for optimality -- most prominently, the Hamilton-Jacobi-Bellman (HJB) equation -- that uniquely characterizes . As the set of collocation points becomes dense in the ambient domain , we establish convergence of the RKHS approximants to : globally on in the RKHS norm when is analytic, and locally (in a neighborhood of the origin) in the RKHS norm when is bounded from above and below by quadratic functions. Furthermore, we show that a practical numerical realization of the abstract scheme reduces to the classical policy iteration algorithm. Numerical experiments support the effectiveness of the proposed approach.

Paper Structure

This paper contains 20 sections, 11 theorems, 182 equations, 4 figures, 1 algorithm.

Key Result

Theorem 2.1

Assume that Assumption as:data holds and let $\Omega\subset\mathbb{R}^{N}$ be a domain containing the origin. Suppose the OVF $v^*\in C^1(\Omega,\mathbb{R})$. If there exists a candidate $v\in C^{1}(\Omega,\mathbb{R})$ such that then there exists a set $\tilde{\Omega}\subset\Omega$ containing a neighborhood of the origin such that In particular, $u_v$ is an optimal feedback on $\tilde{\Omega}$.

Figures (4)

  • Figure 1: Academic toy example: The training error of the initial RKHS–PI iteration plotted against the number of selected centers (left). The True-Error for the RKHS–PI over the number of iterations (right).
  • Figure 2: Van der Pol oscillator: The training error of the initial RKHS–PI iteration plotted against the number of selected centers (left). The True-Error for the RKHS–PI over the number of iterations (right).
  • Figure 3: Linear heat equation: The training error of the initial RKHS–PI iteration plotted against the number of selected centers (left). The True-Error for the RKHS–PI over the number of iterations (right).
  • Figure 4: Nonlinear heat equation: The training error of the initial RKHS–PI iteration plotted against the number of selected centers (left). The True-Error for the RKHS–PI over the number of iterations (right).

Theorems & Definitions (27)

  • Theorem 2.1: Local verification of optimality
  • proof
  • Corollary 2.2
  • proof
  • Corollary 2.3
  • proof
  • Definition 3.1: Linear optimal recovery in an RKHS
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • ...and 17 more