Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces

Nicolas Hoischen; Petar Bevanda; Stefan Sosnowski; Sandra Hirche; Boris Houska

Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces

Nicolas Hoischen, Petar Bevanda, Stefan Sosnowski, Sandra Hirche, Boris Houska

TL;DR

This work tackles data-driven stochastic optimal control for nonlinear diffusion systems with unknown dynamics $(\mathbf{f},\mathbf{G},\ell)$ but known control penalty $r$ and constraints. It develops an RKHS-based pipeline that embeds state densities, learns finite-rank Markov operators $(\mathsf{A},\mathsf{B})$ from data, and solves a kernelized HJB (KHJB) recursion to recover a global optimal feedback $\widehat{\bm{\pi}}^*(\mathbf{x})$. The main contributions are a nonparametric operator-learning method with linear-in-sample complexity and an RKHS-embedded dynamic programming scheme that delivers practical, scalable control for high-dimensional stochastic systems; these are demonstrated on 1D, 2D, and 4D benchmarks, including depth control for an autonomous underwater vehicle where the learned policy outperforms an LQR baseline. The approach enables data-driven, model-free optimal control with theoretical and practical appeal for uncertain nonlinear diffusion processes.

Abstract

This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the nonlinear dynamics and stage cost functions are unknown, while only a control penalty function and constraints are provided. To this end, we embed state probability densities into a reproducing kernel Hilbert space (RKHS) to leverage recent advances in operator regression, thereby identifying Markov transition operators associated with controlled diffusion processes. This operator learning approach integrates naturally with convex operator-theoretic Hamilton-Jacobi-Bellman recursions that scale linearly with state dimensionality, effectively solving a wide range of nonlinear optimal control problems. Numerical results demonstrate its ability to address diverse nonlinear control tasks, including the depth regulation of an autonomous underwater vehicle.

Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces

TL;DR

This work tackles data-driven stochastic optimal control for nonlinear diffusion systems with unknown dynamics

but known control penalty

and constraints. It develops an RKHS-based pipeline that embeds state densities, learns finite-rank Markov operators

from data, and solves a kernelized HJB (KHJB) recursion to recover a global optimal feedback

. The main contributions are a nonparametric operator-learning method with linear-in-sample complexity and an RKHS-embedded dynamic programming scheme that delivers practical, scalable control for high-dimensional stochastic systems; these are demonstrated on 1D, 2D, and 4D benchmarks, including depth control for an autonomous underwater vehicle where the learned policy outperforms an LQR baseline. The approach enables data-driven, model-free optimal control with theoretical and practical appeal for uncertain nonlinear diffusion processes.

Abstract

Paper Structure (19 sections, 2 theorems, 35 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 2 theorems, 35 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Informal Problem Statement
Stochastic Optimal Control
Fokker-Planck-Kolmogorov PDEs
Convex Reformulation
Hamilton-Jacobi-Bellman Equation
Discretization of Time
OPTIMAL CONTROL MEETS RKHS EMBEDDINGS
Reproducing Kernel Hilbert Spaces
RKHS Embeddings
Data-Driven Operator Learning
DATA-DRIVEN OPTIMAL CONTROL
Implementation and Numerical Examples
The KHJB Algorithm
Numerical Examples
...and 4 more sections

Key Result

Proposition 1

If Assumption ass::blanket holds, eq::PDEOCP admits a unique minimizer $(\rho^\star,\nu^\star)$ with Moreover, the optimal feedback law is given by $\bm{\pi}^\star = \frac{\bm{\nu}^\star}{\rho^\star}$ and $\rho^\star(t)$ corresponds to the probability density of the optimal state $\bm{X}_t^\star$.

Figures (5)

Figure 1: The optimal control laws $\pi_\infty^\star$ for $\ell(x)=x^2$ and $r(u)=u^2$ are unknown to the algorithm but are used by us as ground truth. Our proposed data-driven Kernel HJB approach recovers these optimal control laws reliably and exhibits little variance across different i.i.d. data draws.
Figure 2: Contours of the learned optimal controller for the Van der Pol system, compared to the known optimal feedback law $\pi^\star(\bm{x})=-x_1 x_2$.
Figure 3: Taming an unstable limit cycling system with linearly uncontrollable origin. Left: Open loop $(u{=}0)$ of the original system. Right: Closed loop under our learned control law for the stage cost $\ell(\bm{x}){=}x^2_2$.
Figure 4: Autonomous Underwater Vehicle (AUV) model.
Figure 5: Tracking performance for two depth references $z_{\mathrm{ref}} = (5\,\mathrm{m},\,2\,\mathrm{m})$, switching at $t = (0\,\mathrm{s},\, 25\,\mathrm{s})$, is evaluated over $M=50$ simulation runs of the closed-loop SDE under the learned control law $\widehat{\bm{\pi}}^\star$ with $\epsilon = 0.001$ and compared to LQR.

Theorems & Definitions (5)

Proposition 1: Houska 2025, Thm. 1
Remark 1
Remark 2
Theorem 1
proof

Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces

TL;DR

Abstract

Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)