Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces
Nicolas Hoischen, Petar Bevanda, Stefan Sosnowski, Sandra Hirche, Boris Houska
TL;DR
This work tackles data-driven stochastic optimal control for nonlinear diffusion systems with unknown dynamics $(\mathbf{f},\mathbf{G},\ell)$ but known control penalty $r$ and constraints. It develops an RKHS-based pipeline that embeds state densities, learns finite-rank Markov operators $(\mathsf{A},\mathsf{B})$ from data, and solves a kernelized HJB (KHJB) recursion to recover a global optimal feedback $\widehat{\bm{\pi}}^*(\mathbf{x})$. The main contributions are a nonparametric operator-learning method with linear-in-sample complexity and an RKHS-embedded dynamic programming scheme that delivers practical, scalable control for high-dimensional stochastic systems; these are demonstrated on 1D, 2D, and 4D benchmarks, including depth control for an autonomous underwater vehicle where the learned policy outperforms an LQR baseline. The approach enables data-driven, model-free optimal control with theoretical and practical appeal for uncertain nonlinear diffusion processes.
Abstract
This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the nonlinear dynamics and stage cost functions are unknown, while only a control penalty function and constraints are provided. To this end, we embed state probability densities into a reproducing kernel Hilbert space (RKHS) to leverage recent advances in operator regression, thereby identifying Markov transition operators associated with controlled diffusion processes. This operator learning approach integrates naturally with convex operator-theoretic Hamilton-Jacobi-Bellman recursions that scale linearly with state dimensionality, effectively solving a wide range of nonlinear optimal control problems. Numerical results demonstrate its ability to address diverse nonlinear control tasks, including the depth regulation of an autonomous underwater vehicle.
