Reinforcement Learning Framework For Stochastic Optimal Control Problem Under Model Uncertainty
Jiaxuan Hou, Lifeng Wei
TL;DR
This paper addresses robust stochastic optimal control under model uncertainty by developing a continuous-time entropy-regularized reinforcement learning framework. It leverages Sion's minimax theorem to exchange the order of minimization over policy distributions and maximization over model distributions, reducing the robust problem to a tractable RL problem governed by a Hamilton–Jacobi–Bellman equation. The authors derive explicit solutions for linear-quadratic cases under both two-point (Bernoulli-like) and continuous (uniform) distributions, showing that optimal policies are Gaussian and that the value functions remain quadratic. They prove solvability equivalence between classical and exploratory formulations, quantify the exploration cost, and demonstrate convergence to the classical setup as the exploration weight vanishes, highlighting practical applicability to robust control tasks.
Abstract
We develop a continuous-time entropy-regularized reinforcement learning framework under model uncertainty. By applying Sion's minimax theorem, we transform the intractable robust control problem into an equivalent standard entropy-regularized stochastic control problem, facilitating reinforcement learning algorithms. We establish sufficient conditions for the theorem's validity and demonstrate our approach on linear-quadratic problems with uncertain model parameters following Bernoulli and uniform distributions.
