Table of Contents
Fetching ...

Reinforcement Learning Framework For Stochastic Optimal Control Problem Under Model Uncertainty

Jiaxuan Hou, Lifeng Wei

TL;DR

This paper addresses robust stochastic optimal control under model uncertainty by developing a continuous-time entropy-regularized reinforcement learning framework. It leverages Sion's minimax theorem to exchange the order of minimization over policy distributions and maximization over model distributions, reducing the robust problem to a tractable RL problem governed by a Hamilton–Jacobi–Bellman equation. The authors derive explicit solutions for linear-quadratic cases under both two-point (Bernoulli-like) and continuous (uniform) distributions, showing that optimal policies are Gaussian and that the value functions remain quadratic. They prove solvability equivalence between classical and exploratory formulations, quantify the exploration cost, and demonstrate convergence to the classical setup as the exploration weight vanishes, highlighting practical applicability to robust control tasks.

Abstract

We develop a continuous-time entropy-regularized reinforcement learning framework under model uncertainty. By applying Sion's minimax theorem, we transform the intractable robust control problem into an equivalent standard entropy-regularized stochastic control problem, facilitating reinforcement learning algorithms. We establish sufficient conditions for the theorem's validity and demonstrate our approach on linear-quadratic problems with uncertain model parameters following Bernoulli and uniform distributions.

Reinforcement Learning Framework For Stochastic Optimal Control Problem Under Model Uncertainty

TL;DR

This paper addresses robust stochastic optimal control under model uncertainty by developing a continuous-time entropy-regularized reinforcement learning framework. It leverages Sion's minimax theorem to exchange the order of minimization over policy distributions and maximization over model distributions, reducing the robust problem to a tractable RL problem governed by a Hamilton–Jacobi–Bellman equation. The authors derive explicit solutions for linear-quadratic cases under both two-point (Bernoulli-like) and continuous (uniform) distributions, showing that optimal policies are Gaussian and that the value functions remain quadratic. They prove solvability equivalence between classical and exploratory formulations, quantify the exploration cost, and demonstrate convergence to the classical setup as the exploration weight vanishes, highlighting practical applicability to robust control tasks.

Abstract

We develop a continuous-time entropy-regularized reinforcement learning framework under model uncertainty. By applying Sion's minimax theorem, we transform the intractable robust control problem into an equivalent standard entropy-regularized stochastic control problem, facilitating reinforcement learning algorithms. We establish sufficient conditions for the theorem's validity and demonstrate our approach on linear-quadratic problems with uncertain model parameters following Bernoulli and uniform distributions.

Paper Structure

This paper contains 12 sections, 8 theorems, 106 equations.

Key Result

Lemma 1

Asuume that (H2.1)-(H2.6) hold. Then, $\theta\rightarrow X_\theta^\pi(t)$ is continuous with respect to $\theta$.

Theorems & Definitions (14)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 4 more