Table of Contents
Fetching ...

Research on Optimal Control Problem Based on Reinforcement Learning under Knightian Uncertainty

Ziyu Li, Chen Fei, Weiyin Fei

TL;DR

This work develops a unified framework for reinforcement learning in continuous time under Knightian uncertainty by integrating sublinear (nonlinear) expectation theory with entropy-regularized relaxed stochastic control. It derives a G-HJB equation and characterizes the optimal randomized control, proving that in the linear-quadratic setting the optimal policy is Gaussian with a variance that depends on Knightian uncertainty bounds, and establishes a solvability equivalence between exploratory and non-exploratory problems alongside an explicit exploration cost of $\mathcal{C}^{u^*,\theta^*}(x) = \frac{\lambda}{2\rho}$. The paper also proves a vanishing-exploration property: as $\lambda \to 0$, the Gaussian policy converges to the deterministic optimal control and the exploratory value function converges to its non-exploratory counterpart. A numerical LQ example with an indoor-temperature-control scenario validates the theoretical predictions, showing how the discount rate $\rho$ and uncertainty bounds shape the optimal policy and its convergence behavior, with practical implications for designing robust RL algorithms under model uncertainty.

Abstract

Considering that the decision-making environment faced by reinforcement learning (RL) agents is full of Knightian uncertainty, this paper describes the exploratory state dynamics equation in Knightian uncertainty to study the entropy-regularized relaxed stochastic control problem in a Knightian uncertainty environment. By employing stochastic analysis theory and the dynamic programming principle under nonlinear expectation, we derive the Hamilton-Jacobi-Bellman (HJB) equation and solve for the optimal policy that achieves a trade-off between exploration and exploitation. Subsequently, for the linear-quadratic (LQ) case, we examine the agent's optimal randomized feedback control under both state-dependent and state-independent reward scenarios, proving that the optimal randomized feedback control follows a Gaussian distribution in the LQ framework. Furthermore, we investigate how the degree of Knightian uncertainty affects the variance of the optimal feedback policy. Additionally, we establish the solvability equivalence between non-exploratory and exploratory LQ problems under Knightian uncertainty and analyze the associated exploration cost. Finally, we provide an LQ example and validate the theoretical findings through numerical simulations.

Research on Optimal Control Problem Based on Reinforcement Learning under Knightian Uncertainty

TL;DR

This work develops a unified framework for reinforcement learning in continuous time under Knightian uncertainty by integrating sublinear (nonlinear) expectation theory with entropy-regularized relaxed stochastic control. It derives a G-HJB equation and characterizes the optimal randomized control, proving that in the linear-quadratic setting the optimal policy is Gaussian with a variance that depends on Knightian uncertainty bounds, and establishes a solvability equivalence between exploratory and non-exploratory problems alongside an explicit exploration cost of . The paper also proves a vanishing-exploration property: as , the Gaussian policy converges to the deterministic optimal control and the exploratory value function converges to its non-exploratory counterpart. A numerical LQ example with an indoor-temperature-control scenario validates the theoretical predictions, showing how the discount rate and uncertainty bounds shape the optimal policy and its convergence behavior, with practical implications for designing robust RL algorithms under model uncertainty.

Abstract

Considering that the decision-making environment faced by reinforcement learning (RL) agents is full of Knightian uncertainty, this paper describes the exploratory state dynamics equation in Knightian uncertainty to study the entropy-regularized relaxed stochastic control problem in a Knightian uncertainty environment. By employing stochastic analysis theory and the dynamic programming principle under nonlinear expectation, we derive the Hamilton-Jacobi-Bellman (HJB) equation and solve for the optimal policy that achieves a trade-off between exploration and exploitation. Subsequently, for the linear-quadratic (LQ) case, we examine the agent's optimal randomized feedback control under both state-dependent and state-independent reward scenarios, proving that the optimal randomized feedback control follows a Gaussian distribution in the LQ framework. Furthermore, we investigate how the degree of Knightian uncertainty affects the variance of the optimal feedback policy. Additionally, we establish the solvability equivalence between non-exploratory and exploratory LQ problems under Knightian uncertainty and analyze the associated exploration cost. Finally, we provide an LQ example and validate the theoretical findings through numerical simulations.

Paper Structure

This paper contains 17 sections, 6 theorems, 107 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Let $v$ be the general unknown solution to the following HJB equation: equivalently expressed as: where $\theta \in \mathcal{P}(U)$, if and only if $\int_U \theta(u) du = 1, \, \theta(u) \geq 0 \text{ a.e. on } U.$ The constrained optimization problem in eq:10 admits a unique solution given by the randomized feedback control where $\Psi(x,u) = r(x,u) + \sigma^2(x,u) \widetilde{G}[v"(x)] + b(x,

Figures (5)

  • Figure 1: Optimal policy distribution
  • Figure 2: Normality test results
  • Figure 3: The impact of $\rho$ on the optimal policy
  • Figure 4: Convergence of the optimal policy
  • Figure 5: Convergence of the exploratory value function $v(x)$

Theorems & Definitions (15)

  • Definition 1
  • Remark 1
  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • Remark 2
  • Proposition 2
  • proof
  • Lemma 1
  • ...and 5 more