Table of Contents
Fetching ...

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Alain Bensoussan, Thien P. B. Nguyen, Minh-Binh Tran, Son N. T. Tu

Abstract

We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, the $L^\infty$ error is bounded below by $\mathcal{O}(h)$ and above by $\mathcal{O}(h^{1/7})$ for Lipschitz initial data; this improves to $\mathcal{O}(h^{1/5})$ for semiconcave data and to $\mathcal{O}(h^{1/3})$ for $C^2$ data. We also prove an upper $L^1$ error estimate of order $\mathcal{O}(h^{1/2})$ in the periodic setting, where $h$ is the splitting step. For the first-order step, we provide a weighted $L^2$ error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable and accurate numerical results.

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Abstract

We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, the error is bounded below by and above by for Lipschitz initial data; this improves to for semiconcave data and to for data. We also prove an upper error estimate of order in the periodic setting, where is the splitting step. For the first-order step, we provide a weighted error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable and accurate numerical results.
Paper Structure (19 sections, 14 theorems, 204 equations, 4 tables, 2 algorithms)

This paper contains 19 sections, 14 theorems, 204 equations, 4 tables, 2 algorithms.

Key Result

Theorem 1.1

Assume itm:H1--itm:H2. Let $u_0 \in W^{1,\infty}(\mathbb{R}^d)$. Let $u$ and $v$ denote the true solution of eq:forwardtemp and the splitting scheme eq:v-split-intro, respectively, with initial data $u_0$.

Theorems & Definitions (28)

  • Theorem 1.1
  • Proposition 1.2
  • Remark 1.3
  • Theorem 1.4
  • Lemma 2.1
  • Proposition 2.2
  • Remark 2.3
  • Proposition 2.4: Commutator Estimates
  • proof
  • Lemma 2.5
  • ...and 18 more