Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Alain Bensoussan; Thien P. B. Nguyen; Minh-Binh Tran; Son N. T. Tu

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Alain Bensoussan, Thien P. B. Nguyen, Minh-Binh Tran, Son N. T. Tu

Abstract

We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, the $L^\infty$ error is bounded below by $\mathcal{O}(h)$ and above by $\mathcal{O}(h^{1/7})$ for Lipschitz initial data; this improves to $\mathcal{O}(h^{1/5})$ for semiconcave data and to $\mathcal{O}(h^{1/3})$ for $C^2$ data. We also prove an upper $L^1$ error estimate of order $\mathcal{O}(h^{1/2})$ in the periodic setting, where $h$ is the splitting step. For the first-order step, we provide a weighted $L^2$ error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable and accurate numerical results.

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Abstract

error is bounded below by

and above by

for Lipschitz initial data; this improves to

for semiconcave data and to

for

data. We also prove an upper

error estimate of order

in the periodic setting, where

is the splitting step. For the first-order step, we provide a weighted

error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable and accurate numerical results.

Paper Structure (19 sections, 14 theorems, 204 equations, 4 tables, 2 algorithms)

This paper contains 19 sections, 14 theorems, 204 equations, 4 tables, 2 algorithms.

Introduction
Approach and contributions
The Splitting Scheme
Policy Iteration Algorithm for first-order Hamilton--Jacobi equation
Main results
Assumptions
Notations
Organization of the paper
Error Analysis of the Splitting Scheme
Basic properties and commutator estimates
Estimates for the viscous Hamilton--Jacobi equation
Error Analysis of the splitting scheme on $L^\infty(\mathbb{R}^d)$
Proof of Theorem \ref{['thm:LinftyA']} and Proposition \ref{['prop:errorTorusTL']}
First-order Hamilton-Jacobi equations: PI-$\lambda$ Algorithm
Value-gradient policy iteration--based algorithm
...and 4 more sections

Key Result

Theorem 1.1

Assume itm:H1--itm:H2. Let $u_0 \in W^{1,\infty}(\mathbb{R}^d)$. Let $u$ and $v$ denote the true solution of eq:forwardtemp and the splitting scheme eq:v-split-intro, respectively, with initial data $u_0$.

Theorems & Definitions (28)

Theorem 1.1
Proposition 1.2
Remark 1.3
Theorem 1.4
Lemma 2.1
Proposition 2.2
Remark 2.3
Proposition 2.4: Commutator Estimates
proof
Lemma 2.5
...and 18 more

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Abstract

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (28)