SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls

Wei Cai; Shuixin Fang; Tao Zhou

SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls

Wei Cai, Shuixin Fang, Tao Zhou

TL;DR

SOC‑MartNet tackles high‑dimensional Hamilton–Jacobi–Bellman equations in stochastic optimal control without requiring an explicit infimum of the Hamiltonian. By casting the problem into a martingale framework and employing adversarial learning to enforce both the minimum principle and martingale properties, it jointly learns neural networks for the value function and the optimal control, along with a test‑function network to certify the martingale constraint. The method relies on Monte Carlo estimates and Euler–Maruyama dynamics, enabling scalable solutions up to dimensions as large as $10^4$ with thousands of training iterations and without time‑marching recursion. Numerical experiments across linear and semilinear parabolic equations, nondegenerate HJBs, and SOCPs—including shifted targets and perturbations—demonstrate accuracy, robustness to dimension, and favorable computational efficiency, especially with parallel GPU architectures.

Abstract

In this paper, we propose a martingale-based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the infimum of the Hamiltonian, $\inf_{u \in U} H(t,x,u, z,p)$, and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 10,000 in a small number of iteration steps (less than 6000) of training.

SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls

TL;DR

with thousands of training iterations and without time‑marching recursion. Numerical experiments across linear and semilinear parabolic equations, nondegenerate HJBs, and SOCPs—including shifted targets and perturbations—demonstrate accuracy, robustness to dimension, and favorable computational efficiency, especially with parallel GPU architectures.

Abstract

, and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 10,000 in a small number of iteration steps (less than 6000) of training.

Paper Structure (17 sections, 3 theorems, 76 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 3 theorems, 76 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Dynamic programming and minimum principle, and computational approach
Proposed method
Martingale formulation for HJB-type equations
SOC-MartNet via adversarial learning for control/value functions
Training algorithm
Application to parabolic problems
Numerical tests
Linear parabolic problem
Semilinear parabolic equation
Test on time convergence rate
Non-degenerated HJB equation without using explicit form of inf_u H
Validity of SOC-MartNet solution in space-time region
SOCP with a shifted target
Non-degenerated HJB equation without explicit inf_u H
...and 2 more sections

Key Result

Lemma 3.2

\newlabellemm_infH0 Let $v$ be any Borel measurable function from $[0, T] \times \mathbb{R}^d$ to $\mathbb{R}$ such that Assume the optimal feedback control exists under $v$, i.e., the following equation eq_Htuv admits a solution $u \in \mathcal{U}_{\mathrm{ad}}$: for $(t, \omega) \in [0, T] \times \Omega$, a.e.-$\,\mathrm{d} t \times \mathbb{P}$. Then, an optimal control $u$ for eq_Htuv can b

Figures (8)

Figure 1: Numerical results of SOC-MartNet (\ref{['alg_amnet2']}) for the linear parabolic problem \ref{['eq_simp_proble']}. The shaded region represents the mean + $2 \times$ SD of the loss values and relative errors across 5 independent runs. The running times are 37, 112 and 363 seconds for $d=100$, $1000$, and $2000$, respectively.
Figure 2: Numerical results of SOC-MartNet (\ref{['alg_amnet2']}) for the semilinear parabolic problem \ref{['eq_semiparab']} with $d=100$ and oscillatory terminal function \ref{['eq_oscgx']}. (a) - (d) Graphs of the true solutions $s \mapsto v(t, s \boldsymbol{1}_d)$ at $t = 0, T$ with varying $T$. (e) - (h) Numerical solution of $s \mapsto v(0, s \boldsymbol{1}_d)$ given by the SOC-MartNet.
Figure 3: Log-log plot of the relative $L^1$-error $\mathrm{RE}_1$ of SOC-MartNet (\ref{['alg_amnet2']}) vs the number of time partitions $N$ for the parabolic equation \ref{['eq_parab']} with parameter setting \ref{['eq_linsin']} and with $d=100$. The red reference line indicates a first-order convergence rate of $O(N^{-1.01})$.
Figure 4: Graphs of the true solution and the numerical solution of SOC-MartNet for $s \mapsto v(t, s \boldsymbol{1}_d)$ at $t = 0$ given by HJB-1.
Figure 5: Numerical results of SOC-MartNet (\ref{['alg_amnet']}) for HJB-2 and HJB-3 with $T=1$ and $d = 2000$, $10000$. The shaded region represents the mean + $2 \times$ SD of the plotted values across 5 independent runs.
...and 3 more figures

Theorems & Definitions (10)

Remark 3.1
Lemma 3.2
Proof 1
Remark 3.3
Lemma 3.4
Proof 2
Theorem 3.5
Remark 3.6
Remark 3.7
Remark 3.8

SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls

TL;DR

Abstract

SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (10)