SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls
Wei Cai, Shuixin Fang, Tao Zhou
TL;DR
SOC‑MartNet tackles high‑dimensional Hamilton–Jacobi–Bellman equations in stochastic optimal control without requiring an explicit infimum of the Hamiltonian. By casting the problem into a martingale framework and employing adversarial learning to enforce both the minimum principle and martingale properties, it jointly learns neural networks for the value function and the optimal control, along with a test‑function network to certify the martingale constraint. The method relies on Monte Carlo estimates and Euler–Maruyama dynamics, enabling scalable solutions up to dimensions as large as $10^4$ with thousands of training iterations and without time‑marching recursion. Numerical experiments across linear and semilinear parabolic equations, nondegenerate HJBs, and SOCPs—including shifted targets and perturbations—demonstrate accuracy, robustness to dimension, and favorable computational efficiency, especially with parallel GPU architectures.
Abstract
In this paper, we propose a martingale-based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the infimum of the Hamiltonian, $\inf_{u \in U} H(t,x,u, z,p)$, and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 10,000 in a small number of iteration steps (less than 6000) of training.
