Table of Contents
Fetching ...

Deep Learning for Continuous-time Stochastic Control with Jumps

Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut

TL;DR

The paper tackles finite-horizon continuous-time stochastic control with jumps by proposing two model-based neural-network algorithms that learn both the value function and optimal control. The GPI-PINN method uses a physics-informed residual loss with a gradient/Hessian–free trick to solve the HJB/PIDE, while GPI-CBU employs an expectation-free Bellman update to avoid jump-integrals and high-order derivatives, greatly improving scalability in high dimensions with jumps. Empirical results on high-dimensional LQR and consumption-investment problems show that GPI-CBU, in particular, achieves accurate value functions and policies up to 50–150 dimensions, often outperforming model-free RL baselines in jump-enabled settings. The work demonstrates how leveraging dynamics through model-based NN training yields globally applicable solutions over the space-time domain and offers a practical path for solving complex stochastic control problems in high dimensions.

Abstract

In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.

Deep Learning for Continuous-time Stochastic Control with Jumps

TL;DR

The paper tackles finite-horizon continuous-time stochastic control with jumps by proposing two model-based neural-network algorithms that learn both the value function and optimal control. The GPI-PINN method uses a physics-informed residual loss with a gradient/Hessian–free trick to solve the HJB/PIDE, while GPI-CBU employs an expectation-free Bellman update to avoid jump-integrals and high-order derivatives, greatly improving scalability in high dimensions with jumps. Empirical results on high-dimensional LQR and consumption-investment problems show that GPI-CBU, in particular, achieves accurate value functions and policies up to 50–150 dimensions, often outperforming model-free RL baselines in jump-enabled settings. The work demonstrates how leveraging dynamics through model-based NN training yields globally applicable solutions over the space-time domain and offers a practical path for solving complex stochastic control problems in high dimensions.

Abstract

In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.

Paper Structure

This paper contains 16 sections, 4 theorems, 66 equations, 16 figures, 1 table, 2 algorithms.

Key Result

Theorem 2.1

$V^{\alpha}$ satisfies the PIDE

Figures (16)

  • Figure 1: Comparison of ${\rm MAE}_V$ (blue) and runtime in seconds (green) of GPI-PINN (solid line) and GPI-CBU (dashed line) for a 10-dimensional LQR problem without jumps (left) and with jumps (right).
  • Figure 2: Value function $V(t,x)$ (left) and first component of the optimal control $\alpha^*_1(t,x)$ (right) at $t=0$ for $x=(x_1,0, \dots, 0)$ with $x_1 \in [-2.5, 2.5]$ for a 50-dimensional LQR problem with jumps. Orange dotted line: numerical results of GPI-CBU with $\pm 1$ standard deviation given by orange shaded area. Blue line: analytical solution \ref{['analy']}--\ref{['analy2']}.
  • Figure 3: $\log {\rm MAE}_V$ of different deep-learning methods for a 10-dimensional LQR problem with jumps.
  • Figure 4: Losses $\widehat{\mathscr{L}}^{(k)}_1\space(\theta^{(k+1)})$ (left) and $\widehat{\mathscr{L}}^{(k)}_2\space(\phi^{(k+1)})$ (right) of GPI-CBU as a function of the epoch $k$. The blue curve in the left plot represents the interior loss of $\widehat{\mathscr{L}}^{(k)}_1$ and the orange curve its boundary part, see Eq. \ref{['How3']}.
  • Figure 5: DGM architecture for the value neural network with $L=3$ (i.e. 4 hidden layers).
  • ...and 11 more figures

Theorems & Definitions (4)

  • Theorem 2.1: Feynman--Kac Formula
  • Theorem 2.2: Verification Theorem
  • Proposition 3.1
  • Proposition 4.1