Generalization Error Analysis of Deep Backward Dynamic Programming for Solving Nonlinear PDEs

Du Ouyang; Jichang Xiao; Xiaoqun Wang

Generalization Error Analysis of Deep Backward Dynamic Programming for Solving Nonlinear PDEs

Du Ouyang, Jichang Xiao, Xiaoqun Wang

TL;DR

The paper analyzes how generalization error contributes to the total error in solving high-dimensional nonlinear PDEs via Deep Backward Dynamic Programming (DBDP) and shows that quasi-Monte Carlo (QMC) sampling yields a superior convergence rate for the generalization error compared to Monte Carlo (MC). By formulating PDEs as BSDEs and training neural networks to approximate the solution and its gradient, the authors decompose the total error into scheme, approximation, and generalization components, proving that QMC achieves an $O(m^{-1+ ext{ε}})$ rate under suitable conditions. They provide rigorous MC-based bounds and show that, under certain approximation assumptions, MC can break the curse of dimensionality, while QMC further accelerates convergence. Numerical experiments across nonlinear heat, HJB, and nonlinear Black-Scholes equations corroborate that QMC-based training yields smaller errors and lower variance than MC, confirming the practical benefits for high-dimensional nonlinear PDEs.

Abstract

We explore the application of the quasi-Monte Carlo (QMC) method in deep backward dynamic programming (DBDP) (Hure et al. 2020) for numerically solving high-dimensional nonlinear partial differential equations (PDEs). Our study focuses on examining the generalization error as a component of the total error in the DBDP framework, discovering that the rate of convergence for the generalization error is influenced by the choice of sampling methods. Specifically, for a given batch size $m$, the generalization error under QMC methods exhibits a convergence rate of $O(m^{-1+\varepsilon})$, where $\varepsilon$ can be made arbitrarily small. This rate is notably more favorable than that of the traditional Monte Carlo (MC) methods, which is $O(m^{-1/2+\varepsilon})$. Our theoretical analysis shows that the generalization error under QMC methods achieves a higher order of convergence than their MC counterparts. Numerical experiments demonstrate that QMC indeed surpasses MC in delivering solutions that are both more precise and stable.

Generalization Error Analysis of Deep Backward Dynamic Programming for Solving Nonlinear PDEs

TL;DR

rate under suitable conditions. They provide rigorous MC-based bounds and show that, under certain approximation assumptions, MC can break the curse of dimensionality, while QMC further accelerates convergence. Numerical experiments across nonlinear heat, HJB, and nonlinear Black-Scholes equations corroborate that QMC-based training yields smaller errors and lower variance than MC, confirming the practical benefits for high-dimensional nonlinear PDEs.

Abstract

, the generalization error under QMC methods exhibits a convergence rate of

, where

can be made arbitrarily small. This rate is notably more favorable than that of the traditional Monte Carlo (MC) methods, which is

. Our theoretical analysis shows that the generalization error under QMC methods achieves a higher order of convergence than their MC counterparts. Numerical experiments demonstrate that QMC indeed surpasses MC in delivering solutions that are both more precise and stable.

Paper Structure (19 sections, 12 theorems, 96 equations, 4 figures, 4 tables)

This paper contains 19 sections, 12 theorems, 96 equations, 4 figures, 4 tables.

Introduction
Solving nonlinear PDE by using deep learning
Neural network
DBDP schemes
Total error of DBDP
The analysis of generalization error
Problem specification
Generalization error for Monte Carlo methods
The convergence rate of mean error for MC
Break the curse of dimensionality
Generalization error for quasi-Monte Carlo
Quasi-Monte Carlo methods
The convergence rate of mean error for QMC
Numerical experiments
Nonlinear heat equations
...and 4 more sections

Key Result

Lemma 1

Under Definition defn:fnn, suppose that $l_{k+1}=1$, for every $\theta_{1},\theta_2 \in \Theta$ and every $x\in \mathbb{R}^{d}$, we have and where $C_{\mathcal{S},R}=|\mathcal{S}|R^{D(\mathcal{S})-1}\left\Vert\mathcal{S}\right\Vert_\infty^{D(\mathcal{S})-1}$ and $B_{\mathcal{S},R}=2R\left\Vert\mathcal{S}\right\Vert_\infty$.

Figures (4)

Figure 1: The density histograms of the pointwise absolute errors for the best and worst trained neural networks in solving the nonlinear heat equation. The batch size is 16,384 and the dimension is 50.
Figure 2: The density histograms of the pointwise absolute errors for the best and worst trained neural networks in solving the HJB equation. The batch size is 16,384 and the dimension is 50.
Figure 3: The density histograms of the pointwise absolute errors for the best and worst trained neural networks in solving the nonlinear Black-Scholes equation with the nonlinear term \ref{['eq:f_BS_1']}. The batch size is 16,384, and the dimension is 50.
Figure 4: The density histograms of the pointwise absolute errors for the best and worst trained neural networks in solving the nonlinear Black-Scholes equation with the nonlinear term \ref{['eq:f_BS_2']}. The batch size is 16,384, and the dimension is 50.

Theorems & Definitions (23)

Definition 1: Feedforward Neutral Network
Lemma 1
proof
Theorem 1
proof
Corollary 1
Lemma 2
proof
Lemma 3
proof
...and 13 more

Generalization Error Analysis of Deep Backward Dynamic Programming for Solving Nonlinear PDEs

TL;DR

Abstract

Generalization Error Analysis of Deep Backward Dynamic Programming for Solving Nonlinear PDEs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (23)