Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

Yuling Jiao; Yanming Lai; Yang Wang

Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

Yuling Jiao, Yanming Lai, Yang Wang

TL;DR

The paper studies solving second-order elliptic PDEs with the Deep Ritz Method using a three-layer tanh neural network trained by projected gradient descent. It provides a unified, high-probability error analysis that combines approximation, generalization, and optimization errors under an overparameterized regime, and proves global convergence of PGD for the DRM objective. The authors extend Sobolev-space approximation results to $W^{s,p}$, establish Rademacher-based generalization bounds, and derive explicit convergence rates for the Robin problem (with extensions to Neumann and Dirichlet cases). The key result shows a quantitative rate of $\\|f_{W_T}-u_R\\|_{H^1(\\Omega)} \\le C \, n^{-1/(288 d^3+4)}$ (up to problem-dependent constants), providing guidance on network depth/width, step size, and iteration count. This work lays a rigorous theoretical foundation for DRM-PGD PDE solvers and motivates exploration of deeper networks and alternative PDE solvers in future research.

Abstract

Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations with three different types of boundary conditions. We perform projected gradient descent(PDG) to train the three-layer network and we establish its global convergence. To the best of our knowledge, we are the first to provide a comprehensive error analysis of using overparameterized networks to solve PDE problems, as our analysis simultaneously includes estimates for approximation error, generalization error, and optimization error. We present error bound in terms of the sample size $n$ and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm. Importantly, our assumptions in this work are classical and we do not require any additional assumptions on the solution of the equation. This ensures the broad applicability and generality of our results.

Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

TL;DR

, establish Rademacher-based generalization bounds, and derive explicit convergence rates for the Robin problem (with extensions to Neumann and Dirichlet cases). The key result shows a quantitative rate of

(up to problem-dependent constants), providing guidance on network depth/width, step size, and iteration count. This work lays a rigorous theoretical foundation for DRM-PGD PDE solvers and motivates exploration of deeper networks and alternative PDE solvers in future research.

Abstract

and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm. Importantly, our assumptions in this work are classical and we do not require any additional assumptions on the solution of the equation. This ensures the broad applicability and generality of our results.

Paper Structure (19 sections, 41 theorems, 237 equations)

This paper contains 19 sections, 41 theorems, 237 equations.

Introduction
Main Results
Our Contributions
Related Works
Organization of This Paper
Deep Ritz Method
Preliminaries
Sobolev Spaces
Convex Optimization
Function Classes Complexity and Concentration Inequality
Miscellaneous
Error Decomposition
Approximation Error
Generalization Error
Optimization Error
...and 4 more sections

Key Result

Theorem 1

Let $u_R$ be the weak solution of Robin problem second order elliptic equationrobin. Let $n$ be the sample size. Let the overparametrization condition be Let the step size and the iteration step $T=\frac{1}{\eta}$. Let $f_{W_T}$ be the three-layer neural network function trained by PGD after $T$ step. Then with probability at least $1-\frac{C(d,coe,\Omega)}{n}$,

Theorems & Definitions (86)

Theorem 1: informal version
Lemma 1: jiao2024error, Lemma 3.4
Remark 1
Lemma 2: de2021approximation, Lemma A.7
Lemma 3: guhring2021approximation, Lemma B.5
Lemma 4
proof
Definition 1: trace operator
Lemma 5: trace theorem
proof
...and 76 more

Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

TL;DR

Abstract

Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (86)