Table of Contents
Fetching ...

Convergence Guarantees for Gradient-Based Training of Neural PDE Solvers: From Linear to Nonlinear PDEs

Wei Zhao, Tao Luo

TL;DR

This work develops a unified convergence framework for neural PDE solvers, covering PINNs and the Deep Ritz method, across linear and nonlinear regimes. It combines an NTK-based global convergence theory for broad linear operators with a Łojasiewicz-inequality-based approach to guarantee convergence to critical points for nonlinear PDEs under a random feature model, revealing implicit regularization. The results show that gradient flow and implicit gradient descent converge under coercivity and explain parameter-bounded training trajectories without explicit regularization. Numerical experiments on Burgers', Allen–Cahn, and Fisher–KPP equations validate the theory, highlighting robustness to multiscale dynamics and limitations of NTK in nonlinear settings. The work thus unifies PDE-solver analyses and points to extensions to deeper architectures and SGD regimes as promising directions for future research.

Abstract

We present a unified convergence theory for gradient-based training of neural network methods for partial differential equations (PDEs), covering both physics-informed neural networks (PINNs) and the Deep Ritz method. For linear PDEs, we extend the neural tangent kernel (NTK) framework for PINNs to establish global convergence guarantees for a broad class of linear operators. For nonlinear PDEs, we prove convergence to critical points via the Łojasiewicz inequality under the random feature model, eliminating the need for strong over-parameterization and encompassing both gradient flow and implicit gradient descent dynamics. Our results further reveal that the random feature model exhibits an implicit regularization effect, preventing parameter divergence to infinity. Theoretical findings are corroborated by numerical experiments, providing new insights into the training dynamics and robustness of neural network PDE solvers.

Convergence Guarantees for Gradient-Based Training of Neural PDE Solvers: From Linear to Nonlinear PDEs

TL;DR

This work develops a unified convergence framework for neural PDE solvers, covering PINNs and the Deep Ritz method, across linear and nonlinear regimes. It combines an NTK-based global convergence theory for broad linear operators with a Łojasiewicz-inequality-based approach to guarantee convergence to critical points for nonlinear PDEs under a random feature model, revealing implicit regularization. The results show that gradient flow and implicit gradient descent converge under coercivity and explain parameter-bounded training trajectories without explicit regularization. Numerical experiments on Burgers', Allen–Cahn, and Fisher–KPP equations validate the theory, highlighting robustness to multiscale dynamics and limitations of NTK in nonlinear settings. The work thus unifies PDE-solver analyses and points to extensions to deeper architectures and SGD regimes as promising directions for future research.

Abstract

We present a unified convergence theory for gradient-based training of neural network methods for partial differential equations (PDEs), covering both physics-informed neural networks (PINNs) and the Deep Ritz method. For linear PDEs, we extend the neural tangent kernel (NTK) framework for PINNs to establish global convergence guarantees for a broad class of linear operators. For nonlinear PDEs, we prove convergence to critical points via the Łojasiewicz inequality under the random feature model, eliminating the need for strong over-parameterization and encompassing both gradient flow and implicit gradient descent dynamics. Our results further reveal that the random feature model exhibits an implicit regularization effect, preventing parameter divergence to infinity. Theoretical findings are corroborated by numerical experiments, providing new insights into the training dynamics and robustness of neural network PDE solvers.

Paper Structure

This paper contains 51 sections, 20 theorems, 155 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let $m$ be a positive integer, and let $\alpha, \beta \in \mathbb{R}$ be not both zero. Given real numbers $p_1, \ldots, p_m$ such that $p_i \neq \pm p_j$ for $1 \le i \neq j\le m$, and $q_1, \ldots, q_m \in \mathbb{R}$, the functions $\alpha \tanh(p_1 t + q_1)+\beta \tanh'(p_1 t + q_1)\ ,\ \ldots,

Figures (7)

  • Figure 1: Evolution of relative Frobenius norm for two NTK matrices.
  • Figure 2: Relative Frobenius norm of two NTK matrices for Allen--Cahn equation.
  • Figure 3: Loss curves for IGD and Adam on Allen--Cahn equation.
  • Figure 4: Numerical solutions obtained by IGD at 3 different time points for Allen--Cahn equation.
  • Figure 5: Relative Frobenius norm of two NTK matrices for Fisher--KPP equation.
  • ...and 2 more figures

Theorems & Definitions (47)

  • Lemma 1: Linear independence
  • Remark 1: On the choice of activation functions
  • Definition 1: Admissible linear operators
  • Theorem 1: Convergence for admissible linear PDEs
  • Remark 2
  • Remark 3: Various extensions
  • Definition 2: Coercivity
  • Theorem 2: Łojasiewicz inequality, Theorem 1.1 in haraux2012some
  • Proposition 1: Convergence under coercivity
  • Remark 4
  • ...and 37 more