Convergence Guarantees for Gradient-Based Training of Neural PDE Solvers: From Linear to Nonlinear PDEs
Wei Zhao, Tao Luo
TL;DR
This work develops a unified convergence framework for neural PDE solvers, covering PINNs and the Deep Ritz method, across linear and nonlinear regimes. It combines an NTK-based global convergence theory for broad linear operators with a Łojasiewicz-inequality-based approach to guarantee convergence to critical points for nonlinear PDEs under a random feature model, revealing implicit regularization. The results show that gradient flow and implicit gradient descent converge under coercivity and explain parameter-bounded training trajectories without explicit regularization. Numerical experiments on Burgers', Allen–Cahn, and Fisher–KPP equations validate the theory, highlighting robustness to multiscale dynamics and limitations of NTK in nonlinear settings. The work thus unifies PDE-solver analyses and points to extensions to deeper architectures and SGD regimes as promising directions for future research.
Abstract
We present a unified convergence theory for gradient-based training of neural network methods for partial differential equations (PDEs), covering both physics-informed neural networks (PINNs) and the Deep Ritz method. For linear PDEs, we extend the neural tangent kernel (NTK) framework for PINNs to establish global convergence guarantees for a broad class of linear operators. For nonlinear PDEs, we prove convergence to critical points via the Łojasiewicz inequality under the random feature model, eliminating the need for strong over-parameterization and encompassing both gradient flow and implicit gradient descent dynamics. Our results further reveal that the random feature model exhibits an implicit regularization effect, preventing parameter divergence to infinity. Theoretical findings are corroborated by numerical experiments, providing new insights into the training dynamics and robustness of neural network PDE solvers.
