Global Convergence of Adjoint-Optimized Neural PDEs
Konstantin Riedl, Justin Sirignano, Konstantinos Spiliopoulos
TL;DR
The paper establishes a rigorous global convergence theory for adjoint-gradient training of neural-PDEs in the regime of infinite-width neural networks and infinite training time, focusing on nonlinear parabolic PDEs with NN terms in the source. It derives a computationally efficient gradient formula via a forward-adjoint pair, proves well-posedness both in finite width and in the infinite-width (NTK) limit, and analyzes the limit dynamics through an integro-differential equation driven by a positive definite NTK $T_{B_0}$ though lacking a spectral gap. A cycle-of-stopping-times argument, supported by a novel second-level adjoint analysis, yields decay of the loss $\mathcal{J}^*_{\tau}$ and shows that the trained limit solution $u^*_{\tau}$ converges weakly to the ground-truth data $h$, with adjoint $\widehat{u}^*_{\tau}$ vanishing in the limit. The results are complemented by numerical experiments validating the theory and providing insight into the training dynamics of NN-PDEs in practice.
Abstract
Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks, which requires solving the inverse problem of learning neural network terms from observed data in order to approximate missing or unresolved physics in the PDE model. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to the available ground truth data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (i) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (ii) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem in the neural network function even in the infinite-width hidden layer limit (unlike in typical neural network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.
