Table of Contents
Fetching ...

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

Andrea Bonfanti, Giuseppe Bruno, Cristina Cipriani

TL;DR

This work establishes that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief, and explores the convergence guarantees of second-order optimization methods in both linear and nonlinear cases.

Abstract

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

TL;DR

This work establishes that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief, and explores the convergence guarantees of second-order optimization methods in both linear and nonlinear cases.

Abstract

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.
Paper Structure (30 sections, 12 theorems, 73 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 12 theorems, 73 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.2

Given the data eq:collocation and the gradient flow eq:gradient_flow, then $u_\theta$ and $r_\theta$ satisfy the following where $K(t) = J(t)J(t)^T$ and

Figures (8)

  • Figure 1: (a) Mean and standard deviation of the spectral norm of $K(0)$ as a function of the number of neurons $m$ for $10$ independent experiments. Left: linear case. Right: nonlinear case. (b) Mean and standard deviation of $\Delta K(t) := \frac{\|K(t)-K(0)\|}{\|K(0)\|}$ over the network's width $m$, for $10$ independent experiments. Left: linear case. Right: nonlinear case.
  • Figure 2: (a) Left: in yellow the non-zero components of the Hessian matrix at initialization (up in the linear case, down the nonlinear one). Center: mean and standard deviation of the spectral norm of the $H_r(0)$ over $m$ in the linear case (for $10$ independent experiments). Right: same as Center, but for a nonlinear PDE. (b) Eigenvalues of $K(0)$ for a first-order optimizer and $D(0)$ for a second-order method applied to Burgers' equation.
  • Figure 3: (a) Poisson equation: median and standard deviation of the relative $L^2$ loss for different optimizers over training iterations (repetitions over 10 independent runs). (b) Convection equation: median and standard deviation of the $L^2$ loss after $1000$ iterations achieved over 5 independent runs with and without CT for different values of the convection coefficient $\beta$(left) and solution obtained with LM (and no other enhancement) after 5000 iterations with $\beta$ = 100 (right).
  • Figure 4: (a) Burgers' equation: mean and standard deviation of the relative $L^2$ loss for various optimizers over wall time (repetitions over 10 independent runs). (b) Navier-Stokes equation: mean and standard deviation of the relative $L^2$ loss over the PDE time $\tau$ for PINNs trained with Adam and LM ($10$ independent runs). Both optimization methods are enhanced with causality training.
  • Figure 5: Mean and standard deviation of the relative $L^2$ loss on the test set on the Wave equation for Adam, L-BFGS and LM optimizer over iterations (repetition over $10$ independent runs).
  • ...and 3 more figures

Theorems & Definitions (33)

  • Remark 2.1
  • Remark 3.1
  • Lemma 3.2
  • proof
  • Theorem 3.4
  • proof
  • Proposition 3.5
  • proof
  • Remark 3.6
  • Proposition 3.7
  • ...and 23 more