Table of Contents
Fetching ...

When and why PINNs fail to train: A neural tangent kernel perspective

Sifan Wang, Xinling Yu, Paris Perdikaris

TL;DR

This work analyzes physics-informed neural networks (PINNs) through the Neural Tangent Kernel (NTK) lens. It derives the PINN NTK, proves that in the infinite-width limit it converges to a deterministic kernel and remains nearly constant during training, and uses this to explain why PINNs exhibit spectral bias and imbalanced convergence across loss terms. A practical adaptive training algorithm is proposed to balance the NTK-driven convergence rates by updating loss weights based on NTK eigenvalues, improving trainability and accuracy in several PDE scenarios. Numerical experiments on 1D Poisson and wave equations validate the theory, demonstrate NTK stability with width, and show substantial improvements when balancing loss components. The results provide a principled pathway to kernel-based analysis and strategy design for robust, provable PINN training.

Abstract

Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. The data and code accompanying this manuscript are publicly available at \url{https://github.com/PredictiveIntelligenceLab/PINNsNTK}.

When and why PINNs fail to train: A neural tangent kernel perspective

TL;DR

This work analyzes physics-informed neural networks (PINNs) through the Neural Tangent Kernel (NTK) lens. It derives the PINN NTK, proves that in the infinite-width limit it converges to a deterministic kernel and remains nearly constant during training, and uses this to explain why PINNs exhibit spectral bias and imbalanced convergence across loss terms. A practical adaptive training algorithm is proposed to balance the NTK-driven convergence rates by updating loss weights based on NTK eigenvalues, improving trainability and accuracy in several PDE scenarios. Numerical experiments on 1D Poisson and wave equations validate the theory, demonstrate NTK stability with width, and show substantial improvements when balancing loss components. The results provide a principled pathway to kernel-based analysis and strategy design for robust, provable PINN training.

Abstract

Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. The data and code accompanying this manuscript are publicly available at \url{https://github.com/PredictiveIntelligenceLab/PINNsNTK}.

Paper Structure

This paper contains 16 sections, 8 theorems, 124 equations, 8 figures, 1 algorithm.

Key Result

Lemma 3.1

Given the data points $\{\bm{x}_b^i, g(\bm{x}_b^i)\}_{i=1}^{N_b}, \{\bm{x}_r^i, f(\bm{x}_r^i)\}_{i=1}^{N_r}$ and the gradient flow eq: gradient_flow, $u(t)$ and $\mathcal{L}u(t)$ obey the following evolution where $\bm{K}_{ru}(t) = \bm{K}_{ur}^T(t)$ and $\bm{K}_{uu}(t) \in \mathbb{R}^{N_b \times N_b}, \bm{K}_{ur}(t) \in \mathbb{R}^{N_b \times N_r}, and \bm{K}_{rr}(t) \in \mathbb{R}^{N_r \times N_

Figures (8)

  • Figure 1: Model problem (1D Poisson equation): The eigenvalues of $\bm{K}, \bm{K}_{uu}$ and $\bm{K}_{rr}$ at initialization in descending order for different fabricated solutions $u(x) = \sin(a \pi x)$ where $a =1,2,4$.
  • Figure 2: Model problem \ref{['sec: convergence_NTK']} (1D Poisson equation): (a) (b) The relative change of parameters $\bm{\theta}$ and the NTK of PINNs $\bm{K}$ obtained by training a fully-connected neural network with one hidden layer and different widths ($10, 100, 500$) via $10,000$ iterations of full-batch gradient descent with a learning rate of $10^{-5}$. (c) The eigenvalues of the NTK $\bm{K}$ at initialization and at the last step of training ($n = 10,000$).
  • Figure 3: Model problem \ref{['sec: convergence_NTK']} (1D Poisson equation): (a) (b) The relative change of parameters $\bm{\theta}$ and the NTK of PINNs $\bm{K}$ obtained by training a fully-connected neural network with three hidden layers and different widths ($10, 100, 500$) via $10,000$ iterations of full-batch gradient descent with a learning rate of $10^{-5}$. (c) The eigenvalues of the NTK $\bm{K}$ at initialization and at the last step of training ($n = 10,000$).
  • Figure 4: Model problem \ref{['sec: alg_adaptive_weights']} (1D Poisson equation): (a) The predicted solution against the exact solution obtained by training a fully-connected neural network of one hidden layer with width $=100$ via $40,000$ iterations of full-batch gradient descent with a learning rate of $10^{-5}$ . The relative $L^2$ error is $2.40e-01$. (b) The predicted solution against the exact solution obtained by training the same neural network using fixed weights $\lambda_b= 100, \lambda_r =1$ via $40,000$ iterations of full-batch gradient descent with a learning rate of $10^{-5}$. The relative $L^2$ error is $1.63e-03$.
  • Figure 5: Model problem of equation \ref{['sec: alg_adaptive_weights']} (1D Poisson equation): The relative $L^2$ error of predicted solutions averaged over 10 independent trials by training a fully-connected neural network of one hidden layer with width $=100$ using different fixed weights $\lambda_b \in [1, 500]$ for $40,000$ gradient descent iterations.
  • ...and 3 more figures

Theorems & Definitions (25)

  • Lemma 3.1
  • proof
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Theorem 4.1
  • proof
  • Remark 4.2
  • Theorem 4.3
  • proof
  • ...and 15 more