Table of Contents
Fetching ...

Architectural Strategies for the optimization of Physics-Informed Neural Networks

Hemanth Saratchandran, Shin-Fang Chng, Simon Lucey

TL;DR

This work analyzes how neural architectures influence Physics-Informed Neural Network (PINN) optimization through the Neural Tangent Kernel (NTK). It establishes that Gaussian activations yield favorable boundary NTK spectra, with a lower bound on the minimum eigenvalue that scales quartically with layer width, suggesting superior training dynamics for PINNs. The authors further propose equilibrated PINNs, using row-equilibrated weights to condition the loss landscape and improve convergence, supported by theoretical intuition and empirical gains. Across Burgers’ equation, Navier–Stokes, and high-frequency diffusion, Gaussian-based and equilibrated PINNs consistently outperform baselines, offering a practical architectural route to mitigate spectral-bias-related training instability in PINNs.

Abstract

Physics-informed neural networks (PINNs) offer a promising avenue for tackling both forward and inverse problems in partial differential equations (PDEs) by incorporating deep learning with fundamental physics principles. Despite their remarkable empirical success, PINNs have garnered a reputation for their notorious training challenges across a spectrum of PDEs. In this work, we delve into the intricacies of PINN optimization from a neural architecture perspective. Leveraging the Neural Tangent Kernel (NTK), our study reveals that Gaussian activations surpass several alternate activations when it comes to effectively training PINNs. Building on insights from numerical linear algebra, we introduce a preconditioned neural architecture, showcasing how such tailored architectures enhance the optimization process. Our theoretical findings are substantiated through rigorous validation against established PDEs within the scientific literature.

Architectural Strategies for the optimization of Physics-Informed Neural Networks

TL;DR

This work analyzes how neural architectures influence Physics-Informed Neural Network (PINN) optimization through the Neural Tangent Kernel (NTK). It establishes that Gaussian activations yield favorable boundary NTK spectra, with a lower bound on the minimum eigenvalue that scales quartically with layer width, suggesting superior training dynamics for PINNs. The authors further propose equilibrated PINNs, using row-equilibrated weights to condition the loss landscape and improve convergence, supported by theoretical intuition and empirical gains. Across Burgers’ equation, Navier–Stokes, and high-frequency diffusion, Gaussian-based and equilibrated PINNs consistently outperform baselines, offering a practical architectural route to mitigate spectral-bias-related training instability in PINNs.

Abstract

Physics-informed neural networks (PINNs) offer a promising avenue for tackling both forward and inverse problems in partial differential equations (PDEs) by incorporating deep learning with fundamental physics principles. Despite their remarkable empirical success, PINNs have garnered a reputation for their notorious training challenges across a spectrum of PDEs. In this work, we delve into the intricacies of PINN optimization from a neural architecture perspective. Leveraging the Neural Tangent Kernel (NTK), our study reveals that Gaussian activations surpass several alternate activations when it comes to effectively training PINNs. Building on insights from numerical linear algebra, we introduce a preconditioned neural architecture, showcasing how such tailored architectures enhance the optimization process. Our theoretical findings are substantiated through rigorous validation against established PDEs within the scientific literature.
Paper Structure (38 sections, 22 theorems, 119 equations, 15 figures, 4 tables)

This paper contains 38 sections, 22 theorems, 119 equations, 15 figures, 4 tables.

Key Result

Theorem 3.1

Let $u$ denote a depth $L$ neural network with $\phi(x) = e^{\frac{-x^2}{s^2}}$ as the activation, where $s^2 > 0$ is a fixed variance hyperparameter. Assume the first $L-1$ widths $\{n_1,\ldots,n_{L-1}\}$ are all the same $\overline{N}$. Assume that $n_k \geq N \geq \overline{N}$ for $1 \leq k \leq

Figures (15)

  • Figure 1: Right; The min. eigenvalue of the empirical NTK for a 2-hidden layer network. We took $N = 400$, $n_1 = 8N$, and $n_2 = 400$. As predicted by Thm. \ref{['thm;ntk_main_thm']}, $\lambda_{\min}(K_{uu})$ for a Gaussian-activated network grows much faster than a Tanh one. Left; zoom in of tanh network.
  • Figure 2: Top: Training/testing results for Burgers' equation. Bottom: Reconstruction of network solution plotted against exact solution at t= 0.5.
  • Figure 3: Top: $L^2$ train error (left). All other figures show reconstruction of the pressure field of each network at t = 1. The Gaussian-activated PINN has performed best in comparison to the others.
  • Figure 4: Loss landscape along the two most curved eigenvectors. The number at the top of each loss figure is the top two eigenvalues. Right: condition number of each network at that point. EG-PINN has a much lower condition number than the other two Gaussian-activated networks.
  • Figure 5: An example of a $\chi$, black curve.
  • ...and 10 more figures

Theorems & Definitions (38)

  • Theorem 3.1
  • Theorem 4.1: van1969condition
  • Proposition 4.2
  • Theorem A.1
  • Lemma A.2
  • Theorem A.3
  • proof : Proof of theorem \ref{['thm;main_ntk_1']}
  • Lemma A.4
  • proof
  • Lemma A.5
  • ...and 28 more