Table of Contents
Fetching ...

On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks

Sifan Wang, Hanwen Wang, Paris Perdikaris

TL;DR

The paper analyzes why physics-informed neural networks misrepresent high-frequency or multi-scale PDE solutions by viewing training through the Neural Tangent Kernel (NTK) lens and identifying an eigenvector (spectral) bias. It introduces Fourier-feature embeddings to reshape the NTK spectrum and proposes two multi-scale PINN architectures (MFF and ST-MFF) that embed inputs with fixed Fourier bases to enable efficient learning across scales, including spatio-temporal domains. The authors demonstrate substantial improvements on forward and inverse multi-scale PDE benchmarks (Poisson, heat, wave, Gray-Scott) and provide a public codebase for reproducibility. This work offers a principled route to more robust PINNs for complex multi-scale systems and outlines future directions in NTK-based theory and initialization strategies.

Abstract

Physics-informed neural networks (PINNs) are demonstrating remarkable promise in integrating physical models with gappy and noisy observational data, but they still struggle in cases where the target functions to be approximated exhibit high-frequency or multi-scale features. In this work we investigate this limitation through the lens of Neural Tangent Kernel (NTK) theory and elucidate how PINNs are biased towards learning functions along the dominant eigen-directions of their limiting NTK. Using this observation, we construct novel architectures that employ spatio-temporal and multi-scale random Fourier features, and justify how such coordinate embedding layers can lead to robust and accurate PINN models. Numerical examples are presented for several challenging cases where conventional PINN models fail, including wave propagation and reaction-diffusion dynamics, illustrating how the proposed methods can be used to effectively tackle both forward and inverse problems involving partial differential equations with multi-scale behavior. All code an data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/MultiscalePINNs}.

On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed neural networks

TL;DR

The paper analyzes why physics-informed neural networks misrepresent high-frequency or multi-scale PDE solutions by viewing training through the Neural Tangent Kernel (NTK) lens and identifying an eigenvector (spectral) bias. It introduces Fourier-feature embeddings to reshape the NTK spectrum and proposes two multi-scale PINN architectures (MFF and ST-MFF) that embed inputs with fixed Fourier bases to enable efficient learning across scales, including spatio-temporal domains. The authors demonstrate substantial improvements on forward and inverse multi-scale PDE benchmarks (Poisson, heat, wave, Gray-Scott) and provide a public codebase for reproducibility. This work offers a principled route to more robust PINNs for complex multi-scale systems and outlines future directions in NTK-based theory and initialization strategies.

Abstract

Physics-informed neural networks (PINNs) are demonstrating remarkable promise in integrating physical models with gappy and noisy observational data, but they still struggle in cases where the target functions to be approximated exhibit high-frequency or multi-scale features. In this work we investigate this limitation through the lens of Neural Tangent Kernel (NTK) theory and elucidate how PINNs are biased towards learning functions along the dominant eigen-directions of their limiting NTK. Using this observation, we construct novel architectures that employ spatio-temporal and multi-scale random Fourier features, and justify how such coordinate embedding layers can lead to robust and accurate PINN models. Numerical examples are presented for several challenging cases where conventional PINN models fail, including wave propagation and reaction-diffusion dynamics, illustrating how the proposed methods can be used to effectively tackle both forward and inverse problems involving partial differential equations with multi-scale behavior. All code an data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/MultiscalePINNs}.

Paper Structure

This paper contains 17 sections, 3 theorems, 72 equations, 18 figures, 2 tables.

Key Result

Lemma 3.1

For the kernel $\bm{K}(\bm{x}, \bm{x}') = \frac{1}{m} \sum_{k=1}^m \cos(\bm{b}_k^T (\bm{x} - \bm{x}'))$, the eigenfunction $g(\bm{x})$ corresponding to non-zero eigenvalues satisfying the the following equation

Figures (18)

  • Figure 1: 1D Poisson equation: Results obtained by training a conventional physics-informed neural network (5-layer, 200 hidden units, $\tanh$ activations) via $10^7$ iterations of gradient descent. Left: Comparison of the predicted and exact solutions. Middle: Point-wise error between the predicted and the exact solution. Right: Evolution of the residual loss $\mathcal{L}_r$, the boundary loss $\mathcal{L}_b$, as well as the relative $L^2$ error during training.
  • Figure 2: NTK eigen-decomposition of a fully-connected neural network (4 layer, 100 hidden units, $\tanh$ activations) with Fourier features initialized by $\sigma = 1$ on 100 equally spaced points in $[0,1]$:(a): The NTK eigenvalues in descending order. (b): The six leading eigenvectors of the NTK in descending order of corresponding eigenvalues.
  • Figure 3: NTK eigen-decomposition of a fully-connected neural network (4 layer, 100 hidden units, $\tanh$ activations) with Fourier features initialized by $\sigma = 10$ on 100 equally spaced points in $[0,1]$:(a): The NTK eigenvalues in descending order. (b): The six leading eigenvectors of the NTK in descending order of corresponding eigenvalues.
  • Figure 4: Frequency domain analysis of the first leading NTK eigenvector for a fully-connected neural network (4 layer, 100 hidden units, $\tanh$ activations) with Fourier features initialized by different $\sigma \in [1, 50]$, evaluated on 100 equally spaced points in $[0,1]$.
  • Figure 5: Training a network with Fourier features initialized by $\sigma = 10$ to fit the target function $f(x) = \sin(20\pi x) + \sin(2 \pi x)$ for $10,000$ epochs:(a): Network prediction (dash red) against the ground truth (light blue). The network prediction exhibits high frequencies when fitting the data points during training. (b) middle: Relative change of the parameters $\bm{\theta}$ ($\frac{||\theta(t) - \theta(0)||_2}{\||\theta(0)\||_2}$) of the network during training. (b) right: Relative $L^2$ training error and test error during training.
  • ...and 13 more figures

Theorems & Definitions (8)

  • Lemma 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • proof
  • proof