Table of Contents
Fetching ...

Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity

Faris Chaudhry

Abstract

We establish empirical scaling laws for Single-Layer Physics-Informed Neural Networks on canonical nonlinear PDEs. We identify a dual optimization failure: (i) a baseline pathology, where the solution error fails to decrease with network width, even at fixed nonlinearity, falling short of theoretical approximation bounds, and (ii) a compounding pathology, where this failure is exacerbated by nonlinearity. We provide quantitative evidence that a simple separable power law is insufficient, and that the scaling behavior is governed by a more complex, non-separable relationship. This failure is consistent with the concept of spectral bias, where networks struggle to learn the high-frequency solution components that intensify with nonlinearity. We show that optimization, not approximation capacity, is the primary bottleneck, and propose a methodology to empirically measure these complex scaling effects.

Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity

Abstract

We establish empirical scaling laws for Single-Layer Physics-Informed Neural Networks on canonical nonlinear PDEs. We identify a dual optimization failure: (i) a baseline pathology, where the solution error fails to decrease with network width, even at fixed nonlinearity, falling short of theoretical approximation bounds, and (ii) a compounding pathology, where this failure is exacerbated by nonlinearity. We provide quantitative evidence that a simple separable power law is insufficient, and that the scaling behavior is governed by a more complex, non-separable relationship. This failure is consistent with the concept of spectral bias, where networks struggle to learn the high-frequency solution components that intensify with nonlinearity. We show that optimization, not approximation capacity, is the primary bottleneck, and propose a methodology to empirically measure these complex scaling effects.
Paper Structure (12 sections, 2 equations, 4 figures, 3 tables)

This paper contains 12 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Error vs. Network Width ($N$) for the Poisson PDE. Tanh networks find low-error solutions but exhibit high variance and no clear scaling ($\alpha \approx 0.06 \pm 0.4$). ReLU networks fail to learn ($\alpha \approx 0.01 \pm 0.01$). The gray and red lines give the theoretical error decay rates of $\mathcal{O}(N^{-1/2})$ and $\mathcal{O}(N^{-1})$ respectively. It should be noted that the confidence intervals of the tanh error often intersect with theoretical estimates, but no consistent trend is observed.
  • Figure 2: Scaling law analysis for the Sine-Gordon equation. (a) Width scaling exponent $\alpha$ vs. hardness $\kappa$. Often $\alpha < 0$, implying increasing network width also increases error. (b) Final error vs. hardness $\kappa$ for different network widths $N$. The final error degrades significantly at a certain inflection point of hardness, indicating some kind of regime shift. (c) A representative example of error vs. width $N$ for the median hardness value $\kappa=2.0$ demonstrates that increased width fails to reduce error. Error bars denote standard error over 5 seeds.
  • Figure 3: Scaling law analysis for the Allen-Cahn equation. (a) Width scaling exponent $\alpha$ vs. hardness $\kappa = 1/D$. For both activations, $\alpha$ is consistently negative, indicating wider networks perform worse. (b) Final error vs. hardness $\kappa$. Tanh networks (dashed lines) achieve low error that is remarkably stable against increasing hardness, while ReLU networks (dotted lines) perform orders of magnitude worse. (c) A representative example of error vs. width $N$ for the median hardness value $\kappa=32.0$, clearly showing the negative scaling trend ($\alpha < 0$). Error bars denote standard error over 5 seeds.
  • Figure 4: Scaling law analysis for the Korteweg-de Vries (KdV) equation. (a) Width scaling exponent $\alpha$ vs. hardness $\kappa = A$. For both ReLU and tanh, $\alpha$ is consistently near or below zero, indicating no performance gain from increasing network width. (b) Final error vs. hardness $\kappa$. Error increases with hardness for both activations, though tanh generally performs better. (c) A representative example of error vs. width $N$ for $\kappa=2.0$, visualizing the flat ($\alpha \approx 0$) or negative scaling. Error bars denote standard error over 5 seeds.