Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations
Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling
TL;DR
This work analyzes the residual loss in Physics-Informed Neural Networks (PINNs) to understand when the residual can be globally minimized. It shows that a two-layer PINN with width $n_1 \ge N$ (where $N$ is the number of collocation points) can attain a global minimum at critical points, provided the $k$-th derivative of the activation is bijective and certain rank conditions hold. The authors argue that activation functions with well-behaved high-order derivatives, particularly sinusoidal activations and Softplus, are advantageous for solving $k$-th order PDEs, and they provide practical guidelines for activation design in PINNs. Comprehensive experiments on transport, wave, Helmholtz, and Klein-Gordon equations corroborate the theory, showing that width and activation choice significantly affect residual minimization and accuracy. Overall, the paper furnishes a theoretical framework and empirical validation for activation function design and network width that enhance PINN performance on linear PDEs and offer insights for nonlinear cases.
Abstract
The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a $k$-th order PDE, the $k$-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.
