Table of Contents
Fetching ...

Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations

Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling

TL;DR

This work analyzes the residual loss in Physics-Informed Neural Networks (PINNs) to understand when the residual can be globally minimized. It shows that a two-layer PINN with width $n_1 \ge N$ (where $N$ is the number of collocation points) can attain a global minimum at critical points, provided the $k$-th derivative of the activation is bijective and certain rank conditions hold. The authors argue that activation functions with well-behaved high-order derivatives, particularly sinusoidal activations and Softplus, are advantageous for solving $k$-th order PDEs, and they provide practical guidelines for activation design in PINNs. Comprehensive experiments on transport, wave, Helmholtz, and Klein-Gordon equations corroborate the theory, showing that width and activation choice significantly affect residual minimization and accuracy. Overall, the paper furnishes a theoretical framework and empirical validation for activation function design and network width that enhance PINN performance on linear PDEs and offer insights for nonlinear cases.

Abstract

The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a $k$-th order PDE, the $k$-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.

Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations

TL;DR

This work analyzes the residual loss in Physics-Informed Neural Networks (PINNs) to understand when the residual can be globally minimized. It shows that a two-layer PINN with width (where is the number of collocation points) can attain a global minimum at critical points, provided the -th derivative of the activation is bijective and certain rank conditions hold. The authors argue that activation functions with well-behaved high-order derivatives, particularly sinusoidal activations and Softplus, are advantageous for solving -th order PDEs, and they provide practical guidelines for activation design in PINNs. Comprehensive experiments on transport, wave, Helmholtz, and Klein-Gordon equations corroborate the theory, showing that width and activation choice significantly affect residual minimization and accuracy. Overall, the paper furnishes a theoretical framework and empirical validation for activation function design and network width that enhance PINN performance on linear PDEs and offer insights for nonlinear cases.

Abstract

The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a -th order PDE, the -th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.
Paper Structure (20 sections, 8 theorems, 33 equations, 9 figures, 4 tables)

This paper contains 20 sections, 8 theorems, 33 equations, 9 figures, 4 tables.

Key Result

Lemma 1

For a two-layer neural network $\hat{u}$ defined in Eq. eq: nn, and a $k$-th order differential operator $\mathcal{D}\left[u\right] = \frac{\partial^k u}{\partial x^k}$ of a single independent variable $x$, $\mathcal{D}[\hat{u}]$ is

Figures (9)

  • Figure 1: Derivatives of most of the common activation functions are not bijective. Here, only Softplus has a bijective first derivative.
  • Figure 2: Distribution of the linear outputs of the layers in Sine networks at initialization.
  • Figure 3: Distribution of linear outputs of PINNs' layers. Top row: 1024 neurons wide, Bottom row: 256 neurons wide
  • Figure 4: Transport Equation. Top panels: Exact solution, Middle panels: Predicted solution, Bottom panels: Absolute Error
  • Figure 5: Average residual loss curve for the Transport PINNs with the Softplus activation function and trained with 256 collocation samples.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Lemma 1
  • Lemma 2
  • Remark 1
  • Theorem 1
  • Definition 1: Non-degenerate Critical Point nguyen2017loss
  • Theorem 2
  • Remark 2
  • Lemma 3: Generalization of Lemma \ref{['theorem: 2 layer Du']}
  • proof
  • Lemma 4: Generalization of Lemma \ref{['theorem: 2 layer gradients W_L']}
  • ...and 5 more