Table of Contents
Fetching ...

What Can One Expect When Solving PDEs Using Shallow Neural Networks?

Roy Y. He, Ying Liang, Hongkai Zhao, Yimin Zhong

TL;DR

The paper analyzes what to expect when solving elliptic PDEs with shallow two-layer networks using PINN and DRM formulations. It develops a spectral framework to compare the inherent ill-conditioning and frequency bias of the neural representation (especially with ReLU^p activations) against the operator-induced bias, revealing that the NN-induced biases dominate high-frequency components and slow learning for such frequencies. By deriving the Gram/KKT spectra and studying boundary enforcement strategies (constraints vs regularization), it demonstrates that scaling and non-homogeneous activations can alleviate some conditioning and bias but cannot provide full adaptivity for nonlinear two-layer nets without effective preconditioning. The work also contrasts linear random-feature-like representations with fully trained networks, showing that while scaling can improve performance, the computational cost and lack of robust preconditioning make traditional FEMs with preconditioners still more practical in many cases; it highlights the significant open questions around deeper networks and preconditioning strategies as future directions.

Abstract

We use elliptic partial differential equations (PDEs) as examples to show various properties and behaviors when shallow neural networks (SNNs) are used to represent the solutions. In particular, we study the numerical ill-conditioning, frequency bias, and the balance between the differential operator and the shallow network representation for different formulations of the PDEs and with various activation functions. Our study shows that the performance of Physics-Informed Neural Networks (PINNs) or Deep Ritz Method (DRM) using linear SNNs with power ReLU activation is dominated by their inherent ill-conditioning and spectral bias against high frequencies. Although this can be alleviated by using non-homogeneous activation functions with proper scaling, achieving such adaptivity for nonlinear SNNs remains costly due to ill-conditioning.

What Can One Expect When Solving PDEs Using Shallow Neural Networks?

TL;DR

The paper analyzes what to expect when solving elliptic PDEs with shallow two-layer networks using PINN and DRM formulations. It develops a spectral framework to compare the inherent ill-conditioning and frequency bias of the neural representation (especially with ReLU^p activations) against the operator-induced bias, revealing that the NN-induced biases dominate high-frequency components and slow learning for such frequencies. By deriving the Gram/KKT spectra and studying boundary enforcement strategies (constraints vs regularization), it demonstrates that scaling and non-homogeneous activations can alleviate some conditioning and bias but cannot provide full adaptivity for nonlinear two-layer nets without effective preconditioning. The work also contrasts linear random-feature-like representations with fully trained networks, showing that while scaling can improve performance, the computational cost and lack of robust preconditioning make traditional FEMs with preconditioners still more practical in many cases; it highlights the significant open questions around deeper networks and preconditioning strategies as future directions.

Abstract

We use elliptic partial differential equations (PDEs) as examples to show various properties and behaviors when shallow neural networks (SNNs) are used to represent the solutions. In particular, we study the numerical ill-conditioning, frequency bias, and the balance between the differential operator and the shallow network representation for different formulations of the PDEs and with various activation functions. Our study shows that the performance of Physics-Informed Neural Networks (PINNs) or Deep Ritz Method (DRM) using linear SNNs with power ReLU activation is dominated by their inherent ill-conditioning and spectral bias against high frequencies. Although this can be alleviated by using non-homogeneous activation functions with proper scaling, achieving such adaptivity for nonlinear SNNs remains costly due to ill-conditioning.

Paper Structure

This paper contains 23 sections, 11 theorems, 72 equations, 14 figures, 1 algorithm.

Key Result

Proposition 3.1

Using two-layer NNs eq_two_layer_NN with fixed $\mathbf w$ and $\mathbf b$, both PINN and DRM with boundary constraints eq_constrained_optimization_simple are equivalent to solving a quadratic minimization problem with linear constraints of the form And both PINN and DRM with boundary regularization eq_regularized_optimization_simple are equivalent to solving In both eq_constrained_quad and eq_r

Figures (14)

  • Figure 1: Composition of B-spline finite element basis of order $p+1$ using linear combination of ReLU power-$p$ bases. (a) $p=1$, (b) $p=2$, (c) $p=3$.
  • Figure 2: Spectrum of the $1000\times1000$ Gram matrices associated with ReLU$^{p}$ activation function for (a) $p=1$, (b) $p=2$, (c) $p=3$. The red dashed curves show the theoretically predicted decay rate $k^{-(2p+2)}$, for each $p$.
  • Figure 3: Eigenvectors of the Gram matrix $\mathbf G_2$ corresponding to shallow networks with $\text{ReLU}^2$ using $1000$ uniform biases $\mathbf b$ with all-one $\mathbf w$ ordered according to the descending eigenvalues.
  • Figure 4: Relative $L_2$ error of the solutions by FEMsp with linear bases, PINN with $\text{ReLU}^2$, and DRM with $\text{ReLU}$ and different numbers of bases (corresponding to evenly spaced grid points) $N$. For (a)-(c), the underlying solution has single Fourier mode: $\sin(k_{\max}\pi x-2\pi/3)$, while for (d)-(f), the solution has two Fourier modes: $\sin(2\pi x+3\pi/5)+\sin(k_{\max}\pi x-2\pi/3)$ where $k_{\max}$ increases. Here, the Dirichlet boundary conditions are imposed as constraints, and a direct linear solver is used to solve the linear system.
  • Figure 5: Relative $L_2$ error of the solutions by FEMsp with quadratic bases, PINN with $\text{ReLU}^3$, and DRM with $\text{ReLU}^2$ and different numbers of bases (corresponding to evenly spaced grid points) $N$. The other experimental set-ups are identical with those in Figure \ref{['L2_frequency_error_p1']}.
  • ...and 9 more figures

Theorems & Definitions (25)

  • Proposition 3.1
  • Proposition 3.2
  • Lemma 3.3
  • proof
  • Proposition 3.4
  • proof
  • Theorem 3.5
  • Theorem 3.6
  • proof
  • Theorem 3.7
  • ...and 15 more