What Can One Expect When Solving PDEs Using Shallow Neural Networks?
Roy Y. He, Ying Liang, Hongkai Zhao, Yimin Zhong
TL;DR
The paper analyzes what to expect when solving elliptic PDEs with shallow two-layer networks using PINN and DRM formulations. It develops a spectral framework to compare the inherent ill-conditioning and frequency bias of the neural representation (especially with ReLU^p activations) against the operator-induced bias, revealing that the NN-induced biases dominate high-frequency components and slow learning for such frequencies. By deriving the Gram/KKT spectra and studying boundary enforcement strategies (constraints vs regularization), it demonstrates that scaling and non-homogeneous activations can alleviate some conditioning and bias but cannot provide full adaptivity for nonlinear two-layer nets without effective preconditioning. The work also contrasts linear random-feature-like representations with fully trained networks, showing that while scaling can improve performance, the computational cost and lack of robust preconditioning make traditional FEMs with preconditioners still more practical in many cases; it highlights the significant open questions around deeper networks and preconditioning strategies as future directions.
Abstract
We use elliptic partial differential equations (PDEs) as examples to show various properties and behaviors when shallow neural networks (SNNs) are used to represent the solutions. In particular, we study the numerical ill-conditioning, frequency bias, and the balance between the differential operator and the shallow network representation for different formulations of the PDEs and with various activation functions. Our study shows that the performance of Physics-Informed Neural Networks (PINNs) or Deep Ritz Method (DRM) using linear SNNs with power ReLU activation is dominated by their inherent ill-conditioning and spectral bias against high frequencies. Although this can be alleviated by using non-homogeneous activation functions with proper scaling, achieving such adaptivity for nonlinear SNNs remains costly due to ill-conditioning.
