Table of Contents
Fetching ...

Numerical study on hyper parameter settings for neural network approximation to partial differential equations

Hee Jun Yang, Alexander Heinlein, Hyea Hyun Kim

TL;DR

This study addresses how hyperparameters affect neural network discretizations of PDEs, comparing PINN and Deep Ritz across Poisson, nonlinear, and eigenvalue problems. It systematically evaluates loss formulations, sampling strategies (Monte Carlo vs Gaussian quadrature), boundary enforcement, network architectures, and optimizers, using augmented Lagrangian terms and Fourier features where applicable. The findings show that Deep Ritz with augmented Lagrangian and Gaussian quadrature generally delivers superior accuracy and robustness on complex problems, while PINN remains versatile but more parameter-sensitive. The work offers practical guidelines for choosing hyperparameters based on problem complexity and reformulation, highlighting the practical impact for efficient and accurate PDE solvers in scientific computing.

Abstract

Approximate solutions of partial differential equations (PDEs) obtained by neural networks are highly affected by hyper parameter settings. For instance, the model training strongly depends on loss function design, including the choice of weight factors for different terms in the loss function, and the sampling set related to numerical integration; other hyper parameters, like the network architecture and the optimizer settings, also impact the model performance. On the other hand, suitable hyper parameter settings are known to be different for different model problems and currently no universal rule for the choice of hyper parameters is known. In this paper, for second order elliptic model problems, various hyper parameter settings are tested numerically to provide a practical guide for efficient and accurate neural network approximation. While a full study of all possible hyper parameter settings is not possible, we focus on studying the formulation of the PDE loss as well as the incorporation of the boundary conditions, the choice of collocation points associated with numerical integration schemes, and various approaches for dealing with loss imbalances will be extensively studied on various model problems; in addition to various Poisson model problems, also a nonlinear and an eigenvalue problem are considered.

Numerical study on hyper parameter settings for neural network approximation to partial differential equations

TL;DR

This study addresses how hyperparameters affect neural network discretizations of PDEs, comparing PINN and Deep Ritz across Poisson, nonlinear, and eigenvalue problems. It systematically evaluates loss formulations, sampling strategies (Monte Carlo vs Gaussian quadrature), boundary enforcement, network architectures, and optimizers, using augmented Lagrangian terms and Fourier features where applicable. The findings show that Deep Ritz with augmented Lagrangian and Gaussian quadrature generally delivers superior accuracy and robustness on complex problems, while PINN remains versatile but more parameter-sensitive. The work offers practical guidelines for choosing hyperparameters based on problem complexity and reformulation, highlighting the practical impact for efficient and accurate PDE solvers in scientific computing.

Abstract

Approximate solutions of partial differential equations (PDEs) obtained by neural networks are highly affected by hyper parameter settings. For instance, the model training strongly depends on loss function design, including the choice of weight factors for different terms in the loss function, and the sampling set related to numerical integration; other hyper parameters, like the network architecture and the optimizer settings, also impact the model performance. On the other hand, suitable hyper parameter settings are known to be different for different model problems and currently no universal rule for the choice of hyper parameters is known. In this paper, for second order elliptic model problems, various hyper parameter settings are tested numerically to provide a practical guide for efficient and accurate neural network approximation. While a full study of all possible hyper parameter settings is not possible, we focus on studying the formulation of the PDE loss as well as the incorporation of the boundary conditions, the choice of collocation points associated with numerical integration schemes, and various approaches for dealing with loss imbalances will be extensively studied on various model problems; in addition to various Poisson model problems, also a nonlinear and an eigenvalue problem are considered.

Paper Structure

This paper contains 24 sections, 43 equations, 5 figures, 18 tables.

Figures (5)

  • Figure 1: Examples \ref{['ex1']}--\ref{['ex3']}: solution plots for $k=1$ (left), $N=6$ (middle), and $A=100$ ans $\varepsilon=0.01$ (right), respectively.
  • Figure 2: Relative $L^2$-error history for $U({\bf x};\theta)$ over training epochs for the model solution \ref{['ex1']} with $k=1$: $L_{R,G}$ with $w_I=w_B=1$ is used for the loss function to train the neural network solution $U({\bf x};\theta)$. The error is computed by using a uniform test sample set of $101 \times 101$ grids over the problem domain.
  • Figure 3: Study on sampling sets for the example in \ref{['ex1']} with $k=1$: the average of the relative $L^2$-errors over various $w_B$ choices for the four loss formulations (left: PINN, right: deep Ritz) depending on the sampling approach (solid line: Gaussian quadrature, dashed line: Monte Carlo).
  • Figure 4: Smooth example in \ref{['ex1']} with $k=1$: absolute error plots for the PINN loss $L_{P,M}$ (top, left) and the deep Ritz loss $L_{R,M}$ (top, right) using Monte--Carlo integration as well as the PINN loss $L_{P,G}$ (bottom, left) and the deep Ritz loss $L_{R,G}$ (bottom, right) using Gaussian quadrature.
  • Figure 5: Loss landscape of Example \ref{['ex2']}-Surface plot (top) and Contour plot (middle), and the relative $L^2$-error for the corresponding $U({\bf x};\widetilde{\theta})$ in Contour plot (bottom): $J_{P,G}$ (left), $J_{R,G}$ (middle), and $L_{R,G}$ (right)