Table of Contents
Fetching ...

Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Chuqi Chen, Qixuan Zhou, Yahong Yang, Yang Xiang, Tao Luo

TL;DR

The paper investigates training dynamics of neural network–based PDE solvers through the lens of the kernel spectrum, introducing the effective rank as a metric for convergence difficulty. It shows that partition of unity and variance scaling initialization increase the kernel’s effective rank, thereby accelerating training across PINN, Deep Ritz, and DeepONet. Theoretical analysis and Random Feature Model experiments, complemented by extensive numerical studies, demonstrate consistent speedups and improved accuracy when using PoU and VS. The findings suggest practical guidelines for initialization and highlight potential avenues for hybrid solvers and architecture-informed design to further enhance convergence in scientific computing with neural nets.

Abstract

Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited accuracy. In this paper, we investigate the training dynamics of neural network-based PDE solvers with a focus on the impact of initialization techniques. We assess training difficulty by analyzing the eigenvalue distribution of the kernel and apply the concept of effective rank to quantify this difficulty, where a larger effective rank correlates with faster convergence of the training error. Building upon this, we discover through theoretical analysis and numerical experiments that two initialization techniques, partition of unity (PoU) and variance scaling (VS), enhance the effective rank, thereby accelerating the convergence of training error. Furthermore, comprehensive experiments using popular PDE-solving frameworks, such as PINN, Deep Ritz, and the operator learning framework DeepOnet, confirm that these initialization techniques consistently speed up convergence, in line with our theoretical findings.

Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

TL;DR

The paper investigates training dynamics of neural network–based PDE solvers through the lens of the kernel spectrum, introducing the effective rank as a metric for convergence difficulty. It shows that partition of unity and variance scaling initialization increase the kernel’s effective rank, thereby accelerating training across PINN, Deep Ritz, and DeepONet. Theoretical analysis and Random Feature Model experiments, complemented by extensive numerical studies, demonstrate consistent speedups and improved accuracy when using PoU and VS. The findings suggest practical guidelines for initialization and highlight potential avenues for hybrid solvers and architecture-informed design to further enhance convergence in scientific computing with neural nets.

Abstract

Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited accuracy. In this paper, we investigate the training dynamics of neural network-based PDE solvers with a focus on the impact of initialization techniques. We assess training difficulty by analyzing the eigenvalue distribution of the kernel and apply the concept of effective rank to quantify this difficulty, where a larger effective rank correlates with faster convergence of the training error. Building upon this, we discover through theoretical analysis and numerical experiments that two initialization techniques, partition of unity (PoU) and variance scaling (VS), enhance the effective rank, thereby accelerating the convergence of training error. Furthermore, comprehensive experiments using popular PDE-solving frameworks, such as PINN, Deep Ritz, and the operator learning framework DeepOnet, confirm that these initialization techniques consistently speed up convergence, in line with our theoretical findings.
Paper Structure (22 sections, 2 theorems, 50 equations, 18 figures, 7 tables)

This paper contains 22 sections, 2 theorems, 50 equations, 18 figures, 7 tables.

Key Result

Theorem 3.2

Let $\lambda_j^\mathrm{L}$ and $\lambda_j^\mathrm{R}$ be the eigenvalues of $\bm{\Phi}_{\mathrm{PoU}}^\mathrm{L}{\bm{\Phi}_{\mathrm{PoU}}^\mathrm{L}}^{\top}$ and $\bm{\Phi}_{\mathrm{PoU}}^\mathrm{R}{\bm{\Phi}_{\mathrm{PoU}}^\mathrm{R}}^{\top}$, respectively. With probability with probability $1-N^2

Figures (18)

  • Figure 1: Construction functions $\{\psi^{b}_n(x),n=1,\dots,M_p\}$. (a) 1D Construction functions $\psi_n^{b}(x)$ define on $[0,8]$ for dividing the region into four domains ($M_p = 4$). In this case $x_n = \{1,3,5,7\}$ with corresponding $r_n = \frac{8-0}{4} = 2$. (b) 2D Construction functions $\psi_n^{b}(x_1,x_2)$ define on $[-1,1]^2$ for dividing the region into four domains ($M_p = 4$).
  • Figure 1: Results for the solving $\bm{A}\bm{x} = \bm{b}$, where $\bm{A} = \text{diag}\{\lambda_1,\lambda_2,\dots,\lambda_N\}$. (a): Eigenvalue distribution of matrix A along with the corresponding effective rank. (b): Convergence curves for each dimension $x_i$, $i=1,2\dots N$. The red line indicates that the $x_i$ corresponding to the eigenvalues before the line have converged after 100 epochs.
  • Figure 1: 2D Helmholtz equation: first row: prediction of the PINN models; second row: pointwise absolute error. From left to right, the corresponding situations in the Tab. \ref{['tab:PINN2DHelmholtz']} are as follows.
  • Figure 1: The framework of the paper.
  • Figure 2: Loss curve for solving $\bm{A}\bm{x} =\bm{b}$ with mean square loss and the effective rank corresponding to different cases.
  • ...and 13 more figures

Theorems & Definitions (4)

  • Definition 3.1: Effective rank
  • Theorem 3.2: Similar eigenvalue distribution
  • Proof 1
  • Corollary 3.3: Similar effective rank