Table of Contents
Fetching ...

A neuron-wise subspace correction method for the finite neuron method

Jongho Park, Jinchao Xu, Xiaofeng Xu

TL;DR

This work introduces Neuron-wise Parallel Subspace Correction (NPSC) for the finite neuron method, addressing the slow convergence of gradient-based training caused by ill-conditioning in the linear layer. By decomposing the parameter space into a linear part and per-neuron nonlinear parts, NPSC alternates between a linear-layer solve and parallel local neuron optimizations, leveraging a newly designed optimal 1D preconditioner that makes the linear solve cost and iterations independent of the neuron count. The nonlinear neuron updates use a Levenberg–Marquardt strategy to find good local minima, with an adjustment step and backtracking to stabilize and accelerate convergence. Across function-approximation and elliptic PDE experiments, NPSC outperforms standard gradient-based methods and ablations confirm the critical roles of preconditioning, adaptive learning rates, and neuron-wise optimization in achieving higher accuracy and robustness, including in oscillatory and higher-dimensional settings where quadrature costs are nontrivial.

Abstract

In this paper, we propose a novel algorithm called Neuron-wise Parallel Subspace Correction Method (NPSC) for the finite neuron method that approximates numerical solutions of partial differential equations (PDEs) using neural network functions. Despite extremely extensive research activities in applying neural networks for numerical PDEs, there is still a serious lack of effective training algorithms that can achieve adequate accuracy, even for one-dimensional problems. Based on recent results on the spectral properties of linear layers and landscape analysis for single neuron problems, we develop a special type of subspace correction method that optimizes the linear layer and each neuron in the nonlinear layer separately. An optimal preconditioner that resolves the ill-conditioning of the linear layer is presented for one-dimensional problems, so that the linear layer is trained in a uniform number of iterations with respect to the number of neurons. In each single neuron problem, a good local minimum that avoids flat energy regions is found by a superlinearly convergent algorithm. Numerical experiments on function approximation problems and PDEs demonstrate better performance of the proposed method than other gradient-based methods.

A neuron-wise subspace correction method for the finite neuron method

TL;DR

This work introduces Neuron-wise Parallel Subspace Correction (NPSC) for the finite neuron method, addressing the slow convergence of gradient-based training caused by ill-conditioning in the linear layer. By decomposing the parameter space into a linear part and per-neuron nonlinear parts, NPSC alternates between a linear-layer solve and parallel local neuron optimizations, leveraging a newly designed optimal 1D preconditioner that makes the linear solve cost and iterations independent of the neuron count. The nonlinear neuron updates use a Levenberg–Marquardt strategy to find good local minima, with an adjustment step and backtracking to stabilize and accelerate convergence. Across function-approximation and elliptic PDE experiments, NPSC outperforms standard gradient-based methods and ablations confirm the critical roles of preconditioning, adaptive learning rates, and neuron-wise optimization in achieving higher accuracy and robustness, including in oscillatory and higher-dimensional settings where quadrature costs are nontrivial.

Abstract

In this paper, we propose a novel algorithm called Neuron-wise Parallel Subspace Correction Method (NPSC) for the finite neuron method that approximates numerical solutions of partial differential equations (PDEs) using neural network functions. Despite extremely extensive research activities in applying neural networks for numerical PDEs, there is still a serious lack of effective training algorithms that can achieve adequate accuracy, even for one-dimensional problems. Based on recent results on the spectral properties of linear layers and landscape analysis for single neuron problems, we develop a special type of subspace correction method that optimizes the linear layer and each neuron in the nonlinear layer separately. An optimal preconditioner that resolves the ill-conditioning of the linear layer is presented for one-dimensional problems, so that the linear layer is trained in a uniform number of iterations with respect to the number of neurons. In each single neuron problem, a good local minimum that avoids flat energy regions is found by a superlinearly convergent algorithm. Numerical experiments on function approximation problems and PDEs demonstrate better performance of the proposed method than other gradient-based methods.
Paper Structure (22 sections, 6 theorems, 72 equations, 6 figures, 2 tables, 3 algorithms)

This paper contains 22 sections, 6 theorems, 72 equations, 6 figures, 2 tables, 3 algorithms.

Key Result

Proposition 2.1

\newlabelProp:M0 In M, the condition number $\kappa (M)$ of the matrix $M$ satisfies

Figures (6)

  • Figure 1: Numerical results for the function approximation problem \ref{['M']}. (a) Decay of the relative energy error $\frac{E_M(a^{(k)}) - E_M ( M^{-1}\beta)}{|E_M (M^{-1} \beta)|}$ in the gradient descent method (GD) and Adam, where $k$ denotes the number of iterations. (b, c) Exact solution and its approximations generated by various numbers of iterations of GD and Adam ($n = 2^5$).
  • Figure 1: (a) Space decomposition of the solution space $\Theta$ of \ref{['model']} into subspaces $A$ and $\{ W_i \}_{i=1}^n$. (b) Subspace correction procedure of NPSC.
  • Figure 1: Numerical results for the function approximation problems (a--c)\ref{['Ex1']} and (d--f)\ref{['Ex2']}. (a, d) Decay of the relative energy error $\frac{E(\theta^{(k)}) - E^*}{|E^*|}$ in various training algorithms ($n = 2^5$). (b, e) Exact solution and its approximations ($n = 2^5$, $10^3$ epochs). (c, f)$L^2$-errors with respect to the number of neurons ($10^3$ epochs).
  • Figure 1: Decay of the relative energy error $\frac{E(\theta^{(k)}) - E^*}{|E^*|}$ in various training algorithms for solving \ref{['Ex1']}. "Backt" denotes the backtracking scheme presented in \ref{['Alg:backt']}, and $\tau$ denotes the fixed learning rate.
  • Figure 2: Numerical results for the elliptic PDEs (a, b) \ref{['Ex3']}, (c, d)\ref{['Ex4']}, (e, f) \ref{['Ex5']}, and (g, h) \ref{['Ex6']}. (a, c, e, g) Decay of the relative energy error $\frac{E(\theta^{(k)}) - E^*}{|E^*|}$ in various training algorithms ($n = 2^5$). (b, d, f, h)$L^2$-errors with respect to the number of neurons ($10^3$ epochs).
  • ...and 1 more figures

Theorems & Definitions (11)

  • Proposition 2.1
  • Lemma 3.1
  • Proof 1
  • Theorem 3.2
  • Remark 3.3
  • Proposition 4.1
  • Theorem 4.2
  • Proof 2
  • Proposition 4.3
  • Proof 3
  • ...and 1 more