Table of Contents
Fetching ...

Convergence Analysis of Block Newton Methods for 1D Shallow Neural Network Approximation

Zhiqiang Cai, Anastassia Doktorova, Robert D. Falgout, César Herrera

TL;DR

This work addresses local convergence of Block Newton (BN) methods for one-dimensional shallow ReLU networks with $n$ neurons, where the parameter vector is $θ ∈ ℝ^{2n+1}$ and split into linear $c ∈ ℝ^{n+1}$ and nonlinear $b ∈ ℝ^n$. It develops a 2×2 block BN framework with outer iterations (NL-GS, L-GS, Jacobi) and inner Newton solves, and analyzes convergence via a fixed-point map $G(θ)$ with Jacobian $J_G(θ^*)$, establishing local convergence when the Hessian $∇^2_θ F(θ^*)$ is SPD and the block inverses exist. The reduced BN (rBN) further drops non-contributing neurons to shrink the parameter set while preserving convergence under the same SPD-type conditions. Applications to 1D diffusion-reaction and least-squares approximation demonstrate practical impact, including a numerical example showing interior-layer recovery and substantial error reduction.

Abstract

This paper analyzes local convergence of the block Newton (BN) method introduced in [5, 6] for one-dimensional shallow neural network approximation to functions and diffusion-reaction problems. The BN method consists of the 2x2 block nonlinear Gauss-Seidel, linear Gauss-Seidel, or Jacobi method for outer iteration and the Newton method for inner iteration. The blocks are corresponding to the linear and the nonlinear parameters. Under some reasonable assumptions, we establish local convergence of the BN methods as well as the reduced BN (rBN) method for one-dimensional diffusion-reaction problems and least-squares function approximation. Unlike common optimization methods, the rBN allows for the reduction of the number of parameters during the optimization process when some neurons contribute little to the approximation or are at nearly optimal locations.

Convergence Analysis of Block Newton Methods for 1D Shallow Neural Network Approximation

TL;DR

This work addresses local convergence of Block Newton (BN) methods for one-dimensional shallow ReLU networks with neurons, where the parameter vector is and split into linear and nonlinear . It develops a 2×2 block BN framework with outer iterations (NL-GS, L-GS, Jacobi) and inner Newton solves, and analyzes convergence via a fixed-point map with Jacobian , establishing local convergence when the Hessian is SPD and the block inverses exist. The reduced BN (rBN) further drops non-contributing neurons to shrink the parameter set while preserving convergence under the same SPD-type conditions. Applications to 1D diffusion-reaction and least-squares approximation demonstrate practical impact, including a numerical example showing interior-layer recovery and substantial error reduction.

Abstract

This paper analyzes local convergence of the block Newton (BN) method introduced in [5, 6] for one-dimensional shallow neural network approximation to functions and diffusion-reaction problems. The BN method consists of the 2x2 block nonlinear Gauss-Seidel, linear Gauss-Seidel, or Jacobi method for outer iteration and the Newton method for inner iteration. The blocks are corresponding to the linear and the nonlinear parameters. Under some reasonable assumptions, we establish local convergence of the BN methods as well as the reduced BN (rBN) method for one-dimensional diffusion-reaction problems and least-squares function approximation. Unlike common optimization methods, the rBN allows for the reduction of the number of parameters during the optimization process when some neurons contribute little to the approximation or are at nearly optimal locations.
Paper Structure (11 sections, 8 theorems, 90 equations, 1 figure, 1 algorithm)

This paper contains 11 sections, 8 theorems, 90 equations, 1 figure, 1 algorithm.

Key Result

Theorem 3.1

\newlabelthmOS0 Suppose that $G : {\@fontswitch{}{\mathcal{}} O} \rightarrow \mathbb{R}^{2n+1}$ has a fixed point $\hbox{\boldmath${\theta}$}^{*} \in {\@fontswitch{}{\mathcal{}} O}$ and that the mapping $G$ is differentiable at $\hbox{\boldmath${\theta}$}^{*}$. Denote by $\|\cdot\|$ a norm in $\ma

Figures (1)

  • Figure 1: For $\nu = \varepsilon^2 = 10^{-6}$: (a) initial NN model with 16 uniform breakpoints; $\frac{|u-u_n|_{1}}{|u|_1} = 0.988$, (b) optimized NN model with 16 breakpoints, 100 iterations, $\frac{|u-u_n|_{1}}{|u|_1} =0.173$.

Theorems & Definitions (19)

  • Theorem 3.1: Ostroswki
  • Proof 1
  • Lemma 3.2
  • Proof 2
  • Lemma 3.3
  • Proof 3
  • Theorem 3.4
  • Proof 4
  • Remark 3.5
  • Lemma 4.1
  • ...and 9 more