Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

Wenrui Hao; Qingguo Hong; Xianlin Jin

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

Wenrui Hao, Qingguo Hong, Xianlin Jin

TL;DR

The paper addresses solving PDEs using neural-network discretizations in a variational energy framework and introduces a Gauss-Newton method tailored to this variational form. It derives the update via an approximate Hessian H(\\theta) = J(\\theta) + Q(\\theta), argues that Q is small and can be neglected, and establishes a Gauss-Newton iteration that achieves superlinear convergence to semiregular zeros. It also proves that, under suitable quadrature and rank conditions, the variational GN can be identical to the L2-Gauss-Newton method, and it analyzes a randomized GN variant for scalability. Numerical experiments in 1D, 2D, and 5D demonstrate that Gauss-Newton outperforms gradient-based methods and L-BFGS in accuracy and efficiency, validating the approach and its potential for high-dimensional PDE discretizations.

Abstract

The numerical solution of differential equations using machine learning-based approaches has gained significant popularity. Neural network-based discretization has emerged as a powerful tool for solving differential equations by parameterizing a set of functions. Various approaches, such as the deep Ritz method and physics-informed neural networks, have been developed for numerical solutions. Training algorithms, including gradient descent and greedy algorithms, have been proposed to solve the resulting optimization problems. In this paper, we focus on the variational formulation of the problem and propose a Gauss- Newton method for computing the numerical solution. We provide a comprehensive analysis of the superlinear convergence properties of this method, along with a discussion on semi-regular zeros of the vanishing gradient. Numerical examples are presented to demonstrate the efficiency of the proposed Gauss-Newton method.

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

TL;DR

Abstract

Paper Structure (11 sections, 13 theorems, 110 equations, 3 figures, 4 tables)

This paper contains 11 sections, 13 theorems, 110 equations, 3 figures, 4 tables.

Introduction
Problem setup
Gauss-Newton method for the variational problem
Gauss-Newton method for solving the L2 minimization problem
The consistency between Gauss-Newton methods for L2 minimization and variational problems
Semiregular zeros of $\nabla L(\theta)=0$
Convergence analysis
Gauss-Newton method
Random Gauss-Newton method
Numerical experiments
Conclusions

Key Result

Lemma 1

For $\forall \epsilon > 0$, there exist $\delta > 0$, $J\in \mathbb{N}^{+}$ and $m\in \mathbb{N}^{+}$, such that $\theta\in \mathbb{R}^m$, $DNN_J\subset H^1(\Omega)$, and for $\Vert \theta-\theta^* \Vert < \delta$, $\theta^*=\arg\min L(\theta)$, if $D^2_{\theta} u(x,\theta) \in H^1(\Omega)$, then it

Figures (3)

Figure 1: Testing Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based training algorithms are applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 1000 epochs until it reaches $1 \times 10^{-5}$.
Figure 2: Testing Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based algorithms. ADAM is applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 2000 epochs until it reaches $1 \times 10^{-5}$. SGD is applied with an initial learning rate of $1\times 10^{-2}$ which is halved every 2000 epochs before going below $1\times 10^{-4}$.
Figure 3: Testing Relative Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based algorithms. Both ADAM and SGD are applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 2000 epochs until it reaches $1 \times 10^{-5}$.

Theorems & Definitions (28)

Lemma 1
proof
Definition 1
Definition 2
Lemma 2
proof
Lemma 3
proof
Theorem 1
proof
...and 18 more

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

TL;DR

Abstract

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (28)