Table of Contents
Fetching ...

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

Wenrui Hao, Qingguo Hong, Xianlin Jin

TL;DR

The paper addresses solving PDEs using neural-network discretizations in a variational energy framework and introduces a Gauss-Newton method tailored to this variational form. It derives the update via an approximate Hessian H(\\theta) = J(\\theta) + Q(\\theta), argues that Q is small and can be neglected, and establishes a Gauss-Newton iteration that achieves superlinear convergence to semiregular zeros. It also proves that, under suitable quadrature and rank conditions, the variational GN can be identical to the L2-Gauss-Newton method, and it analyzes a randomized GN variant for scalability. Numerical experiments in 1D, 2D, and 5D demonstrate that Gauss-Newton outperforms gradient-based methods and L-BFGS in accuracy and efficiency, validating the approach and its potential for high-dimensional PDE discretizations.

Abstract

The numerical solution of differential equations using machine learning-based approaches has gained significant popularity. Neural network-based discretization has emerged as a powerful tool for solving differential equations by parameterizing a set of functions. Various approaches, such as the deep Ritz method and physics-informed neural networks, have been developed for numerical solutions. Training algorithms, including gradient descent and greedy algorithms, have been proposed to solve the resulting optimization problems. In this paper, we focus on the variational formulation of the problem and propose a Gauss- Newton method for computing the numerical solution. We provide a comprehensive analysis of the superlinear convergence properties of this method, along with a discussion on semi-regular zeros of the vanishing gradient. Numerical examples are presented to demonstrate the efficiency of the proposed Gauss-Newton method.

Gauss Newton method for solving variational problems of PDEs with neural network discretizaitons

TL;DR

The paper addresses solving PDEs using neural-network discretizations in a variational energy framework and introduces a Gauss-Newton method tailored to this variational form. It derives the update via an approximate Hessian H(\\theta) = J(\\theta) + Q(\\theta), argues that Q is small and can be neglected, and establishes a Gauss-Newton iteration that achieves superlinear convergence to semiregular zeros. It also proves that, under suitable quadrature and rank conditions, the variational GN can be identical to the L2-Gauss-Newton method, and it analyzes a randomized GN variant for scalability. Numerical experiments in 1D, 2D, and 5D demonstrate that Gauss-Newton outperforms gradient-based methods and L-BFGS in accuracy and efficiency, validating the approach and its potential for high-dimensional PDE discretizations.

Abstract

The numerical solution of differential equations using machine learning-based approaches has gained significant popularity. Neural network-based discretization has emerged as a powerful tool for solving differential equations by parameterizing a set of functions. Various approaches, such as the deep Ritz method and physics-informed neural networks, have been developed for numerical solutions. Training algorithms, including gradient descent and greedy algorithms, have been proposed to solve the resulting optimization problems. In this paper, we focus on the variational formulation of the problem and propose a Gauss- Newton method for computing the numerical solution. We provide a comprehensive analysis of the superlinear convergence properties of this method, along with a discussion on semi-regular zeros of the vanishing gradient. Numerical examples are presented to demonstrate the efficiency of the proposed Gauss-Newton method.
Paper Structure (11 sections, 13 theorems, 110 equations, 3 figures, 4 tables)

This paper contains 11 sections, 13 theorems, 110 equations, 3 figures, 4 tables.

Key Result

Lemma 1

For $\forall \epsilon > 0$, there exist $\delta > 0$, $J\in \mathbb{N}^{+}$ and $m\in \mathbb{N}^{+}$, such that $\theta\in \mathbb{R}^m$, $DNN_J\subset H^1(\Omega)$, and for $\Vert \theta-\theta^* \Vert < \delta$, $\theta^*=\arg\min L(\theta)$, if $D^2_{\theta} u(x,\theta) \in H^1(\Omega)$, then it

Figures (3)

  • Figure 1: Testing Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based training algorithms are applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 1000 epochs until it reaches $1 \times 10^{-5}$.
  • Figure 2: Testing Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based algorithms. ADAM is applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 2000 epochs until it reaches $1 \times 10^{-5}$. SGD is applied with an initial learning rate of $1\times 10^{-2}$ which is halved every 2000 epochs before going below $1\times 10^{-4}$.
  • Figure 3: Testing Relative Errors vs. Iterations. Left: Newton-type training algorithms. The Gauss-Newton method is employed with a back-tracking strategy to determine the learning rate, while L-BFGS is applied with the strong Wolfe condition. Right: Gradient-based algorithms. Both ADAM and SGD are applied with an initial learning rate of $1 \times 10^{-3}$, which is then halved every 2000 epochs until it reaches $1 \times 10^{-5}$.

Theorems & Definitions (28)

  • Lemma 1
  • proof
  • Definition 1
  • Definition 2
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • ...and 18 more