Table of Contents
Fetching ...

Optimizing Variational Physics-Informed Neural Networks Using Least Squares

Carlos Uriarte, Manuela Bastidas, David Pardo, Jamie M. Taylor, Sergio Rojas

TL;DR

The paper tackles slow convergence of variational physics-informed neural networks trained with stochastic optimization by proposing a hybrid least-squares/gradient-descent (LS/GD) scheme that updates the last-layer weights via LS and the hidden-layer weights via GD. It formalizes Robust VPINNs (RVPINNs) through a residual-minimization framework that leverages a Riesz representative in the test space and discretizes both trial and test spaces, enabling efficient LS solves. A central contribution is the cost-aware analysis showing that forward-mode AD or ultraweak formulations (UltraPINNs) can reduce the per-iteration cost of LS/GD to be competitive with conventional GD, with substantial improvements demonstrated on one- and two-dimensional problems, including high-frequency and singular cases. The work highlights practical gains in convergence and speed, outlines implementation strategies in TensorFlow/Keras, and suggests future directions such as extending to other loss forms and integrating with second-order optimizers.

Abstract

Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we improve the convergence of the loss during training in most practical scenarios. This work analyzes the computational cost of the resulting hybrid Least-Squares/Gradient-Descent optimizer and explains how to implement it efficiently. In particular, we show that a traditional implementation based on backward-mode automatic differentiation leads to a prohibitively expensive algorithm. To remedy this, we propose using either forward-mode automatic differentiation or an ultraweak-type scheme that avoids the differentiation of trial functions in the discrete weak formulation. The proposed alternatives are up to one hundred times faster than the traditional one, recovering a computational cost-per-iteration similar to that of a conventional gradient-descent-based optimizer alone. To support our analysis, we derive computational estimates and conduct numerical experiments in one- and two-dimensional problems.

Optimizing Variational Physics-Informed Neural Networks Using Least Squares

TL;DR

The paper tackles slow convergence of variational physics-informed neural networks trained with stochastic optimization by proposing a hybrid least-squares/gradient-descent (LS/GD) scheme that updates the last-layer weights via LS and the hidden-layer weights via GD. It formalizes Robust VPINNs (RVPINNs) through a residual-minimization framework that leverages a Riesz representative in the test space and discretizes both trial and test spaces, enabling efficient LS solves. A central contribution is the cost-aware analysis showing that forward-mode AD or ultraweak formulations (UltraPINNs) can reduce the per-iteration cost of LS/GD to be competitive with conventional GD, with substantial improvements demonstrated on one- and two-dimensional problems, including high-frequency and singular cases. The work highlights practical gains in convergence and speed, outlines implementation strategies in TensorFlow/Keras, and suggests future directions such as extending to other loss forms and integrating with second-order optimizers.

Abstract

Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we improve the convergence of the loss during training in most practical scenarios. This work analyzes the computational cost of the resulting hybrid Least-Squares/Gradient-Descent optimizer and explains how to implement it efficiently. In particular, we show that a traditional implementation based on backward-mode automatic differentiation leads to a prohibitively expensive algorithm. To remedy this, we propose using either forward-mode automatic differentiation or an ultraweak-type scheme that avoids the differentiation of trial functions in the discrete weak formulation. The proposed alternatives are up to one hundred times faster than the traditional one, recovering a computational cost-per-iteration similar to that of a conventional gradient-descent-based optimizer alone. To support our analysis, we derive computational estimates and conduct numerical experiments in one- and two-dimensional problems.
Paper Structure (24 sections, 44 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 44 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Architecture of $u^{\boldsymbol{\alpha},\boldsymbol{\omega}}$. Arrows represent trainable parameters, and graphics within neurons represent activation functions. $\boldsymbol{\alpha}$ stands for the trainable parameters corresponding to all hidden layers, in red, and $\boldsymbol{\omega}$ for the coefficients in the final linear combination (also interpretable as a non-activated and unbiased scalar-valued layer), in blue.
  • Figure 2: Methodology of the hybrid LS/GD optimizer.
  • Figure 3: Costs of computing the gradient of a vector-valued neural network $\mathbb{R}^d\ni x \mapsto \mathbf{u}^{\boldsymbol{\alpha}}(x)\in \mathbb{R}^N$ with five hidden layers of width $2^{10}$. Experimentation is performed for $d, N\in\{2^0,2^2,\ldots,2^9\}$.
  • Figure 4: Results for VPINNs with manufactured solution $u^*(x)=\sin(4x)\sin(x/2)$ using either Adam or LS/Adam. The final relative errors are $2.72\%$ and $0.19\%$ for Adam and LS/Adam, respectively.
  • Figure 5: Results for UltraPINNs with manufactured solution $u^*(x)=\sin(4x)\sin(x/2)$ using either Adam or LS/Adam. The resulting final relative errors are $2.72\%$ and $0.20\%$ for Adam and LS/Adam.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Example : Poisson's Equation in weak form
  • Remark 1: Imposition of boundary conditions
  • Remark 2: Petrov-Galerkin viewpoint
  • Remark 3: Least-squares regularization
  • Remark 4: Total derivative viewpoint
  • Remark 5: Spectral viewpoint
  • Remark 6: Matrix construction vs. action
  • Remark 7
  • Remark 8: Revisiting total vs. partial derivative viewpoint