Table of Contents
Fetching ...

Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism

Levi McClenny, Ulisses Braga-Neto

TL;DR

This work tackles the instability and slow convergence of standard PINNs on stiff PDEs by introducing Self-Adaptive PINNs (SA-PINNs), which assign trainable, per-point weights to the residual, boundary, and initial losses via a soft attention mask. The method jointly optimizes network weights and adaptive weights, uses a Gaussian-Process–based weight map to enable stochastic gradient training, and provides NTK-based insights into how per-point weighting equalizes and smooths the training dynamics. Empirical results across Burgers, Helmholtz, Allen–Cahn, and 2D Burgers demonstrate substantially improved accuracy with fewer epochs, and SGD experiments further illustrate practical benefits of per-point weighting. The NTK analysis offers a theoretical lens suggesting why SA-PINNs stabilize training by balancing eigenvalues across loss components, with potential implications for broader PDE-constrained learning applications.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged recently as a promising application of deep neural networks to the numerical solution of nonlinear partial differential equations (PDEs). However, it has been recognized that adaptive procedures are needed to force the neural network to fit accurately the stubborn spots in the solution of "stiff" PDEs. In this paper, we propose a fundamentally new way to train PINNs adaptively, where the adaptation weights are fully trainable and applied to each training point individually, so the neural network learns autonomously which regions of the solution are difficult and is forced to focus on them. The self-adaptation weights specify a soft multiplicative soft attention mask, which is reminiscent of similar mechanisms used in computer vision. The basic idea behind these SA-PINNs is to make the weights increase as the corresponding losses increase, which is accomplished by training the network to simultaneously minimize the losses and maximize the weights. In addition, we show how to build a continuous map of self-adaptive weights using Gaussian Process regression, which allows the use of stochastic gradient descent in problems where conventional gradient descent is not enough to produce accurate solutions. Finally, we derive the Neural Tangent Kernel matrix for SA-PINNs and use it to obtain a heuristic understanding of the effect of the self-adaptive weights on the dynamics of training in the limiting case of infinitely-wide PINNs, which suggests that SA-PINNs work by producing a smooth equalization of the eigenvalues of the NTK matrix corresponding to the different loss terms. In numerical experiments with several linear and nonlinear benchmark problems, the SA-PINN outperformed other state-of-the-art PINN algorithm in L2 error, while using a smaller number of training epochs.

Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism

TL;DR

This work tackles the instability and slow convergence of standard PINNs on stiff PDEs by introducing Self-Adaptive PINNs (SA-PINNs), which assign trainable, per-point weights to the residual, boundary, and initial losses via a soft attention mask. The method jointly optimizes network weights and adaptive weights, uses a Gaussian-Process–based weight map to enable stochastic gradient training, and provides NTK-based insights into how per-point weighting equalizes and smooths the training dynamics. Empirical results across Burgers, Helmholtz, Allen–Cahn, and 2D Burgers demonstrate substantially improved accuracy with fewer epochs, and SGD experiments further illustrate practical benefits of per-point weighting. The NTK analysis offers a theoretical lens suggesting why SA-PINNs stabilize training by balancing eigenvalues across loss components, with potential implications for broader PDE-constrained learning applications.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged recently as a promising application of deep neural networks to the numerical solution of nonlinear partial differential equations (PDEs). However, it has been recognized that adaptive procedures are needed to force the neural network to fit accurately the stubborn spots in the solution of "stiff" PDEs. In this paper, we propose a fundamentally new way to train PINNs adaptively, where the adaptation weights are fully trainable and applied to each training point individually, so the neural network learns autonomously which regions of the solution are difficult and is forced to focus on them. The self-adaptation weights specify a soft multiplicative soft attention mask, which is reminiscent of similar mechanisms used in computer vision. The basic idea behind these SA-PINNs is to make the weights increase as the corresponding losses increase, which is accomplished by training the network to simultaneously minimize the losses and maximize the weights. In addition, we show how to build a continuous map of self-adaptive weights using Gaussian Process regression, which allows the use of stochastic gradient descent in problems where conventional gradient descent is not enough to produce accurate solutions. Finally, we derive the Neural Tangent Kernel matrix for SA-PINNs and use it to obtain a heuristic understanding of the effect of the self-adaptive weights on the dynamics of training in the limiting case of infinitely-wide PINNs, which suggests that SA-PINNs work by producing a smooth equalization of the eigenvalues of the NTK matrix corresponding to the different loss terms. In numerical experiments with several linear and nonlinear benchmark problems, the SA-PINN outperformed other state-of-the-art PINN algorithm in L2 error, while using a smaller number of training epochs.

Paper Structure

This paper contains 19 sections, 39 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Mask function examples. From the upper left to the bottom right: polynomial mask, $q=2$; polynomial mask, $q=4$; smooth logistic mask; sharp logistic mask.
  • Figure 2: High-fidelity (left) vs. predicted (right) solutions for the viscous Burgers PDE.
  • Figure 3: Top: predicted solution of the viscous Burgers PDE. Middle: Cross-sections of the approximated vs. actual solutions for various x-domain snapshots. Bottom left: Residual $r(x,t)$ across the spatial-temporal domain. Bottom right: Absolute error between prediction and high-fidelity solution across the spatial-temporal domain.
  • Figure 4: Trained weights for residue points across the domain $\Omega$. Larger/brighter colored points correspond to larger weights.
  • Figure 5: Exact (left) vs. predicted (right) solutions for the Helmholtz PDE.
  • ...and 12 more figures