Table of Contents
Fetching ...

Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks

Wenqian Chen, Amanda A. Howard, Panos Stinis

TL;DR

This work identifies that plain PINNs and PIDeepONets can fail when residual convergence rates differ dramatically across training points, with the slowest rate dominating the overall solution. It introduces inverse residual decay rate (irdr) and a balanced residual decay rate (BRDR) weighting scheme that proportionally emphasizes slowly converging residuals while enforcing a mean weight of 1, plus an adaptive scaling factor to keep the effective learning rate near its maximum. BRDR is extended to mini-batch training and shown to yield bounded, smooth weights, faster convergence, lower final errors, and reduced training uncertainty across PINN and PIDeepONet benchmarks, including 2D Helmholtz, 1D Allen–Cahn, 1D Burgers, and operator-learning tasks. The approach competes with or outperforms state-of-the-art adaptive weighting methods (SA, RBA, NTK, CK) while incurring modest computational overhead and reduced hyperparameter tuning, with code available for reproducibility.

Abstract

Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence rate of residuals at different training points, where the slowest convergence rate dominates the overall solution convergence. Based on these observations, we propose a pointwise adaptive weighting method that balances the residual decay rate across different training points. The performance of our proposed adaptive weighting method is compared with current state-of-the-art adaptive weighting methods on benchmark problems for both physics-informed neural networks and physics-informed deep operator networks. Through extensive numerical results we demonstrate that our proposed approach of balanced residual decay rates offers several advantages, including bounded weights, high prediction accuracy, fast convergence rate, low training uncertainty, low computational cost, and ease of hyperparameter tuning.

Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks

TL;DR

This work identifies that plain PINNs and PIDeepONets can fail when residual convergence rates differ dramatically across training points, with the slowest rate dominating the overall solution. It introduces inverse residual decay rate (irdr) and a balanced residual decay rate (BRDR) weighting scheme that proportionally emphasizes slowly converging residuals while enforcing a mean weight of 1, plus an adaptive scaling factor to keep the effective learning rate near its maximum. BRDR is extended to mini-batch training and shown to yield bounded, smooth weights, faster convergence, lower final errors, and reduced training uncertainty across PINN and PIDeepONet benchmarks, including 2D Helmholtz, 1D Allen–Cahn, 1D Burgers, and operator-learning tasks. The approach competes with or outperforms state-of-the-art adaptive weighting methods (SA, RBA, NTK, CK) while incurring modest computational overhead and reduced hyperparameter tuning, with code available for reproducibility.

Abstract

Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy and efficiency. In this work, we demonstrate that the failure of plain physics-informed neural networks arises from the significant discrepancy in the convergence rate of residuals at different training points, where the slowest convergence rate dominates the overall solution convergence. Based on these observations, we propose a pointwise adaptive weighting method that balances the residual decay rate across different training points. The performance of our proposed adaptive weighting method is compared with current state-of-the-art adaptive weighting methods on benchmark problems for both physics-informed neural networks and physics-informed deep operator networks. Through extensive numerical results we demonstrate that our proposed approach of balanced residual decay rates offers several advantages, including bounded weights, high prediction accuracy, fast convergence rate, low training uncertainty, low computational cost, and ease of hyperparameter tuning.
Paper Structure (22 sections, 46 equations, 15 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 46 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: The residual decay process for four training points in a 1-dimensional Poisson equation, as described in Section \ref{['sec_rdr_diff']}.
  • Figure 2: The relationship between the convergence rate $\lambda$ and the inverse residual decay rate ($irdr$) calculated with different smoothing factor $\beta_c$. The left panel shows $irdr$ for a time-decaying residual with a fixed convergence rate, $R = R_0 \exp(-\lambda n)$. The right panel illustrates $irdr$ for a time-decaying residual where the convergence rate is initially $\lambda = 1e-5$ and is reduced by half at $n = 100,000$.
  • Figure 3: The history of solution and average inverse residual decay rate during training process from plain PINN (left) and BRDR PINN (right) for the 1D Poisson equation. Note that both the plain PINN and BRDR PINN share the same network initialization for the same $k$.
  • Figure 4: PINN for the 2D Helmholtz equation: The history of $L_2$ relative error, unweighted loss of each component and the average weight ratio of BC to PDE from fixed-weight training, and adaptive-weight training("SA", "RBA", "BRDR", "BRDR+"). Note that all the cases share the same network architecture and the same random seed for initialization of network parameters.
  • Figure 5: PINN for 1D Allen-Cahn equation: The history of $L_2$ relative error, unweighted loss of each component and the average weight ratio of IC to PDE from fixed-weight training, and adaptive-weight training("SA", "RBA", "BRDR", "BRDR+"). Note that all the cases share the same network architecture and the same random seed for initialization of network parameters.
  • ...and 10 more figures