Table of Contents
Fetching ...

PDE-aware Optimizer for Physics-informed Neural Networks

Vismay Churiwala, Hardik Shukla, Manurag Khullar

TL;DR

Physics-informed neural networks (PINNs) solve PDEs by embedding physics into the loss, but training can be unstable due to gradient misalignment among loss terms. The authors introduce a PDE-aware optimizer that uses the variance of per-sample PDE residual gradients to adapt updates, providing a Hessian-free, gradient-based preconditioning. Across 1D Burgers, Allen-Cahn, and KdV benchmarks, the method yields smoother convergence and lower absolute errors, especially in stiff regions, compared with Adam and SOAP. The work demonstrates a practical, interpretable approach to stabilize PINN training and outlines pathways for scaling to larger architectures and hardware.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical constraints into the loss function. However, standard optimizers such as Adam often struggle to balance competing loss terms, particularly in stiff or ill-conditioned systems. In this work, we propose a PDE-aware optimizer that adapts parameter updates based on the variance of per-sample PDE residual gradients. This method addresses gradient misalignment without incurring the heavy computational costs of second-order optimizers such as SOAP. We benchmark the PDE-aware optimizer against Adam and SOAP on 1D Burgers', Allen-Cahn and Korteweg-de Vries(KdV) equations. Across both PDEs, the PDE-aware optimizer achieves smoother convergence and lower absolute errors, particularly in regions with sharp gradients. Our results demonstrate the effectiveness of PDE residual-aware adaptivity in enhancing stability in PINNs training. While promising, further scaling on larger architectures and hardware accelerators remains an important direction for future research.

PDE-aware Optimizer for Physics-informed Neural Networks

TL;DR

Physics-informed neural networks (PINNs) solve PDEs by embedding physics into the loss, but training can be unstable due to gradient misalignment among loss terms. The authors introduce a PDE-aware optimizer that uses the variance of per-sample PDE residual gradients to adapt updates, providing a Hessian-free, gradient-based preconditioning. Across 1D Burgers, Allen-Cahn, and KdV benchmarks, the method yields smoother convergence and lower absolute errors, especially in stiff regions, compared with Adam and SOAP. The work demonstrates a practical, interpretable approach to stabilize PINN training and outlines pathways for scaling to larger architectures and hardware.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical constraints into the loss function. However, standard optimizers such as Adam often struggle to balance competing loss terms, particularly in stiff or ill-conditioned systems. In this work, we propose a PDE-aware optimizer that adapts parameter updates based on the variance of per-sample PDE residual gradients. This method addresses gradient misalignment without incurring the heavy computational costs of second-order optimizers such as SOAP. We benchmark the PDE-aware optimizer against Adam and SOAP on 1D Burgers', Allen-Cahn and Korteweg-de Vries(KdV) equations. Across both PDEs, the PDE-aware optimizer achieves smoother convergence and lower absolute errors, particularly in regions with sharp gradients. Our results demonstrate the effectiveness of PDE residual-aware adaptivity in enhancing stability in PINNs training. While promising, further scaling on larger architectures and hardware accelerators remains an important direction for future research.

Paper Structure

This paper contains 19 sections, 10 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Type I: The irregular green trajectory illustrates how the optimisation struggles when facing two types of gradient conflicts. Type II: The red trajectory shows how appropriate preconditioning through Second-order information could mitigate these conflicts by aligning gradients both within and between optimisation steps.
  • Figure 2: Flow chart of the PDE-Aware Optimizer. The first moment $\bm{m}_t$ is built from batch-averaged PDE‐residual gradients; the second moment $\bm{v}_t$ tracks their element-wise variance, so the pre-conditioned step $\bm{w}_{t+1}=\bm{w}_t-\eta\,\bm{m}_t/(\sqrt{\bm{v}_t}+\epsilon)$ automatically shrinks learning rates in stiff regions and enlarges them where the residual is smooth.
  • Figure 3: Validation-loss landscape over the $(\eta,\beta_{1},\beta_{2})$ grid for Burgers PDE.
  • Figure 4: Heatmap comparison of Adam, SOAP, and PDE-aware optimizers on the Burgers' equation
  • Figure 5: Absolute error ($|u_{\text{PINN}} - u_{\text{FDM}}|$) for Burgers’ equation across different optimizers
  • ...and 7 more figures