Table of Contents
Fetching ...

PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks

Sifan Wang, Bowen Li, Yuhan Chen, Paris Perdikaris

TL;DR

This work identifies why deep PINNs struggle to train due to poor initialization of network derivatives and introduces PirateNets, an architecture with adaptive residual connections and physics-informed initialization that starts as a linear, easily trainable model and progressively becomes more nonlinear. By embedding coordinates through random Fourier features and using gating in residual blocks initialized to preserve identity, PirateNets stabilize training and enable deep networks to minimize PDE residuals effectively. Across multiple PDE benchmarks (Allen-Cahn, KdV, Grey-Scott, Ginzburg–Landau, and Lid-driven cavity), PirateNets achieve state-of-the-art accuracy and demonstrate robust depth scalability, with ablations confirming the importance of alpha initialization and gating. The approach also supports leveraging data to initialize the final layer via least squares, integrating physical priors into the learning process and suggesting avenues for future extensions to neural operators and prior-informed design.

Abstract

While physics-informed neural networks (PINNs) have become a popular deep learning framework for tackling forward and inverse problems governed by partial differential equations (PDEs), their performance is known to degrade when larger and deeper neural network architectures are employed. Our study identifies that the root of this counter-intuitive behavior lies in the use of multi-layer perceptron (MLP) architectures with non-suitable initialization schemes, which result in poor trainablity for the network derivatives, and ultimately lead to an unstable minimization of the PDE residual loss. To address this, we introduce Physics-informed Residual Adaptive Networks (PirateNets), a novel architecture that is designed to facilitate stable and efficient training of deep PINN models. PirateNets leverage a novel adaptive residual connection, which allows the networks to be initialized as shallow networks that progressively deepen during training. We also show that the proposed initialization scheme allows us to encode appropriate inductive biases corresponding to a given PDE system into the network architecture. We provide comprehensive empirical evidence showing that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks. All code and data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/jaxpi}.

PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks

TL;DR

This work identifies why deep PINNs struggle to train due to poor initialization of network derivatives and introduces PirateNets, an architecture with adaptive residual connections and physics-informed initialization that starts as a linear, easily trainable model and progressively becomes more nonlinear. By embedding coordinates through random Fourier features and using gating in residual blocks initialized to preserve identity, PirateNets stabilize training and enable deep networks to minimize PDE residuals effectively. Across multiple PDE benchmarks (Allen-Cahn, KdV, Grey-Scott, Ginzburg–Landau, and Lid-driven cavity), PirateNets achieve state-of-the-art accuracy and demonstrate robust depth scalability, with ablations confirming the importance of alpha initialization and gating. The approach also supports leveraging data to initialize the final layer via least squares, integrating physical priors into the learning process and suggesting avenues for future extensions to neural operators and prior-informed design.

Abstract

While physics-informed neural networks (PINNs) have become a popular deep learning framework for tackling forward and inverse problems governed by partial differential equations (PDEs), their performance is known to degrade when larger and deeper neural network architectures are employed. Our study identifies that the root of this counter-intuitive behavior lies in the use of multi-layer perceptron (MLP) architectures with non-suitable initialization schemes, which result in poor trainablity for the network derivatives, and ultimately lead to an unstable minimization of the PDE residual loss. To address this, we introduce Physics-informed Residual Adaptive Networks (PirateNets), a novel architecture that is designed to facilitate stable and efficient training of deep PINN models. PirateNets leverage a novel adaptive residual connection, which allows the networks to be initialized as shallow networks that progressively deepen during training. We also show that the proposed initialization scheme allows us to encode appropriate inductive biases corresponding to a given PDE system into the network architecture. We provide comprehensive empirical evidence showing that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks. All code and data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/jaxpi}.
Paper Structure (24 sections, 4 theorems, 58 equations, 15 figures, 8 tables)

This paper contains 24 sections, 4 theorems, 58 equations, 15 figures, 8 tables.

Key Result

Proposition 1

Consider the second-order elliptic Dirichlet problem: with Dirichlet boundary condition and $f \in L^2(\Omega)$. Let $u: \Omega \rightarrow \mathbb{R}$ be its solution and $u_{\mathbf{\theta}}$ be a smooth approximation by PINNs. Define the expected loss function: Then, for any compact $V \subset \subset \Omega$, there exist a constant $C$ such that

Figures (15)

  • Figure 1: Allen-Cahn equation: Relative $L^2$ error of training PINNs using MLP, ResNet, and PirateNet backbones of varying depths, averaged over 5 random seeds for each architecture.
  • Figure 2: Regression:Left: Variance of the network derivative equipped with different activations across various network widths at initialization. Middle: Variance of MLP derivatives of different orders across various network width at initialization. Right: Relative $L^2$ error in approximating $y(x) = \sin(2 \pi x)$ with MLP derivatives of different orders. All statistics are averaged over 5 random seeds.
  • Figure 3: Physics-informed residual adaptive networks (PirateNets): In our model, input coordinates are first projected into a high-dimensional feature space using random Fourier features, then followed by passing through $N$ adaptive residual blocks. Each block consists of three dense layers, augmented with two gating operations that incorporate shallow latent features. The key module of the architecture is the adaptive skip connection with a trainable parameter $\alpha$ initialized at $0$, so that at the initialization phase, each block reduces to an identity mapping, and the model can be viewed as a linear combination of the coordinate embeddings. It turns out that this approach helps to circumvent the issue of pathological initialization in deep PDE residual networks. We propose a physics-informed initialization for the final layer by solving a least squares problem to fit the available data, while all other weights are initialized following the Glorot scheme, and biases are set to zero. As training progresses, the depth of the model gradually increases as the nonlinearities become activated, enabling the model to progressively recover its approximation capacity.
  • Figure 4: Allen-Cahn equation:Top: Comparison between the solution predicted by a trained PirateNet and the reference solution. The detailed hyper-parameter settings are presented in Table \ref{['tab: ac_config']}. Bottom: Convergence of the initial condition loss, the PDE residual loss, and the relative $L^2$ test error during the training of a PirateNet and a Modified MLP backbone, alongside the evolution of nonlinearities in each of the PirateNet residual blocks.
  • Figure 5: Allen-Cahn equation:Left: Relative $L^2$ test errors obtained by a PirateNet with the last layer initialized by the least square solution for fitting the initial condition and the linearized PDE solution, respectively. Middle: Relative $L^2$ test errors of training a Modified MLP and a PirateNet backbone with or without the physics-informed initalization. Without physics-informed initialization, the final layer defaults to a standard dense layer with weights initialized using the Glorot scheme and biases set to zero. Right: Relative $L^2$ errors of training a Modified MLP and a PirateNet backbone of different depth. Each ablation study is performed under the same hyper-parameter settings, with results averaged over 5 random seeds.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Claim 1
  • Proposition 1
  • Proposition 2
  • Corollary 3.1
  • Proposition 3
  • proof : Proof of Proposition \ref{['prop: elliptic']}
  • proof : Proof of Proposition \ref{['prop: parabolic']}
  • proof : Proof of Proposition \ref{['prop']}