PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks
Sifan Wang, Bowen Li, Yuhan Chen, Paris Perdikaris
TL;DR
This work identifies why deep PINNs struggle to train due to poor initialization of network derivatives and introduces PirateNets, an architecture with adaptive residual connections and physics-informed initialization that starts as a linear, easily trainable model and progressively becomes more nonlinear. By embedding coordinates through random Fourier features and using gating in residual blocks initialized to preserve identity, PirateNets stabilize training and enable deep networks to minimize PDE residuals effectively. Across multiple PDE benchmarks (Allen-Cahn, KdV, Grey-Scott, Ginzburg–Landau, and Lid-driven cavity), PirateNets achieve state-of-the-art accuracy and demonstrate robust depth scalability, with ablations confirming the importance of alpha initialization and gating. The approach also supports leveraging data to initialize the final layer via least squares, integrating physical priors into the learning process and suggesting avenues for future extensions to neural operators and prior-informed design.
Abstract
While physics-informed neural networks (PINNs) have become a popular deep learning framework for tackling forward and inverse problems governed by partial differential equations (PDEs), their performance is known to degrade when larger and deeper neural network architectures are employed. Our study identifies that the root of this counter-intuitive behavior lies in the use of multi-layer perceptron (MLP) architectures with non-suitable initialization schemes, which result in poor trainablity for the network derivatives, and ultimately lead to an unstable minimization of the PDE residual loss. To address this, we introduce Physics-informed Residual Adaptive Networks (PirateNets), a novel architecture that is designed to facilitate stable and efficient training of deep PINN models. PirateNets leverage a novel adaptive residual connection, which allows the networks to be initialized as shallow networks that progressively deepen during training. We also show that the proposed initialization scheme allows us to encode appropriate inductive biases corresponding to a given PDE system into the network architecture. We provide comprehensive empirical evidence showing that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks. All code and data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/jaxpi}.
