Table of Contents
Fetching ...

Pseudo-differential-enhanced physics-informed neural networks

Andrew Gracyk

TL;DR

It is established that the pseudo-differential enhanced physics-informed neural networks (PINNs), an extension of gradient enhancement but in Fourier space, improve spectral eigenvalue decay of the neural tangent kernel (NTK), and so their methods contribute towards the learning of high frequencies in early training.

Abstract

We present pseudo-differential enhanced physics-informed neural networks (PINNs), an extension of gradient enhancement but in Fourier space. Gradient enhancement of PINNs dictates that the PDE residual is taken to a higher differential order than prescribed by the PDE, added to the objective as an augmented term in order to improve training and overall learning fidelity. We propose the same procedure after application via Fourier transforms, since differentiating in Fourier space is multiplication with the Fourier wavenumber under suitable decay. Our methods are fast and efficient. Our methods oftentimes achieve superior PINN versus numerical error in fewer training iterations, potentially pair well with few samples in collocation, and can on occasion break plateaus in low collocation settings. Moreover, our methods are suitable for fractional derivatives. We establish that our methods improve spectral eigenvalue decay of the neural tangent kernel (NTK), and so our methods contribute towards the learning of high frequencies in early training, mitigating the effects of frequency bias up to the polynomial order and possibly greater with smooth activations. Our methods accommodate advanced techniques in PINNs, such as Fourier feature embeddings. A pitfall of discrete Fourier transforms via the Fast Fourier Transform (FFT) is mesh subjugation, and so we demonstrate compatibility of our methods for greater mesh flexibility and invariance on alternative Euclidean and non-Euclidean domains via Monte Carlo methods and otherwise.

Pseudo-differential-enhanced physics-informed neural networks

TL;DR

It is established that the pseudo-differential enhanced physics-informed neural networks (PINNs), an extension of gradient enhancement but in Fourier space, improve spectral eigenvalue decay of the neural tangent kernel (NTK), and so their methods contribute towards the learning of high frequencies in early training.

Abstract

We present pseudo-differential enhanced physics-informed neural networks (PINNs), an extension of gradient enhancement but in Fourier space. Gradient enhancement of PINNs dictates that the PDE residual is taken to a higher differential order than prescribed by the PDE, added to the objective as an augmented term in order to improve training and overall learning fidelity. We propose the same procedure after application via Fourier transforms, since differentiating in Fourier space is multiplication with the Fourier wavenumber under suitable decay. Our methods are fast and efficient. Our methods oftentimes achieve superior PINN versus numerical error in fewer training iterations, potentially pair well with few samples in collocation, and can on occasion break plateaus in low collocation settings. Moreover, our methods are suitable for fractional derivatives. We establish that our methods improve spectral eigenvalue decay of the neural tangent kernel (NTK), and so our methods contribute towards the learning of high frequencies in early training, mitigating the effects of frequency bias up to the polynomial order and possibly greater with smooth activations. Our methods accommodate advanced techniques in PINNs, such as Fourier feature embeddings. A pitfall of discrete Fourier transforms via the Fast Fourier Transform (FFT) is mesh subjugation, and so we demonstrate compatibility of our methods for greater mesh flexibility and invariance on alternative Euclidean and non-Euclidean domains via Monte Carlo methods and otherwise.
Paper Structure (27 sections, 126 equations, 24 figures, 1 table, 3 algorithms)

This paper contains 27 sections, 126 equations, 24 figures, 1 table, 3 algorithms.

Figures (24)

  • Figure 1: We plot (left) our Fourier enhanced PINN solution pointwise error on discretizations on three instances of retraining versus (right) a vanilla PINN on a log scale on two indices corresponding to $t=0.25, 0.75$ on the Allen-Cahn equation. Lower is better.
  • Figure 2: We demonstrate on our Navier-Stokes experiment with severe Fourier enhanced loss tuned coefficient that high frequencies are learned faster. We train a Navier-Stokes PINN with and without Fourier enhancement with coefficient $0.5 \times \lambda_{\text{physics}}$ on a vanilla MLP with Fourier feature embedding and the SOAP optimizer for 500 epochs. High error power at high frequency corresponds to inability to learn high frequency. Thus, our methods mitigate spectral bias.
  • Figure 3: We present PDE solution results of our method with (a) the enhanced Fourier loss on the Allen-Cahn equation and (b) a vanilla PINN. The only advanced technique (aside from our Fourier loss) here is the Fourier feature embedding with $\sigma=1.0$ in the architecture. We train for $\sim 150,000$ descent iterations with the Adam optimizer and use a batch size of 35 for the physics and boundary loss. Our Fourier symbol here is $P(\xi) = 2\pi i \xi$. We emphasize the symbol $P(\xi)$ does not have $P(\xi) = 1 + \hdots \ $, since this corresponds to exact physics loss in Fourier space.
  • Figure 4: We plot reserved memory usage on a Fourier gradient enhanced versus a traditional gradient enhanced PINN on Burger's equation with a triangular domain at a single iteration (it is about constant per epoch).
  • Figure 5: We plot relative $L^2$ error discretized on three instances of Burger's equation (with a square domain; not a triangular domain) using a Monte-Carlo Fourier loss versus a grid FFT Fourier loss over $10,000$ training iterations with the Adam optimizer. We choose training coefficient $0.025 \times \mathcal{L}_{\text{enhanced}}$, and 200 uniformly sampled points for the physics loss. Here, we choose a vanilla MLP with $\text{tanh}(\cdot)$ activation and a learning rate of $\gamma = 1\mathrm{e}{-3}$. We truncate the modes to 12 in each.
  • ...and 19 more figures