Table of Contents
Fetching ...

Parametric Encoding with Attention and Convolution Mitigate Spectral Bias of Neural Partial Differential Equation Solvers

Mehdi Shishehbor, Shirin Hosseinmardi, Ramin Bostanabad

TL;DR

PGCAN addresses spectral bias in neural PDE solvers by introducing a Parametric Grid Convolutional Attention Network that uses a trainable grid-based encoder, local convolution to propagate boundary information, and a transformer-style decoder with attention. It demonstrates lower relative $L_2$ error $L_2^{re}$ than baseline methods across Burgers’, Convection, Helmholtz, and lid-driven cavity problems, with gains growing with problem complexity. The paper also introduces a directional PSD-based metric to quantify spectral bias and reports flatter PSDs for PGCAN errors compared with baselines, indicating improved frequency learning. Limitations include a uniform grid partitioning, with proposed future work on adaptive domain decomposition and extension to irregular/higher-dimensional domains.

Abstract

Deep neural networks (DNNs) are increasingly used to solve partial differential equations (PDEs) that naturally arise while modeling a wide range of systems and physical phenomena. However, the accuracy of such DNNs decreases as the PDE complexity increases and they also suffer from spectral bias as they tend to learn the low-frequency solution characteristics. To address these issues, we introduce Parametric Grid Convolutional Attention Networks (PGCANs) that can solve PDE systems without leveraging any labeled data in the domain. The main idea of PGCAN is to parameterize the input space with a grid-based encoder whose parameters are connected to the output via a DNN decoder that leverages attention to prioritize feature training. Our encoder provides a localized learning ability and uses convolution layers to avoid overfitting and improve information propagation rate from the boundaries to the interior of the domain. We test the performance of PGCAN on a wide range of PDE systems and show that it effectively addresses spectral bias and provides more accurate solutions compared to competing methods.

Parametric Encoding with Attention and Convolution Mitigate Spectral Bias of Neural Partial Differential Equation Solvers

TL;DR

PGCAN addresses spectral bias in neural PDE solvers by introducing a Parametric Grid Convolutional Attention Network that uses a trainable grid-based encoder, local convolution to propagate boundary information, and a transformer-style decoder with attention. It demonstrates lower relative error than baseline methods across Burgers’, Convection, Helmholtz, and lid-driven cavity problems, with gains growing with problem complexity. The paper also introduces a directional PSD-based metric to quantify spectral bias and reports flatter PSDs for PGCAN errors compared with baselines, indicating improved frequency learning. Limitations include a uniform grid partitioning, with proposed future work on adaptive domain decomposition and extension to irregular/higher-dimensional domains.

Abstract

Deep neural networks (DNNs) are increasingly used to solve partial differential equations (PDEs) that naturally arise while modeling a wide range of systems and physical phenomena. However, the accuracy of such DNNs decreases as the PDE complexity increases and they also suffer from spectral bias as they tend to learn the low-frequency solution characteristics. To address these issues, we introduce Parametric Grid Convolutional Attention Networks (PGCANs) that can solve PDE systems without leveraging any labeled data in the domain. The main idea of PGCAN is to parameterize the input space with a grid-based encoder whose parameters are connected to the output via a DNN decoder that leverages attention to prioritize feature training. Our encoder provides a localized learning ability and uses convolution layers to avoid overfitting and improve information propagation rate from the boundaries to the interior of the domain. We test the performance of PGCAN on a wide range of PDE systems and show that it effectively addresses spectral bias and provides more accurate solutions compared to competing methods.
Paper Structure (17 sections, 15 equations, 10 figures, 2 tables)

This paper contains 17 sections, 15 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Vanilla physics-informed neural networks (vPINNs) for solving $1D$ Burgers' equation: The model parameters, collectively denoted by $\boldsymbol{\theta}$, are optimized by minimizing the three-component loss function that encourages the network to satisfy the PDE inside the domain while reproducing the IC/BCs. These loss components are obtained by querying the network on a set of test points that are distributed inside the domain and on its boundaries.
  • Figure 2: Parametric grid encoding with $N_r=2$ in $2D$: The grid encoding has $N_r=2$ levels which have $3\times3$ and $9\times9$ cells at the coarse and fine resolutions. Each vertex at any resolution is endowed with some learnable features which are used to obtain the features of the query point $\boldsymbol{\zeta}$ via interpolation. $\bar{x}$ and $\bar{y}$ denote the local coordinates of $\boldsymbol{\zeta}$ in the cell that contains it.
  • Figure 3: Parametric Grid Convolutional Attention Networks (PGCAN s) for solving the Navier-Stokes equations: encoder-decoder setup (a) The input space is mapped to a structured high-dimensional space parameterized via features $\boldsymbol{F}_0^l \in \mathbb{R}^{n_{rep}\times N_f \times N_v^{l,x} \times N_v^{l,y}}$. Upon convolution and interpolation for a query point, these features are passed to the ensuing NN that implements the projections formulated in \ref{['eq: m4-hk']}. Feature convolution block (b): Trainable features are arranged in grids covering the domain. These features are convolved by a 3$\times$3 kernel and followed by a $\tanh$ activation function. Feature interpolation block (c): The query point is placed in the convolved feature maps. Each of these grid-like maps is diagonally shifted to prevent overfitting. Cosine interpolation is then performed based on the local coordinates of the point in the corresponding unit cells.
  • Figure 4: Directional PSD for analytic signals: Unlike Gaussian and uniform noise, the directional PSD curves of signals with spatially varying frequencies are not flat.
  • Figure 5: Reference solutions and error maps: The first column includes the reference solutions while the rest of them indicate the absolute errors associated with each model. Due to similarity, we only show the error maps of one out of $10$ repetitions.
  • ...and 5 more figures