Table of Contents
Fetching ...

From Frequency Bias to Spectral Balance: Operator-Aware Preconditioners for PINNs

Roy Y. He, Ying Liang, Hongkai Zhao, Yimin Zhong

TL;DR

A simple operator-aware preconditioning strategy is proposed that rebalances the optimization landscape and the learning dynamics by applying an auxiliary integral operator to the residual and substantially improves both convergency and accuracy.

Abstract

When neural networks (NNs) are used as a type of nonlinear parametric representation to solve partial differential equations (PDEs), they often display frequency-dependent learning dynamics that can differ from those seen in direct function approximation tasks, resulting from a balance between the frequency bias of the NN representation and that of the underlying differential operator. Although many commonly used NNs exhibit a bias towards low-frequency modes in representation, the presence of differential operators in the loss function, which amplifies high-frequency components, can lead to high frequency bias. In this work, using second order elliptic PDEs as an example, we show how these two factors compete and lead to an overall frequency bias in different situations. Once the balance is determined, it is important to design computational strategies to counter the resulting bias to improve training efficiency. We propose a simple operator-aware preconditioning strategy that rebalances the optimization landscape and the learning dynamics by applying an auxiliary integral operator to the residual. The integral kernel can be the Green's function of a reference elliptic operator or an approximation, and integrates easily with common NN solvers for PDEs. Extensive experiments, including multiscale and variable-coefficient problems, show that the approach restores more balanced learning dynamics across modes and substantially improves both convergency and accuracy.

From Frequency Bias to Spectral Balance: Operator-Aware Preconditioners for PINNs

TL;DR

A simple operator-aware preconditioning strategy is proposed that rebalances the optimization landscape and the learning dynamics by applying an auxiliary integral operator to the residual and substantially improves both convergency and accuracy.

Abstract

When neural networks (NNs) are used as a type of nonlinear parametric representation to solve partial differential equations (PDEs), they often display frequency-dependent learning dynamics that can differ from those seen in direct function approximation tasks, resulting from a balance between the frequency bias of the NN representation and that of the underlying differential operator. Although many commonly used NNs exhibit a bias towards low-frequency modes in representation, the presence of differential operators in the loss function, which amplifies high-frequency components, can lead to high frequency bias. In this work, using second order elliptic PDEs as an example, we show how these two factors compete and lead to an overall frequency bias in different situations. Once the balance is determined, it is important to design computational strategies to counter the resulting bias to improve training efficiency. We propose a simple operator-aware preconditioning strategy that rebalances the optimization landscape and the learning dynamics by applying an auxiliary integral operator to the residual. The integral kernel can be the Green's function of a reference elliptic operator or an approximation, and integrates easily with common NN solvers for PDEs. Extensive experiments, including multiscale and variable-coefficient problems, show that the approach restores more balanced learning dynamics across modes and substantially improves both convergency and accuracy.
Paper Structure (10 sections, 2 theorems, 45 equations, 15 figures, 1 table)

This paper contains 10 sections, 2 theorems, 45 equations, 15 figures, 1 table.

Key Result

Lemma 4.1

Let $\Phi_d(r) = \log(r)$ for $d=2$ and $\Phi_d(r) = \sqrt{\frac{2}{\pi}}\frac{1}{r}$ for $d=3$. Define $\mathcal{A}_d(\rho, \ell)$, $\ell\in[0, 1]$, as follows: where $J_{\nu}$ is the Bessel function of the first kind of order $\nu$. Then there is an absolute constant $C_{\mathcal{A}} > 0$ that for all $\rho \ge 0$.

Figures (15)

  • Figure 1: Example 1: Relative $L^2$ error and frequency bias in function approximation using different activation functions, with and without scaling. (a) Relative $L^2$ error during training; (b)-(f) Errors in different modes during training corresponding to activation functions Tanh, Sine, ReLU, scaled Tanh, and scaled Sine, respectively.
  • Figure 2: Example 2: Errors in different modes for a 2-layer (top row), 4-layer (middle row), and 6-layer (bottom row) neural network using $\text{ReLU}$ (left column), Tanh (middle column), and Sine (right column) activation functions.
  • Figure 3: Example 3: Comparison of training performance and selected frequency responses. (a) Neural network outputs at 2,000 epoch for different settings; (b) Approximation using $\text{ReLU}^3$ activation; (c) Approximation using Sine activation; (d) Relative $L^2$ error history; (e) PINN with $\text{ReLU}^3$ activation; (f) PINN with Sine activation.
  • Figure 4: Example 4: Comparison of $L^2$ error and errors in selected frequency modes in training for different models. (a) Training loss for all models; (b) Function Approximation; (c) PINN; (d) Preconditioned PINN.
  • Figure 5: Example 5: Comparison of $L^2$ error and errors in selected frequency modes in training for different models. (a) Training loss for all models; (b) Function approximation; (c) PINN; (d) Preconditioned PINN.
  • ...and 10 more figures

Theorems & Definitions (4)

  • Lemma 4.1
  • proof
  • Theorem 4.2
  • proof