Table of Contents
Fetching ...

Loss Jump During Loss Switch in Solving PDEs with Neural Networks

Zhiwei Wang, Lulu Zhang, Zhongwang Zhang, Zhi-Qin John Xu

TL;DR

Loss Jump During Loss Switch in Solving PDEs with Neural Networks investigates the effect of loss function design on training dynamics for neural PDE solvers. A stable loss-jump is observed when switching from data loss to higher-order derivative model loss, explained via frequency-domain analysis and a multi-stage descent. Across Burgers, heat, diffusion, and wave equations, the study shows that model loss changes spectral priorities, often weakening low-frequency constraint and occasionally favoring higher frequencies. The findings motivate frequency-aware training strategies to improve robustness and convergence of neural PDE solvers.

Abstract

Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation data loss directly constrains and measures the model output, while other loss functions indirectly model the performance of the network, which can be classified as model loss. However, this alternative approach lacks a thorough understanding of its underlying mechanisms, including theoretical foundations and rigorous characterization of various phenomena. This work focuses on investigating how different loss functions impact the training of neural networks for solving PDEs. We discover a stable loss-jump phenomenon: when switching the loss function from the data loss to the model loss, which includes different orders of derivative information, the neural network solution significantly deviates from the exact solution immediately. Further experiments reveal that this phenomenon arises from the different frequency preferences of neural networks under different loss functions. We theoretically analyze the frequency preference of neural networks under model loss. This loss-jump phenomenon provides a valuable perspective for examining the underlying mechanisms of neural networks in solving PDEs.

Loss Jump During Loss Switch in Solving PDEs with Neural Networks

TL;DR

Loss Jump During Loss Switch in Solving PDEs with Neural Networks investigates the effect of loss function design on training dynamics for neural PDE solvers. A stable loss-jump is observed when switching from data loss to higher-order derivative model loss, explained via frequency-domain analysis and a multi-stage descent. Across Burgers, heat, diffusion, and wave equations, the study shows that model loss changes spectral priorities, often weakening low-frequency constraint and occasionally favoring higher frequencies. The findings motivate frequency-aware training strategies to improve robustness and convergence of neural PDE solvers.

Abstract

Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation data loss directly constrains and measures the model output, while other loss functions indirectly model the performance of the network, which can be classified as model loss. However, this alternative approach lacks a thorough understanding of its underlying mechanisms, including theoretical foundations and rigorous characterization of various phenomena. This work focuses on investigating how different loss functions impact the training of neural networks for solving PDEs. We discover a stable loss-jump phenomenon: when switching the loss function from the data loss to the model loss, which includes different orders of derivative information, the neural network solution significantly deviates from the exact solution immediately. Further experiments reveal that this phenomenon arises from the different frequency preferences of neural networks under different loss functions. We theoretically analyze the frequency preference of neural networks under model loss. This loss-jump phenomenon provides a valuable perspective for examining the underlying mechanisms of neural networks in solving PDEs.
Paper Structure (12 sections, 3 theorems, 55 equations, 7 figures)

This paper contains 12 sections, 3 theorems, 55 equations, 7 figures.

Key Result

Theorem 1

The dynamics have the following expression in the frequency domain for all $\phi\in\bm{S}(\mathbb{R}^d)$: Where $v_\rho(x) = v(x)\rho(x) = v(x)\rho_1(x)+v(x)\rho_2(x)$ with empirical density $\rho_j(x)=\sum_{i\in S_j} \delta(x-x_i)$ and

Figures (7)

  • Figure 1: Training process under different learning rate with tanh (left) and ReLU (right) activation function. The gray line indicates pre-training using the data loss function. The asterisk points the error when switching loss. The colored lines are different learning rates used.
  • Figure 2: Burgers equation training process. The second and third rows are the changes of the network prediction value with the training process at time $t=0$ and time $t=1$ respectively after switching the loss.
  • Figure 3: Heat equation training process. Upper left: data loss and model loss as the training progresses. Upper right: error heatmaps at 50,000 and 100,000 epochs. Bottom: Frequency error at different training epochs. The 4 sub-figures are the results from different training stage.
  • Figure 4: Diffusion equation (left) and wave equation (right) training process. Top: data loss and model loss as the training progresses. Middle: Heatmap of the analytical solution and the DNN-predicted solution. Bottom: Absolute error between analytical solution and DNN prediction.
  • Figure 5: Frequency error of diffusion equation (top) and wave equation (bottom).
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1: Dynamics for NN with model loss
  • Lemma 1: Dynamics for $v"$
  • proof
  • Lemma 2: Dynamics for $v$