Table of Contents
Fetching ...

Multi-scale Deep Neural Networks for Solving High Dimensional PDEs

Wei Cai, Zhi-Qin John Xu

TL;DR

The paper introduces MscaleDNN, a multi-scale neural network that uses radial scaling in Fourier space and compactly supported activations to efficiently learn high-frequency, high-dimensional functions and solve PDEs. It provides two PDE solution frameworks, Ritz variational energy and least-squares residual losses, and demonstrates substantial speedups and accuracy gains across 3D to 25D problems and high-frequency function fitting. The results suggest that multi-scale, wavelet-inspired architectures can mitigate the frequency-principle limitations of standard DNNs, enabling practical solutions to complex, high-dimensional PDEs. The work also outlines future directions toward wavelet-DNN hybrids to further enhance resolution and efficiency.

Abstract

In this paper, we propose the idea of radial scaling in frequency domain and activation functions with compact support to produce a multi-scale DNN (MscaleDNN), which will have the multi-scale capability in approximating high frequency and high dimensional functions and speeding up the solution of high dimensional PDEs. Numerical results on high dimensional function fitting and solutions of high dimensional PDEs, using loss functions with either Ritz energy or least squared PDE residuals, have validated the increased power of multi-scale resolution and high frequency capturing of the proposed MscaleDNN.

Multi-scale Deep Neural Networks for Solving High Dimensional PDEs

TL;DR

The paper introduces MscaleDNN, a multi-scale neural network that uses radial scaling in Fourier space and compactly supported activations to efficiently learn high-frequency, high-dimensional functions and solve PDEs. It provides two PDE solution frameworks, Ritz variational energy and least-squares residual losses, and demonstrates substantial speedups and accuracy gains across 3D to 25D problems and high-frequency function fitting. The results suggest that multi-scale, wavelet-inspired architectures can mitigate the frequency-principle limitations of standard DNNs, enabling practical solutions to complex, high-dimensional PDEs. The work also outlines future directions toward wavelet-DNN hybrids to further enhance resolution and efficiency.

Abstract

In this paper, we propose the idea of radial scaling in frequency domain and activation functions with compact support to produce a multi-scale DNN (MscaleDNN), which will have the multi-scale capability in approximating high frequency and high dimensional functions and speeding up the solution of high dimensional PDEs. Numerical results on high dimensional function fitting and solutions of high dimensional PDEs, using loss functions with either Ritz energy or least squared PDE residuals, have validated the increased power of multi-scale resolution and high frequency capturing of the proposed MscaleDNN.

Paper Structure

This paper contains 16 sections, 40 equations, 14 figures.

Figures (14)

  • Figure 1:
  • Figure 2: Illustration of a MscaleDNN.
  • Figure 3: Loss function vs. training epoch. We use a network 3-2500-1 with activation function $\mathrm{ReLU}(x)$ or $\mathrm{sReLU}(x)$ indicated by the legend. The learning rate is $5\times10^{-5}$ with a decay rate $2\times10^{-7}$ for each full-batch training step. The training and test dataset are both $10000$ random samples. Weights are initialized by ${\cal D}_{1}$.
  • Figure 4: Loss function vs. training epoch. We use a network 3-500-500-500-500-1 with activation function $\mathrm{ReLU}(x)$ or $\mathrm{sReLU}(x)$ indicated by the legend. The learning rate is $3\times10^{-6}$ with a decay rate $5\times10^{-7}$ for each training step with batch size 1000. The training and test dataset are both $5000$ random samples. Weights are initialized by ${\cal D}_{2}$.
  • Figure 5: Loss function vs. training epoch. We use a network 60-200-200-200-1 with activation function $\mathrm{sReLU}(x)$ or $\mathrm{ReLU}(x)$. The learning rate is $5\times10^{-5}$ with a decay rate $2\times10^{-7}$ for each training step with batch size 100. The training and test dataset are both $10000$ random samples. Weights are initialized by ${\cal D}_{1}$.
  • ...and 9 more figures