Table of Contents
Fetching ...

Understanding the Expressivity and Trainability of Fourier Neural Operator: A Mean-Field Perspective

Takeshi Koshizuka, Masahiro Fujisawa, Yusuke Tanaka, Issei Sato

TL;DR

A mean-field theory for the Fourier Neural Operator is established, and a connection between expressivity and trainability is identified: the ordered and chaotic phases correspond to regions of vanishing and exploding gradients, respectively.

Abstract

In this paper, we explores the expressivity and trainability of the Fourier Neural Operator (FNO). We establish a mean-field theory for the FNO, analyzing the behavior of the random FNO from an edge of chaos perspective. Our investigation into the expressivity of a random FNO involves examining the ordered-chaos phase transition of the network based on the weight distribution. This phase transition demonstrates characteristics unique to the FNO, induced by mode truncation, while also showcasing similarities to those of densely connected networks. Furthermore, we identify a connection between expressivity and trainability: the ordered and chaotic phases correspond to regions of vanishing and exploding gradients, respectively. This finding provides a practical prerequisite for the stable training of the FNO. Our experimental results corroborate our theoretical findings.

Understanding the Expressivity and Trainability of Fourier Neural Operator: A Mean-Field Perspective

TL;DR

A mean-field theory for the Fourier Neural Operator is established, and a connection between expressivity and trainability is identified: the ordered and chaotic phases correspond to regions of vanishing and exploding gradients, respectively.

Abstract

In this paper, we explores the expressivity and trainability of the Fourier Neural Operator (FNO). We establish a mean-field theory for the FNO, analyzing the behavior of the random FNO from an edge of chaos perspective. Our investigation into the expressivity of a random FNO involves examining the ordered-chaos phase transition of the network based on the weight distribution. This phase transition demonstrates characteristics unique to the FNO, induced by mode truncation, while also showcasing similarities to those of densely connected networks. Furthermore, we identify a connection between expressivity and trainability: the ordered and chaotic phases correspond to regions of vanishing and exploding gradients, respectively. This finding provides a practical prerequisite for the stable training of the FNO. Our experimental results corroborate our theoretical findings.
Paper Structure (27 sections, 12 theorems, 85 equations, 14 figures, 4 tables)

This paper contains 27 sections, 12 theorems, 85 equations, 14 figures, 4 tables.

Key Result

Lemma 3.0

For all $d \in [D]$, the covariance $\bm{\Sigma}^{(\ell)} \coloneqq \mathbb{E}_{\Theta^{1:\ell},\Xi^{1:\ell}} \left[ {\mathbf{H}}_{:, d}^{(\ell)} \left.{\mathbf{H}}_{:, d}^{(\ell)}\right.^{\top} \right]$ is obtained recursively by the iterated map $\mathcal{C}$ defined by where the expectation is taken over the pre-activations ${\mathbf{H}}_{:, d} \sim \mathcal{N}(0, \bm{\Sigma}^{(\ell-1)})$, $\t

Figures (14)

  • Figure 1: Average gradient norm $\operatorname{Tr}(\bm{\tilde{\bm{\Sigma}}}^{(\ell)})/D$ during the backpropagation of several FNOs plotted as a function of layer $\ell$. Each line corresponds to the result of different initial values of $\sigma^2$ from $0.5$ to $4.0$ in increments of $0.5$. The x-axis is the layer and the y-axis is the log-scale of the gradient norm. Depending on the value of $\sigma^2$, the gradient norm increases or decreases consistently as the gradient propagates to shallower layers.
  • Figure 2: Training loss of FNOs at last epoch for four distinct PDEs. (a, b): the advection equation, (c, d): the Burgers' equation, (e): Darcy Flow, (f-h): the NS equation. The heatmaps represents the training loss values for varying depth $L \in \{ 4, 8, 16, 32 \}$ and initial weight parameter $\sigma^2 \in \{ 0.1, 0.5, 1.0, 2.0, 3.0, 4.0 \}$, with lighter colors signifying lower training loss. The presented results are the mean training loss at the last epoch over three different seeds.
  • Figure 3: Illustration of ordered-chaos phase transition for the weight initialization parameter $\sigma^2$. In the ordered phase, the spatial hidden representations $\mathbf{H}^{(\ell)}$ on the grid converge to a uniform state during forward propagation and the gradient vanishes during backpropagation. In the chaotic phase, the representations either converge to a distinct state or diverge and the gradient explodes.
  • Figure 4: Ordered-chaos phase transition diagram for the DCN
  • Figure 5: Training Loss Curve. (a): training loss curve of the 32-layer original FNOs with varying initial parameters $\sigma^2 \in \{ 0.1, 0.5, 1.0, 2.0, 3.0, 4.0 \}$, on the NS equation with $\nu=1e\mathrm{-}3$. (b): training loss curve of the original FNOs with an initial parameter $\sigma^2=2.0$ with a varying number of layers $L \in \{4, 8, 16, 32\}$ on the NS equation with $\nu=1e\mathrm{-}5$. (c): training loss curve of the simplified FNOs with ReLU activation and the initial parameter $\sigma^2=2.0$ with varying number of layers $L \in \{4, 8, 16, 32\}$ on the Burgers' equation.
  • ...and 9 more figures

Theorems & Definitions (21)

  • Lemma 3.0: Iterated map
  • Lemma 3.0: Exsistance of fixed points
  • Definition 3.1
  • Theorem 3.2: Exponential expressivity
  • Theorem 3.3: Trainability
  • Theorem A.1: Exponential expressivity
  • proof
  • Lemma A.0: Iterated map
  • proof
  • Lemma A.0: Exsistance of fixed points
  • ...and 11 more