Table of Contents
Fetching ...

Some Best Practices in Operator Learning

Dustin Enyeart, Guang Lin

TL;DR

This work addresses the practical problem of choosing robust hyperparameters for neural operator learning, rather than comparing architectures. It evaluates activation functions, dropout, stochastic weight averaging, and learning rate finder across three architectures (DeepONets, Fourier neural operators, Koopman autoencoders) on five differential-equation benchmarks. The key findings are that gelu activation consistently outperforms alternatives, dropout should be avoided, SWA improves accuracy when the learning rate is not larger than the original, and learning rate finders are not reliable for operator learning. Together, these results provide actionable guidelines to accelerate hyperparameter searches and improve generalization in neural-operator models, with publicly available code for replication.

Abstract

Hyperparameters searches are computationally expensive. This paper studies some general choices of hyperparameters and training methods specifically for operator learning. It considers the architectures DeepONets, Fourier neural operators and Koopman autoencoders for several differential equations to find robust trends. Some options considered are activation functions, dropout and stochastic weight averaging.

Some Best Practices in Operator Learning

TL;DR

This work addresses the practical problem of choosing robust hyperparameters for neural operator learning, rather than comparing architectures. It evaluates activation functions, dropout, stochastic weight averaging, and learning rate finder across three architectures (DeepONets, Fourier neural operators, Koopman autoencoders) on five differential-equation benchmarks. The key findings are that gelu activation consistently outperforms alternatives, dropout should be avoided, SWA improves accuracy when the learning rate is not larger than the original, and learning rate finders are not reliable for operator learning. Together, these results provide actionable guidelines to accelerate hyperparameter searches and improve generalization in neural-operator models, with publicly available code for replication.

Abstract

Hyperparameters searches are computationally expensive. This paper studies some general choices of hyperparameters and training methods specifically for operator learning. It considers the architectures DeepONets, Fourier neural operators and Koopman autoencoders for several differential equations to find robust trends. Some options considered are activation functions, dropout and stochastic weight averaging.

Paper Structure

This paper contains 16 sections, 11 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: The DeepONet architecture: The input $u$ is the input function, and the input $x$ is the point where the output function is evaluated. Their encodings are denoted by $E_u$ and $E_x$, respectively. The output is denoted by $y$.
  • Figure 2: A Fourier neural operator: The input is denoted by $x$. First, a feed-forward neural network is used to increase the channel dimension. Then, a sequence of spectral convolution layers are applied. Then, a feed-forward neural network is used to decrease the channel dimension. The output is denoted by $y$.
  • Figure 3: A spectral convolution layer: The input is denoted by $x$. On the bottom, a linear layer is applied to each channel. On the top, first the fast Fourier transform is applied. Then, the higher modes are dropped. Then, a linear layer is applied to each channel. Then, the inverse fast Fourier transform is applied. Then, the top and bottom are added elementwise, and an activation function is applied. The output is denoted by $y$.
  • Figure 4: Discretization of the Koopman formulation into a numerical scheme: The physical states at successive time points are denoted by $s_0$, $s_1$, $\dots$, $s_{n-1}$ and $s_n$. The encoded states at successive time points are denoted by $e_0$, $e_1$, $\dots$, $e_{n-1}$ and $e_n$. The function $f$ is the true time evolution of the physical state by the time step. The discretized Koopman operator, encoder and decoder are denoted by $K$, $E$ and $R$, respectively.
  • Figure 5: Activation functions: The upper left is the hyperbolic tangent. The upper right is the rectified linear unit. The lower left is the gaussian error linear unit. The lower right is the exponential linear unit.
  • ...and 4 more figures