Some Best Practices in Operator Learning
Dustin Enyeart, Guang Lin
TL;DR
This work addresses the practical problem of choosing robust hyperparameters for neural operator learning, rather than comparing architectures. It evaluates activation functions, dropout, stochastic weight averaging, and learning rate finder across three architectures (DeepONets, Fourier neural operators, Koopman autoencoders) on five differential-equation benchmarks. The key findings are that gelu activation consistently outperforms alternatives, dropout should be avoided, SWA improves accuracy when the learning rate is not larger than the original, and learning rate finders are not reliable for operator learning. Together, these results provide actionable guidelines to accelerate hyperparameter searches and improve generalization in neural-operator models, with publicly available code for replication.
Abstract
Hyperparameters searches are computationally expensive. This paper studies some general choices of hyperparameters and training methods specifically for operator learning. It considers the architectures DeepONets, Fourier neural operators and Koopman autoencoders for several differential equations to find robust trends. Some options considered are activation functions, dropout and stochastic weight averaging.
