Discretization Error of Fourier Neural Operators
Samuel Lanthaler, Andrew M. Stuart, Margaret Trautner
TL;DR
This work quantifies the discretization error arising from grid-based implementations of Fourier Neural Operators, providing an $O(N^{-s})$ convergence rate that depends on input Sobolev regularity and persists through network layers. It decomposes total error into discretization and model discrepancy, showing that the discretization component can be tightly bounded and analyzed alongside the continuum FNO. The authors validate the theory with extensive numerical experiments, compare smooth versus non-smooth activations, and demonstrate that periodic positional encodings and smooth activations preserve regularity, improving convergence. An adaptive subsampling strategy is proposed to accelerate training by exploiting the discretization-model error decomposition. Overall, the paper offers both theoretical and practical guidance for efficiently training FNOs on discretized grids in PDE-related applications.
Abstract
Operator learning is a variant of machine learning that is designed to approximate maps between function spaces from data. The Fourier Neural Operator (FNO) is one of the main model architectures used for operator learning. The FNO combines linear and nonlinear operations in physical space with linear operations in Fourier space, leading to a parameterized map acting between function spaces. Although in definition, FNOs are objects in continuous space and perform convolutions on a continuum, their implementation is a discretized object performing computations on a grid, allowing efficient implementation via the FFT. Thus, there is a discretization error between the continuum FNO definition and the discretized object used in practice that is separate from other previously analyzed sources of model error. We examine this discretization error here and obtain algebraic rates of convergence in terms of the grid resolution as a function of the input regularity. Numerical experiments that validate the theory and describe model stability are performed. In addition, an algorithm is presented that leverages the discretization error and model error decomposition to optimize computational training time.
