Optimal Convergence Rates for Neural Operators
Mike Nguyen, Nicole Mücke
TL;DR
The paper tackles operator learning between function spaces by placing two layer neural operators in the neural tangent kernel regime and deriving generalization guarantees for early stopped gradient descent. It develops a vector valued kernel framework the vvNTK, proving that the vvRKHS can approximate neural operator targets and establishing minimax optimal rates under a Hölder source condition and eigenvalue decay; the resulting bounds specify how many hidden units $M$ and second stage samples $n_{\mathcal X}$ are needed to achieve a target accuracy, with a rate $||G_{\theta_T}-G^*||_{L^2_{\mu_u}} = \tilde{O}(T^{-r}+M^{-1/2}+n_{\mathcal X}^{-1/2})$ and a choice of stopping time $T$ yielding the minimax rate $O(n_{\mathcal U}^{-r/(2r+b)})$. The analysis relies on a decomposition into a Taylor approximation error, a random feature type error, and a generalization error, and includes a weight stability result ensuring the network stays near initialization. Empirical validation on the Poisson equation confirms the theoretical scaling, showing that width and sample requirements scale as $\sqrt{n_{\mathcal U}}$ to realize the optimum rates. Overall, the work provides a principled NTK based theory for fast convergent and sample-efficient neural operator learning with practical PDE surrogate applications.
Abstract
We introduce the neural tangent kernel (NTK) regime for two-layer neural operators and analyze their generalization properties. For early-stopped gradient descent (GD), we derive fast convergence rates that are known to be minimax optimal within the framework of non-parametric regression in reproducing kernel Hilbert spaces (RKHS). We provide bounds on the number of hidden neurons and the number of second-stage samples necessary for generalization. To justify our NTK regime, we additionally show that any operator approximable by a neural operator can also be approximated by an operator from the RKHS. A key application of neural operators is learning surrogate maps for the solution operators of partial differential equations (PDEs). We consider the standard Poisson equation to illustrate our theoretical findings with simulations.
