Table of Contents
Fetching ...

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning

Julius Berner, Miguel Liu-Schiaffini, Jean Kossaifi, Valentin Duruisseaux, Boris Bonev, Kamyar Azizzadenesheli, Anima Anandkumar

TL;DR

This work formalizes neural operators as principled, discretization-agnostic mappings between function spaces, enabling learning of operators rather than functions. It provides a concrete recipe to extend popular neural network architectures (MLP, CNN, GNN, Transformer, and encoder-decoder designs) into neural operators via integral transforms with learnable kernels and quadrature weights, ensuring outputs can be queried at arbitrary coordinates. The authors outline design principles for well-posed operator learning, present diverse building-blocks (including pointwise, spectral, and graph-based operators), and offer training strategies that integrate data and physics losses. Empirical results, notably on Navier–Stokes problems, demonstrate cross-resolution generalization and the value of fixed-receptive-field architectures and Fourier-based operators, while also highlighting trade-offs with multi-resolution training and interpolation. Overall, the framework provides a roadmap for practitioners to convert existing architectures into discretization-robust neural operators with practical guidance and open-source tooling.

Abstract

A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in the nature of the data have limited neural networks from achieving a comparable level of success in scientific applications as seen in other fields. Neural operators are a principled way to generalize neural networks to mappings between function spaces, offering a pathway to replicate deep learning's transformative impact on scientific problems. For instance, neural operators can learn solution operators for entire classes of PDEs, e.g., physical systems with different boundary conditions, coefficient functions, and geometries. A key factor in deep learning's success has been the careful engineering of neural architectures through extensive empirical testing. Translating these neural architectures into neural operators allows operator learning to enjoy these same empirical optimizations. However, prior neural operator architectures have often been introduced as standalone models, not directly derived as extensions of existing neural network architectures. In this paper, we identify and distill the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces. Using these principles, we propose a recipe for converting several popular neural architectures into neural operators with minimal modifications. This paper aims to guide practitioners through this process and details the steps to make neural operators work in practice. Our code can be found at https://github.com/neuraloperator/NNs-to-NOs

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning

TL;DR

This work formalizes neural operators as principled, discretization-agnostic mappings between function spaces, enabling learning of operators rather than functions. It provides a concrete recipe to extend popular neural network architectures (MLP, CNN, GNN, Transformer, and encoder-decoder designs) into neural operators via integral transforms with learnable kernels and quadrature weights, ensuring outputs can be queried at arbitrary coordinates. The authors outline design principles for well-posed operator learning, present diverse building-blocks (including pointwise, spectral, and graph-based operators), and offer training strategies that integrate data and physics losses. Empirical results, notably on Navier–Stokes problems, demonstrate cross-resolution generalization and the value of fixed-receptive-field architectures and Fourier-based operators, while also highlighting trade-offs with multi-resolution training and interpolation. Overall, the framework provides a roadmap for practitioners to convert existing architectures into discretization-robust neural operators with practical guidance and open-source tooling.

Abstract

A wide range of scientific problems, such as those described by continuous-time dynamical systems and partial differential equations (PDEs), are naturally formulated on function spaces. While function spaces are typically infinite-dimensional, deep learning has predominantly advanced through applications in computer vision and natural language processing that focus on mappings between finite-dimensional spaces. Such fundamental disparities in the nature of the data have limited neural networks from achieving a comparable level of success in scientific applications as seen in other fields. Neural operators are a principled way to generalize neural networks to mappings between function spaces, offering a pathway to replicate deep learning's transformative impact on scientific problems. For instance, neural operators can learn solution operators for entire classes of PDEs, e.g., physical systems with different boundary conditions, coefficient functions, and geometries. A key factor in deep learning's success has been the careful engineering of neural architectures through extensive empirical testing. Translating these neural architectures into neural operators allows operator learning to enjoy these same empirical optimizations. However, prior neural operator architectures have often been introduced as standalone models, not directly derived as extensions of existing neural network architectures. In this paper, we identify and distill the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces. Using these principles, we propose a recipe for converting several popular neural architectures into neural operators with minimal modifications. This paper aims to guide practitioners through this process and details the steps to make neural operators work in practice. Our code can be found at https://github.com/neuraloperator/NNs-to-NOs

Paper Structure

This paper contains 67 sections, 44 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of a neural operator. The input is a function $f\in\mathcal{F}$ that can be given at different discretizations $(x_i)_{i=1}^n$. The output is a function $g\in \mathcal{G}$ that can be queried at different points $(y_j)_{j=1}^m$.
  • Figure 2: Pipeline of converting neural networks to neural operators. Graph neural network (GNN) and convolutional layers can be converted into well-posed neural operator layers through a sequence of simple modifications. GNO refers to the graph neural operator li2020neural, "Spec. conv." refers to a spectral convolution as used in Fourier neural operators (FNOs) li2020fourier, and Local FNO refers to a FNO supplemented with local integral kernels liu2024neural. We denote by UNO the U-shaped neural operator rahman2022u and by SFNO the spherical Fourier neural operator bonev2023spherical. The overall strategy to convert neural networks into neural operators is outlined in \ref{['sec: summary conversion']}.
  • Figure 3: Visualizing the need for quadrature weights. Aggregating function values at irregularly-spaced points without proper quadrature weights (e.g., taking the mean, i.e., $\Delta=1/n$, as in the top figure) adds more weight to densely sampled areas. When increasing the resolution, the output depends on the chosen refinement of the discretization and does not have a unique limit (top). The use of quadrature weights leads to consistent approximations for different discretizations that converge to a unique integral as the discretization is refined (bottom).
  • Figure 4: Illustration of collapsing receptive fields with a nearest neighbors strategy. The figure shows the values of the input function $f$ (blue) which influence the output function $g$ at a point $y$ when using a nearest neighbors strategy (e.g. as in convolutional and graph neural networks) (top) and with a fixed receptive field (e.g. as in convolutional and graph neural operators) (bottom). If the neighborhood is selected using a nearest neighbors strategy, the receptive field (red) collapses when the discretization is refined (from left to right).
  • Figure 5: Relative $L^2$-errors (on unseen test data) when training different methods on the Navier-Stokes equations (see \ref{['sec:experiment_details']} for details). Although FNO and OFormer are only trained on resolution $128$, they achieve approximately the same error for higher and lower resolutions. On the other hand, the U-Net only performs well on the training resolution $128$. If we train on mixed resolutions $\{64,128,256\}$ (using each of the $10,\!000$ data points three times with different resolutions in every epoch), it performs well for these resolutions---however, performance still degrades for higher resolutions. Interpolating the convolutional kernels of the U-Net (in both single and mixed-resolution training) also improves generalization across resolutions compared to the baseline U-Net. ViT is included as another common baseline in computer vision, and we observe similar results to those of the U-Net. We optimized the hyperparameters for each method and trained until convergence (see \ref{['sec:experiment_details']}). Due to memory constraints, the OFormer results could not be computed for the highest resolution.
  • ...and 1 more figures