RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

Gianluca Fabiani; Ioannis G. Kevrekidis; Constantinos Siettos; Athanasios N. Yannacopoulos

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

Gianluca Fabiani, Ioannis G. Kevrekidis, Constantinos Siettos, Athanasios N. Yannacopoulos

TL;DR

The universal approximation accuracy of RandONets is proved for approximating nonlinear operators and their efficiency in approximating linear nonlinear evolution operators (right-hand-sides (RHS)) with a focus on PDEs is demonstrated.

Abstract

Deep Operator Networks (DeepOnets) have revolutionized the domain of scientific machine learning for the solution of the inverse problem for dynamical systems. However, their implementation necessitates optimizing a high-dimensional space of parameters and hyperparameters. This fact, along with the requirement of substantial computational resources, poses a barrier to achieving high numerical accuracy. Here, inpsired by DeepONets and to address the above challenges, we present Random Projection-based Operator Networks (RandONets): shallow networks with random projections that learn linear and nonlinear operators. The implementation of RandONets involves: (a) incorporating random bases, thus enabling the use of shallow neural networks with a single hidden layer, where the only unknowns are the output weights of the network's weighted inner product; this reduces dramatically the dimensionality of the parameter space; and, based on this, (b) using established least-squares solvers (e.g., Tikhonov regularization and preconditioned QR decomposition) that offer superior numerical approximation properties compared to other optimization techniques used in deep-learning. In this work, we prove the universal approximation accuracy of RandONets for approximating nonlinear operators and demonstrate their efficiency in approximating linear nonlinear evolution operators (right-hand-sides (RHS)) with a focus on PDEs. We show, that for this particular task, RandONets outperform, both in terms of numerical approximation accuracy and computational cost, the ``vanilla" DeepOnets.

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

TL;DR

Abstract

Paper Structure (23 sections, 7 theorems, 61 equations, 6 figures, 5 tables)

This paper contains 23 sections, 7 theorems, 61 equations, 6 figures, 5 tables.

Introduction
Description of the problem
Methods
Preliminaries on DeepOnets
Preliminaries on Random Projection Neural Networks
Random Projection-based Operator Networks (RandONets)
RandONets as universal approximators of nonlinear operators
Implementation of RandONets
RandONets for aligned data.
RandONets for unaligned data.
Numerical implementation of the training of RandONets.
Numerical Results
Metrics.
Remark on the DeepOnet architectures used.
Remark on the hardware and software used.
...and 8 more sections

Key Result

Theorem 3.1

Suppose $K$ is a compact set in $\mathbb{R}^d$, $U$ is a compact set in $\mathsf{C}(K)$ and $\psi$ is a Tauber-Wiener function, then $\forall$$f \in \mathsf{U}$ and any $\epsilon > 0$, there exist scaling factors $\{\xi_i\}_{i=1}^{N}$ and shifts $\{\theta_i\}_{i=1}^{N}$ both independent of $f$, and Moreover, the coefficient $w_i[f]$ are continuous functionals on $\mathsf{U}$.

Figures (6)

Figure 1: Schematic of the Random Projection-based Operator Network (RandOnet). The RandOnet first discretizes the input function ($u$) over a fixed grid of spatial points. Then it separately embeds the space of the spatial locations ($\bm{y}$) into a random hidden layer (e.g., with sigmoidal activations functions) and the space of the discretized functions into low-distortion kernel-embedding (e.g., with Johnson-Lindenstrauss random projections johnson1984extensions or Rahimi and Recht Random Fourier Features rahimi2007random). Finally, the output is composed of a weighted ($W$) inner product of the branch ($B$) and trunk ($T$) features. The training can be performed through linear least-squares techniques (e.g., Tikhonov regularization, SVD and QR decomposition).
Figure 2: Case study 1: Antiderivative Operator, in Eq. \ref{['eq:antiderivative']}. (First row) extensive-data case, $800$ training input functions; (Second row) limited-data case, $150$ training input functions. (a), (d) MSE for the training and test sets, with the DeepOnet with $2$ hidden layers (indicatively) with $40$ neurons each, for both the branch and trunk networks. (b), (c), (e), (f) MSE and $L^2$ error, $5\%-95\%$ range and median, of the RandONets, for different size $M$ of the branch embedding. The errors are computed w.r.t only the output functions in the test dataset. Comparison of Johnson-Lindenstrauss (JL) branch embedding with random Fourier features (RFFN) embeddings. We set the size of the Trunk network to $N=200$ and the grid of input points to $m=100$. Numerical approximation accuracy vs. (b)-(e) the number of neurons $M$ in the hidden layer of the branch network; and (c)-(f) vs. computational times in seconds.
Figure 3: Case study 2: Pendulum with external force, in Eq. \ref{['eq:pendulum']}. (First row) extensive-data case, $2400$ training input functions; (Second row) limited-data case, $450$ training input functions. (a), (d) Convergence of training and test set MSE of the DeepOnet with two hidden layers (indicatively) with $40$ neurons each, for both the branch and trunk networks. (b), (c), (e), (f) MSE and $L^2$ error percentiles (median, $5\%-95\%$), of the RandONets, for different size $M$ of the branch embedding. The errors are computed w.r.t. only the output functions in the test dataset. Comparison of Johnson-Lindenstrauss (JL) branch embedding, in Eq. \ref{['eq:JL_embedding']}, with Random Fourier Feature Networks (RFFN) embeddings, in Eq. \ref{['eq:RFFN_embedding']}. We set the size of the Trunk network to $N=200$ and the grid of input points to $m=100$. Numerical approximation accuracy vs. (b)-(e) number of neurons $M$ in the hidden layer of the branch network; and (c)-(f) vs. computational time in seconds.
Figure 4: Case study 3: 1D Diffusion-advection-reaction linear PDE in Eq. \ref{['eq:LinearPDE']}. We use $1600$ training input functions; (a) Convergence of training and test MSE of the DeepOnet with two hidden layers (indicatively) with $40$ neurons each, for both branch and trunk networks. (b), (c), MSE and $L^2$ error percentiles (median, $5\%-95\%$), of the RandONets, for different size $M$ of the branch embedding. The errors are computed w.r.t. only the output functions in the test dataset. Comparison of Johnson-Lindenstrauss (JL) random features, in Eq. \ref{['eq:JL_embedding']}, with random Fourier features (RFFN) embeddings, in Eq. \ref{['eq:RFFN_embedding']}. We set the size of the trunk network to $N=200$ and the grid of input points to $m=100$. Numerical approximation accuracy vs. (b) number of neurons $M$ in the hidden layer of the branch network; and (c) computational time in seconds.
Figure 5: Case study 4: 1D nonlinear Burgers' PDE (Eq. \ref{['eq:burgers']}). We used $1600$ training input functions: (a) MSE when using a vanilla DeepOnet with $2$ hidden layers with (indicatively) $40$ neurons each, for both branch and trunk networks. (b), (c), MSE and $L^2$ error percentiles (median, $5\%-95\%$), of the RandONets for different size $M$ of the branch embedding. Comparison of Johnson-Lindenstrauss random embeddings, as in Eq. \ref{['eq:JL_embedding']}, with random Fourier features (RFFN) embeddings, as in Eq. \ref{['eq:RFFN_embedding']}. We have set the size of the trunk network to $N=200$ and the grid of input points to $m=100$. Numerical approximation accuracy vs. (b) number of neurons $M$ in the hidden layer of the branch network; and (c) vs. computational time in seconds.
...and 1 more figures

Theorems & Definitions (11)

Definition 3.1: Tauber-Wiener function chen1995universal
Theorem 3.1: Universal approximation for functionschen1995universal
Theorem 3.2: Universal approximation for operators chen1995universal
Theorem 3.3: Low-distortion of kernel-embedding rahimi2007random
Theorem 3.4
Proposition 1
proof
Proposition 2: Random Projection Neural Networks (RPNNs) for functionals
proof
Theorem 3.5: RandONet universal approximation for Operators
...and 1 more

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

TL;DR

Abstract

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (11)