Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

Hao Liu; Zecheng Zhang; Wenjing Liao; Hayden Schaeffer

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

Hao Liu, Zecheng Zhang, Wenjing Liao, Hayden Schaeffer

TL;DR

This paper articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size, and addresses cases where input functions exhibit low-dimensional structures, allowing for tighter error bounds.

Abstract

Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

TL;DR

Abstract

Paper Structure (33 sections, 16 theorems, 152 equations, 2 figures, 1 table)

This paper contains 33 sections, 16 theorems, 152 equations, 2 figures, 1 table.

Introduction
Preliminary
Neural Network
Cover and Partition of Unity
Lipschitz Functional
Clipping Operation
Notation
Problem Setup and Deep Operator Learning
Problem Setup and Examples
Deep Operator Learning
Main Results
Assumptions
DeepONet Approximation Error and Model Scaling Law
Generalization Error and Data Scaling Law
Utilizing low-dimensional structures
...and 18 more sections

Key Result

Lemma 1

Let $\{\Omega_k\}_{k=1}^M$ be an open cover of a compact smooth manifold $\mathcal{M}$ . There exists a $C^{\infty}$ partition of unity $\{\omega_k\}_{k=1}^M$ that subordinates to $\{\Omega_k\}_{k=1}^M$ such that $\mathrm{supp}(\omega_k)\subset \Omega_k$ for any $k$.

Figures (2)

Figure 1: Illustration of the DeepONet architecture. Here $\textbf{u}$ is the discretization of $u\in U$, and $\textbf{y}\in \Omega_V$.
Figure 2: Illustration of the network architecture in Theorem \ref{['thm_functional']}. Here $\textbf{u}$ is the discretization of $u\in U$.

Theorems & Definitions (22)

Definition 1: Cover
Lemma 1: Theorem 13.7(ii) of tu2011manifolds
Definition 2: Lipschitz functional
Example 1
Example 2
Theorem 1
Lemma 2
Corollary 1
Theorem 2
Theorem 3
...and 12 more

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

TL;DR

Abstract

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (22)