Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

Lei Shi; Jia-Qi Yang

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

Lei Shi, Jia-Qi Yang

TL;DR

This work analyzes learning operators between general Hilbert spaces via stochastic gradient descent. It formulates a mean-squared loss in the Hilbert-Schmidt space and derives nonasymptotic upper and minimax lower bounds for prediction and estimation errors under weak and strong regularity of the target operator, linking convergence to the spectral decay of the input covariance $L_C$. The theory shows SGD converges to the best linear approximation of potentially nonlinear target operators, with rates depending on regularity, spectral decay, and step-size regime; strong regularity and vector-/scalar-RKHS settings are explicitly treated. It further extends to nonlinear operator learning, biased models, and RKHS frameworks, including vector-valued and scalar cases, and functional regression, offering a dimension-free perspective on the tractability of operator learning in infinite dimensions. Overall, the results provide rigorous insight into when and how SGD can efficiently learn operators in high- or infinite-dimensional spaces and quantify the fundamental limits via minimax rates.

Abstract

This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and complexity. Under these conditions, we establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis, further illustrating that our convergence analysis and regularity conditions quantitatively characterize the tractability of solving operator learning problems using the SGD algorithm. It is crucial to highlight that our convergence analysis is still valid for nonlinear operator learning. We show that the SGD estimator will converge to the best linear approximation of the nonlinear target operator. Moreover, applying our analysis to operator learning problems based on vector-valued and real-valued reproducing kernel Hilbert spaces yields new convergence results, thereby refining the conclusions of existing literature.

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

TL;DR

. The theory shows SGD converges to the best linear approximation of potentially nonlinear target operators, with rates depending on regularity, spectral decay, and step-size regime; strong regularity and vector-/scalar-RKHS settings are explicitly treated. It further extends to nonlinear operator learning, biased models, and RKHS frameworks, including vector-valued and scalar cases, and functional regression, offering a dimension-free perspective on the tractability of operator learning in infinite dimensions. Overall, the results provide rigorous insight into when and how SGD can efficiently learn operators in high- or infinite-dimensional spaces and quantify the fundamental limits via minimax rates.

Abstract

Paper Structure (30 sections, 29 theorems, 237 equations)

This paper contains 30 sections, 29 theorems, 237 equations.

Introduction
Main Results
Regularity Assumptions
Upper Bounds on Convergence Rates
Minimax Lower Bounds
Related Work, Extension and Applications
Non-linear Operator Learning with SGD
An Extension of Model \ref{['linear']}
Application to Learning with Vector-valued RKHS
Application to Learning with Scalar-valued RKHS
Discussion and Future Work
Error Decomposition and Basic Estimates
Convergence Analysis of Upper Bounds under Weak Regularity Condition
Convergence Analysis of Upper Bounds under Strong Regularity Condition
Convergence Analysis of Lower Bounds
...and 15 more sections

Key Result

Proposition 2.1

Assumption a3 is equivalent to the statement: There exists a constant $c>0$ such that for any $f\in\mathcal{H}_{1}$, Furthermore, Assumption a3 is satisfied if $x$ is strictly sub-Gaussian in $\mathcal{H}_1$.

Theorems & Definitions (55)

Proposition 2.1
Theorem 2.2
Theorem 2.3
Theorem 2.4
Theorem 2.5
Theorem 2.6
Theorem 2.7
Theorem 2.8
Theorem 2.9
Theorem 3.1
...and 45 more

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

TL;DR

Abstract

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (55)