Table of Contents
Fetching ...

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

Lei Shi, Jia-Qi Yang

TL;DR

This work analyzes learning operators between general Hilbert spaces via stochastic gradient descent. It formulates a mean-squared loss in the Hilbert-Schmidt space and derives nonasymptotic upper and minimax lower bounds for prediction and estimation errors under weak and strong regularity of the target operator, linking convergence to the spectral decay of the input covariance $L_C$. The theory shows SGD converges to the best linear approximation of potentially nonlinear target operators, with rates depending on regularity, spectral decay, and step-size regime; strong regularity and vector-/scalar-RKHS settings are explicitly treated. It further extends to nonlinear operator learning, biased models, and RKHS frameworks, including vector-valued and scalar cases, and functional regression, offering a dimension-free perspective on the tractability of operator learning in infinite dimensions. Overall, the results provide rigorous insight into when and how SGD can efficiently learn operators in high- or infinite-dimensional spaces and quantify the fundamental limits via minimax rates.

Abstract

This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and complexity. Under these conditions, we establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis, further illustrating that our convergence analysis and regularity conditions quantitatively characterize the tractability of solving operator learning problems using the SGD algorithm. It is crucial to highlight that our convergence analysis is still valid for nonlinear operator learning. We show that the SGD estimator will converge to the best linear approximation of the nonlinear target operator. Moreover, applying our analysis to operator learning problems based on vector-valued and real-valued reproducing kernel Hilbert spaces yields new convergence results, thereby refining the conclusions of existing literature.

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

TL;DR

This work analyzes learning operators between general Hilbert spaces via stochastic gradient descent. It formulates a mean-squared loss in the Hilbert-Schmidt space and derives nonasymptotic upper and minimax lower bounds for prediction and estimation errors under weak and strong regularity of the target operator, linking convergence to the spectral decay of the input covariance . The theory shows SGD converges to the best linear approximation of potentially nonlinear target operators, with rates depending on regularity, spectral decay, and step-size regime; strong regularity and vector-/scalar-RKHS settings are explicitly treated. It further extends to nonlinear operator learning, biased models, and RKHS frameworks, including vector-valued and scalar cases, and functional regression, offering a dimension-free perspective on the tractability of operator learning in infinite dimensions. Overall, the results provide rigorous insight into when and how SGD can efficiently learn operators in high- or infinite-dimensional spaces and quantify the fundamental limits via minimax rates.

Abstract

This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and complexity. Under these conditions, we establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis, further illustrating that our convergence analysis and regularity conditions quantitatively characterize the tractability of solving operator learning problems using the SGD algorithm. It is crucial to highlight that our convergence analysis is still valid for nonlinear operator learning. We show that the SGD estimator will converge to the best linear approximation of the nonlinear target operator. Moreover, applying our analysis to operator learning problems based on vector-valued and real-valued reproducing kernel Hilbert spaces yields new convergence results, thereby refining the conclusions of existing literature.
Paper Structure (30 sections, 29 theorems, 237 equations)

This paper contains 30 sections, 29 theorems, 237 equations.

Key Result

Proposition 2.1

Assumption a3 is equivalent to the statement: There exists a constant $c>0$ such that for any $f\in\mathcal{H}_{1}$, Furthermore, Assumption a3 is satisfied if $x$ is strictly sub-Gaussian in $\mathcal{H}_1$.

Theorems & Definitions (55)

  • Proposition 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Theorem 2.6
  • Theorem 2.7
  • Theorem 2.8
  • Theorem 2.9
  • Theorem 3.1
  • ...and 45 more