Table of Contents
Fetching ...

Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

Jia-Qi Yang, Lei Shi

TL;DR

The paper develops a rigorous theory for learning regression operators from a Polish input space to a Hilbert-valued output using vector-valued RKHSs induced by operator-valued kernels. By formulating the problem as regularized SGD in infinite-dimensional spaces and translating the nonlinear operator regression into a linear operator regression via a Hilbert–Schmidt map, the authors derive dimension-free, near-optimal convergence rates in both online (decaying $\eta_t$ and $\lambda_t$) and finite-horizon (constant parameters) settings. They provide comprehensive error analyses, including an error decomposition into approximation, initialization, drift, and sampling components, and establish both expectation-based and high-probability bounds, with explicit rates depending on regularity $r$ and capacity $s$. The results advance operator learning with regularization, offering probabilistic guarantees in infinite dimensions and enabling extensions to general kernels, structured prediction, and PCA-based encoder–decoder frameworks, with implications for real-time, discretization-invariant learning of solution operators for parameterized PDEs and related tasks.

Abstract

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. To address the associated ill-posedness, we analyze regularized stochastic gradient descent (SGD) algorithms in both online and finite-horizon settings. The former uses polynomially decaying step sizes and regularization parameters, while the latter adopts fixed values. Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors. The resulting convergence rates are near-optimal in expectation, and we also derive high-probability estimates that imply almost sure convergence. Our analysis introduces a general technique for obtaining high-probability guarantees in infinite-dimensional settings. Possible extensions to broader kernel classes and encoder-decoder structures are briefly discussed.

Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

TL;DR

The paper develops a rigorous theory for learning regression operators from a Polish input space to a Hilbert-valued output using vector-valued RKHSs induced by operator-valued kernels. By formulating the problem as regularized SGD in infinite-dimensional spaces and translating the nonlinear operator regression into a linear operator regression via a Hilbert–Schmidt map, the authors derive dimension-free, near-optimal convergence rates in both online (decaying and ) and finite-horizon (constant parameters) settings. They provide comprehensive error analyses, including an error decomposition into approximation, initialization, drift, and sampling components, and establish both expectation-based and high-probability bounds, with explicit rates depending on regularity and capacity . The results advance operator learning with regularization, offering probabilistic guarantees in infinite dimensions and enabling extensions to general kernels, structured prediction, and PCA-based encoder–decoder frameworks, with implications for real-time, discretization-invariant learning of solution operators for parameterized PDEs and related tasks.

Abstract

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. To address the associated ill-posedness, we analyze regularized stochastic gradient descent (SGD) algorithms in both online and finite-horizon settings. The former uses polynomially decaying step sizes and regularization parameters, while the latter adopts fixed values. Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors. The resulting convergence rates are near-optimal in expectation, and we also derive high-probability estimates that imply almost sure convergence. Our analysis introduces a general technique for obtaining high-probability guarantees in infinite-dimensional settings. Possible extensions to broader kernel classes and encoder-decoder structures are briefly discussed.

Paper Structure

This paper contains 21 sections, 38 theorems, 268 equations, 2 figures.

Key Result

Proposition 2.1

The vector-valued RKHS $\mathcal{H}$, associated with the operator-valued kernel $K(x,x^\prime)=\mathcal{K}(x,x^\prime)W$, where $W$ is a positive operator and $\mathcal{K}$ is a scalar-valued kernel, is isometrically isomorphic to $\mathcal{B}_{\mathrm{HS}}(\mathcal{H}_\mathcal{K},\overline{W^{1/2} and $\|h\|_{\mathcal{H}}=\|H\|_{\mathrm{HS}}$.

Figures (2)

  • Figure 1: Surrogate approach for structured prediction
  • Figure 2: Commutative diagram of PCA encoder-decoder framework

Theorems & Definitions (78)

  • Proposition 2.1
  • Proposition 2.2
  • Theorem 2.3
  • Remark 1
  • Theorem 2.4
  • Theorem 2.5
  • Theorem 2.6
  • Theorem 2.7
  • Corollary 2.8
  • Theorem 2.9
  • ...and 68 more