Convergence analysis of online algorithms for vector-valued kernel regression

Michael Griebel; Peter Oswald

Convergence analysis of online algorithms for vector-valued kernel regression

Michael Griebel, Peter Oswald

TL;DR

This work develops a convergence theory for online, regularized vector-valued kernel regression in an RKHS framework. By formulating the problem through a feature-map-driven RKHS H and a covariance-like operator P_ρ on a smoothness scale V_{P_ρ}^s, it proves an order-optimal decay rate in expectation for the RKHS error: E(||u−u^{(m)}||_V^2)≤C^2(m+1)^{-s/(2+s)} for 0<s≤1 under mild, verifiable conditions. The analysis leverages a Schwarz-iteration-inspired update and elementary Hilbert-space techniques, with the rate reflecting both the regression function smoothness and the noise level; a divergence result shows the necessity of key assumptions, and a special-cons case provides explicit, near-optimal rates in a diagonal coefficient-learning setting. These results generalize prior scalar-valued analyses to the vector-valued case without strong spectral or probabilistic prerequisites, offering a principled foundation for online multitask and functional learning. The study also clarifies limitations regarding L^2_ρ convergence and highlights practical parameter regimes (e.g., t≈2/3) that optimize decay.

Abstract

We consider the problem of approximating the regression function $f_μ:\, Ω\to Y$ from noisy $μ$-distributed vector-valued data $(ω_m,y_m)\inΩ\times Y$ by an online learning algorithm using a reproducing kernel Hilbert space $H$ (RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one via a random process and are successively processed to build approximations to the regression function. Assuming that the regression function essentially belongs to $H$ (soft learning scenario), we provide estimates for the expected squared error in the RKHS norm of the approximations $f^{(m)}\in H$ obtained by a standard regularized online approximation algorithm. In particular, we show an order-optimal estimate $$ \mathbb{E}(\|ε^{(m)}\|_H^2)\le C (m+1)^{-s/(2+s)},\qquad m=1,2,\ldots, $$ where $ε^{(m)}$ denotes the error term after $m$ processed data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption on the regression function, and the constant $C$ depends on the variance of the input noise, the smoothness of the regression function, and other parameters of the algorithm. The proof, which is inspired by results on Schwarz iterative methods in the noiseless case, uses only elementary Hilbert space techniques and minimal assumptions on the noise, the feature map that defines $H$ and the associated covariance operator.

Convergence analysis of online algorithms for vector-valued kernel regression

TL;DR

Abstract

We consider the problem of approximating the regression function

from noisy

-distributed vector-valued data

by an online learning algorithm using a reproducing kernel Hilbert space

(RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one via a random process and are successively processed to build approximations to the regression function. Assuming that the regression function essentially belongs to

(soft learning scenario), we provide estimates for the expected squared error in the RKHS norm of the approximations

obtained by a standard regularized online approximation algorithm. In particular, we show an order-optimal estimate

where

denotes the error term after

processed data, the parameter

expresses an additional smoothness assumption on the regression function, and the constant

depends on the variance of the input noise, the smoothness of the regression function, and other parameters of the algorithm. The proof, which is inspired by results on Schwarz iterative methods in the noiseless case, uses only elementary Hilbert space techniques and minimal assumptions on the noise, the feature map that defines

and the associated covariance operator.

Paper Structure (9 sections, 3 theorems, 139 equations)

This paper contains 9 sections, 3 theorems, 139 equations.

Introduction
Setting and main result
Examples and results related to Theorem \ref{['theo1']}
Proof of Theorem \ref{['theo1']}
Further remarks
Comments on Theorem \ref{['theo1']}
Difficulties with convergence in $L^2_\rho(\Omega,Y)$
A divergence result
A special case

Key Result

theorem 1

Let $Y,V$ be separable Hilbert spaces, $\Omega$ be a compact metric space, $\mu$ be a Borel probability measure on $\Omega\times Y$, and $\rho$ be the marginal Borel probability measure on $\Omega$ induced by $\mu$. Assume that For the feature map $\mathbf{R}=\{R_\omega\}_{\omega\in\Omega}$, we require uniform boundedness (AssR2) and measurability. We also assume that the operator $P_\rho=\mathbb

Theorems & Definitions (3)

theorem 1
theorem 2
theorem 3

Convergence analysis of online algorithms for vector-valued kernel regression

TL;DR

Abstract

Convergence analysis of online algorithms for vector-valued kernel regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (3)