Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

Xiwei Zhang; Yan Chen; Tao Li

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

Xiwei Zhang, Yan Chen, Tao Li

TL;DR

This work addresses online statistical learning in RKHS under non-stationary data by introducing a random Tikhonov regularization path and recasting the problem as solving a random ill-posed inverse problem with time-varying forward operators $T_k$. It develops a two-term error decomposition (multiplicative noise and regularization-path drift) and proves mean-square convergence of the online output to the unknown function when the regularization path drifts slowly and the RKHS persistence of excitation condition holds, even without data independence. A specific case with independent non-identically distributed data is analyzed, showing that mild drift in input marginals suffices for consistency. Numerical experiments validate convergence and illustrate advantages over KLMS and NORMA in non-stationary environments. The results provide a rigorous framework for tracking time-varying targets in infinite-dimensional RKHS settings with non-iid data streams.

Abstract

We study recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with non-stationary online data streams. We introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS. We show that the tracking error vanishes in mean square if the regularization path is slowly time-varying. Then, leveraging the monotonicity of inverse operators and the spectral decomposition of compact operators, and introducing the RKHS persistence of excitation condition, we develop a dominated convergence method to prove the mean square consistency between the regularization path and the unknown function to be learned. Especially, for independent and non-identically distributed data streams, the mean square consistency between the algorithm's output and the unknown function is achieved if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

TL;DR

. It develops a two-term error decomposition (multiplicative noise and regularization-path drift) and proves mean-square convergence of the online output to the unknown function when the regularization path drifts slowly and the RKHS persistence of excitation condition holds, even without data independence. A specific case with independent non-identically distributed data is analyzed, showing that mild drift in input marginals suffices for consistency. Numerical experiments validate convergence and illustrate advantages over KLMS and NORMA in non-stationary environments. The results provide a rigorous framework for tracking time-varying targets in infinite-dimensional RKHS settings with non-iid data streams.

Abstract

Paper Structure (13 sections, 21 theorems, 182 equations, 2 figures)

This paper contains 13 sections, 21 theorems, 182 equations, 2 figures.

Introduction
Statistical learning model in RKHS
Online learning algorithm in RKHS
Random Tikhonov regularization path of the regression function
Online regularized learning algorithm in RKHS
Convergence analysis
Special case with independent and non-identically distributed online data streams
Numerical examples
Conclusions
Proof in Section III
Proofs in Section IV
Proofs in Section V
Theoretical framework of random elements with values in a Banach space and some lemmas and propositions

Key Result

Proposition 3.1

For the statistical learning model (model), if Assumptions ass2-ass1 hold, then where $\mathrm{grad}\,J_k:\mathscr H_K\to\mathscr H_K$ is the gradient operator. The optimal solution $f_{\lambda,k}$ of (youhua) satisfies where $I:\mathscr H_K\to\mathscr H_K$ is the identity operator. Especially, if $\lambda_k=0$, then $f_{\lambda,k}=f^{\star}$, and if $\lambda_k>0$, then

Figures (2)

Figure 1: (a) Mean squared errors with $a_{k}=\frac{1}{(k+1)^{0.7}}$ and $\lambda_k=\frac{10^{-4}}{(k+1)^{0.15}}$; (b) Mean squared errors with $a_{k}=\frac{1}{(k+1)^{0.7}}$, $\lambda_k=\frac{1}{(k+1)^{0.15}},$$\frac{10^{-1}}{(k+1)^{0.15}},$$\frac{10^{-4}}{(k+1)^{0.15}}$ after 100000 iterations.
Figure 2: (a) Mean squared errors of KLMS; (b) Mean squared errors of NORMA.

Theorems & Definitions (58)

Definition 2.1: Theodoridis
Remark 2.1
Remark 2.2
Proposition 3.1
proof
Definition 3.1
Remark 3.1
Remark 3.2
Lemma 4.1
proof
...and 48 more

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

TL;DR

Abstract

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (58)