Table of Contents
Fetching ...

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

Xiwei Zhang, Yan Chen, Tao Li

TL;DR

This work addresses online statistical learning in RKHS under non-stationary data by introducing a random Tikhonov regularization path and recasting the problem as solving a random ill-posed inverse problem with time-varying forward operators $T_k$. It develops a two-term error decomposition (multiplicative noise and regularization-path drift) and proves mean-square convergence of the online output to the unknown function when the regularization path drifts slowly and the RKHS persistence of excitation condition holds, even without data independence. A specific case with independent non-identically distributed data is analyzed, showing that mild drift in input marginals suffices for consistency. Numerical experiments validate convergence and illustrate advantages over KLMS and NORMA in non-stationary environments. The results provide a rigorous framework for tracking time-varying targets in infinite-dimensional RKHS settings with non-iid data streams.

Abstract

We study recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with non-stationary online data streams. We introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS. We show that the tracking error vanishes in mean square if the regularization path is slowly time-varying. Then, leveraging the monotonicity of inverse operators and the spectral decomposition of compact operators, and introducing the RKHS persistence of excitation condition, we develop a dominated convergence method to prove the mean square consistency between the regularization path and the unknown function to be learned. Especially, for independent and non-identically distributed data streams, the mean square consistency between the algorithm's output and the unknown function is achieved if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

TL;DR

This work addresses online statistical learning in RKHS under non-stationary data by introducing a random Tikhonov regularization path and recasting the problem as solving a random ill-posed inverse problem with time-varying forward operators . It develops a two-term error decomposition (multiplicative noise and regularization-path drift) and proves mean-square convergence of the online output to the unknown function when the regularization path drifts slowly and the RKHS persistence of excitation condition holds, even without data independence. A specific case with independent non-identically distributed data is analyzed, showing that mild drift in input marginals suffices for consistency. Numerical experiments validate convergence and illustrate advantages over KLMS and NORMA in non-stationary environments. The results provide a rigorous framework for tracking time-varying targets in infinite-dimensional RKHS settings with non-iid data streams.

Abstract

We study recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with non-stationary online data streams. We introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS. We show that the tracking error vanishes in mean square if the regularization path is slowly time-varying. Then, leveraging the monotonicity of inverse operators and the spectral decomposition of compact operators, and introducing the RKHS persistence of excitation condition, we develop a dominated convergence method to prove the mean square consistency between the regularization path and the unknown function to be learned. Especially, for independent and non-identically distributed data streams, the mean square consistency between the algorithm's output and the unknown function is achieved if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
Paper Structure (13 sections, 21 theorems, 182 equations, 2 figures)

This paper contains 13 sections, 21 theorems, 182 equations, 2 figures.

Key Result

Proposition 3.1

For the statistical learning model (model), if Assumptions ass2-ass1 hold, then where $\mathrm{grad}\,J_k:\mathscr H_K\to\mathscr H_K$ is the gradient operator. The optimal solution $f_{\lambda,k}$ of (youhua) satisfies where $I:\mathscr H_K\to\mathscr H_K$ is the identity operator. Especially, if $\lambda_k=0$, then $f_{\lambda,k}=f^{\star}$, and if $\lambda_k>0$, then

Figures (2)

  • Figure 1: (a) Mean squared errors with $a_{k}=\frac{1}{(k+1)^{0.7}}$ and $\lambda_k=\frac{10^{-4}}{(k+1)^{0.15}}$; (b) Mean squared errors with $a_{k}=\frac{1}{(k+1)^{0.7}}$, $\lambda_k=\frac{1}{(k+1)^{0.15}},$$\frac{10^{-1}}{(k+1)^{0.15}},$$\frac{10^{-4}}{(k+1)^{0.15}}$ after 100000 iterations.
  • Figure 2: (a) Mean squared errors of KLMS; (b) Mean squared errors of NORMA.

Theorems & Definitions (58)

  • Definition 2.1: Theodoridis
  • Remark 2.1
  • Remark 2.2
  • Proposition 3.1
  • proof
  • Definition 3.1
  • Remark 3.1
  • Remark 3.2
  • Lemma 4.1
  • proof
  • ...and 48 more