On the Consistency of Kernel Methods with Dependent Observations

Pierre-François Massiani; Sebastian Trimpe; Friedrich Solowjow

On the Consistency of Kernel Methods with Dependent Observations

Pierre-François Massiani, Sebastian Trimpe, Friedrich Solowjow

TL;DR

This work addresses learning with dependent observations by introducing empirical weak convergence (EWC), which allows the asymptotic data distribution to be a random measure. It then develops a foundation for kernel methods under EWC, proving that kernel mean embeddings converge to the embedding of the random limit and that SVMs with Hilbert-space outputs remain consistent when the data-generating process satisfies EWC. The theory generalizes beyond i.i.d. and standard mixing assumptions to include infinite-dimensional outputs (e.g., CKME) and dynamical-system-type data, by leveraging convergence of random measures and weak topology. The results demonstrate that kernel methods can be consistent under broader, less restrictive conditions, while clarifying the trade-offs with differentiability requirements and the lack of explicit rates. Overall, the paper lays a rigorous foundation for learning with dependent data and extends statistical learning theory to Hilbert-valued outputs.

Abstract

The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.i.d., such as when data comes from a dynamical system. We propose the new notion of empirical weak convergence (EWC) as a general assumption explaining such phenomena for kernel methods. It assumes the existence of a random asymptotic data distribution and is a strict weakening of previous assumptions in the field. Our main results then establish consistency of SVMs, kernel mean embeddings, and general Hilbert-space valued empirical expectations with EWC data. Our analysis holds for both finite- and infinite-dimensional outputs, as we extend classical results of statistical learning to the latter case. In particular, it is also applicable to CKMEs. Overall, our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.

On the Consistency of Kernel Methods with Dependent Observations

TL;DR

Abstract

Paper Structure (33 sections, 33 theorems, 103 equations, 1 figure)

This paper contains 33 sections, 33 theorems, 103 equations, 1 figure.

Introduction
Related work
Consistency under dependent sampling
Learning theory with infinite-dimensional outputs
Preliminaries and notations
Sets and topology
Markov kernels and random elements
Elements of statistical learning theory
Vector-valued RKHSs and SVMs
Empirical weak convergence
Definition and first properties
Examples and connections
Independent, ergodic, and mixing processes
Measure-preserving dynamical systems
Weak Convergence
...and 18 more sections

Key Result

Theorem 2.9

Let $\mathcal{H}$ be a $\mathcal{G}$-valued RKHS with kernel $K$. Then, $K$ is Hermitian, positive semi-definiteRecall that a bivariate function $\phi:\mathcal{X}\times\mathcal{X}\to\mathcal{L}(\mathcal{G})$ is Hermitian if $\phi(x,x^\prime) = \phi(x^\prime, x)^\star$. Further, it is positive semi-d

Figures (1)

Figure 1: Relation between the the different notions. All implications hold with the same limit measure or with the intensity measure thereof, and both in probability and as when applicable. The implication LLNE $\implies$ AMS is shown in SHS2009.

Theorems & Definitions (85)

Definition 2.1: Markov kernel
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Definition 2.6
Definition 2.7
Definition 2.8
Theorem 2.9
Definition 2.10
...and 75 more

On the Consistency of Kernel Methods with Dependent Observations

TL;DR

Abstract

On the Consistency of Kernel Methods with Dependent Observations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (85)