Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Dimitri Meunier; Zikai Shen; Mattes Mollenhauer; Arthur Gretton; Zhu Li

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Dimitri Meunier, Zikai Shen, Mattes Mollenhauer, Arthur Gretton, Zhu Li

TL;DR

This work analyzes learning with vector-valued outputs in a reproducing kernel Hilbert space framework, treating the problem as a (potentially ill-posed) inverse problem regularized by spectral filter functions. It introduces vector-valued interpolation spaces to quantify target smoothness in both well-specified and misspecified settings and derives minimax-rate results for a broad class of spectral algorithms, including infinite-dimensional outputs. A key finding is the saturation phenomenon for vector-valued kernel ridge regression: while optimal rates are attainable for moderate smoothness (β in [1,2]), the KRR upper bound can saturate at $n^{-2/(2+p)}$ for β>2, leaving a gap to information-theoretic lower bounds; in contrast, algorithms with infinite qualification (e.g., gradient descent, kernel PCR) can avoid this saturation and attain rates matching the problem's smoothness. The results extend to misspecified settings and rely on an extended representer theorem, enabling practical computation in high- and infinite-dimensional output spaces with strong theoretical guarantees.

Abstract

We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression, various implementations of gradient descent and many more. Our contributions are twofold. First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by deriving a novel lower bound on learning rates; this bound is shown to be suboptimal when the smoothness of the regression function exceeds a certain level. Second, we present the upper bound for the finite sample risk general vector-valued spectral algorithms, applicable to both well-specified and misspecified scenarios (where the true regression function lies outside of the hypothesis space) which is minimax optimal in various regimes. All of our results explicitly allow the case of infinite-dimensional output variables, proving consistency of recent practical applications.

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

TL;DR

for β>2, leaving a gap to information-theoretic lower bounds; in contrast, algorithms with infinite qualification (e.g., gradient descent, kernel PCR) can avoid this saturation and attain rates matching the problem's smoothness. The results extend to misspecified settings and rely on an extended representer theorem, enabling practical computation in high- and infinite-dimensional output spaces with strong theoretical guarantees.

Abstract

Paper Structure (21 sections, 38 theorems, 237 equations)

This paper contains 21 sections, 38 theorems, 237 equations.

Introduction
Background and Preliminaries
Vector-valued Regression
Vector-valued Interpolation Space and Source Condition
Further Assumptions
Saturation Effect of Kernel Ridge Regression
Consistency and optimal rates for general spectral algorithms
Additional Background
Hilbert spaces and linear operators
RKHS embbedings into L2 and Well-specifiedness
Additional Notations
Saturation Effect with Tikhonov Regularization
Learning rates for spectral algorithms
Fourier expansion
Approximation Error
...and 6 more sections

Key Result

Theorem 1

For every function $F\in \mathcal{G}$ there exists a unique operator $C \in S_2(\mathcal{H}, \mathcal{Y})$ such that $F(\cdot) = C\phi(\cdot) \in \mathcal{Y}$ with $\|C\|_{S_2(\mathcal{H}, \mathcal{Y})} = \|F\|_{\mathcal{G}}$ and vice versa. Hence $\mathcal{G} \simeq S_2(\mathcal{H}, \mathcal{Y})$ a

Theorems & Definitions (80)

Remark 1: aubin2000applied, Theorem 12.6.1
Remark 2: General multiplicative kernel
Theorem 1: vRKHS isomorphism
Definition 1: Vector-valued interpolation space
Remark 3: Interpolation space inclusions
Remark 4: Well-specified versus misspecified setting
Theorem 2: Upper and lower bounds for KRR in the well-specified regime
Theorem 3: Saturation of KRR
Definition 2: Filter function
Proposition 1: Representer theorem for general spectral filter
...and 70 more

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

TL;DR

Abstract

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (80)