Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms
Dimitri Meunier, Zikai Shen, Mattes Mollenhauer, Arthur Gretton, Zhu Li
TL;DR
This work analyzes learning with vector-valued outputs in a reproducing kernel Hilbert space framework, treating the problem as a (potentially ill-posed) inverse problem regularized by spectral filter functions. It introduces vector-valued interpolation spaces to quantify target smoothness in both well-specified and misspecified settings and derives minimax-rate results for a broad class of spectral algorithms, including infinite-dimensional outputs. A key finding is the saturation phenomenon for vector-valued kernel ridge regression: while optimal rates are attainable for moderate smoothness (β in [1,2]), the KRR upper bound can saturate at $n^{-2/(2+p)}$ for β>2, leaving a gap to information-theoretic lower bounds; in contrast, algorithms with infinite qualification (e.g., gradient descent, kernel PCR) can avoid this saturation and attain rates matching the problem's smoothness. The results extend to misspecified settings and rely on an extended representer theorem, enabling practical computation in high- and infinite-dimensional output spaces with strong theoretical guarantees.
Abstract
We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression, various implementations of gradient descent and many more. Our contributions are twofold. First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by deriving a novel lower bound on learning rates; this bound is shown to be suboptimal when the smoothness of the regression function exceeds a certain level. Second, we present the upper bound for the finite sample risk general vector-valued spectral algorithms, applicable to both well-specified and misspecified scenarios (where the true regression function lies outside of the hypothesis space) which is minimax optimal in various regimes. All of our results explicitly allow the case of infinite-dimensional output variables, proving consistency of recent practical applications.
