Error Bounds for Learning with Vector-Valued Random Features
Samuel Lanthaler, Nicholas H. Nelsen
TL;DR
This work develops a comprehensive error analysis for learning with vector-valued random features in ridge regression, extending the theory to infinite-dimensional input-output mappings and operator learning. A key feature is a direct risk-functional analysis that avoids explicit RF-RR solution formulas and random-matrix concentration, enabling robust guarantees in the vector-valued setting. The paper proves strong consistency under misspecification and minimax-optimal convergence when the target lies in the RKHS, showing that with $M\simeq\sqrt{N}$ features and $\lambda\simeq 1/\sqrt{N}$ one attains $O(1/\sqrt{N})$ error, free of logarithmic factors. It also characterizes convergence rates under fractional regularity, and demonstrates the practical viability through a Burgers-equation operator-learning experiment, highlighting the method’s effectiveness for high- or infinite-dimensional outputs. Overall, the results provide sharp, scalable guarantees for vector-valued RF-based learning in complex, operator-valued settings.
Abstract
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.
