Error Bounds for Learning with Vector-Valued Random Features

Samuel Lanthaler; Nicholas H. Nelsen

Error Bounds for Learning with Vector-Valued Random Features

Samuel Lanthaler, Nicholas H. Nelsen

TL;DR

This work develops a comprehensive error analysis for learning with vector-valued random features in ridge regression, extending the theory to infinite-dimensional input-output mappings and operator learning. A key feature is a direct risk-functional analysis that avoids explicit RF-RR solution formulas and random-matrix concentration, enabling robust guarantees in the vector-valued setting. The paper proves strong consistency under misspecification and minimax-optimal convergence when the target lies in the RKHS, showing that with $M\simeq\sqrt{N}$ features and $\lambda\simeq 1/\sqrt{N}$ one attains $O(1/\sqrt{N})$ error, free of logarithmic factors. It also characterizes convergence rates under fractional regularity, and demonstrates the practical viability through a Burgers-equation operator-learning experiment, highlighting the method’s effectiveness for high- or infinite-dimensional outputs. Overall, the results provide sharp, scalable guarantees for vector-valued RF-based learning in complex, operator-valued settings.

Abstract

This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.

Error Bounds for Learning with Vector-Valued Random Features

TL;DR

features and

one attains

error, free of logarithmic factors. It also characterizes convergence rates under fractional regularity, and demonstrates the practical viability through a Burgers-equation operator-learning experiment, highlighting the method’s effectiveness for high- or infinite-dimensional outputs. Overall, the results provide sharp, scalable guarantees for vector-valued RF-based learning in complex, operator-valued settings.

Abstract

Paper Structure (33 sections, 27 theorems, 131 equations, 3 figures, 1 table)

This paper contains 33 sections, 27 theorems, 131 equations, 3 figures, 1 table.

Introduction
Related work.
Contributions.
Outline.
Preliminaries
Notation.
Random features and reproducing kernel Hilbert spaces.
Random feature ridge regression.
Main results
Assumptions
General error bound
Consequences.
Statistical consistency
Convergence rates
Proof outline for the main theorem
...and 18 more sections

Key Result

Theorem 3.4

Suppose that $\mathcal{G} = \rho+\mathcal{G}_{\mathcal{H}}$ satisfies Assumption ass:misspec. Fix a failure probability $\delta\in(0,1)$, regularization strength $\lambda\in(0,1)$, and sample size $N$. Let ${\{\theta_m\}}\sim\mu^{\otimes M}$ be the $M$ random feature parameters and $\{(u_n,y_n)\}\si with probability at least $1-\delta$, where is a function of $\rho$, $\lambda$, $\mathcal{G}_{\mat

Figures (3)

Figure 1: Flow chart illustrating the proof of Theorem \ref{['thm:error_bound_total_G']}.
Figure 2: Squared test error of trained RFM for learning the Burgers' equation solution operator. All shaded bands denote two empirical standard deviations from the empirical mean of the error computed over $10$ different models, each with iid sampling of the features and training data indices.
Figure 3: Squared test error---which empirically approximates the population risk $\mathscr{R}(\widehat{\alpha};\mathcal{G})$---versus discretized output space dimension $p$, where $\mathcal{G}$ is the Burgers' equation solution operator (SM\ref{['app:numeric']}).

Theorems & Definitions (59)

Definition 2.1: Random feature model
Definition 2.2: Empirical risk
Theorem 3.4: $\mathcal{G}$-population squared error bound
Remark 3.5: Excess risk
Remark 3.6: The $\beta$ factor
Theorem 3.7: Well-specified
Corollary 3.8: $\mathcal{G}_{\mathcal{H}}$-population squared error bound
Example 3.9: Numerical discretization error
Theorem 3.10: Strong consistency
Remark 3.11: Universal RKHS
...and 49 more

Error Bounds for Learning with Vector-Valued Random Features

TL;DR

Abstract

Error Bounds for Learning with Vector-Valued Random Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (59)