Sample-based almost-sure quasi-optimal approximation in reproducing kernel Hilbert spaces
Nando Hegemann, Anthony Nouy, Philipp Trunschke
TL;DR
The paper studies sample-efficient function approximation in RKHSs by introducing a kernel-based projector $P_{\mathcal{V}_d}^{\boldsymbol{x}}$ whose error is quasi-optimal relative to the best $\mathcal{V}$-projection, with a computable bound involving $\mu(\boldsymbol{x})$ and $\tau(\boldsymbol{x})$. It develops probabilistic sampling strategies, notably continuous volume sampling and the novel subspace-informed volume sampling (SIVS), to select point sets that ensure small $\mu(\boldsymbol{x})$ with high probability, achieving near-linear sample complexity in $d$ under favorable eigenvalue decay. The authors also introduce a greedy subsampling scheme with theoretical (approximate) submodularity properties to reduce sample sizes further and extend the framework to noisy evaluations. Through experiments on multiple RKHSs and measures, the method demonstrates substantial improvements over classical sampling strategies, suggesting practical applicability to inverse problems and the PBDW framework. Overall, the work provides a theoretically grounded, computationally feasible approach for almost-sure quasi-optimal approximation from few, strategically chosen samples.
Abstract
This paper addresses the problem of approximating an unknown function from point evaluations. When obtaining these point evaluations is costly, minimising the required sample size becomes crucial, and it is unreasonable to reserve a sufficiently large test sample for estimating the approximation accuracy. Therefore, an approximation with a certified quasi-optimality factor is required. This article shows that such an approximation can be obtained when the sought function lies in a reproducing kernel Hilbert space (RKHS) and is to be approximated in a finite-dimensional linear subspace $\mathcal{V}_d$. However, selecting the sample points to minimise the quasi-optimality factor requires optimising over an infinite set of points and computing exact inner products in RKHS, which is often infeasible in practice. Extending results from optimal sampling for $L^2$ approximation, the present paper proves that random points, drawn independently from the Christoffel sampling distribution associated with $\mathcal{V}_d$, can yield a controllable quasi-optimality factor with high probability. Inspired by this result, a novel sampling scheme, coined subspace-informed volume sampling, is introduced and evaluated in numerical experiments, where it outperforms classical i.i.d. Christoffel sampling and continuous volume sampling. To reduce the size of such a random sample, an additional greedy subsampling scheme with provable suboptimality bounds is introduced. Our presentation is of independent interest to the inverse problems community, as it offers a simpler interpretation of the parametrised background data weak (PBDW) method.
