Optimal sampling for least squares approximation with general dictionaries
Philipp Trunschke, Anthony Nouy
TL;DR
The paper tackles stable least-squares approximation when the dictionary may be overcomplete and the optimal sampling measure depends on an unknown Gramian $G$. It introduces an iterative Gramian refinement method that bootsstraps from an initial vol-sampling-based estimate and progressively improves the sampling density via inverse Christoffel functions, achieving convergence in mean and variance and providing high-probability framing guarantees. Theoretical results show unbiasedness of iterates and decreasing variance, while experiments demonstrate substantial reductions in required samples compared to naive Monte Carlo across unbounded, compact, and singular measures. This approach enables practical, offline-online use of optimal sampling in nonlinear approximation contexts where the underlying linear space arises from local linearisation of complex models. Overall, the work advances the feasibility of near-optimal, dictionary-based least-squares in challenging, high-dimensional settings.
Abstract
We consider the problem of approximating an unknown function from point evaluations. This problem is a crucial subproblem in many modern (nonlinear) approximation schemes. When obtaining these point evaluations is costly, minimising the required sample size becomes crucial. Recently, an increasing focus has been on employing importance sampling strategies to achieve this. For the approximation in a $d$-dimensional linear space, an optimal i.i.d. sampling measure achieves a sampling complexity of $\mathcal{O}(d\log (d))$. However, the corresponding sampling measure depends on an orthonormal basis of the linear space, which is rarely known (in particular in the context of nonlinear approximation where the linear space arises as a local linearisation of a nonlinear model class like neural networks or tensor networks). Consequently, sampling from these measures is challenging in practice. This manuscript presents a strategy for estimating an orthonormal basis. This strategy can be performed offline and does not require evaluations of the sought function. We establish convergence and illustrate the practical performance through numerical experiments. Comparing the presented approach with standard Monte Carlo sampling demonstrates a significant reduction in the number of samples required to achieve a good estimation of an orthonormal basis.
