Optimal sampling for least squares approximation with general dictionaries

Philipp Trunschke; Anthony Nouy

Optimal sampling for least squares approximation with general dictionaries

Philipp Trunschke, Anthony Nouy

TL;DR

The paper tackles stable least-squares approximation when the dictionary may be overcomplete and the optimal sampling measure depends on an unknown Gramian $G$. It introduces an iterative Gramian refinement method that bootsstraps from an initial vol-sampling-based estimate and progressively improves the sampling density via inverse Christoffel functions, achieving convergence in mean and variance and providing high-probability framing guarantees. Theoretical results show unbiasedness of iterates and decreasing variance, while experiments demonstrate substantial reductions in required samples compared to naive Monte Carlo across unbounded, compact, and singular measures. This approach enables practical, offline-online use of optimal sampling in nonlinear approximation contexts where the underlying linear space arises from local linearisation of complex models. Overall, the work advances the feasibility of near-optimal, dictionary-based least-squares in challenging, high-dimensional settings.

Abstract

We consider the problem of approximating an unknown function from point evaluations. This problem is a crucial subproblem in many modern (nonlinear) approximation schemes. When obtaining these point evaluations is costly, minimising the required sample size becomes crucial. Recently, an increasing focus has been on employing importance sampling strategies to achieve this. For the approximation in a $d$-dimensional linear space, an optimal i.i.d. sampling measure achieves a sampling complexity of $\mathcal{O}(d\log (d))$. However, the corresponding sampling measure depends on an orthonormal basis of the linear space, which is rarely known (in particular in the context of nonlinear approximation where the linear space arises as a local linearisation of a nonlinear model class like neural networks or tensor networks). Consequently, sampling from these measures is challenging in practice. This manuscript presents a strategy for estimating an orthonormal basis. This strategy can be performed offline and does not require evaluations of the sought function. We establish convergence and illustrate the practical performance through numerical experiments. Comparing the presented approach with standard Monte Carlo sampling demonstrates a significant reduction in the number of samples required to achieve a good estimation of an orthonormal basis.

Optimal sampling for least squares approximation with general dictionaries

TL;DR

The paper tackles stable least-squares approximation when the dictionary may be overcomplete and the optimal sampling measure depends on an unknown Gramian

. It introduces an iterative Gramian refinement method that bootsstraps from an initial vol-sampling-based estimate and progressively improves the sampling density via inverse Christoffel functions, achieving convergence in mean and variance and providing high-probability framing guarantees. Theoretical results show unbiasedness of iterates and decreasing variance, while experiments demonstrate substantial reductions in required samples compared to naive Monte Carlo across unbounded, compact, and singular measures. This approach enables practical, offline-online use of optimal sampling in nonlinear approximation contexts where the underlying linear space arises from local linearisation of complex models. Overall, the work advances the feasibility of near-optimal, dictionary-based least-squares in challenging, high-dimensional settings.

Abstract

-dimensional linear space, an optimal i.i.d. sampling measure achieves a sampling complexity of

. However, the corresponding sampling measure depends on an orthonormal basis of the linear space, which is rarely known (in particular in the context of nonlinear approximation where the linear space arises as a local linearisation of a nonlinear model class like neural networks or tensor networks). Consequently, sampling from these measures is challenging in practice. This manuscript presents a strategy for estimating an orthonormal basis. This strategy can be performed offline and does not require evaluations of the sought function. We establish convergence and illustrate the practical performance through numerical experiments. Comparing the presented approach with standard Monte Carlo sampling demonstrates a significant reduction in the number of samples required to achieve a good estimation of an orthonormal basis.

Paper Structure (37 sections, 27 theorems, 174 equations, 6 figures)

This paper contains 37 sections, 27 theorems, 174 equations, 6 figures.

Introduction
Contributions and Structure
Setting and related work
Least squares approximation.
Error bounds.
Sample size bounds.
Approximate optimal sampling.
Monte Carlo approach.
Framing approach.
Multi-level framing approach.
Sampling.
Iterative refinement of the Gramian
Initialisation
Iterative refinement
Theory
...and 22 more sections

Key Result

theorem 2

Suppose the probability of the random event eq:rip is lower bounded by $p$. Then

Figures (6)

Figure 1: Experiment from section \ref{['sec:experiment_hermite']}. Equispaced quantiles for the suboptimality factor $\tfrac{C}{c}$ from equation \ref{['eq:suboptimality_factor']} plotted against the number of steps $k$.
Figure 2: Experiment from section \ref{['sec:experiment_hermite']}. Equispaced quantiles for the suboptimality factor $\tfrac{C}{c}$ from equation \ref{['eq:suboptimality_factor']} plotted against the cumulative sample size $kN$.
Figure 3: Experiment from section \ref{['sec:experiment_redundant_monomial']}. Equispaced quantiles for the suboptimality factor $\tfrac{C}{c}$ from equation \ref{['eq:suboptimality_factor']} plotted against the cumulative sample size $kN$.
Figure 4: Experiment from section \ref{['ex:pw_const']}. Equispaced quantiles for the suboptimality factor $\tfrac{C}{c}$ from equation \ref{['eq:suboptimality_factor']} plotted against the cumulative sample size $kN$.
Figure 5: Experiment from section \ref{['sec:christoffel-darboux']}. Equispaced quantiles for the suboptimality factor $\tfrac{C}{c}$ from equation \ref{['eq:suboptimality_factor']} plotted against the cumulative sample size $kN$.
...and 1 more figures

Theorems & Definitions (52)

remark 1
theorem 2: haberstich_2022_boosted Theorem 2.4
theorem 3
lemma 4
lemma 5
proposition 6
proof
proposition 7
proof
theorem 8
...and 42 more

Optimal sampling for least squares approximation with general dictionaries

TL;DR

Abstract

Optimal sampling for least squares approximation with general dictionaries

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (52)