Least squares approximations in linear statistical inverse learning problems

Tapio Helin

Least squares approximations in linear statistical inverse learning problems

Tapio Helin

TL;DR

The paper addresses recovering an unknown function $f$ from noisy, randomly sampled evaluations linked through an ill-posed forward map by regularizing via finite-dimensional projections. It proves probabilistic convergence rates for the maximum likelihood estimator and, with a norm-based truncation, establishes $L^p$-convergence rates that scale as $a_{N,R_0,\delta} = R_0(\delta/(R_0\sqrt{N}))^{2s/(2s+t+1)}$, where the smoothness $s$ and ill-posedness parameter $t$ govern the rate. A key contribution is showing that these rates are minimax-optimal over a class of statistical models with controlled design measures, thereby matching the best possible performance in this framework. The results hinge on concentration of the empirical normal operator in Hilbert-Schmidt norm and a careful decomposition of error into approximation and variance components, unified under an admissible subspace structure. The work connects regularization-by-projection in inverse problems with statistical learning theory and RKHS methods, offering a principled, data-driven parameter-choice rule and rigorous optimality guarantees that extend existing spectral-regularization insights to random-design inverse learning.

Abstract

Statistical inverse learning aims at recovering an unknown function $f$ from randomly scattered and possibly noisy point evaluations of another function $g$, connected to $f$ via an ill-posed mathematical model. In this paper we blend statistical inverse learning theory with the classical regularization strategy of applying finite-dimensional projections. Our key finding is that coupling the number of random point evaluations with the choice of projection dimension, one can derive probabilistic convergence rates for the reconstruction error of the maximum likelihood (ML) estimator. Convergence rates in expectation are derived with a ML estimator complemented with a norm-based cut-off operation. Moreover, we prove that the obtained rates are minimax optimal.

Least squares approximations in linear statistical inverse learning problems

TL;DR

The paper addresses recovering an unknown function

from noisy, randomly sampled evaluations linked through an ill-posed forward map by regularizing via finite-dimensional projections. It proves probabilistic convergence rates for the maximum likelihood estimator and, with a norm-based truncation, establishes

-convergence rates that scale as

, where the smoothness

and ill-posedness parameter

govern the rate. A key contribution is showing that these rates are minimax-optimal over a class of statistical models with controlled design measures, thereby matching the best possible performance in this framework. The results hinge on concentration of the empirical normal operator in Hilbert-Schmidt norm and a careful decomposition of error into approximation and variance components, unified under an admissible subspace structure. The work connects regularization-by-projection in inverse problems with statistical learning theory and RKHS methods, offering a principled, data-driven parameter-choice rule and rigorous optimality guarantees that extend existing spectral-regularization insights to random-design inverse learning.

Abstract

Statistical inverse learning aims at recovering an unknown function

from randomly scattered and possibly noisy point evaluations of another function

, connected to

via an ill-posed mathematical model. In this paper we blend statistical inverse learning theory with the classical regularization strategy of applying finite-dimensional projections. Our key finding is that coupling the number of random point evaluations with the choice of projection dimension, one can derive probabilistic convergence rates for the reconstruction error of the maximum likelihood (ML) estimator. Convergence rates in expectation are derived with a ML estimator complemented with a norm-based cut-off operation. Moreover, we prove that the obtained rates are minimax optimal.

Paper Structure (9 sections, 15 theorems, 125 equations)

This paper contains 9 sections, 15 theorems, 125 equations.

Introduction
Literature overview
Mathematical preliminaries
Concentration result
Expected reconstruction error
Minimax optimality
Preliminaries
Proof of strong minimax optimality
Conclusions

Key Result

Theorem 1.2

\newlabelthm:main_result_prob0 Let $\{V_m\}_{m=1}^\infty$ be a sequence of admissible subspaces, $B_\nu$ is a Hilbert--Schmidt operator and suppose $\nu \in {\mathcal{P}}^{>}(t,D_1) \cap {\mathcal{P}}^\times(D_2)$ and $f^\dagger \in \Theta(s,R_0)$ for some constants $s,t,R_0, D_1, D_2>0$. Moreover There exists a constant $C$ depending on $D_j$, $j=1,2$ such that with probability greater than $1-

Theorems & Definitions (25)

Definition 1.1
Theorem 1.2
Theorem 1.3
Theorem 1.4
Remark 1.5
Definition 2.1
Theorem 2.2: pinelis1986remarks
Corollary 2.3: blanchard2018optimal
Lemma 3.1
Proof 1
...and 15 more

Least squares approximations in linear statistical inverse learning problems

TL;DR

Abstract

Least squares approximations in linear statistical inverse learning problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (25)