Dimension-free bounds in high-dimensional linear regression via error-in-operator approach
Fedor Noskov, Nikita Puchkin, Vladimir Spokoiny
TL;DR
This work addresses high-dimensional linear regression with random design by introducing an error-in-operator approach that embeds the design covariance into the optimization objective rather than directly estimating $\Sigma$. The authors derive non-asymptotic, dimension-free expansions for the bias and stochastic terms of the error-in-operator estimator $\widehat{\boldsymbol\theta}$, and show that learning an auxiliary operator $A$ does not inflate the effective dimension when the regularization parameter $\mu$ is chosen suitably. A practical alternating-optimization algorithm is developed to compute $\widehat{\boldsymbol\upsilon}$, with theoretical convergence guarantees, and numerical experiments demonstrate improved generalization and reduced double-descent sensitivity compared to ridge methods. The results are underpinned by concentration inequalities and a detailed Hessian analysis, linking risk to the eigenstructure of the design covariance through quantities like $\lambda$, $\mathbf{b}_\lambda$, and the decay function $r_q(k)$. Overall, the paper provides a principled, computationally tractable way to achieve dimension-free performance in noisy, high-dimensional settings. $R(\widehat{\boldsymbol\theta})-R(\boldsymbol\theta^\circ)=\|\Sigma^{1/2}(\widehat{\boldsymbol\theta}-\boldsymbol\theta^\circ)\|^2$, and the leading bias is captured by $\mathbf{b}_\lambda=-\lambda(\Sigma^2/2+\lambda I_d)^{-1}\boldsymbol\theta^\circ$, with a provable stochastic-term bound.
Abstract
We consider a problem of high-dimensional linear regression with random design. We suggest a novel approach referred to as error-in-operator which does not estimate the design covariance $Σ$ directly but incorporates it into empirical risk minimization. We provide an expansion of the excess prediction risk and derive non-asymptotic dimension-free bounds on the leading term and the remainder. This helps us to show that auxiliary variables do not increase the effective dimension of the problem, provided that parameters of the procedure are tuned properly. We also discuss computational aspects of our method and illustrate its performance with numerical experiments.
