Table of Contents
Fetching ...

Dimension-free bounds in high-dimensional linear regression via error-in-operator approach

Fedor Noskov, Nikita Puchkin, Vladimir Spokoiny

TL;DR

This work addresses high-dimensional linear regression with random design by introducing an error-in-operator approach that embeds the design covariance into the optimization objective rather than directly estimating $\Sigma$. The authors derive non-asymptotic, dimension-free expansions for the bias and stochastic terms of the error-in-operator estimator $\widehat{\boldsymbol\theta}$, and show that learning an auxiliary operator $A$ does not inflate the effective dimension when the regularization parameter $\mu$ is chosen suitably. A practical alternating-optimization algorithm is developed to compute $\widehat{\boldsymbol\upsilon}$, with theoretical convergence guarantees, and numerical experiments demonstrate improved generalization and reduced double-descent sensitivity compared to ridge methods. The results are underpinned by concentration inequalities and a detailed Hessian analysis, linking risk to the eigenstructure of the design covariance through quantities like $\lambda$, $\mathbf{b}_\lambda$, and the decay function $r_q(k)$. Overall, the paper provides a principled, computationally tractable way to achieve dimension-free performance in noisy, high-dimensional settings. $R(\widehat{\boldsymbol\theta})-R(\boldsymbol\theta^\circ)=\|\Sigma^{1/2}(\widehat{\boldsymbol\theta}-\boldsymbol\theta^\circ)\|^2$, and the leading bias is captured by $\mathbf{b}_\lambda=-\lambda(\Sigma^2/2+\lambda I_d)^{-1}\boldsymbol\theta^\circ$, with a provable stochastic-term bound.

Abstract

We consider a problem of high-dimensional linear regression with random design. We suggest a novel approach referred to as error-in-operator which does not estimate the design covariance $Σ$ directly but incorporates it into empirical risk minimization. We provide an expansion of the excess prediction risk and derive non-asymptotic dimension-free bounds on the leading term and the remainder. This helps us to show that auxiliary variables do not increase the effective dimension of the problem, provided that parameters of the procedure are tuned properly. We also discuss computational aspects of our method and illustrate its performance with numerical experiments.

Dimension-free bounds in high-dimensional linear regression via error-in-operator approach

TL;DR

This work addresses high-dimensional linear regression with random design by introducing an error-in-operator approach that embeds the design covariance into the optimization objective rather than directly estimating . The authors derive non-asymptotic, dimension-free expansions for the bias and stochastic terms of the error-in-operator estimator , and show that learning an auxiliary operator does not inflate the effective dimension when the regularization parameter is chosen suitably. A practical alternating-optimization algorithm is developed to compute , with theoretical convergence guarantees, and numerical experiments demonstrate improved generalization and reduced double-descent sensitivity compared to ridge methods. The results are underpinned by concentration inequalities and a detailed Hessian analysis, linking risk to the eigenstructure of the design covariance through quantities like , , and the decay function . Overall, the paper provides a principled, computationally tractable way to achieve dimension-free performance in noisy, high-dimensional settings. , and the leading bias is captured by , with a provable stochastic-term bound.

Abstract

We consider a problem of high-dimensional linear regression with random design. We suggest a novel approach referred to as error-in-operator which does not estimate the design covariance directly but incorporates it into empirical risk minimization. We provide an expansion of the excess prediction risk and derive non-asymptotic dimension-free bounds on the leading term and the remainder. This helps us to show that auxiliary variables do not increase the effective dimension of the problem, provided that parameters of the procedure are tuned properly. We also discuss computational aspects of our method and illustrate its performance with numerical experiments.

Paper Structure

This paper contains 46 sections, 40 theorems, 668 equations, 2 figures, 1 algorithm.

Key Result

Theorem 2.3

Let the parameters $\mu$ and $\lambda$ be non-negative. Assume that the following inequalities hold: Then the vector $\boldsymbol \theta^*$ defined in eq:ups_star satisfies

Figures (2)

  • Figure 1: Left: The ratio between the bias component $\Vert \Sigma^{1/2}(\boldsymbol \theta^* - \boldsymbol \theta^\circ)\Vert$ of the risk and its leading term established in Theorem \ref{['theorem: bias']}. Middle and Right: The ratio between the variance component of the risk $\Vert \Sigma^{1/2}(\widehat{\boldsymbol \theta} - \boldsymbol \theta^*) \Vert$ and its leading term established in Theorem \ref{['th:stoch_term']}.
  • Figure 2: Numerical studies of Error-in-Operator estimator $\widehat{\boldsymbol \theta}$. Left: The comparison of the Error-in-Operator estimator and the standard ridge estimator. Right: The effect of finite $\mu$ on the double descent curve.

Theorems & Definitions (47)

  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5
  • Theorem 2.6
  • Corollary 2.7
  • Remark 2.8
  • Lemma 3.1
  • Remark 3.2
  • Theorem 3.3
  • Lemma 5.1
  • ...and 37 more