Table of Contents
Fetching ...

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Pierre C. Bellec, Kai Tan

TL;DR

The paper develops a finite-iteration uncertainty quantification framework for iterates produced by GD-type algorithms in high-dimensional linear models with $p$ comparable to $n$. By modeling the update as a Lipschitz map and introducing memory matrices ${\mathcal J},{\mathcal D}$ and the memory matrix ${\widehat{\mathbf A}}$, it derives a data-driven generalization-error estimator $\hat r_t$ that is $\sqrt n$-consistent under Gaussian designs, enabling early stopping via $\hat t = \arg\min_t \hat r_t$. It further provides a debiased, coordinate-wise inference scheme with asymptotic normality and practical confidence intervals for the true coefficients at any finite iteration, along with an oracle inequality for stopping rules. The approach is instantiated for GD, AGD, ISTA, FISTA, and LQA/MCP, and supported by efficient computation strategies (forward-substitution and Hutchinson trace approximation) to scale to large problems. Overall, the method offers a scalable, data-driven path to quantify predictive risk and perform inference along algorithmic trajectories, without requiring knowledge of the design covariance or noise level."

Abstract

This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

TL;DR

The paper develops a finite-iteration uncertainty quantification framework for iterates produced by GD-type algorithms in high-dimensional linear models with comparable to . By modeling the update as a Lipschitz map and introducing memory matrices and the memory matrix , it derives a data-driven generalization-error estimator that is -consistent under Gaussian designs, enabling early stopping via . It further provides a debiased, coordinate-wise inference scheme with asymptotic normality and practical confidence intervals for the true coefficients at any finite iteration, along with an oracle inequality for stopping rules. The approach is instantiated for GD, AGD, ISTA, FISTA, and LQA/MCP, and supported by efficient computation strategies (forward-substitution and Hutchinson trace approximation) to scale to large problems. Overall, the method offers a scalable, data-driven path to quantify predictive risk and perform inference along algorithmic trajectories, without requiring knowledge of the design covariance or noise level."

Abstract

This paper investigates the iterates obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension is comparable with the sample size , i.e., . The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate for any fixed iteration along the trajectory. These estimators are proved to be -consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration , the estimates allow to select from the data an iteration that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate at any finite iteration . Extensive simulations on synthetic data illustrate the theoretical results.
Paper Structure (56 sections, 32 theorems, 267 equations, 15 figures, 1 table)

This paper contains 56 sections, 32 theorems, 267 equations, 15 figures, 1 table.

Key Result

Theorem 2.1

Let assu:designassu:noiseassu:regimeassu:Lipschitz be fulfilled. For each $t\in [T]$, define the estimate $\hat{r}_t$ of $r_t$ by where $\hat{w}_{t,s} = \boldsymbol{e}_t^\top (\boldsymbol{I}_T - {\widehat{\mathbf A}}/n)^{-1}\boldsymbol{e}_s$. We have for any $t\in [T]$, Here ${\rm var}(y_1) = \lVert\boldsymbol{\Sigma}^{1/2}\boldsymbol{b}^*\rVert^2 + \sigma^2$, and $C(\zeta, T, \gamma, \kappa)$ i

Figures (15)

  • Figure 1: Risk curves and qq-plots of z-score of GD for $(n,p)=(1200, 1500)$.
  • Figure 2: Risk curves and qq-plots of z-score of AGD for $(n,p)=(1200, 1500)$.
  • Figure 3: Risk curves and qq-plots of z-score of ISTA for $(n,p)=(1200, 1500)$.
  • Figure 4: Risk curves and qq-plots of z-score of FISTA for $(n,p)=(1200, 1500)$.
  • Figure 5: Risk curves and qq-plots of z-score of LQA for $(n,p)=(1200, 1500)$.
  • ...and 10 more figures

Theorems & Definitions (64)

  • Definition 2.1
  • Theorem 2.1: Estimation of prediction risk
  • Remark 2.1: Risk of initialization
  • Theorem 2.2: Proof is given in \ref{['sec:proof-thm-generalization-error']}
  • Corollary 2.3: Proof is given in \ref{['proof:cor:early']}
  • Theorem 2.4
  • Corollary 2.5
  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • ...and 54 more