Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Pierre C. Bellec; Kai Tan

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Pierre C. Bellec, Kai Tan

TL;DR

The paper develops a finite-iteration uncertainty quantification framework for iterates produced by GD-type algorithms in high-dimensional linear models with $p$ comparable to $n$. By modeling the update as a Lipschitz map and introducing memory matrices ${\mathcal J},{\mathcal D}$ and the memory matrix ${\widehat{\mathbf A}}$, it derives a data-driven generalization-error estimator $\hat r_t$ that is $\sqrt n$-consistent under Gaussian designs, enabling early stopping via $\hat t = \arg\min_t \hat r_t$. It further provides a debiased, coordinate-wise inference scheme with asymptotic normality and practical confidence intervals for the true coefficients at any finite iteration, along with an oracle inequality for stopping rules. The approach is instantiated for GD, AGD, ISTA, FISTA, and LQA/MCP, and supported by efficient computation strategies (forward-substitution and Hutchinson trace approximation) to scale to large problems. Overall, the method offers a scalable, data-driven path to quantify predictive risk and perform inference along algorithmic trajectories, without requiring knowledge of the design covariance or noise level."

Abstract

This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

TL;DR

The paper develops a finite-iteration uncertainty quantification framework for iterates produced by GD-type algorithms in high-dimensional linear models with

comparable to

. By modeling the update as a Lipschitz map and introducing memory matrices

and the memory matrix

, it derives a data-driven generalization-error estimator

that is

-consistent under Gaussian designs, enabling early stopping via

. It further provides a debiased, coordinate-wise inference scheme with asymptotic normality and practical confidence intervals for the true coefficients at any finite iteration, along with an oracle inequality for stopping rules. The approach is instantiated for GD, AGD, ISTA, FISTA, and LQA/MCP, and supported by efficient computation strategies (forward-substitution and Hutchinson trace approximation) to scale to large problems. Overall, the method offers a scalable, data-driven path to quantify predictive risk and perform inference along algorithmic trajectories, without requiring knowledge of the design covariance or noise level."

Abstract

This paper investigates the iterates

obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension

is comparable with the sample size

, i.e.,

. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate

for any fixed iteration

along the trajectory. These estimators are proved to be

-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration

, the estimates allow to select from the data an iteration

that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate

at any finite iteration

. Extensive simulations on synthetic data illustrate the theoretical results.

Paper Structure (56 sections, 32 theorems, 267 equations, 15 figures, 1 table)

This paper contains 56 sections, 32 theorems, 267 equations, 15 figures, 1 table.

Introduction
Iterative algorithms
Statistical properties of iterates for small $t$ or when convergence fails
Early stopping
Contributions
Related literature
Notation
Main results
Iterates and derivatives
Proximal gradient descent
Combining two previous iterates
General form: combining all previous iterates
Lipschitz assumption and the chain rule
Memory matrix
Probabilistic assumptions and proportional regime
...and 41 more sections

Key Result

Theorem 2.1

Let assu:designassu:noiseassu:regimeassu:Lipschitz be fulfilled. For each $t\in [T]$, define the estimate $\hat{r}_t$ of $r_t$ by where $\hat{w}_{t,s} = \boldsymbol{e}_t^\top (\boldsymbol{I}_T - {\widehat{\mathbf A}}/n)^{-1}\boldsymbol{e}_s$. We have for any $t\in [T]$, Here ${\rm var}(y_1) = \lVert\boldsymbol{\Sigma}^{1/2}\boldsymbol{b}^*\rVert^2 + \sigma^2$, and $C(\zeta, T, \gamma, \kappa)$ i

Figures (15)

Figure 1: Risk curves and qq-plots of z-score of GD for $(n,p)=(1200, 1500)$.
Figure 2: Risk curves and qq-plots of z-score of AGD for $(n,p)=(1200, 1500)$.
Figure 3: Risk curves and qq-plots of z-score of ISTA for $(n,p)=(1200, 1500)$.
Figure 4: Risk curves and qq-plots of z-score of FISTA for $(n,p)=(1200, 1500)$.
Figure 5: Risk curves and qq-plots of z-score of LQA for $(n,p)=(1200, 1500)$.
...and 10 more figures

Theorems & Definitions (64)

Definition 2.1
Theorem 2.1: Estimation of prediction risk
Remark 2.1: Risk of initialization
Theorem 2.2: Proof is given in \ref{['sec:proof-thm-generalization-error']}
Corollary 2.3: Proof is given in \ref{['proof:cor:early']}
Theorem 2.4
Corollary 2.5
Remark 3.1
Remark 3.2
Remark 3.3
...and 54 more

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

TL;DR

Abstract

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (64)