Comparing regularisation paths of (conjugate) gradient estimators in ridge regression
Laura Hucker, Markus Reiß, Thomas Stark
TL;DR
The paper develops a non-asymptotic framework to compare regularisation paths of gradient-based methods for ridge regression under random design. By decomposing errors via residual polynomials and introducing a data-dependent time reparametrisation $\tau_t$, it shows CG iterates along their path have prediction risk bounded by the GF path, and hence by ridge regression, up to constants depending on the spectrum of $\widehat{\Sigma}$; an oracle-type bound connects CG to GF and RR. Numerical experiments demonstrate that CG, GF, and RR share almost indistinguishable regularisation paths, with CG achieving minimal risk much faster, indicating strong practical and theoretical guarantees for CG with early stopping in high-dimensional settings. The results rely on a novel CG-specific error decomposition and extend GF/RR comparisons to the data-dependent Krylov subspaces generated by CG, offering guidance for stopping rules and hyperparameter tuning in practice.
Abstract
We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimising a penalised ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent non-linearities and dependencies. On the other hand, standard gradient flow is a linear method with well-known regularising properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.
