Early-Stopped Mirror Descent for Linear Regression over Convex Bodies
Tobias Wegel, Gil Kur, Patrick Rebeschini
TL;DR
This work establishes that early-stopped mirror descent (ESMD) can match the statistical performance of the constrained least squares estimator (LSE) for high-dimensional linear regression under arbitrary convex shape constraints. By designing optimization potentials from the Minkowski functional of the constraint set and analyzing ESMD through localized complexity tools like the critical radius and localized Gaussian width, the authors prove a risk bound showing ESMD is within a constant factor of the LSE’s risk, uniformly over the constraint set and for general design matrices. The framework yields sharp rates for several geometric families, including $\ell_p$-balls with $p\in[1,2)$, $M$-convex hulls, and both column-normalized and Gaussian designs, and demonstrates a transfer of minimax optimality from the LSE to ESMD. The results provide a principled, geometry-driven blueprint to achieve implicit regularization via early stopping across a broad spectrum of convex constraints, with implications for computational efficiency and statistical optimality in overparameterized regimes.
Abstract
Early-stopped iterative optimization methods are widely used as alternatives to explicit regularization, and direct comparisons between early-stopping and explicit regularization have been established for many optimization geometries. However, most analyses depend heavily on the specific properties of the optimization geometry or strong convexity of the empirical objective, and it remains unclear whether early-stopping could ever be less statistically efficient than explicit regularization for some particular shape constraint, especially in the overparameterized regime. To address this question, we study the setting of high-dimensional linear regression under additive Gaussian noise when the ground truth is assumed to lie in a known convex body and the task is to minimize the in-sample mean squared error. Our main result shows that for any convex body and any design matrix, up to an absolute constant factor, the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body. We achieve this by constructing algorithmic regularizers based on the Minkowski functional of the convex body.
