Complexity of Minimizing Regularized Convex Quadratic Functions
Daniel Berg Thomsen, Nikita Doikov
TL;DR
This work analyzes the iteration complexity of gradient-based methods for minimizing uniformly convex regularized quadratic functions $f(x)=\frac{1}{2}x^T A x - b^T x + \frac{s}{p}\|x\|^p$ with $A\succeq0$ and $p>2$. It proves that a basic gradient method with a novel step size achieves a convergence rate of $O(N^{-p/(p-2)})$ for the functional residual, and that this rate is tight for one-step methods; for the broader class of multi-step first-order methods, the optimal rate is $O(N^{-2p/(p-2)})$, attained by the Fast Gradient Method. The special case $p=3$ (cubically regularized quadratic) yields an optimal rate of $O(N^{-6})$, and the paper also develops new lower bounds on gradient norms. A Krylov-subspace, resisting-oracle framework proves universal lower bounds for general first-order methods, and experiments validate the theory on adversarial and random instances, highlighting the subproblems arising in cubic-Newton and trust-region contexts. Overall, the paper sharpens the understanding of first-order complexity for this important class of uniformly convex quadratic-like problems.
Abstract
In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of $O(N^{-p/(p - 2)})$ for the functional residual, where $N$ is the iteration number and $p > 2$ is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of $O(N^{-2p/(p-2)})$ is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is $p=3$, which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of $O(N^{-6})$ is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.
