Table of Contents
Fetching ...

Complexity of Minimizing Regularized Convex Quadratic Functions

Daniel Berg Thomsen, Nikita Doikov

TL;DR

This work analyzes the iteration complexity of gradient-based methods for minimizing uniformly convex regularized quadratic functions $f(x)=\frac{1}{2}x^T A x - b^T x + \frac{s}{p}\|x\|^p$ with $A\succeq0$ and $p>2$. It proves that a basic gradient method with a novel step size achieves a convergence rate of $O(N^{-p/(p-2)})$ for the functional residual, and that this rate is tight for one-step methods; for the broader class of multi-step first-order methods, the optimal rate is $O(N^{-2p/(p-2)})$, attained by the Fast Gradient Method. The special case $p=3$ (cubically regularized quadratic) yields an optimal rate of $O(N^{-6})$, and the paper also develops new lower bounds on gradient norms. A Krylov-subspace, resisting-oracle framework proves universal lower bounds for general first-order methods, and experiments validate the theory on adversarial and random instances, highlighting the subproblems arising in cubic-Newton and trust-region contexts. Overall, the paper sharpens the understanding of first-order complexity for this important class of uniformly convex quadratic-like problems.

Abstract

In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of $O(N^{-p/(p - 2)})$ for the functional residual, where $N$ is the iteration number and $p > 2$ is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of $O(N^{-2p/(p-2)})$ is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is $p=3$, which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of $O(N^{-6})$ is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.

Complexity of Minimizing Regularized Convex Quadratic Functions

TL;DR

This work analyzes the iteration complexity of gradient-based methods for minimizing uniformly convex regularized quadratic functions with and . It proves that a basic gradient method with a novel step size achieves a convergence rate of for the functional residual, and that this rate is tight for one-step methods; for the broader class of multi-step first-order methods, the optimal rate is , attained by the Fast Gradient Method. The special case (cubically regularized quadratic) yields an optimal rate of , and the paper also develops new lower bounds on gradient norms. A Krylov-subspace, resisting-oracle framework proves universal lower bounds for general first-order methods, and experiments validate the theory on adversarial and random instances, highlighting the subproblems arising in cubic-Newton and trust-region contexts. Overall, the paper sharpens the understanding of first-order complexity for this important class of uniformly convex quadratic-like problems.

Abstract

In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of for the functional residual, where is the iteration number and is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is , which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.
Paper Structure (24 sections, 19 theorems, 139 equations, 8 figures, 1 algorithm)

This paper contains 24 sections, 19 theorems, 139 equations, 8 figures, 1 algorithm.

Key Result

Lemma 1

For $k \geq 0$, let it hold that Then

Figures (8)

  • Figure 1: The functional residual after $N$ steps of running our implementation of a Krylov subspace solver (blue), as compared to the gradient method for the same number of steps (red) on \ref{['fig:single_run_adversarial']} adversarially constructed instances, and \ref{['fig:single_run_random']} on randomly generated problem instances. In \ref{['fig:single_run_random']} we also highlight the per-iteration lowest functional residual with a solid line to illustrate the worst-case performance across the random instances.
  • Figure 2: Result of running a Krylov subspace solver on our adversarially constructed problem instances with $\pi$ generated using our heuristic (black). On the left graph we compare its performance with the Gradient Method (red). On the right graph, we compare the performance with the problem instance with evenly allocating mass to all components of $\pi$ (red). Each point along the $x$-axis corresponds to a separate construction, and the value along the $y$-axis is the minimal functional residual achieved by the solver after $N$ iterations. The blue dashed line corresponds to the asymptotic trend predicted by our theory when $p=3$. Our theory lower bounds the performance in both cases, but it is likely a tight estimate for the worst-case performance of Krylov subspace solvers given that the slope matches exactly what is predicted in theory.
  • Figure 3: Result of running a Krylov subspace solver on our adversarially constructed problem instances with eigenvalues of $A$ generated according to the construction in blue, and randomly according to a beta distribution in red. \ref{['fig:eigenval_experiment_graph']} contains the functional residual at every step of the method, and \ref{['fig:eigenval_experiment_hist']} is a histogram of the eigenvalues. The beta distribution appears to achieve a good approximation of the adversarially placed eigenvalues, as seen in both plots.
  • Figure 4: Grid of functional residuals resulting from running the gradient method using the step size defined in \ref{['GradResBound2']}, and a Krylov subspace solver on randomly generated problem instances. Different rows correspond to a fixed setting of $L$, and columns to a fixed setting of $s$. The transparent lines are individual problem instances, and the per-iteration lowest functional residual has been highlighted using an opaque line. Note that the initial functional residual is highly dependent on the setting of $L$, and that larger settings of $s$ lead to faster convergence for both methods.
  • Figure 5: Functional residual of the gradient method with our step size. The problem instance has been randomly generated (according to Appendix \ref{['app:random_qps']}), with $d = 1000$, $\mu = 0$, $L =$, $p = 3$, $s = 0.1$, and $\|x_{\star}\| = 1$. We see that in practice the method can perform better than the corresponding lower bound.
  • ...and 3 more figures

Theorems & Definitions (38)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Remark 4
  • Lemma 5
  • proof
  • Theorem 6
  • ...and 28 more