Table of Contents
Fetching ...

A simple uniformly optimal method without line search for convex optimization

Tianjiao Li, Guanghui Lan

TL;DR

This work addresses convex optimization with unknown problem parameters by introducing AC-FGM, a parameter-free, line-search-free accelerated gradient method. AC-FGM employs three intertwined sequences to adaptively combine gradient information and proximal updates, achieving the optimal $O(1/k^2)$ rate for smooth convex problems and extending uniformly to Hölder continuous gradients without requiring $L$, $\nu$, or $L_\nu$ a priori. Theoretical results establish uniform optimality across smooth, weakly smooth, and nonsmooth regimes, while extensive numerical experiments on QP, Lasso, square root Lasso, and sparse logistic regression demonstrate practical efficiency and robustness. These findings offer a compelling, scalable alternative for parameter-free first-order optimization with broad applicability in data science and machine learning.

Abstract

Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.

A simple uniformly optimal method without line search for convex optimization

TL;DR

This work addresses convex optimization with unknown problem parameters by introducing AC-FGM, a parameter-free, line-search-free accelerated gradient method. AC-FGM employs three intertwined sequences to adaptively combine gradient information and proximal updates, achieving the optimal rate for smooth convex problems and extending uniformly to Hölder continuous gradients without requiring , , or a priori. Theoretical results establish uniform optimality across smooth, weakly smooth, and nonsmooth regimes, while extensive numerical experiments on QP, Lasso, square root Lasso, and sparse logistic regression demonstrate practical efficiency and robustness. These findings offer a compelling, scalable alternative for parameter-free first-order optimization with broad applicability in data science and machine learning.

Abstract

Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
Paper Structure (11 sections, 8 theorems, 100 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 8 theorems, 100 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

proposition thmcounterproposition

Assume the parameters $\{\tau_t\}$, $\{\eta_t\}$ and $\{\beta_t\}$ satisfy where $L_1$ and $L_t, ~t\geq 2$ are defined in (def_L_1) and (def_L_t), respectively. We have for any $z \in X$, where for $t\geq 2$,

Figures (6)

  • Figure 1: Quadratic programming (\ref{['QP_prob']}): Comparison between AC-FGM, NS-AGD, NS-FGM, and AdGD in terms of the number of iterations for solving randomly generated instances (column 1) and datasets bodyfat and cadata (column 2).
  • Figure 2: Lasso (\ref{['QP_prob_with_lasso']}). Comparison between AC-FGM, NS-AGD, NS-FGM, and AdGD in terms of the number of iterations for datasets gisette (column 1) and rcv1.binary (column 2). Row 1 takes $\lambda = \tfrac{0.01}{m} \|A^{T} b \|_\infty$ and Row 2 takes $\lambda = \tfrac{0.001}{m} \|A^{T} b \|_\infty$.
  • Figure 3: Square root Lasso (\ref{['square_root_lasso_prob']}): Comparison between AC-FGM, NS-AGD, NS-FGM, and AdGD in terms of the number of iterations for datasets gisette (column 1) and YearPredictionMSD.test (column 2). Row 1 takes $\lambda = 100\cdot m^{-1/2}\Phi^{-1}(1-0.01/n)$ and Row 2 takes $\lambda = 1000\cdot m^{-1/2}\Phi^{-1}(1-0.01/n)$. We set $\epsilon=10^{-8}$ when implementing the algorithms.
  • Figure 4: Sparse logistic regression (\ref{['logistic_regression']}): Comparison between AC-FGM, NS-AGD, NS-FGM, and AdGD in terms of the number of iterations for datasets gisette, rcv1.binary, real-sim, and covtype.binary. Penalty parameter: $\lambda = 0.001\|A^{T} b\|_\infty$.
  • Figure 5: Sparse logistic regression (\ref{['logistic_regression']}): Comparison between AC-FGM, NS-AGD, NS-FGM, and AdGD in terms of the number of iterations for datasets gisette, rcv1.binary, real-sim, and covtype.binary. Penalty parameter: $\lambda = 0.005\|A^{T} b\|_\infty$.
  • ...and 1 more figures

Theorems & Definitions (16)

  • proposition thmcounterproposition
  • proof
  • theorem 1
  • proof
  • corollary thmcountercorollary
  • proof
  • corollary thmcountercorollary
  • proof
  • lemma thmcounterlemma
  • proof
  • ...and 6 more