Table of Contents
Fetching ...

A New Lineserach for Accelerated Composite Minimization

Reza Rahimi Baghbadorani, Sergio Grammatico, Peyman Mohajerin Esfahani

TL;DR

This work addresses the long-standing challenge of selecting stepsizes in first-order convex optimization without relying on a known global smoothness constant. It introduces a novel zero-order linesearch that relies only on function evaluations, applied to both non-accelerated and accelerated gradient methods through a gradient-mapping framework for composite objectives. The authors prove convergence guarantees, achieving O(1/k) for non-accelerated and O(1/k^2) for accelerated schemes, and demonstrate near-optimal performance on smooth, composite, and non-convex problems. The approach is hyperparameter-free for the composite setting and shows strong empirical performance across diverse problem classes, suggesting broad practical impact for large-scale optimization tasks.

Abstract

The choice of the stepsize in first-order convex optimization is typically based on the smoothness constant and plays a crucial role in the performance of algorithms. Recently, there has been a resurgent interest in introducing adaptive stepsizes that do not explicitly depend on smooth constant. In this paper, we propose a novel linesearch stepsize rule based on function evaluations (i.e., zero-order information) that enjoys provable convergence guarantees for both accelerated and non-accelerated gradient descent. We further discuss the similarities and differences between the proposed stepsize regimes and the existing stepsize rules (including Polyak and Armijo). We numerically benchmark the performance of our proposed algorithms against state-of-the-art methods across three major problems classes of (1) smooth minimization (logistic regression, quadratic programs, log-sum-exponential, and smooth max-cut relaxation) (2) composite minimization ($\ell_1$-regularized least-squares, $\ell_1$-constrained least-squares, and $\ell_1$-regularized logistic regression), and (3) non-convex minimization (cubic minimization). These classes include a wide range of operations research and management applications such as portfolio optimization, discrete choice models, sparse classification and feature selections, high-order optimization and trust-region subproblems.

A New Lineserach for Accelerated Composite Minimization

TL;DR

This work addresses the long-standing challenge of selecting stepsizes in first-order convex optimization without relying on a known global smoothness constant. It introduces a novel zero-order linesearch that relies only on function evaluations, applied to both non-accelerated and accelerated gradient methods through a gradient-mapping framework for composite objectives. The authors prove convergence guarantees, achieving O(1/k) for non-accelerated and O(1/k^2) for accelerated schemes, and demonstrate near-optimal performance on smooth, composite, and non-convex problems. The approach is hyperparameter-free for the composite setting and shows strong empirical performance across diverse problem classes, suggesting broad practical impact for large-scale optimization tasks.

Abstract

The choice of the stepsize in first-order convex optimization is typically based on the smoothness constant and plays a crucial role in the performance of algorithms. Recently, there has been a resurgent interest in introducing adaptive stepsizes that do not explicitly depend on smooth constant. In this paper, we propose a novel linesearch stepsize rule based on function evaluations (i.e., zero-order information) that enjoys provable convergence guarantees for both accelerated and non-accelerated gradient descent. We further discuss the similarities and differences between the proposed stepsize regimes and the existing stepsize rules (including Polyak and Armijo). We numerically benchmark the performance of our proposed algorithms against state-of-the-art methods across three major problems classes of (1) smooth minimization (logistic regression, quadratic programs, log-sum-exponential, and smooth max-cut relaxation) (2) composite minimization (-regularized least-squares, -constrained least-squares, and -regularized logistic regression), and (3) non-convex minimization (cubic minimization). These classes include a wide range of operations research and management applications such as portfolio optimization, discrete choice models, sparse classification and feature selections, high-order optimization and trust-region subproblems.
Paper Structure (19 sections, 6 theorems, 57 equations, 6 figures, 2 tables)

This paper contains 19 sections, 6 theorems, 57 equations, 6 figures, 2 tables.

Key Result

Lemma 2.2

Let $G^{f}_{\lambda h}(x)$ be the gradient mapping defined in grad_mapping for a smooth convex function $f$, a possibly nonsmooth function $h$, and a positive constant $\lambda$ in $\mathbb{R}_+$.

Figures (6)

  • Figure 1: Geometric interpretation of different stepsize rules using $\phi_k(\lambda)$ defined in \ref{['phi func']}.
  • Figure 2: Initial choice of $\lambda_0$ at the $({k+1})^{\text{th}}$ iteration.
  • Figure 3: The results for the class (1) smooth minimization. The first row shows the optimality gap, and the second row shows the stepsize behavior.
  • Figure 4: Approximate maximum eigenvalue \ref{['regularized maxcut dual']}. The first row shows the optimality gap and the second row shows the stepsize behavior.
  • Figure 5: The results for the class (2) composite minimization. The first row shows the optimality gap, and the second row shows the stepsize behavior.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Lemma 2.2: Gradient mapping
  • Theorem 2.3: Non-accelerated adaptive stepsize
  • proof
  • Corollary 2.4: Locally smooth function
  • proof
  • Remark 2.5: Approximate adaptive stepsize rule
  • Corollary 2.6: Convergence of smooth minimization
  • Theorem 3.1: Accelerated adaptive stepsize
  • proof
  • Corollary 3.2: Accelerated smooth minimization