Adaptive Backtracking Line Search

Joao V. Cavalcanti; Laurent Lessard; Ashia C. Wilson

Adaptive Backtracking Line Search

Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson

TL;DR

Adaptive Backtracking Line Search introduces an online, violation-aware step-size factor $\hat{\rho}(v(\alpha_k))$ to replace the fixed backtracking scale in Armijo and descent-lemma line searches, achieving faster iterations with no additional computation. The authors provide convex-case guarantees that ABLS uses no more function evaluations than standard backtracking and extend global convergence guarantees to nonconvex smooth problems, preserving GD and AGD convergence rates. Empirical results across logistic regression, linear inverse problems, Rosenbrock, and matrix factorization demonstrate reduced function/gradient evaluations and shorter runtimes. The work offers a broadly applicable template for adaptive line search that can speed up optimization in deterministic and proximal settings without sacrificing theoretical guarantees.

Abstract

Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is violated, with no additional computational burden. This light-weight adjustment leads to significantly faster optimization, which we confirm by performing a variety of experiments on over fifteen real world datasets. For convex problems, we prove adaptive backtracking requires no more adjustments to produce a feasible step-size than regular backtracking does. For nonconvex smooth problems, we prove adaptive backtracking enjoys the same guarantees of regular backtracking. Furthermore, we prove adaptive backtracking preserves the convergence rates of gradient descent and its accelerated variant.

Adaptive Backtracking Line Search

TL;DR

Adaptive Backtracking Line Search introduces an online, violation-aware step-size factor

to replace the fixed backtracking scale in Armijo and descent-lemma line searches, achieving faster iterations with no additional computation. The authors provide convex-case guarantees that ABLS uses no more function evaluations than standard backtracking and extend global convergence guarantees to nonconvex smooth problems, preserving GD and AGD convergence rates. Empirical results across logistic regression, linear inverse problems, Rosenbrock, and matrix factorization demonstrate reduced function/gradient evaluations and shorter runtimes. The work offers a broadly applicable template for adaptive line search that can speed up optimization in deterministic and proximal settings without sacrificing theoretical guarantees.

Abstract

Paper Structure (39 sections, 17 theorems, 112 equations, 25 figures, 5 tables, 6 algorithms)

This paper contains 39 sections, 17 theorems, 112 equations, 25 figures, 5 tables, 6 algorithms.

Introduction
Contributions.
Adaptive backtracking
Line search: criteria and search procedures
Criteria.
Search procedures.
Adaptive backtracking
Related Work
Case study: Armijo condition
Backtracking and AGD.
Case study: descent lemma
Empirical performance
Convex objective: logistic regression + Armijo
Convex objective: linear inverse problems + descent Lemma
Nonconvex objective: Rosenbrock + Armijo
...and 24 more sections

Key Result

Proposition 1

Let $F$ be convex differentiable. Given a point $x_{k}$, a direction $d_{k}$ and a step-size $\alpha_{k}>0$ satisfying (ineq:Armijo) for some $c$, then $x_{k}$, $d_{k}$ and $\alpha_{k}'$ also satisfy (ineq:Armijo) for any $\alpha_{k}'\in \mathopen{(}0,\alpha_{k}\mathopen{)}$.

Figures (25)

Figure 1: Baseline: GD with constant $\alpha_{k}=1/\bar{L}$; reg ($\rho, \beta$) and ad ($\rho, \beta$): GD with, respectively, regular and adaptive memoryless BLS parameterized by $\rho$ and $\alpha_{0}=\beta/\bar{L}$.
Figure 2: Baseline: AGD with constant $\alpha_{k}=1/\bar{L}$; reg ($\rho, \beta$) and ad ($\rho, \beta$): AGD with, respectively, regular and adaptive memoryless BLS parameterized by $\rho$ and $\alpha_{0}=\beta/\bar{L}$.
Figure 3: Performance of GD and AGD regular (red) and adaptive (blue) BLS variants on Rosenbrock. "loss" refers to the final loss after 1000 iterations.
Figure 4: MLP trained on MNIST with different algorithms.
Figure 5: Regular backtracking returns the greatest feasible step size after one adjustment.
...and 20 more figures

Theorems & Definitions (42)

Proposition 1
Definition 1: Compatibility
Proposition 2
Definition 2: Smoothness
Definition 3: Gradient related
Example 1: Fundamental obstacle
Example 2: Fact 1
Example 3: Facts 2 and 3
Proposition 3
proof
...and 32 more

Adaptive Backtracking Line Search

TL;DR

Abstract

Adaptive Backtracking Line Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (42)