Table of Contents
Fetching ...

Adaptive Backtracking Line Search

Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson

TL;DR

Adaptive Backtracking Line Search introduces an online, violation-aware step-size factor $\hat{\rho}(v(\alpha_k))$ to replace the fixed backtracking scale in Armijo and descent-lemma line searches, achieving faster iterations with no additional computation. The authors provide convex-case guarantees that ABLS uses no more function evaluations than standard backtracking and extend global convergence guarantees to nonconvex smooth problems, preserving GD and AGD convergence rates. Empirical results across logistic regression, linear inverse problems, Rosenbrock, and matrix factorization demonstrate reduced function/gradient evaluations and shorter runtimes. The work offers a broadly applicable template for adaptive line search that can speed up optimization in deterministic and proximal settings without sacrificing theoretical guarantees.

Abstract

Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is violated, with no additional computational burden. This light-weight adjustment leads to significantly faster optimization, which we confirm by performing a variety of experiments on over fifteen real world datasets. For convex problems, we prove adaptive backtracking requires no more adjustments to produce a feasible step-size than regular backtracking does. For nonconvex smooth problems, we prove adaptive backtracking enjoys the same guarantees of regular backtracking. Furthermore, we prove adaptive backtracking preserves the convergence rates of gradient descent and its accelerated variant.

Adaptive Backtracking Line Search

TL;DR

Adaptive Backtracking Line Search introduces an online, violation-aware step-size factor to replace the fixed backtracking scale in Armijo and descent-lemma line searches, achieving faster iterations with no additional computation. The authors provide convex-case guarantees that ABLS uses no more function evaluations than standard backtracking and extend global convergence guarantees to nonconvex smooth problems, preserving GD and AGD convergence rates. Empirical results across logistic regression, linear inverse problems, Rosenbrock, and matrix factorization demonstrate reduced function/gradient evaluations and shorter runtimes. The work offers a broadly applicable template for adaptive line search that can speed up optimization in deterministic and proximal settings without sacrificing theoretical guarantees.

Abstract

Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is violated, with no additional computational burden. This light-weight adjustment leads to significantly faster optimization, which we confirm by performing a variety of experiments on over fifteen real world datasets. For convex problems, we prove adaptive backtracking requires no more adjustments to produce a feasible step-size than regular backtracking does. For nonconvex smooth problems, we prove adaptive backtracking enjoys the same guarantees of regular backtracking. Furthermore, we prove adaptive backtracking preserves the convergence rates of gradient descent and its accelerated variant.
Paper Structure (39 sections, 17 theorems, 112 equations, 25 figures, 5 tables, 6 algorithms)

This paper contains 39 sections, 17 theorems, 112 equations, 25 figures, 5 tables, 6 algorithms.

Key Result

Proposition 1

Let $F$ be convex differentiable. Given a point $x_{k}$, a direction $d_{k}$ and a step-size $\alpha_{k}>0$ satisfying (ineq:Armijo) for some $c$, then $x_{k}$, $d_{k}$ and $\alpha_{k}'$ also satisfy (ineq:Armijo) for any $\alpha_{k}'\in \mathopen{(}0,\alpha_{k}\mathopen{)}$.

Figures (25)

  • Figure 1: Baseline: GD with constant $\alpha_{k}=1/\bar{L}$; reg ($\rho, \beta$) and ad ($\rho, \beta$): GD with, respectively, regular and adaptive memoryless BLS parameterized by $\rho$ and $\alpha_{0}=\beta/\bar{L}$.
  • Figure 2: Baseline: AGD with constant $\alpha_{k}=1/\bar{L}$; reg ($\rho, \beta$) and ad ($\rho, \beta$): AGD with, respectively, regular and adaptive memoryless BLS parameterized by $\rho$ and $\alpha_{0}=\beta/\bar{L}$.
  • Figure 3: Performance of GD and AGD regular (red) and adaptive (blue) BLS variants on Rosenbrock. "loss" refers to the final loss after 1000 iterations.
  • Figure 4: MLP trained on MNIST with different algorithms.
  • Figure 5: Regular backtracking returns the greatest feasible step size after one adjustment.
  • ...and 20 more figures

Theorems & Definitions (42)

  • Proposition 1
  • Definition 1: Compatibility
  • Proposition 2
  • Definition 2: Smoothness
  • Definition 3: Gradient related
  • Example 1: Fundamental obstacle
  • Example 2: Fact 1
  • Example 3: Facts 2 and 3
  • Proposition 3
  • proof
  • ...and 32 more