Table of Contents
Fetching ...

Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari

TL;DR

This work provides explicit, non-asymptotic global convergence rates for BFGS with Armijo-Wolfe line search on $\mu$-strongly convex, $L$-smooth functions, achieving a global linear rate of $\left(1 - \frac{1}{\kappa}\right)^t$ with $\kappa=\frac{L}{\mu}$, and, when the Hessian is Lipschitz, a rate independent of $\kappa$ after sufficient iterations. It also establishes a global non-asymptotic superlinear rate of $\mathcal{O}\left((\frac{C}{t})^t\right)$, where $C$ depends on problem size, condition number, and initialization via $B_0$. The results culminate in a global complexity characterization for BFGS with Armijo-Wolfe, and a log-bisection scheme for efficiently enforcing the line-search conditions. Together, these findings quantitatively connect initialization, line-search parameters, and problem regularity to global convergence behavior, offering practical guidance for deploying BFGS in strongly convex settings.

Abstract

In this paper, we present the first explicit and non-asymptotic global convergence rates of the BFGS method when implemented with an inexact line search scheme satisfying the Armijo-Wolfe conditions. We show that BFGS achieves a global linear convergence rate of $(1 - \frac{1}κ)^t$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ= \frac{L}μ$ represents the condition number. Additionally, if the objective function's Hessian is Lipschitz, BFGS with the Armijo-Wolfe line search achieves a linear convergence rate that depends solely on the line search parameters, independent of the condition number. We also establish a global superlinear convergence rate of $\mathcal{O}((\frac{1}{t})^t)$. These global bounds are all valid for any starting point $x_0$ and any symmetric positive definite initial Hessian approximation matrix $B_0$, though the choice of $B_0$ impacts the number of iterations needed to achieve these rates. By synthesizing these results, we outline the first global complexity characterization of BFGS with the Armijo-Wolfe line search. Additionally, we clearly define a mechanism for selecting the step size to satisfy the Armijo-Wolfe conditions and characterize its overall complexity.

Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

TL;DR

This work provides explicit, non-asymptotic global convergence rates for BFGS with Armijo-Wolfe line search on -strongly convex, -smooth functions, achieving a global linear rate of with , and, when the Hessian is Lipschitz, a rate independent of after sufficient iterations. It also establishes a global non-asymptotic superlinear rate of , where depends on problem size, condition number, and initialization via . The results culminate in a global complexity characterization for BFGS with Armijo-Wolfe, and a log-bisection scheme for efficiently enforcing the line-search conditions. Together, these findings quantitatively connect initialization, line-search parameters, and problem regularity to global convergence behavior, offering practical guidance for deploying BFGS in strongly convex settings.

Abstract

In this paper, we present the first explicit and non-asymptotic global convergence rates of the BFGS method when implemented with an inexact line search scheme satisfying the Armijo-Wolfe conditions. We show that BFGS achieves a global linear convergence rate of for -strongly convex functions with -Lipschitz gradients, where represents the condition number. Additionally, if the objective function's Hessian is Lipschitz, BFGS with the Armijo-Wolfe line search achieves a linear convergence rate that depends solely on the line search parameters, independent of the condition number. We also establish a global superlinear convergence rate of . These global bounds are all valid for any starting point and any symmetric positive definite initial Hessian approximation matrix , though the choice of impacts the number of iterations needed to achieve these rates. By synthesizing these results, we outline the first global complexity characterization of BFGS with the Armijo-Wolfe line search. Additionally, we clearly define a mechanism for selecting the step size to satisfy the Armijo-Wolfe conditions and characterize its overall complexity.
Paper Structure (37 sections, 24 theorems, 180 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 37 sections, 24 theorems, 180 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 2.1

Consider the BFGS method with Armijo-Wolfe inexact line search, where the step size satisfies the conditions in sufficient_decrease and curvature_condition. Then, for any initial point $x_0$ and any symmetric positive definite initial Hessian approximation matrix $B_0$, the following results hold fo

Figures (1)

  • Figure 1: Convergence curves of BFGS with inexact line search of different $B_0$ and gradeint descent with backtracking line search.

Theorems & Definitions (36)

  • Lemma 2.1
  • Remark 2.1
  • Proposition 3.1
  • Theorem 4.1
  • Remark 4.1
  • Corollary 4.2
  • Proposition 5.1
  • Theorem 5.2
  • Corollary 5.3
  • Lemma 6.1
  • ...and 26 more