Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search
Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari
TL;DR
This work provides explicit, non-asymptotic global convergence rates for BFGS with Armijo-Wolfe line search on $\mu$-strongly convex, $L$-smooth functions, achieving a global linear rate of $\left(1 - \frac{1}{\kappa}\right)^t$ with $\kappa=\frac{L}{\mu}$, and, when the Hessian is Lipschitz, a rate independent of $\kappa$ after sufficient iterations. It also establishes a global non-asymptotic superlinear rate of $\mathcal{O}\left((\frac{C}{t})^t\right)$, where $C$ depends on problem size, condition number, and initialization via $B_0$. The results culminate in a global complexity characterization for BFGS with Armijo-Wolfe, and a log-bisection scheme for efficiently enforcing the line-search conditions. Together, these findings quantitatively connect initialization, line-search parameters, and problem regularity to global convergence behavior, offering practical guidance for deploying BFGS in strongly convex settings.
Abstract
In this paper, we present the first explicit and non-asymptotic global convergence rates of the BFGS method when implemented with an inexact line search scheme satisfying the Armijo-Wolfe conditions. We show that BFGS achieves a global linear convergence rate of $(1 - \frac{1}κ)^t$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ= \frac{L}μ$ represents the condition number. Additionally, if the objective function's Hessian is Lipschitz, BFGS with the Armijo-Wolfe line search achieves a linear convergence rate that depends solely on the line search parameters, independent of the condition number. We also establish a global superlinear convergence rate of $\mathcal{O}((\frac{1}{t})^t)$. These global bounds are all valid for any starting point $x_0$ and any symmetric positive definite initial Hessian approximation matrix $B_0$, though the choice of $B_0$ impacts the number of iterations needed to achieve these rates. By synthesizing these results, we outline the first global complexity characterization of BFGS with the Armijo-Wolfe line search. Additionally, we clearly define a mechanism for selecting the step size to satisfy the Armijo-Wolfe conditions and characterize its overall complexity.
