The Essential Best and Average Rate of Convergence of the Exact Line Search Gradient Descent Method
Thomas Yu
TL;DR
This work addresses the convergence behavior of the exact line search gradient descent method on strongly convex quadratic objectives and resolves a long-standing question about the average and essential best-case rates in ill-conditioned regimes. By recasting the OGD dynamics as a discrete dynamical system via Akaike's map $T$ and employing center and stable manifolds, the authors precisely characterize when the average ROC tends to zero (two distinct eigenvalues) versus when it collapses toward the worst-case ROC (presence of an intermediate eigenvalue). The 2-D analysis provides explicit intuition and shows the average ROC can be arbitrarily fast in the absence of intermediate eigenvalues, while higher dimensions reveal a nuanced dependence on the spectrum that drives the essential bound. The results illuminate practical aspects for polynomial optimization problems (POPs) such as phase retrieval, where exact line search GD shows robust and competitive convergence despite ill-conditioning. Overall, the paper advances theoretical understanding of exact line search dynamics and connects classical ROC bounds with modern applications in imaging and data sciences.
Abstract
It is very well known that when the exact line search gradient descent method is applied to a convex quadratic objective, the worst-case rate of convergence (ROC), among all seed vectors, deteriorates as the condition number of the Hessian of the objective grows. By an elegant analysis due to H. Akaike, it is generally believed -- but not proved -- that in the ill-conditioned regime the ROC for almost all initial vectors, and hence also the average ROC, is close to the worst case ROC. We complete Akaike's analysis by determining the \emph{essential best case ROC} (defined in a measure-theoretic way) by using a dynamical system approach, facilitated by the theorem of center and stable manifolds. Our analysis also makes apparent the effect of an intermediate eigenvalue in the Hessian by establishing the following amusing result: In the absence of an intermediate eigenvalue, the average ROC gets arbitrarily \emph{fast} -- not slow -- as the Hessian gets increasingly ill-conditioned. We discuss in passing some contemporary applications of exact line search GD to well-conditioned polynomial optimization problems arising from imaging and data sciences. In particular, we observe that a tailored exact line search GD algorithm for a POP arising from the phase retrieval problem is only 50\% more expensive per iteration than its constant step size counterpart, while promising a ROC only matched by the optimally tuned (constant) step size which can rarely be achieved in practice.
