Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices
Quentin Rebjock, Nicolas Boumal
TL;DR
This work addresses the challenge of fast local convergence for trust-region methods in the presence of non-isolated minima, where the Hessian is indefinite and the Polyak–Łojasiewicz condition can still hold. ItDevelops a novel analysis of the truncated conjugate gradient (tCG) method within trust-region subproblems, showing that tCG can achieve superlinear convergence under PL by effectively filtering out small negative Hessian eigenvalues near the minimum. The key innovations include a new CG analysis for indefinite systems, the introduction of a possibly new polynomial family, and a rigorous capture and convergence-rate result for TR with tCG, with implications for optimization on quotient manifolds. Overall, the results bridge a gap between theory and practice by explaining why TR-tCG performs well near non-isolated minima and providing precise convergence guarantees under weak regularity assumptions.
Abstract
Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if the minima are not isolated, then the Hessian there cannot be positive definite. The weaker Polyak$\unicode{x2013}$Łojasiewicz (PŁ) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an $\textit{exact}$ subproblem solver lacks even basic features such as a capture theorem under PŁ. In practice, a popular $\textit{inexact}$ subproblem solver is the truncated conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear convergence under PŁ. We confirm this theoretically. The main mathematical obstacle is that, under PŁ, at points arbitrarily close to minima, the Hessian has vanishingly small, possibly negative eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet, the core theory underlying tCG is that of CG, which assumes a positive definite operator. Accordingly, we develop new tools to analyze the dynamics of CG in the presence of small eigenvalues of any sign, for the regime of interest to TR-tCG.
