Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

Quentin Rebjock; Nicolas Boumal

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

Quentin Rebjock, Nicolas Boumal

TL;DR

This work addresses the challenge of fast local convergence for trust-region methods in the presence of non-isolated minima, where the Hessian is indefinite and the Polyak–Łojasiewicz condition can still hold. ItDevelops a novel analysis of the truncated conjugate gradient (tCG) method within trust-region subproblems, showing that tCG can achieve superlinear convergence under PL by effectively filtering out small negative Hessian eigenvalues near the minimum. The key innovations include a new CG analysis for indefinite systems, the introduction of a possibly new polynomial family, and a rigorous capture and convergence-rate result for TR with tCG, with implications for optimization on quotient manifolds. Overall, the results bridge a gap between theory and practice by explaining why TR-tCG performs well near non-isolated minima and providing precise convergence guarantees under weak regularity assumptions.

Abstract

Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if the minima are not isolated, then the Hessian there cannot be positive definite. The weaker Polyak$\unicode{x2013}$Łojasiewicz (PŁ) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an $\textit{exact}$ subproblem solver lacks even basic features such as a capture theorem under PŁ. In practice, a popular $\textit{inexact}$ subproblem solver is the truncated conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear convergence under PŁ. We confirm this theoretically. The main mathematical obstacle is that, under PŁ, at points arbitrarily close to minima, the Hessian has vanishingly small, possibly negative eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet, the core theory underlying tCG is that of CG, which assumes a positive definite operator. Accordingly, we develop new tools to analyze the dynamics of CG in the presence of small eigenvalues of any sign, for the regime of interest to TR-tCG.

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

TL;DR

Abstract

Łojasiewicz (PŁ) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an

subproblem solver lacks even basic features such as a capture theorem under PŁ. In practice, a popular

subproblem solver is the truncated conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear convergence under PŁ. We confirm this theoretically. The main mathematical obstacle is that, under PŁ, at points arbitrarily close to minima, the Hessian has vanishingly small, possibly negative eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet, the core theory underlying tCG is that of CG, which assumes a positive definite operator. Accordingly, we develop new tools to analyze the dynamics of CG in the presence of small eigenvalues of any sign, for the regime of interest to TR-tCG.

Paper Structure (32 sections, 34 theorems, 121 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 34 theorems, 121 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Sufficient conditions for superlinear local convergence
Why an exact subproblem solver can fail yet tCG succeeds
The Hessian typically has small negative eigenvalues arbitrarily close to $\bar{x}$.
Those eigenvalues defeat the exact subproblem solver.
But tCG automatically filters them out.
Contributions
Related work
Krylov subspace methods.
Krylov and trust-region methods.
$\text{P\L}$ and local convergence.
CG: reminders and a possibly new family of polynomials
Background: CG through the lens of optimal polynomials
Lanczos polynomials.
Connections to the CG algorithm.
...and 17 more sections

Key Result

Theorem 1.2

Suppose assu:hess-lip, assu:hess-lip-like, assu:hess-approx and eq:pl hold around a local minimum $\bar{x}$. We run TR with the tCG subproblem solver (Algorithm alg:tcg) with parameters $\kappa > 0$ and $\theta \in \interval[open]{0}{1}$. Given any neighborhood $\mathcal{U}$ of $\bar{x}$, there exis

Figures (5)

Figure 1: Illustration of the Morse--Bott property. The set of local minima $\mathcal{S}$ is smooth around the point $\bar{x}$. Here it has dimension 1 in the 2-dimensional search space $\mathcal{M} = {\mathbb R}^2$.
Figure 2: Norms of the iterates $\tilde{v}_n$ and residuals $\tilde{r}_n$ of CG on a problem $(\tilde{A}, \tilde{b})$. Here $\tilde{A}$ is diagonal with size ${\tilde{d}} = 11$. For illustration, there are $d = 10$ eigenvalues close to $1$ and $1$ eigenvalue equal to 0. The norm of the first $d$ entries of the weight vector $\tilde{b}$ is normalized to 1 and the entry associated to the zero eigenvalue is $10^{-3}$. Notice how the norm of the iterate $\tilde{v}_n$ explodes only after the residual $\tilde{r}_n$ became small: this is why tCG can stop before explosion, with a good solution. For reference, we also plot the same quantities for the well-conditioned problem $(A, b)$ of size $10$, where the zero eigenvalue was removed.
Figure 3: Lanczos polynomials $\tilde{\pi}_1, \ldots, \tilde{\pi}_7$ associated to the iterates $1$ to $7$ for the problem in Figure \ref{['fig:vn-rn-norms']}. The horizontal axis shows the eigenvalues $\lambda_1 \geq \dots \geq \lambda_d > \lambda_{\tilde{d}} = 0$ of $\tilde{A}$. For each iteration $n$ we plot the roots of the Lanczos polynomial $\tilde{\pi}_n$. The lines linking the roots emphasize the interlacement. Most of the weight of $\tilde{b}$ lies in the interval $\interval{\lambda_d}{\lambda_1}$. This is why the roots are located in this interval during the first iterations. After the fourth iteration the minimal root rapidly approaches zero. This causes the explosion of the iterates and residuals (see Figure \ref{['fig:vn-rn-norms']}), but tCG can stop earlier.
Figure 4: This figure supports the proof of Lemma \ref{['lemma:bound-roots']}. The point $z$ may be anywhere between $\gamma_{k + 1}$ and $\gamma_{k - 1}$. Here, it is represented between $\mu_k$ and $\gamma_k$, but the proof considers all scenarios.
Figure 5: The gradient norm exhibits a pattern typical of superlinear convergence.

Theorems & Definitions (75)

Definition 1.1
Theorem 1.2
Definition 1.3
Definition 2.1
Definition 2.2
Lemma 2.3
proof
Remark 2.4
Lemma 2.5
proof
...and 65 more

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

TL;DR

Abstract

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (75)