Optimization over bounded-rank matrices through a desingularization enables joint global and local guarantees
Quentin Rebjock, Nicolas Boumal
TL;DR
This work tackles optimization over matrices with bounded rank by introducing a desingularization that maps the non-smooth feasible set to a smooth manifold $\mathcal{M}$ via $\varphi$, with lifted cost $g=f\circ\varphi$. It develops a full Riemannian framework on $\mathcal{M}$, including a family of $\alpha$-weighted metrics, parsimonious tangent representations, multiple retractions (including second-order ones), and explicit gradient/Hessian expressions to enable general-purpose optimization. The authors prove global convergence for descent-type methods and establish fast local convergence through Polyak–Łojasiewicz (PL) or Morse–Bott-type conditions, even near non-maximal-rank regions, thereby achieving both global guarantees and fast local rates. Numerical experiments on matrix completion show competitive performance with strong robustness to rank overestimation, and the work provides open-source implementations to facilitate broader adoption of the desingularization approach in low-rank optimization tasks.
Abstract
Convergence guarantees for optimization over bounded-rank matrices are delicate to obtain because the feasible set is a non-smooth and non-convex algebraic variety. Existing techniques include projected gradient descent, fixed-rank optimization (over the maximal-rank stratum), and the LR parameterization. They all lack either global guarantees (the ability to accumulate only at critical points) or fast local convergence (e.g., if the limit has non-maximal rank). We seek optimization algorithms that enjoy both. Khrulkov and Oseledets [2018] parameterize the bounded-rank variety via a desingularization to recast the optimization problem onto a smooth manifold. Building on their ideas, we develop a Riemannian geometry for this desingularization, also with care for numerical considerations. We use it to secure global convergence to critical points with fast local rates, for a large range of algorithms. On matrix completion tasks, we find that this approach is comparable to others, while enjoying better general-purpose theoretical guarantees.
