A Theory of the NEPv Approach for Optimization On the Stiefel Manifold
Ren-Cang Li
TL;DR
The paper develops two complementary frameworks, NPDo and NEPv, to solve optimization problems on the Stiefel manifold ${\rm St}(k,n)$ by transforming the KKT conditions into structured polar or eigenvalue problems and solving via self-consistent-field iterations. Central to both approaches is the notion of atomic functions, which are trace- or power-based building blocks whose properties ensure monotone ascent and global convergence under a unifying Ansatz. By proving that common matrix-trace objectives (and their convex compositions) fit into these atomic classes, the authors establish broad, provable guarantees for a wide range of problems in machine learning and data analysis. The frameworks also discuss acceleration via LOCG and practical considerations, highlighting that NEPv typically requires weaker conditions and covers more problem classes than NPDo, though NPDo often offers implementation advantages. Altogether, the work provides a comprehensive, unified theory that broadens the applicability and reliability of SCF-based optimization on the Stiefel manifold across many applications.
Abstract
The NEPv approach has been increasingly used lately for optimization on the Stiefel manifold arising from machine learning. General speaking, the approach first turns the first order optimality condition, also known as the KKT condition, into a nonlinear eigenvalue problem with eigenvector dependency (NEPv) or a nonlinear polar decomposition with orthogonal factor dependency (NPDo) and then solve the nonlinear problem via some variations of the self-consistent-field (SCF) iteration. The difficulty, however, lies in designing a proper SCF iteration so that a maximizer is found at the end. Currently, each use of the approach is very much individualized, especially in its convergence analysis to show that the approach does work or otherwise. In this paper, a unifying framework is established. The framework is built upon some basic assumptions. If the basic assumptions are satisfied, globally convergence is guaranteed to a stationary point and during the SCF iterative process that leads to the stationary point, the objective function increases monotonically. Also a notion of atomic functions is proposed, which include commonly used matrix traces of linear and quadratic forms as special ones. It is shown that the basic assumptions are satisfied by atomic functions and by convex compositions of atomic functions. Together they provide a large collection of objectives for which the NEPv/NPDo approach is guaranteed to work.
