Table of Contents
Fetching ...

A Riemannian Proximal Newton-CG Method

Wen Huang, Wutao Si

TL;DR

This work addresses global convergence for nonsmooth optimization on Riemannian manifolds by marrying a truncated conjugate gradient solver with a Riemannian proximal Newton step, producing a RPN-CG method. It proves global convergence and local superlinear convergence under standard assumptions, and demonstrates that a hybrid RPN-CGH variant is robust to switching parameters. The approach shows superior practical performance on sparse PCA, CM, and CD problems compared to state-of-the-art proximal-gradient-type methods. Overall, the method provides a scalable, second-order algorithm for nonsmooth manifold optimization with strong theoretical guarantees and empirical efficiency.

Abstract

Recently, a Riemannian proximal Newton method has been developed for optimizing problems in the form of $\min_{x\in\mathcal{M}} f(x) + μ\|x\|_1$, where $\mathcal{M}$ is a compact embedded submanifold and $f(x)$ is smooth. Although this method converges superlinearly locally, global convergence is not guaranteed. The existing remedy relies on a hybrid approach: running a Riemannian proximal gradient method until the iterate is sufficiently accurate and switching to the Riemannian proximal Newton method. This existing approach is sensitive to the switching parameter. This paper proposes a Riemannian proximal Newton-CG method that merges the truncated conjugate gradient method with the Riemannian proximal Newton method. The global convergence and local superlinear convergence are proven. Numerical experiments show that the proposed method outperforms other state-of-the-art methods.

A Riemannian Proximal Newton-CG Method

TL;DR

This work addresses global convergence for nonsmooth optimization on Riemannian manifolds by marrying a truncated conjugate gradient solver with a Riemannian proximal Newton step, producing a RPN-CG method. It proves global convergence and local superlinear convergence under standard assumptions, and demonstrates that a hybrid RPN-CGH variant is robust to switching parameters. The approach shows superior practical performance on sparse PCA, CM, and CD problems compared to state-of-the-art proximal-gradient-type methods. Overall, the method provides a scalable, second-order algorithm for nonsmooth manifold optimization with strong theoretical guarantees and empirical efficiency.

Abstract

Recently, a Riemannian proximal Newton method has been developed for optimizing problems in the form of , where is a compact embedded submanifold and is smooth. Although this method converges superlinearly locally, global convergence is not guaranteed. The existing remedy relies on a hybrid approach: running a Riemannian proximal gradient method until the iterate is sufficiently accurate and switching to the Riemannian proximal Newton method. This existing approach is sensitive to the switching parameter. This paper proposes a Riemannian proximal Newton-CG method that merges the truncated conjugate gradient method with the Riemannian proximal Newton method. The global convergence and local superlinear convergence are proven. Numerical experiments show that the proposed method outperforms other state-of-the-art methods.
Paper Structure (21 sections, 17 theorems, 70 equations, 4 figures, 4 tables, 3 algorithms)

This paper contains 21 sections, 17 theorems, 70 equations, 4 figures, 4 tables, 3 algorithms.

Key Result

Proposition 2.1

If $x_* = $ is a local minimizer with $\bar{x}_* \in \mathbb{R}^j$ and $\bar{B}_{x_*}$ has full column rank. Then $v(x_*) = 0$ and $\mathcal{B}_{x_*} \succeq 0$ on the subspace $\mathfrak{L}_{x_*}$, where $\mathfrak{L}_x$ is defined by $\mathfrak{L}_x = \{w: \bar{B}_{x}^{T} w = 0\}$.

Figures (4)

  • Figure 1: The five principal components used in the synthetic data.
  • Figure 2: Sparse PCA: plots of $\|v(x_k)\|$ versus iterations and CPU times respectively. The left two plots are generated by random data and the right two plots are generated by synthetic data with $(n, p, \mu) = (4000, 5, 0.8)$ and $\epsilon = 10^{-3}$.
  • Figure 3: CM: plots of $\|v(x_k)\|$ versus iterations and CPU times respectively.
  • Figure 4: Community Detection: plots of $\|v(x_k)\|$ versus iterations and CPU times respectively.

Theorems & Definitions (35)

  • Proposition 2.1
  • Remark 3.1
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.1
  • proof
  • Definition 3.1: Geodesically strongly convex
  • Lemma 3.3
  • ...and 25 more