Table of Contents
Fetching ...

A Riemannian ADMM

Jiaxiang Li, Shiqian Ma, Tejes Srivastava

TL;DR

This work proposes a Riemannian alternating direction method of multipliers (ADMM) type algorithm that minimizes a nonsmooth objective over manifold -- a particular nonconvex set.

Abstract

We consider a class of Riemannian optimization problems where the objective is the sum of a smooth function and a nonsmooth function, considered in the ambient space. This class of problems finds important applications in machine learning and statistics such as the sparse principal component analysis, sparse spectral clustering, and orthogonal dictionary learning. We propose a Riemannian alternating direction method of multipliers (ADMM) to solve this class of problems. Our algorithm adopts easily computable steps in each iteration. The iteration complexity of the proposed algorithm for obtaining an $ε$-stationary point is analyzed under mild assumptions. Existing ADMM for solving nonconvex problems either does not allow nonconvex constraint set, or does not allow nonsmooth objective function. Our algorithm is the first ADMM type algorithm that minimizes a nonsmooth objective over manifold -- a particular nonconvex set. Numerical experiments are conducted to demonstrate the advantage of the proposed method.

A Riemannian ADMM

TL;DR

This work proposes a Riemannian alternating direction method of multipliers (ADMM) type algorithm that minimizes a nonsmooth objective over manifold -- a particular nonconvex set.

Abstract

We consider a class of Riemannian optimization problems where the objective is the sum of a smooth function and a nonsmooth function, considered in the ambient space. This class of problems finds important applications in machine learning and statistics such as the sparse principal component analysis, sparse spectral clustering, and orthogonal dictionary learning. We propose a Riemannian alternating direction method of multipliers (ADMM) to solve this class of problems. Our algorithm adopts easily computable steps in each iteration. The iteration complexity of the proposed algorithm for obtaining an -stationary point is analyzed under mild assumptions. Existing ADMM for solving nonconvex problems either does not allow nonconvex constraint set, or does not allow nonsmooth objective function. Our algorithm is the first ADMM type algorithm that minimizes a nonsmooth objective over manifold -- a particular nonconvex set. Numerical experiments are conducted to demonstrate the advantage of the proposed method.
Paper Structure (9 sections, 9 theorems, 83 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 9 sections, 9 theorems, 83 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

The solution of the $z$-subproblem in R_admm_Moreau-linearize is given by where where $\mathrm{prox}_h$ denotes the proximal mapping of function $h$, which is defined as

Figures (6)

  • Figure 1: Comparison of the CPU time (in seconds) consumed among the ManPG, RADMM and Riemannian gradient methods for solving \ref{['sPCA']} with $\mu=1$. Each figure is averaged for 10 repeated experiments with random initializations.
  • Figure 2: Comparison of SOC, MADMM and RADMM for solving \ref{['sPCA']} with $\mu=1$ and with respect to iteration numbers. Each figure is averaged for 10 repeated experiments with random initializations.
  • Figure 3: Comparison of SOC, MADMM and RADMM for solving \ref{['sPCA']} with $\mu=1$ and with respect to CPU time consumed. Each figure is averaged for 10 repeated experiments with random initializations.
  • Figure 4: Function value $\|Y^\top X^{k}\|_1$ versus CPU time. In this experiment we set $n\in\{30, 50\}$, $p=5$, and we set the number of Inlier $p_1=500$ and Outlier $p_2=1167$. The experiments are repeated and averaged for 10 times with random initialization.
  • Figure 5: Comparison of SOC, MADMM and RADMM for solving \ref{['DPCP-matrix']} with respect to iteration number. The upper row is the plot of function value $f(X^k)$, and the lower row is the plot of $f(X^k)-f^\star$. Note that here $f^*$ is still taken as the minimum function value output by all three algorithms. Each figure is averaged for 5 repeated experiments with random initializations.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Definition 1: Tangent space
  • Definition 2: Riemannian Gradient
  • Definition 3: Retraction
  • Lemma 1
  • proof
  • Definition 4
  • Lemma 2: Properties of Moreau envelope
  • Lemma 3: Bound dual by primal
  • proof
  • Definition 5: boumal2019global
  • ...and 11 more