Table of Contents
Fetching ...

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Ganzhao Yuan

TL;DR

A Block Coordinate Descent (OBCD) method is developed for nonsmooth composite optimization subject to orthogonality constraints X^T X = I_r. By updating k rows at a time and solving a small subproblem exactly via a majorization surrogate, OBCD achieves strong optimality (block-k stationary points) and favorable convergence properties. The paper establishes a hierarchy of optimality, ergodic convergence at rate O(1/ε), and non-ergodic convergence under KL conditions, with extensions including breakpoint searching for k=2 and greedy working-set strategies. Empirical results show OBCD often outperforms state-of-the-art methods across tasks like sparse PCA and nonnegative PCA, highlighting its practical impact for high-dimensional orthogonal problems.

Abstract

Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive, non-convex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages Block Coordinate Descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates $k$ rows of the solution matrix, where $k \geq 2$, by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove that the limiting points of \textbf{OBCD}, referred to as (global) block-$k$ stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} converges to $ε$-block-$k$ stationary points with an ergodic convergence rate of $\mathcal{O}(1/ε)$. Additionally, under the Kurdyka-Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also extend \textbf{OBCD} by incorporating breakpoint searching methods for subproblem solving and greedy strategies for working set selection. Comprehensive experiments demonstrate the superior performance of our approach across various tasks.

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

TL;DR

A Block Coordinate Descent (OBCD) method is developed for nonsmooth composite optimization subject to orthogonality constraints X^T X = I_r. By updating k rows at a time and solving a small subproblem exactly via a majorization surrogate, OBCD achieves strong optimality (block-k stationary points) and favorable convergence properties. The paper establishes a hierarchy of optimality, ergodic convergence at rate O(1/ε), and non-ergodic convergence under KL conditions, with extensions including breakpoint searching for k=2 and greedy working-set strategies. Empirical results show OBCD often outperforms state-of-the-art methods across tasks like sparse PCA and nonnegative PCA, highlighting its practical impact for high-dimensional orthogonal problems.

Abstract

Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive, non-convex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages Block Coordinate Descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates rows of the solution matrix, where , by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove that the limiting points of \textbf{OBCD}, referred to as (global) block- stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} converges to -block- stationary points with an ergodic convergence rate of . Additionally, under the Kurdyka-Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also extend \textbf{OBCD} by incorporating breakpoint searching methods for subproblem solving and greedy strategies for working set selection. Comprehensive experiments demonstrate the superior performance of our approach across various tasks.
Paper Structure (46 sections, 28 theorems, 113 equations, 4 figures, 8 tables, 2 algorithms)

This paper contains 46 sections, 28 theorems, 113 equations, 4 figures, 8 tables, 2 algorithms.

Key Result

Lemma 2.1

(Proof in Appendix app:lemma:X:and:V) We let $\texttt{B} \in \{\mathcal{B}_{i}\}$ , where the set $\{\mathcal{B}_{1}, \mathcal{B}_{2},...,\mathcal{B}_{\mathrm{C}_n^k}\}$ denotes all possible combinations of the index vectors choosing $k$ items from $n$ without repetition. We let $\mathbf{V} \in \mathrm{St}(k,k)$. We define $\mathbf{X}^+ \triangleq \mathcal{X}_{\texttt{B}} (\math

Figures (4)

  • Figure 1: The convergence curve of the compared methods for solving $L_0$ norm-based SPCA with $\lambda=100$. No matter how long the algorithms run, the other methods remain trapped in poor local minima.
  • Figure 2: Geometric Visualizations of Two Examples of $2\times 2$ Optimization Problems with Orthogonality Constraints with $\mathbf{A} = (10-1-1)$ and $\mathbf{B} = (1012)$.
  • Figure 3: The convergence curve of the compared methods for solving $L_0$ norm-based SPCA with $\lambda=100$.
  • Figure 4: The convergence curve of the compared methods for solving $L_1$ norm-based SPCA with $\lambda=100$.

Theorems & Definitions (67)

  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Remark 2.4
  • Lemma 2.5
  • Remark 2.6
  • Theorem 3.1
  • Remark 3.2
  • Definition 3.3
  • Remark 3.4
  • ...and 57 more