A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
Ganzhao Yuan
TL;DR
A Block Coordinate Descent (OBCD) method is developed for nonsmooth composite optimization subject to orthogonality constraints X^T X = I_r. By updating k rows at a time and solving a small subproblem exactly via a majorization surrogate, OBCD achieves strong optimality (block-k stationary points) and favorable convergence properties. The paper establishes a hierarchy of optimality, ergodic convergence at rate O(1/ε), and non-ergodic convergence under KL conditions, with extensions including breakpoint searching for k=2 and greedy working-set strategies. Empirical results show OBCD often outperforms state-of-the-art methods across tasks like sparse PCA and nonnegative PCA, highlighting its practical impact for high-dimensional orthogonal problems.
Abstract
Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive, non-convex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages Block Coordinate Descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates $k$ rows of the solution matrix, where $k \geq 2$, by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove that the limiting points of \textbf{OBCD}, referred to as (global) block-$k$ stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} converges to $ε$-block-$k$ stationary points with an ergodic convergence rate of $\mathcal{O}(1/ε)$. Additionally, under the Kurdyka-Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also extend \textbf{OBCD} by incorporating breakpoint searching methods for subproblem solving and greedy strategies for working set selection. Comprehensive experiments demonstrate the superior performance of our approach across various tasks.
