Table of Contents
Fetching ...

Block Coordinate Descent Methods for Optimization under J-Orthogonality Constraints with Applications

Di He, Ganzhao Yuan, Xiao Wang, Pengxiang Xu

TL;DR

This work tackles optimization problems constrained to J-orthogonal matrices, where the constraint takes the form $\mathbf{X}^\top \mathbf{J} \mathbf{X} = \mathbf{J}$ and the objective is a smooth (often finite-sum) function. It introduces JOBCD, a block coordinate descent framework with two variants: GS-JOBCD, which updates 2-by-2 blocks via a Gauss-Seidel-inspired majorization-minimization surrogate, and VR-J-JOBCD, a variance-reduced Jacobi method designed for large-scale finite-sum problems and parallel execution. The authors establish optimality conditions on the J-manifold, global convergence to block-stationary points, and strong convergence under the KL property, with explicit iteration complexities: GS-JOBCD achieves $\mathcal{O}(\Delta_0 N / \varepsilon)$ and VR-J-JOBCD achieves $\mathcal{O}(nN + \Delta_0 \sqrt{N} / \varepsilon)$. Empirically, JOBCD variants outperform state-of-the-art methods across hyperbolic eigenvalue problems, hyperbolic structure probing, and ultrahyperbolic KG embedding, highlighting improved accuracy and efficiency, particularly in large-scale finite-sum settings. The framework thus offers a scalable and theoretically grounded approach to nonconvex optimization under J-orthogonality with broad applicability in hyperbolic and ultrahyperbolic learning tasks.

Abstract

The J-orthogonal matrix, also referred to as the hyperbolic orthogonal matrix, is a class of special orthogonal matrix in hyperbolic space, notable for its advantageous properties. These matrices are integral to optimization under J-orthogonal constraints, which have widespread applications in statistical learning and data science. However, addressing these problems is generally challenging due to their non-convex nature and the computational intensity of the constraints. Currently, algorithms for tackling these challenges are limited. This paper introduces JOBCD, a novel Block Coordinate Descent method designed to address optimizations with J-orthogonality constraints. We explore two specific variants of JOBCD: one based on a Gauss-Seidel strategy (GS-JOBCD), the other on a variance-reduced and Jacobi strategy (VR-J-JOBCD). Notably, leveraging the parallel framework of a Jacobi strategy, VR-J-JOBCD integrates variance reduction techniques to decrease oracle complexity in the minimization of finite-sum functions. For both GS-JOBCD and VR-J-JOBCD, we establish the oracle complexity under mild conditions and strong limit-point convergence results under the Kurdyka-Lojasiewicz inequality. To demonstrate the effectiveness of our method, we conduct experiments on hyperbolic eigenvalue problems, hyperbolic structural probe problems, and the ultrahyperbolic knowledge graph embedding problem. Extensive experiments using both real-world and synthetic data demonstrate that JOBCD consistently outperforms state-of-the-art solutions, by large margins.

Block Coordinate Descent Methods for Optimization under J-Orthogonality Constraints with Applications

TL;DR

This work tackles optimization problems constrained to J-orthogonal matrices, where the constraint takes the form and the objective is a smooth (often finite-sum) function. It introduces JOBCD, a block coordinate descent framework with two variants: GS-JOBCD, which updates 2-by-2 blocks via a Gauss-Seidel-inspired majorization-minimization surrogate, and VR-J-JOBCD, a variance-reduced Jacobi method designed for large-scale finite-sum problems and parallel execution. The authors establish optimality conditions on the J-manifold, global convergence to block-stationary points, and strong convergence under the KL property, with explicit iteration complexities: GS-JOBCD achieves and VR-J-JOBCD achieves . Empirically, JOBCD variants outperform state-of-the-art methods across hyperbolic eigenvalue problems, hyperbolic structure probing, and ultrahyperbolic KG embedding, highlighting improved accuracy and efficiency, particularly in large-scale finite-sum settings. The framework thus offers a scalable and theoretically grounded approach to nonconvex optimization under J-orthogonality with broad applicability in hyperbolic and ultrahyperbolic learning tasks.

Abstract

The J-orthogonal matrix, also referred to as the hyperbolic orthogonal matrix, is a class of special orthogonal matrix in hyperbolic space, notable for its advantageous properties. These matrices are integral to optimization under J-orthogonal constraints, which have widespread applications in statistical learning and data science. However, addressing these problems is generally challenging due to their non-convex nature and the computational intensity of the constraints. Currently, algorithms for tackling these challenges are limited. This paper introduces JOBCD, a novel Block Coordinate Descent method designed to address optimizations with J-orthogonality constraints. We explore two specific variants of JOBCD: one based on a Gauss-Seidel strategy (GS-JOBCD), the other on a variance-reduced and Jacobi strategy (VR-J-JOBCD). Notably, leveraging the parallel framework of a Jacobi strategy, VR-J-JOBCD integrates variance reduction techniques to decrease oracle complexity in the minimization of finite-sum functions. For both GS-JOBCD and VR-J-JOBCD, we establish the oracle complexity under mild conditions and strong limit-point convergence results under the Kurdyka-Lojasiewicz inequality. To demonstrate the effectiveness of our method, we conduct experiments on hyperbolic eigenvalue problems, hyperbolic structural probe problems, and the ultrahyperbolic knowledge graph embedding problem. Extensive experiments using both real-world and synthetic data demonstrate that JOBCD consistently outperforms state-of-the-art solutions, by large margins.
Paper Structure (40 sections, 26 theorems, 144 equations, 68 figures, 3 tables, 2 algorithms)

This paper contains 40 sections, 26 theorems, 144 equations, 68 figures, 3 tables, 2 algorithms.

Key Result

Lemma 2.1

(Proof in Section app:binding:jorth:theorem) For any $\texttt{B}\in\Omega$, we define $\mathbf{X}^+\triangleq \mathcal{X}_{\texttt{B}}(\mathbf{V})\triangleq \mathbf{X} + \mathbf{U}_{\texttt{B}} (\mathbf{V}-\mathbf{I}) \mathbf{U}_{\texttt{B}}^\mathsf{T} \mathbf{X}$. We have: (a) If $\mathbf{V} \in \m

Figures (68)

  • Figure 1: Gisette (3000-100-50)
  • Figure 2: Sector (500-1000-500)
  • Figure 3: wla (2470-290-145)
  • Figure 5: Cifar (10000-50-45)
  • Figure 6: Gisette (6000-50-45)
  • ...and 63 more figures

Theorems & Definitions (52)

  • Lemma 2.1
  • Proposition 2.2
  • Lemma 2.3
  • Lemma 2.4
  • Lemma 2.5
  • Lemma 3.1
  • Definition 3.2
  • Theorem 3.3
  • Definition 4.4
  • Lemma 4.5
  • ...and 42 more