Table of Contents
Fetching ...

Structured Regularization for Constrained Optimization on the SPD Manifold

Andrew Cheng, Melanie Weber

TL;DR

This work introduces a modular, symmetry-based regularization framework for constrained optimization on the SPD manifold, leveraging symmetric gauge functions to design sparsity and ball-neighborhood regularizers. By preserving geodesic convexity and enabling difference-of-convex (DC) structure, the authors enable the use of Euclidean convex-concave procedures (CCCP) to solve constrained geodesically convex problems efficiently, often with simpler subroutines than traditional Riemannian methods. They provide a convergence and complexity analysis, plus a broad set of applications (square roots, Karcher means, optimistic likelihoods, and SPD regression) and extensive experiments demonstrating speed and robustness advantages over standard Riemannian solvers. The approach offers a flexible, scalable pathway to incorporate structure and prior information into SPD optimization, with potential extensions to other Cartan-Hadamard manifolds and a range of regularizers beyond the two primary SG-based classes.

Abstract

Matrix-valued optimization tasks, including those involving symmetric positive definite (SPD) matrices, arise in a wide range of applications in machine learning, data science and statistics. Classically, such problems are solved via constrained Euclidean optimization, where the domain is viewed as a Euclidean space and the structure of the matrices (e.g., positive definiteness) enters as constraints. More recently, geometric approaches that leverage parametrizations of the problem as unconstrained tasks on the corresponding matrix manifold have been proposed. While they exhibit algorithmic benefits in many settings, they cannot directly handle additional constraints, such as inequality or sparsity constraints. A remedy comes in the form of constrained Riemannian optimization methods, notably, Riemannian Frank-Wolfe and Projected Gradient Descent. However, both algorithms require potentially expensive subroutines that can introduce computational bottlenecks in practise. To mitigate these shortcomings, we introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods. We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure. We demonstrate the effectiveness of our approach in numerical experiments.

Structured Regularization for Constrained Optimization on the SPD Manifold

TL;DR

This work introduces a modular, symmetry-based regularization framework for constrained optimization on the SPD manifold, leveraging symmetric gauge functions to design sparsity and ball-neighborhood regularizers. By preserving geodesic convexity and enabling difference-of-convex (DC) structure, the authors enable the use of Euclidean convex-concave procedures (CCCP) to solve constrained geodesically convex problems efficiently, often with simpler subroutines than traditional Riemannian methods. They provide a convergence and complexity analysis, plus a broad set of applications (square roots, Karcher means, optimistic likelihoods, and SPD regression) and extensive experiments demonstrating speed and robustness advantages over standard Riemannian solvers. The approach offers a flexible, scalable pathway to incorporate structure and prior information into SPD optimization, with potential extensions to other Cartan-Hadamard manifolds and a range of regularizers beyond the two primary SG-based classes.

Abstract

Matrix-valued optimization tasks, including those involving symmetric positive definite (SPD) matrices, arise in a wide range of applications in machine learning, data science and statistics. Classically, such problems are solved via constrained Euclidean optimization, where the domain is viewed as a Euclidean space and the structure of the matrices (e.g., positive definiteness) enters as constraints. More recently, geometric approaches that leverage parametrizations of the problem as unconstrained tasks on the corresponding matrix manifold have been proposed. While they exhibit algorithmic benefits in many settings, they cannot directly handle additional constraints, such as inequality or sparsity constraints. A remedy comes in the form of constrained Riemannian optimization methods, notably, Riemannian Frank-Wolfe and Projected Gradient Descent. However, both algorithms require potentially expensive subroutines that can introduce computational bottlenecks in practise. To mitigate these shortcomings, we introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods. We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure. We demonstrate the effectiveness of our approach in numerical experiments.

Paper Structure

This paper contains 59 sections, 34 theorems, 125 equations, 9 figures, 2 tables, 4 algorithms.

Key Result

Theorem 2.4

Let $d(x_0,x^*) \leq R$ for some $x_0 \in \mathcal{M}$ with $\phi(x) \leq \phi(x_0)$. If the functions $Q(x,x_k)$ in Alg. alg:cccp are first-order surrogate functions, then where $\alpha_{\mathcal{M}}$ depends on the geometry of the manifold and $L$ characterizes the smoothness of $h(\cdot)$.

Figures (9)

  • Figure 1: We apply the fixed-point algorithm (see Proposition \ref{['prop:fp_sqrt']}) to the medium conditioned $H \in \mathbb{R}^{200 \times 200}$. We initialized all algorithms at $X_0 = 3 I_d$. Although all three methods are of the same order in terms of per-iteration-complexity, the fixed-point method exhibits superior runtime performance. The stepsizes for RGD and RCG are chosen using backtracking line search. In contrast, the fixed-point algorithm does not need a stepsize. Distance is measured in terms of the Frobenius norm.
  • Figure 2: We apply the fixed-point algorithm (see Proposition \ref{['prop:fp_sqrt']}) to the ill-conditioned Hilbert matrix where we took dimension $d=200$. We initialized all algorithms at $X_0 = 3 I_d$. The benchmarks fail to converge whereas CCCP exhibits robustness to ill conditioning.
  • Figure 3: We generate the Hilbert matrix $H \in \mathbb{R}^{200 \times 200}$. We plot the condition number of $H+X_k$ where $X_k$ is the $k$-th iterate of the fixed-point algorithm \ref{['eq:sqrt_fp']} and compare this to the condition number of $H$. Clearly, $H+X_k$ is much better conditioned than $H$. This trend also holds for other very ill-conditioned matrices.
  • Figure 4: Karcher Mean.$m=100$ and $d=100$. . CCCP demonstrates superior runtime complexity.
  • Figure 5: Karcher Mean.$m=100$ and $d=500$. We observe that the gap between the runtime performance between CCCP and the benchmarks widens as we take the dimension $d$ to be larger.
  • ...and 4 more figures

Theorems & Definitions (69)

  • Definition 2.1: Geodesic convexity of sets
  • Definition 2.2: Geodesic convexity of functions
  • Definition 2.3
  • Theorem 2.4: pmlr-v202-weber23a
  • Definition 2.5: Symmetric Gauge Functions.
  • Proposition 2.6: Closure under positive scaling
  • Proposition 2.7: Closure under addition
  • proof
  • Definition 2.8
  • Proposition 2.9: Closure under the dual
  • ...and 59 more