Table of Contents
Fetching ...

A biconvex optimization for solving semidefinite programs via bilinear factorization

En-Liang Hu

TL;DR

The paper addresses the scalability of semidefinite programming (SDP) by moving from the traditional quadratic factorization $Z=XX^\top$ to a bilinear factorization $Z=XY^\top$ augmented with a Courant penalty $\frac{\gamma}{2}\|X-Y\|_F^2$. A theoretical bound $\gamma>\tfrac{1}{4}(L-\sigma)_+$ ensures that stationary points of the bilinear surrogate correspond to stationary points of the original SDP under rank deficiency, linking to the Burer-Monteiro approach. The authors propose an alternating accelerated gradient descent (AAGD) algorithm to solve the resulting biconvex problem efficiently with closed-form stepsizes and low per-iteration cost. Empirical results on nonparametric kernel learning (NPKL) and colored maximum variance unfolding (CMVU) demonstrate competitive accuracy and improved convergence speed, highlighting the method's scalability to large SDP instances.

Abstract

Many problems in machine learning can be reduced to learning a low-rank positive semidefinite matrix (denoted as $Z$), which encounters semidefinite program (SDP). Existing SDP solvers by classical convex optimization are expensive to solve large-scale problems. Employing the low rank of solution, Burer-Monteiro's method reformulated SDP as a nonconvex problem via the {\emph{quadratic}} factorization ($Z$ as $XX^\top$). However, this would lose the structure of problem in optimization. In this paper, we propose to convert SDP into a biconvex problem via the {\emph{bilinear}} factorization ($Z$ as $XY^\top$), and while adding the term $\frac{\g}{2}\normfs{X-Y}$ to penalize the difference of $X$ and $Y$. Thus, the biconvex structure (w.r.t. $X$ and $Y$) can be exploited naturally in optimization. As a theoretical result, we provide a bound to the penalty parameter $\g$ under the assumption of $L$-Lipschitz smoothness and $σ$-strongly biconvexity, such that, at stationary points, the proposed bilinear factorization is equivalent to Burer-Monteiro's factorization when the bound is arrived, that is $\g>\frac{1}{4}(L-σ)_+$. Our proposal opens up a new way to surrogate SDP by biconvex program. Experiments on two SDP-related applications demonstrate that the proposed method is effective as the state-of-the-art.

A biconvex optimization for solving semidefinite programs via bilinear factorization

TL;DR

The paper addresses the scalability of semidefinite programming (SDP) by moving from the traditional quadratic factorization to a bilinear factorization augmented with a Courant penalty . A theoretical bound ensures that stationary points of the bilinear surrogate correspond to stationary points of the original SDP under rank deficiency, linking to the Burer-Monteiro approach. The authors propose an alternating accelerated gradient descent (AAGD) algorithm to solve the resulting biconvex problem efficiently with closed-form stepsizes and low per-iteration cost. Empirical results on nonparametric kernel learning (NPKL) and colored maximum variance unfolding (CMVU) demonstrate competitive accuracy and improved convergence speed, highlighting the method's scalability to large SDP instances.

Abstract

Many problems in machine learning can be reduced to learning a low-rank positive semidefinite matrix (denoted as ), which encounters semidefinite program (SDP). Existing SDP solvers by classical convex optimization are expensive to solve large-scale problems. Employing the low rank of solution, Burer-Monteiro's method reformulated SDP as a nonconvex problem via the {\emph{quadratic}} factorization ( as ). However, this would lose the structure of problem in optimization. In this paper, we propose to convert SDP into a biconvex problem via the {\emph{bilinear}} factorization ( as ), and while adding the term to penalize the difference of and . Thus, the biconvex structure (w.r.t. and ) can be exploited naturally in optimization. As a theoretical result, we provide a bound to the penalty parameter under the assumption of -Lipschitz smoothness and -strongly biconvexity, such that, at stationary points, the proposed bilinear factorization is equivalent to Burer-Monteiro's factorization when the bound is arrived, that is . Our proposal opens up a new way to surrogate SDP by biconvex program. Experiments on two SDP-related applications demonstrate that the proposed method is effective as the state-of-the-art.

Paper Structure

This paper contains 22 sections, 4 theorems, 25 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

If $\dot{f}$ in eq:sdp is convex w.r.t. $Z$, then $f$ in eq:fs and $F$ in eq:xy are biconvex w.r.t. $X$ and $Y$.

Figures (3)

  • Figure 1: Objective vs CPU time (logscale) on two data sets.
  • Figure 2: The progress of residual value $\left\Vert X_k-Y_k \right\Vert_F^2$ on a9a.
  • Figure 3: Objective vs CPU time (logscale) on the USPS Digits and Newsgroups 20.

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Remark 1
  • Theorem 1
  • Corollary 1
  • Remark 2
  • Proposition 2