cuHALLaR: A GPU Accelerated Low-Rank Augmented Lagrangian Method for Large-Scale Semidefinite Programming
Jacob M. Aguirre, Diego Cifuentes, Vincent Guigues, Renato D. C. Monteiro, Victor Hugo Nascimento, Arnesh Sujanani
TL;DR
cuHALLaR presents a GPU-accelerated implementation of the HALLaR framework for large-scale SDPs by adopting a low-rank factorization $X=UU^{\top}$ and a hybrid sparse–low-rank data format (HSLR). The method delivers dramatic speedups over the CPU HALLaR and over the GPU solver cuLoRADS across matrix completion, maximum stable set, and phase retrieval problems, including instances with millions of variables and hundreds of millions of constraints. Key innovations include efficient GPU kernels for $\\tilde{\\mathcal{A}}(UU^{\top})$ and $\\tilde{\\mathcal{A}}^{*}(\\tilde{p})U$, and the HSLR data representation that avoids forming dense $n\times n$ matrices. Empirical results demonstrate substantial practical impact, solving extremely large SDP instances (e.g., up to $(n,m) \approx (8{,}000{,}000, 300{,}000{,}000)$) with high precision in minutes, highlighting cuHALLaR as a competitive GPU-based framework for large-scale SDPs.
Abstract
This paper introduces cuHALLaR, a GPU-accelerated implementation of the HALLaR method proposed in Monteiro et al. 2024 for solving large-scale semidefinite programming (SDP) problems. We demonstrate how our Julia-based implementation efficiently uses GPU parallelism through optimization of simple, but key, operations, including linear maps, adjoints, and gradient evaluations. Extensive numerical experiments across three SDP problem classes, i.e., maximum stable set, matrix completion, and phase retrieval show significant performance improvements over both CPU implementations and existing GPU-based solvers. For the largest instances, cuHALLaR achieves speedups of 30-140x on matrix completion problems, up to 135x on maximum stable set problems for Hamming graphs with 8.4 million vertices, and 15-47x on phase retrieval problems with dimensions up to 3.2 million. Our approach efficiently handles massive problems with dimensions up to (n,m) equal to (8 million, 300 million) with high precision, solving matrix completion instances with over 8 million rows and columns in just 142 seconds. These results establish cuHALLaR as a very promising GPU-based method for solving large-scale semidefinite programs.
