A Hardware Accelerator for the Goemans-Williamson Algorithm
D. A. Herrera-Martí, E. Guthmuller, J. Fereyre
TL;DR
The paper addresses scalable benchmarking for Max-Cut via the Goemans–Williamson SDP relaxation and shows that extending internal precision in Krylov subroutines within an Interior Point Method can dramatically reduce iteration counts for large SDPs, improving time-to-solution. It provides evidence that higher precision speeds up Conjugate Gradient in the Newton steps, with gains growing with problem size, and proposes adaptive precision schemes to balance speed and accuracy. Additionally, it evaluates a potential native-precision hardware accelerator (VXP) for extended precision, estimating substantial speedups (up to ~10× for CG) under realistic memory-bandwidth constraints and roughly 27% additional savings with adaptive precision. Together, these results offer a viable path to scalable, worst-case-guaranteed SDP benchmarks for comparing classical and quantum-inspired optimisers on very large Max-Cut instances.
Abstract
The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.
