Table of Contents
Fetching ...

A Hardware Accelerator for the Goemans-Williamson Algorithm

D. A. Herrera-Martí, E. Guthmuller, J. Fereyre

TL;DR

The paper addresses scalable benchmarking for Max-Cut via the Goemans–Williamson SDP relaxation and shows that extending internal precision in Krylov subroutines within an Interior Point Method can dramatically reduce iteration counts for large SDPs, improving time-to-solution. It provides evidence that higher precision speeds up Conjugate Gradient in the Newton steps, with gains growing with problem size, and proposes adaptive precision schemes to balance speed and accuracy. Additionally, it evaluates a potential native-precision hardware accelerator (VXP) for extended precision, estimating substantial speedups (up to ~10× for CG) under realistic memory-bandwidth constraints and roughly 27% additional savings with adaptive precision. Together, these results offer a viable path to scalable, worst-case-guaranteed SDP benchmarks for comparing classical and quantum-inspired optimisers on very large Max-Cut instances.

Abstract

The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.

A Hardware Accelerator for the Goemans-Williamson Algorithm

TL;DR

The paper addresses scalable benchmarking for Max-Cut via the Goemans–Williamson SDP relaxation and shows that extending internal precision in Krylov subroutines within an Interior Point Method can dramatically reduce iteration counts for large SDPs, improving time-to-solution. It provides evidence that higher precision speeds up Conjugate Gradient in the Newton steps, with gains growing with problem size, and proposes adaptive precision schemes to balance speed and accuracy. Additionally, it evaluates a potential native-precision hardware accelerator (VXP) for extended precision, estimating substantial speedups (up to ~10× for CG) under realistic memory-bandwidth constraints and roughly 27% additional savings with adaptive precision. Together, these results offer a viable path to scalable, worst-case-guaranteed SDP benchmarks for comparing classical and quantum-inspired optimisers on very large Max-Cut instances.

Abstract

The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.

Paper Structure

This paper contains 9 sections, 17 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: (a) Iterations of CG subroutine vs. iterations of the IPM for $N=5000$. Increasing precision reduces the amount of "redundant searches" in the Krylov subspace and the matrix is inverted after a smaller number of iterations. (b) Condition number $\kappa$ vs. iterations of the IPM. As explained in the appendices, the matrix becomes rank-deficient as the IPM progresses. As a result, the conditioning of the system of equations for matrix inversion increases with the number of iterations. (All data is from graph G55 in the Gset database).
  • Figure 2: Relative improvement vs. problem size. We integrated the amount of iterations for all floating point precisions considered in this work (1024 bits to 64 bits). This shows that the expected speedup increases with the size of the problem, the slope in the increase of total number of iterations is larger in low precisions compared to that in high precision.
  • Figure 3: Ratio between the attained optimal value and the best known cut vs. problem size. The values are averaged over 10 realisations of the random hyperplane separation. Despite providing worst-case guarantees, the GW algorithm becomes gradually competitive with local search heuristics which only provide average-case guarantees. For signed edges, performance is degraded.
  • Figure 4: Time spent in CG algorithm at each Newton step for G55 problem (5000x5000).
  • Figure 5: Total time spent in CG for all problems and precisions, normalised to 64b precision.
  • ...and 1 more figures