SUperman: Efficient Permanent Computation on GPUs
Deniz Elbek, Fatih Taşyaran, Bora Uçar, Kamer Kaya
TL;DR
SUperman addresses the challenge of exact permanent computation, a #P-complete problem, by designing a GPU-optimized, multi-node software suite that extends Ryser-based approaches with Gray-code optimizations and architecture-aware memory strategies. The framework supports dense and sparse matrices across real and complex domains, incorporating preprocessing (DM and Forbert–Marx decompositions) and precision-enhancement techniques (quad precision outer sums and compensated summation) to deliver substantial speedups over CPU baselines and to enable large-scale records (e.g., $62\times62$ on 192 GPUs in ~1.63 days). Key contributions include coalesced memory access patterns, per-thread register usage for the $x$ array, and flexible OpenMP/MPI-based parallelism, plus accuracy-guided strategies validated on known matrices. The results indicate meaningful practical impact for applications in quantum computing, physics, and combinatorics, offering a reusable GPU/HPC solution for permanents and setting records for large instances. Future work points to hybrid/extended precision, Python wrappers, and broader ecosystem integration to broaden accessibility and applicability.
Abstract
The permanent is a function, defined for a square matrix, with applications in various domains including quantum computing, statistical physics, complexity theory, combinatorics, and graph theory. Its formula is similar to that of the determinant; however, unlike the determinant, its exact computation is #P-complete, i.e., there is no algorithm to compute the permanent in polynomial time unless P=NP. For an $n \times n$ matrix, the fastest algorithm has a time complexity of $O(2^{n-1}n)$. Although supercomputers have been employed for permanent computation before, there is no work and, more importantly, no publicly available software that leverages cutting-edge High-Performance Computing accelerators such as GPUs. In this work, we design, develop, and investigate the performance of SUperman, a complete software suite that can compute matrix permanents on multiple nodes/GPUs on a cluster while handling various matrix types, e.g., real/complex/binary and sparse/dense, etc., with a unique treatment for each type. SUperman run on a single Nvidia A100 GPU is up to $86\times$ faster than a state-of-the-art parallel algorithm on 44 Intel Xeon cores running at 2.10GHz. Leveraging 192 GPUs, SUperman computes the permanent of a $62 \times 62$ matrix in 1.63 days, marking the largest reported permanent computation to date.
