Table of Contents
Fetching ...

CUAOA: A Novel CUDA-Accelerated Simulation Framework for the QAOA

Jonas Stein, Jonas Blenninger, David Bucher, Peter J. Eder, Elif Çetiner, Maximilian Zorn, Claudia Linnhoff-Popien

TL;DR

This work tackles the bottleneck of simulating QAOA on classical hardware by introducing CUAOA, a CUDA-native, single-GPU QAOA simulator with a Python/Rust interface. It leverages a diagonal cost Hamiltonian representation, adjoint differentiation for fast gradient calculations, and cuStateVec for mixer and sampling, delivering substantial runtime improvements over existing tools such as QOKit, Qiskit, and Pennylane. The results on MaxCut graphs show orders-of-magnitude speedups for small- to medium-sized problems, with gradient-based optimization also significantly accelerated. The framework provides comprehensive functionality (statevector access, exact expectation values, fast sampling, and gradient-based optimization) and is open-source, paving the way for more scalable classical simulations and integration into Rust/Python workflows; future work includes multi-GPU extensions and constraint-preserving mixers.

Abstract

The Quantum Approximate Optimization Algorithm (QAOA) is a prominent quantum algorithm designed to find approximate solutions to combinatorial optimization problems, which are challenging for classical computers. In the current era, where quantum hardware is constrained by noise and limited qubit availability, simulating the QAOA remains essential for research. However, existing state-of-the-art simulation frameworks suffer from long execution times or lack comprehensive functionality, usability, and versatility, often requiring users to implement essential features themselves. Additionally, these frameworks are primarily restricted to Python, limiting their use in safer and faster languages like Rust, which offer, e.g., advanced parallelization capabilities. In this paper, we develop a GPU accelerated QAOA simulation framework utilizing the NVIDIA CUDA toolkit. This framework offers a complete interface for QAOA simulations, enabling the calculation of (exact) expectation values, direct access to the statevector, fast sampling, and high-performance optimization methods using an advanced state-of-the-art gradient calculation technique. The framework is designed for use in Python and Rust, providing flexibility for integration into a wide range of applications, including those requiring fast algorithm implementations leveraging QAOA at its core. The new framework's performance is rigorously benchmarked on the MaxCut problem and compared against the current state-of-the-art general-purpose quantum circuit simulation frameworks Qiskit and Pennylane as well as the specialized QAOA simulation tool QOKit. Our evaluation shows that our approach outperforms the existing state-of-the-art solutions in terms of runtime up to multiple orders of magnitude. Our implementation is publicly available at https://github.com/JFLXB/cuaoa and Zenodo.

CUAOA: A Novel CUDA-Accelerated Simulation Framework for the QAOA

TL;DR

This work tackles the bottleneck of simulating QAOA on classical hardware by introducing CUAOA, a CUDA-native, single-GPU QAOA simulator with a Python/Rust interface. It leverages a diagonal cost Hamiltonian representation, adjoint differentiation for fast gradient calculations, and cuStateVec for mixer and sampling, delivering substantial runtime improvements over existing tools such as QOKit, Qiskit, and Pennylane. The results on MaxCut graphs show orders-of-magnitude speedups for small- to medium-sized problems, with gradient-based optimization also significantly accelerated. The framework provides comprehensive functionality (statevector access, exact expectation values, fast sampling, and gradient-based optimization) and is open-source, paving the way for more scalable classical simulations and integration into Rust/Python workflows; future work includes multi-GPU extensions and constraint-preserving mixers.

Abstract

The Quantum Approximate Optimization Algorithm (QAOA) is a prominent quantum algorithm designed to find approximate solutions to combinatorial optimization problems, which are challenging for classical computers. In the current era, where quantum hardware is constrained by noise and limited qubit availability, simulating the QAOA remains essential for research. However, existing state-of-the-art simulation frameworks suffer from long execution times or lack comprehensive functionality, usability, and versatility, often requiring users to implement essential features themselves. Additionally, these frameworks are primarily restricted to Python, limiting their use in safer and faster languages like Rust, which offer, e.g., advanced parallelization capabilities. In this paper, we develop a GPU accelerated QAOA simulation framework utilizing the NVIDIA CUDA toolkit. This framework offers a complete interface for QAOA simulations, enabling the calculation of (exact) expectation values, direct access to the statevector, fast sampling, and high-performance optimization methods using an advanced state-of-the-art gradient calculation technique. The framework is designed for use in Python and Rust, providing flexibility for integration into a wide range of applications, including those requiring fast algorithm implementations leveraging QAOA at its core. The new framework's performance is rigorously benchmarked on the MaxCut problem and compared against the current state-of-the-art general-purpose quantum circuit simulation frameworks Qiskit and Pennylane as well as the specialized QAOA simulation tool QOKit. Our evaluation shows that our approach outperforms the existing state-of-the-art solutions in terms of runtime up to multiple orders of magnitude. Our implementation is publicly available at https://github.com/JFLXB/cuaoa and Zenodo.
Paper Structure (15 sections, 4 equations, 3 figures)

This paper contains 15 sections, 4 equations, 3 figures.

Figures (3)

  • Figure 1: Runtime wrt. expectation value for QAOA with $p = 6$.
  • Figure 2: Runtime for sampling with 1024 shots and $p = 6$.
  • Figure 3: Runtime for gradient calculation using the adjoint differentiation method with $p = 6$.