Table of Contents
Fetching ...

Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing

Chuan-Chi Wang, Yu-Cheng Lin, Yan-Jie Wang, Chia-Heng Tu, Shih-Hao Hung

TL;DR

Queen tackles the bottlenecks of state-vector quantum circuit simulation on multi-GPU systems by introducing a cache-aware, two-module framework: All-in-One optimization (AIO) to produce gate blocks and enable gate fusion, and All-in-Cache (AIC) to execute block-by-block within GPU caches while coordinating IMS and XRS swaps across ranks. The approach achieves roughly 9x average speedups over QuEST, Aer, and cuQuantum on a DGX-A100, with up to 8x FLOPS and 96x arithmetic intensity improvements, indicating a shift from memory-bound to compute-bound regimes for large-scale QCS. The authors provide reproducible artifacts (A1 and A2) to enable benchmarking and validation, and discuss scalability, strong-scaling results up to 8 GPUs and the potential for future NVIDIA architectures. Overall, Queen offers a practical, scalable path to faster quantum circuit simulations, facilitating rapid development of quantum algorithms on classical hardware.

Abstract

The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations and unpolished circuit optimizations, the past simulators were severely constrained in performance, leading to stagnation in quantum computing. In this paper, we present an innovative quantum circuit simulation toolkit comprising gate optimization and simulation modules to address these performance challenges. For the performance, scalability, and comprehensive evaluation, we conduct a series of particular circuit benchmarks and strong scaling tests on a DGX-A100 workstation and achieve averaging 9 times speedup compared to state-of-the-art simulators, including QuEST, IBM-Aer, and NVIDIA-cuQuantum. Moreover, the critical performance metric FLOPS increases by up to a factor of 8-fold, and arithmetic intensity experiences a remarkable 96x enhancement. We believe the proposed toolkit paves the way for faster quantum circuit simulations, thereby facilitating the development of novel quantum algorithms.

Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing

TL;DR

Queen tackles the bottlenecks of state-vector quantum circuit simulation on multi-GPU systems by introducing a cache-aware, two-module framework: All-in-One optimization (AIO) to produce gate blocks and enable gate fusion, and All-in-Cache (AIC) to execute block-by-block within GPU caches while coordinating IMS and XRS swaps across ranks. The approach achieves roughly 9x average speedups over QuEST, Aer, and cuQuantum on a DGX-A100, with up to 8x FLOPS and 96x arithmetic intensity improvements, indicating a shift from memory-bound to compute-bound regimes for large-scale QCS. The authors provide reproducible artifacts (A1 and A2) to enable benchmarking and validation, and discuss scalability, strong-scaling results up to 8 GPUs and the potential for future NVIDIA architectures. Overall, Queen offers a practical, scalable path to faster quantum circuit simulations, facilitating rapid development of quantum algorithms on classical hardware.

Abstract

The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations and unpolished circuit optimizations, the past simulators were severely constrained in performance, leading to stagnation in quantum computing. In this paper, we present an innovative quantum circuit simulation toolkit comprising gate optimization and simulation modules to address these performance challenges. For the performance, scalability, and comprehensive evaluation, we conduct a series of particular circuit benchmarks and strong scaling tests on a DGX-A100 workstation and achieve averaging 9 times speedup compared to state-of-the-art simulators, including QuEST, IBM-Aer, and NVIDIA-cuQuantum. Moreover, the critical performance metric FLOPS increases by up to a factor of 8-fold, and arithmetic intensity experiences a remarkable 96x enhancement. We believe the proposed toolkit paves the way for faster quantum circuit simulations, thereby facilitating the development of novel quantum algorithms.
Paper Structure (28 sections, 7 equations, 12 figures, 2 tables, 5 algorithms)

This paper contains 28 sections, 7 equations, 12 figures, 2 tables, 5 algorithms.

Figures (12)

  • Figure 1: The quantum circuit simulation toolkit.
  • Figure 2: The types of state vector-based simulation.
  • Figure 3: The partition for qubit representation.
  • Figure 4: The workflow of our quantum circuit simulation.
  • Figure 5: The circuit diagram for the simulator.
  • ...and 7 more figures