Table of Contents
Fetching ...

GPU-Accelerated Quantum Simulation of Stabilizer Circuits

Muhammad Osama, Dimitrios Thanos, Alfons Laarman

Abstract

We introduce new parallel algorithms for efficiently simulating stabilizer (Clifford) circuits on GPUs, with a focus on data-parallel tableau evolution and scalable handling of projective measurements. Our approach reformulates key bottlenecks in stabilizer simulation -- such as Gaussian elimination and measurement updates -- into GPU-tailored primitives that eliminate sequential dependencies and maximize memory coalescing. We implement these techniques in QuaSARQ, a GPU-accelerated stabilizer simulator designed for large qubit counts and many-shot sampling. Across a broad benchmark suite reaching 180,000 qubits and depth 1,000 (roughly 130M gates), QuaSARQ shows substantial runtime improvements, with up to 105$\times$ speedup, and over 80% energy reduction on demanding instances. Moreover, QuaSARQ consistently outperforms Stim, a state-of-the-art CPU-optimized stabilizer simulator, as well as Qiskit-Aer (CPU/GPU), Qibo, Cirq, and PennyLane. Finally, QuaSARQ exhibits a significant advantage in many-shot sampling on large workloads. These results demonstrate that our parallel algorithms can significantly advance the scalability of stabilizer-circuit simulation, particularly for workloads involving extensive measurements and sampling.

GPU-Accelerated Quantum Simulation of Stabilizer Circuits

Abstract

We introduce new parallel algorithms for efficiently simulating stabilizer (Clifford) circuits on GPUs, with a focus on data-parallel tableau evolution and scalable handling of projective measurements. Our approach reformulates key bottlenecks in stabilizer simulation -- such as Gaussian elimination and measurement updates -- into GPU-tailored primitives that eliminate sequential dependencies and maximize memory coalescing. We implement these techniques in QuaSARQ, a GPU-accelerated stabilizer simulator designed for large qubit counts and many-shot sampling. Across a broad benchmark suite reaching 180,000 qubits and depth 1,000 (roughly 130M gates), QuaSARQ shows substantial runtime improvements, with up to 105 speedup, and over 80% energy reduction on demanding instances. Moreover, QuaSARQ consistently outperforms Stim, a state-of-the-art CPU-optimized stabilizer simulator, as well as Qiskit-Aer (CPU/GPU), Qibo, Cirq, and PennyLane. Finally, QuaSARQ exhibits a significant advantage in many-shot sampling on large workloads. These results demonstrate that our parallel algorithms can significantly advance the scalability of stabilizer-circuit simulation, particularly for workloads involving extensive measurements and sampling.
Paper Structure (51 sections, 20 equations, 9 figures, 1 table, 9 algorithms)

This paper contains 51 sections, 20 equations, 9 figures, 1 table, 9 algorithms.

Figures (9)

  • Figure 1: GPU architecture hitchHikemultiGPU.
  • Figure 2: Input circuit on the left. Scheduled circuit on the right.
  • Figure 3: Logical and physical formats of a word-aligned tableau.
  • Figure 4: Running example of \ref{['alg:buildTableau']} on 128-qubit system.
  • Figure 5: Compacting pivots on a 6-qubit tableau.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 1: Weak Simulation
  • Example 1
  • Definition 2: Gate Dependency Relation
  • Definition 3: Parallel Gates
  • Definition 4: Maximal Window