Circuit Partitioning and Full Circuit Execution: A Comparative Study of GPU-Based Quantum Circuit Simulation
Kartikey Sarode, Daniel E. Huang, E. Wes Bethel
TL;DR
The paper tackles the challenge of simulating large quantum circuits beyond NISQ devices by comparing circuit-splitting via CutQC against full-circuit execution with distributed memory on GPUs. It combines CutQC’s circuit-cutting framework with GPU-accelerated statevector simulation (Qiskit Aer-GPU) to reconstruct the original circuit probabilities and contrasts it with distributed full-circuit statevector simulation. The main finding is that full-circuit execution is faster on a single node, while circuit-splitting incurs exponential post-processing costs ($4^{K}$ per number of cuts and $9^{n}$ for cut CNOTs) but can reduce memory by narrowing subcircuits by about 30–40% in width, making it potentially advantageous under resource constraints. The work clarifies the runtime-memory tradeoffs between approaches and suggests hybrid strategies for large-scale quantum circuit simulation, with implications for scalable algorithm validation on classical hardware.
Abstract
Executing large quantum circuits is not feasible using the currently available NISQ (noisy intermediate-scale quantum) devices. The high costs of using real quantum devices make it further challenging to research and develop quantum algorithms. As a result, performing classical simulations is usually the preferred method for researching and validating large-scale quantum algorithms. However, these simulations require a huge amount of resources, as each additional qubit exponentially increases the computational space required. Distributed Quantum Computing (DQC) is a promising alternative to reduce the resources required for simulating large quantum algorithms at the cost of increased runtime. This study presents a comparative analysis of two simulation methods: circuit-splitting and full-circuit execution using distributed memory, each having a different type of overhead. The first method, using CutQC, cuts the circuit into smaller subcircuits and allows us to simulate a large quantum circuit on smaller machines. The second method, using Qiskit-Aer-GPU, distributes the computational space across a distributed memory system to simulate the entire quantum circuit. Results indicate that full-circuit executions are faster than circuit-splitting for simulations performed on a single node. However, circuit-splitting simulations show promising results in specific scenarios as the number of qubits is scaled.
