TensorQC: Towards Scalable Distributed Quantum Computing via Tensor Networks
Wei Tang, Margaret Martonosi
TL;DR
This paper tackles the scalability bottleneck of circuit cutting in distributed quantum computing by introducing TensorQC, a framework that uses tensor-network contraction to perform the classical co-processing more efficiently. By reinterpreting subcircuit outputs as tensors and contracting them along cut edges, TensorQC achieves exponential savings over naive reconstruction, enabling benchmarks up to 200 qubits on a single GPU and reducing the quantum-area burden by over an order of magnitude. The authors couple this approach with heavy-state selection and graph-partitioning heuristics to automatically locate high-quality cuts under hardware constraints, and validate the method across diverse benchmarks (QAOA, AQFT, Supremacy, W-State, GHZ). The result is a practical pathway to scalable distributed quantum computing, making previously intractable large-scale quantum benchmarks feasible with modest QPUs and current classical hardware.
Abstract
A quantum processing unit (QPU) must contain a large number of high quality qubits to produce accurate results for problems at useful scales. In contrast, most scientific and industry classical computation workloads happen in parallel on distributed systems, which rely on copying data across multiple cores. Unfortunately, copying quantum data is theoretically prohibited due to the quantum non-cloning theory. Instead, quantum circuit cutting techniques cut a large quantum circuit into multiple smaller subcircuits, distribute the subcircuits on parallel QPUs and reconstruct the results with classical computing. Such techniques make distributed hybrid quantum computing (DHQC) a possibility but also introduce an exponential classical co-processing cost in the number of cuts and easily become intractable. This paper presents TensorQC, which leverages classical tensor networks to bring an exponential runtime advantage over state-of-the-art parallelization post-processing techniques. As a result, this paper demonstrates running benchmarks that are otherwise intractable for a standalone QPU and prior circuit cutting techniques. Specifically, this paper runs six realistic benchmarks using QPUs available nowadays and a single GPU, and reduces the QPU size and quality requirements by more than $10\times$ over purely quantum platforms.
