Table of Contents
Fetching ...

TensorQC: Towards Scalable Distributed Quantum Computing via Tensor Networks

Wei Tang, Margaret Martonosi

TL;DR

This paper tackles the scalability bottleneck of circuit cutting in distributed quantum computing by introducing TensorQC, a framework that uses tensor-network contraction to perform the classical co-processing more efficiently. By reinterpreting subcircuit outputs as tensors and contracting them along cut edges, TensorQC achieves exponential savings over naive reconstruction, enabling benchmarks up to 200 qubits on a single GPU and reducing the quantum-area burden by over an order of magnitude. The authors couple this approach with heavy-state selection and graph-partitioning heuristics to automatically locate high-quality cuts under hardware constraints, and validate the method across diverse benchmarks (QAOA, AQFT, Supremacy, W-State, GHZ). The result is a practical pathway to scalable distributed quantum computing, making previously intractable large-scale quantum benchmarks feasible with modest QPUs and current classical hardware.

Abstract

A quantum processing unit (QPU) must contain a large number of high quality qubits to produce accurate results for problems at useful scales. In contrast, most scientific and industry classical computation workloads happen in parallel on distributed systems, which rely on copying data across multiple cores. Unfortunately, copying quantum data is theoretically prohibited due to the quantum non-cloning theory. Instead, quantum circuit cutting techniques cut a large quantum circuit into multiple smaller subcircuits, distribute the subcircuits on parallel QPUs and reconstruct the results with classical computing. Such techniques make distributed hybrid quantum computing (DHQC) a possibility but also introduce an exponential classical co-processing cost in the number of cuts and easily become intractable. This paper presents TensorQC, which leverages classical tensor networks to bring an exponential runtime advantage over state-of-the-art parallelization post-processing techniques. As a result, this paper demonstrates running benchmarks that are otherwise intractable for a standalone QPU and prior circuit cutting techniques. Specifically, this paper runs six realistic benchmarks using QPUs available nowadays and a single GPU, and reduces the QPU size and quality requirements by more than $10\times$ over purely quantum platforms.

TensorQC: Towards Scalable Distributed Quantum Computing via Tensor Networks

TL;DR

This paper tackles the scalability bottleneck of circuit cutting in distributed quantum computing by introducing TensorQC, a framework that uses tensor-network contraction to perform the classical co-processing more efficiently. By reinterpreting subcircuit outputs as tensors and contracting them along cut edges, TensorQC achieves exponential savings over naive reconstruction, enabling benchmarks up to 200 qubits on a single GPU and reducing the quantum-area burden by over an order of magnitude. The authors couple this approach with heavy-state selection and graph-partitioning heuristics to automatically locate high-quality cuts under hardware constraints, and validate the method across diverse benchmarks (QAOA, AQFT, Supremacy, W-State, GHZ). The result is a practical pathway to scalable distributed quantum computing, making previously intractable large-scale quantum benchmarks feasible with modest QPUs and current classical hardware.

Abstract

A quantum processing unit (QPU) must contain a large number of high quality qubits to produce accurate results for problems at useful scales. In contrast, most scientific and industry classical computation workloads happen in parallel on distributed systems, which rely on copying data across multiple cores. Unfortunately, copying quantum data is theoretically prohibited due to the quantum non-cloning theory. Instead, quantum circuit cutting techniques cut a large quantum circuit into multiple smaller subcircuits, distribute the subcircuits on parallel QPUs and reconstruct the results with classical computing. Such techniques make distributed hybrid quantum computing (DHQC) a possibility but also introduce an exponential classical co-processing cost in the number of cuts and easily become intractable. This paper presents TensorQC, which leverages classical tensor networks to bring an exponential runtime advantage over state-of-the-art parallelization post-processing techniques. As a result, this paper demonstrates running benchmarks that are otherwise intractable for a standalone QPU and prior circuit cutting techniques. Specifically, this paper runs six realistic benchmarks using QPUs available nowadays and a single GPU, and reduces the QPU size and quality requirements by more than over purely quantum platforms.

Paper Structure

This paper contains 35 sections, 13 equations, 14 figures, 1 table, 3 algorithms.

Figures (14)

  • Figure 1: Example of cutting a $5$-qubit quantum circuit with one cut to divide it into two smaller subcircuits. (Left) The red cross indicates the cutting point. Subcircuit $1$ is shaded dark and subcircuit $2$ is shaded light. (Right) The dashed arrow between the subcircuits shows the path undertaken by the qubit wire being cut. The one cut needs to permute through the $\{I,X,Y,Z\}$ bases to reconstruct the unknown cut state. The two subcircuits require no quantum communications can now be executed independently in any order on multiple $3$-qubit QPUs.
  • Figure 2: A hypothetical $4$-subcircuit scenario to demonstrate the two sources of inefficiencies of prior works. Consider the case where $e_{1,2,3}$ are fixed and only $e_4$ permutes.
  • Figure 3: Reconstructing two subcircuits is equivalent to a pairwise tensor contraction.
  • Figure 4: Contracting the tensor network example from Figure \ref{['fig:prior_inefficiencies']}. The solid boxes show the pair of subcircuits being contracted. The dashed edges inside the boxes are the inner dimensions at every contraction. The edges across the boundary of the boxes are the outer dimensions at every contraction. Tensor network contraction only requires $144$ multiplications in total.
  • Figure 5: Slicing one cut edge to express a tensor network as a summation of smaller tensor networks.
  • ...and 9 more figures