Table of Contents
Fetching ...

Harnessing CUDA-Q's MPS for Tensor Network Simulations of Large-Scale Quantum Circuits

Gabin Schieffer, Stefano Markidis, Ivy Peng

TL;DR

This work evaluates CUDA-Q’s tensor-network backends, focusing on Matrix Product State (MPS) representations, to enable large-qubit quantum circuit simulations on a single GPU. By comparing state-vector, exact tensor-network, and MPS backends on a Grace Hopper system across five representative circuits, the study shows that SV remains fastest when feasible, but TN and especially MPS enable simulations beyond SV memory limits, reaching up to about 60–90 qubits depending on circuit structure. Profiling reveals that SVD iterations in the MPS approach offer substantial contraction-time reductions, though GPU utilization is uneven and Tensor Cores are underused for these workloads. The work also investigates the impact of MPS approximation via bond-dimension controls ($\\chi_{max}$), demonstrating that accuracy can be preserved for key outcomes at moderate $\\chi$ values, with explicit validation on a 10-qubit QAOA circuit. Overall, the results highlight the practical potential of MPS-based quantum circuit simulation on commodity GPUs and outline future directions for deeper integration with broader quantum software stacks.

Abstract

Quantum computer simulators are an indispensable tool for prototyping quantum algorithms and verifying the functioning of existing quantum computer hardware. The current largest quantum computers feature more than one thousand qubits, challenging their classical simulators. State-vector quantum simulators are challenged by the exponential increase of representable quantum states with respect to the number of qubits, making more than fifty qubits practically unfeasible. A more appealing approach for simulating quantum computers is adopting the tensor network approach, whose memory requirements fundamentally depend on the level of entanglement in the quantum circuit, and allows simulating the current largest quantum computers. This work investigates and evaluates the CUDA-Q tensor network simulators on an Nvidia Grace Hopper system, particularly the Matrix Product State (MPS) formulation. We compare the performance of the CUDA-Q state vector implementation and validate the correctness of MPS simulations. Our results highlight that tensor network-based methods provide a significant opportunity to simulate large-qubit circuits, albeit approximately. We also show that current GPU-accelerated computation cannot fully utilize GPU efficiently in the case of MPS simulations.

Harnessing CUDA-Q's MPS for Tensor Network Simulations of Large-Scale Quantum Circuits

TL;DR

This work evaluates CUDA-Q’s tensor-network backends, focusing on Matrix Product State (MPS) representations, to enable large-qubit quantum circuit simulations on a single GPU. By comparing state-vector, exact tensor-network, and MPS backends on a Grace Hopper system across five representative circuits, the study shows that SV remains fastest when feasible, but TN and especially MPS enable simulations beyond SV memory limits, reaching up to about 60–90 qubits depending on circuit structure. Profiling reveals that SVD iterations in the MPS approach offer substantial contraction-time reductions, though GPU utilization is uneven and Tensor Cores are underused for these workloads. The work also investigates the impact of MPS approximation via bond-dimension controls (), demonstrating that accuracy can be preserved for key outcomes at moderate values, with explicit validation on a 10-qubit QAOA circuit. Overall, the results highlight the practical potential of MPS-based quantum circuit simulation on commodity GPUs and outline future directions for deeper integration with broader quantum software stacks.

Abstract

Quantum computer simulators are an indispensable tool for prototyping quantum algorithms and verifying the functioning of existing quantum computer hardware. The current largest quantum computers feature more than one thousand qubits, challenging their classical simulators. State-vector quantum simulators are challenged by the exponential increase of representable quantum states with respect to the number of qubits, making more than fifty qubits practically unfeasible. A more appealing approach for simulating quantum computers is adopting the tensor network approach, whose memory requirements fundamentally depend on the level of entanglement in the quantum circuit, and allows simulating the current largest quantum computers. This work investigates and evaluates the CUDA-Q tensor network simulators on an Nvidia Grace Hopper system, particularly the Matrix Product State (MPS) formulation. We compare the performance of the CUDA-Q state vector implementation and validate the correctness of MPS simulations. Our results highlight that tensor network-based methods provide a significant opportunity to simulate large-qubit circuits, albeit approximately. We also show that current GPU-accelerated computation cannot fully utilize GPU efficiently in the case of MPS simulations.

Paper Structure

This paper contains 21 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A quantum circuit creating an entangled bell state (left), and its equivalent representation as a tensor network (right). Adapted from biamonte_tensor_2017.
  • Figure 2: Tensor network representation of a matrix product state (MPS). Each of the $n$ qubits is represented by a $s_i$.
  • Figure 3: The number of gates (\ref{['fig:gate_number']}) and entanglement ratios (\ref{['fig:entanglement_ratio']}) of the five evaluated circuits, for $n$ qubits. Note: the gate count in a QFT circuit depends on the input problem, therefore, the entanglement ratio presented for QFT circuit is a lower-bound.
  • Figure 4: Runtime for increasing number of qubits for each circuit, simulation using state vector (SV), approximate matrix product state (MPS), and exact tensor network (TN).
  • Figure 5: Runtime measurements for the four circuits, with qubit count $n\geq35$, along with model (dashed line). Results are split based on the scaling behavior of the runtime with respect to the number of qubits $n$; either power ($t=\alpha n^\beta$, top) or linear ($t=an+b$, bottom). The projected runtime for state vector simulation is plotted, in the unrealistic hypothesis of infinite memory on a single GPU.
  • ...and 3 more figures