Table of Contents
Fetching ...

TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC

Ang Li, Chenxu Liu, Samuel Stein, In-Saeng Suh, Muqing Zheng, Meng Wang, Yue Shi, Bo Fang, Martin Roetteler, Travis Humble

TL;DR

TANQ-Sim addresses the need for scalable, noise-aware quantum circuit simulation beyond near-term devices by implementing a full-scale density-matrix simulator optimized for GPUs. It introduces a universal C1/C2 gate framework, mapping C2 to double-precision tensorcores, and leverages gate fusion and NVSHMEM-based GPU-wide communication to achieve strong scaling on multi-GPU HPC systems. Noise is modeled via Kraus operators for depolarizing and thermal relaxation channels, with validation against IBMQ/Qiskit-Aer benchmarks, enabling realistic assessment of deep circuits and error mitigation strategies. Integrated with QIR front-ends, TANQ-Sim facilitates cross-language interoperability and practical deployment on modern HPC resources, providing actionable insights for quantum algorithm validation and hardware design. The results on the Perlmutter system demonstrate competitive performance, notable speedups over existing simulators, and demonstrated utility through teleportation, distillation, and Ising-model case studies.

Abstract

Although there have been remarkable advances in quantum computing (QC), it remains crucial to simulate quantum programs using classical large-scale parallel computing systems to validate quantum algorithms, comprehend the impact of noise, and develop resilient quantum applications. This is particularly important for bridging the gap between near-term noisy-intermediate-scale-quantum (NISQ) computing and future fault-tolerant quantum computing (FTQC). Nevertheless, current simulation methods either lack the capability to simulate noise, or simulate with excessive computational costs, or do not scale out effectively. In this paper, we propose TANQ-Sim, a full-scale density matrix based simulator designed to simulate practical deep circuits with both coherent and non-coherent noise. To address the significant computational cost associated with such simulations, we propose a new density-matrix simulation approach that enables TANQ-Sim to leverage the latest double-precision tensorcores (DPTCs) in NVIDIA Ampere and Hopper GPUs. To the best of our knowledge, this is the first application of double-precision tensorcores for non-AI/ML workloads. To optimize performance, we also propose specific gate fusion techniques for density matrix simulation. For scaling, we rely on the advanced GPU-side communication library NVSHMEM and propose effective optimization methods for enhancing communication efficiency. Evaluations on the NERSC Perlmutter supercomputer demonstrate the functionality, performance, and scalability of the simulator. We also present three case studies to showcase the practical usage of TANQ-Sim, including teleportation, entanglement distillation, and Ising simulation. TANQ-Sim will be released on GitHub.

TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC

TL;DR

TANQ-Sim addresses the need for scalable, noise-aware quantum circuit simulation beyond near-term devices by implementing a full-scale density-matrix simulator optimized for GPUs. It introduces a universal C1/C2 gate framework, mapping C2 to double-precision tensorcores, and leverages gate fusion and NVSHMEM-based GPU-wide communication to achieve strong scaling on multi-GPU HPC systems. Noise is modeled via Kraus operators for depolarizing and thermal relaxation channels, with validation against IBMQ/Qiskit-Aer benchmarks, enabling realistic assessment of deep circuits and error mitigation strategies. Integrated with QIR front-ends, TANQ-Sim facilitates cross-language interoperability and practical deployment on modern HPC resources, providing actionable insights for quantum algorithm validation and hardware design. The results on the Perlmutter system demonstrate competitive performance, notable speedups over existing simulators, and demonstrated utility through teleportation, distillation, and Ising-model case studies.

Abstract

Although there have been remarkable advances in quantum computing (QC), it remains crucial to simulate quantum programs using classical large-scale parallel computing systems to validate quantum algorithms, comprehend the impact of noise, and develop resilient quantum applications. This is particularly important for bridging the gap between near-term noisy-intermediate-scale-quantum (NISQ) computing and future fault-tolerant quantum computing (FTQC). Nevertheless, current simulation methods either lack the capability to simulate noise, or simulate with excessive computational costs, or do not scale out effectively. In this paper, we propose TANQ-Sim, a full-scale density matrix based simulator designed to simulate practical deep circuits with both coherent and non-coherent noise. To address the significant computational cost associated with such simulations, we propose a new density-matrix simulation approach that enables TANQ-Sim to leverage the latest double-precision tensorcores (DPTCs) in NVIDIA Ampere and Hopper GPUs. To the best of our knowledge, this is the first application of double-precision tensorcores for non-AI/ML workloads. To optimize performance, we also propose specific gate fusion techniques for density matrix simulation. For scaling, we rely on the advanced GPU-side communication library NVSHMEM and propose effective optimization methods for enhancing communication efficiency. Evaluations on the NERSC Perlmutter supercomputer demonstrate the functionality, performance, and scalability of the simulator. We also present three case studies to showcase the practical usage of TANQ-Sim, including teleportation, entanglement distillation, and Ising simulation. TANQ-Sim will be released on GitHub.
Paper Structure (20 sections, 13 equations, 12 figures, 4 tables)

This paper contains 20 sections, 13 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: NISQ device noise illustrated by testing a 3-qubit GHZ-state circuit (top-left), showing the ground truth (top-right), Rigetti Aspen-M3 result (center-left), IBMQ-Lima (center-right), IonQ QPU (bottom-left), and Quantinuum HQS-2 result (bottom-right), using 500 induction shots.
  • Figure 2: TANQ-Sim infrastructure. 1QF refers to the average 1-qubit fidelity. 2QF refers to the average 2-qubit fidelity. RO stands for average read-out fidelity.
  • Figure 3: The integration of TANQ-Sim with QIR.
  • Figure 4: Number of gates with transpilation and gate fusion.
  • Figure 5: Performance comparison with Qiskit-Aer-GPU, DM-Sim, and across GPU architectures.
  • ...and 7 more figures