TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC
Ang Li, Chenxu Liu, Samuel Stein, In-Saeng Suh, Muqing Zheng, Meng Wang, Yue Shi, Bo Fang, Martin Roetteler, Travis Humble
TL;DR
TANQ-Sim addresses the need for scalable, noise-aware quantum circuit simulation beyond near-term devices by implementing a full-scale density-matrix simulator optimized for GPUs. It introduces a universal C1/C2 gate framework, mapping C2 to double-precision tensorcores, and leverages gate fusion and NVSHMEM-based GPU-wide communication to achieve strong scaling on multi-GPU HPC systems. Noise is modeled via Kraus operators for depolarizing and thermal relaxation channels, with validation against IBMQ/Qiskit-Aer benchmarks, enabling realistic assessment of deep circuits and error mitigation strategies. Integrated with QIR front-ends, TANQ-Sim facilitates cross-language interoperability and practical deployment on modern HPC resources, providing actionable insights for quantum algorithm validation and hardware design. The results on the Perlmutter system demonstrate competitive performance, notable speedups over existing simulators, and demonstrated utility through teleportation, distillation, and Ising-model case studies.
Abstract
Although there have been remarkable advances in quantum computing (QC), it remains crucial to simulate quantum programs using classical large-scale parallel computing systems to validate quantum algorithms, comprehend the impact of noise, and develop resilient quantum applications. This is particularly important for bridging the gap between near-term noisy-intermediate-scale-quantum (NISQ) computing and future fault-tolerant quantum computing (FTQC). Nevertheless, current simulation methods either lack the capability to simulate noise, or simulate with excessive computational costs, or do not scale out effectively. In this paper, we propose TANQ-Sim, a full-scale density matrix based simulator designed to simulate practical deep circuits with both coherent and non-coherent noise. To address the significant computational cost associated with such simulations, we propose a new density-matrix simulation approach that enables TANQ-Sim to leverage the latest double-precision tensorcores (DPTCs) in NVIDIA Ampere and Hopper GPUs. To the best of our knowledge, this is the first application of double-precision tensorcores for non-AI/ML workloads. To optimize performance, we also propose specific gate fusion techniques for density matrix simulation. For scaling, we rely on the advanced GPU-side communication library NVSHMEM and propose effective optimization methods for enhancing communication efficiency. Evaluations on the NERSC Perlmutter supercomputer demonstrate the functionality, performance, and scalability of the simulator. We also present three case studies to showcase the practical usage of TANQ-Sim, including teleportation, entanglement distillation, and Ising simulation. TANQ-Sim will be released on GitHub.
