Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation

Yuncheng Lu; Shuang Liang; Hongxiang Fan; Ce Guo; Wayne Luk; Paul H. J. Kelly

Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation

Yuncheng Lu, Shuang Liang, Hongxiang Fan, Ce Guo, Wayne Luk, Paul H. J. Kelly

TL;DR

This paper tackles the challenge of efficiently simulating large quantum circuits on classical hardware by addressing the absence of versatile cross-platform backends and sparsity-aware optimizations in Schrödinger-style simulators. It introduces CAST, a cross-platform toolchain that combines a sparsity-aware agglomerative gate fusion strategy with dynamic kernel generation, emitting LLVM IR for CPU vectorization and PTX for Nvidia GPUs, and supports JIT or static compilation. The approach leverages a CircuitTile data structure to enable safe fusion across commuting and consecutive gates, guided by a cost-model that optimizes operation counts. Evaluation across CPU and GPU backends demonstrates substantial speedups over leading simulators (e.g., up to $8.03\times$ vs Qiskit on CPU and $39.3\times$ vs cuQuantum on GPU) with modest compilation overhead, indicating strong practical impact for scalable quantum circuit verification and research.

Abstract

While existing quantum hardware resources have limited availability and reliability, there is a growing demand for exploring and verifying quantum algorithms. Efficient classical simulators for high-performance quantum simulation are critical to meeting this demand. However, due to the vastly varied characteristics of classical hardware, implementing hardware-specific optimizations for different hardware platforms is challenging. To address such needs, we propose CAST (Cross-platform Adaptive Schrödiner-style Simulation Toolchain), a novel compilation toolchain with cross-platform (CPU and Nvidia GPU) optimization and high-performance backend supports. CAST exploits a novel sparsity-aware gate fusion algorithm that automatically selects the best fusion strategy and backend configuration for targeted hardware platforms. CAST also aims to offer versatile and high-performance backend for different hardware platforms. To this end, CAST provides an LLVM IR-based vectorization optimization for various CPU architectures and instruction sets, as well as a PTX-based code generator for Nvidia GPU support. We benchmark CAST against IBM Qiskit, Google QSimCirq, Nvidia cuQuantum backend, and other high-performance simulators. On various 32-qubit CPU-based benchmarks, CAST is able to achieve up to 8.03x speedup than Qiskit. On various 30-qubit GPU-based benchmarks, CAST is able to achieve up to 39.3x speedup than Nvidia cuQuantum backend.

Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation

TL;DR

Abstract

Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)