Table of Contents
Fetching ...

Parallel iQCC Enables 200 Qubit Scale Quantum Chemistry on Accelerated Computing Platforms Surpassing Classical Benchmarks in Ruthenium Catalysts

Seyyed Mehdi Hosseini Jenab, Brandon Henderson, Scott N. Genin

Abstract

We introduce a parallel, GPU-accelerated implementation of the iterative qubit coupled cluster (iQCC) method that overcomes the exponential growth of the transformed Hamiltonian -- the principal bottleneck for classical emulation of quantum chemistry circuits. By distributing Hamiltonian terms across compute nodes via bit-wise partitioning and offloading Pauli contractions to GPUs, we achieve speedups exceeding two orders of magnitude over the serial CPU approach. Crucially, iQCC confines the variational evolution to a classically simulable operator subspace by selecting entanglers exclusively from the Direct Interaction Space, which guarantees non-vanishing energy gradients at every iteration and thereby naturally avoids the barren-plateau phenomenon that renders highly expressive quantum circuits untrainable. Leveraging these algorithmic and hardware advances, we simulate electronic-structure Hamiltonians for industrially relevant ruthenium catalysts in the 100--124 qubit regime, completing full ground-state calculations on NVIDIA GPUs in the ranges of 1.2 - 45 hrs and surpassing the accuracy of Density Matrix Renormalization Group. These results effectively de-quantize a significant portion of the NISQ roadmap: quantum advantage for chemistry is often assumed to emerge beyond ${\sim}50$ qubits, yet our work demonstrates that this frontier lies significantly further -- potentially past 200 qubits -- reshaping expectations for where genuine quantum advantage may first appear.

Parallel iQCC Enables 200 Qubit Scale Quantum Chemistry on Accelerated Computing Platforms Surpassing Classical Benchmarks in Ruthenium Catalysts

Abstract

We introduce a parallel, GPU-accelerated implementation of the iterative qubit coupled cluster (iQCC) method that overcomes the exponential growth of the transformed Hamiltonian -- the principal bottleneck for classical emulation of quantum chemistry circuits. By distributing Hamiltonian terms across compute nodes via bit-wise partitioning and offloading Pauli contractions to GPUs, we achieve speedups exceeding two orders of magnitude over the serial CPU approach. Crucially, iQCC confines the variational evolution to a classically simulable operator subspace by selecting entanglers exclusively from the Direct Interaction Space, which guarantees non-vanishing energy gradients at every iteration and thereby naturally avoids the barren-plateau phenomenon that renders highly expressive quantum circuits untrainable. Leveraging these algorithmic and hardware advances, we simulate electronic-structure Hamiltonians for industrially relevant ruthenium catalysts in the 100--124 qubit regime, completing full ground-state calculations on NVIDIA GPUs in the ranges of 1.2 - 45 hrs and surpassing the accuracy of Density Matrix Renormalization Group. These results effectively de-quantize a significant portion of the NISQ roadmap: quantum advantage for chemistry is often assumed to emerge beyond qubits, yet our work demonstrates that this frontier lies significantly further -- potentially past 200 qubits -- reshaping expectations for where genuine quantum advantage may first appear.
Paper Structure (34 sections, 16 equations, 3 figures, 5 tables)

This paper contains 34 sections, 16 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the parallel iQCC optimization workflow. After initialization and MPI distribution of the Hamiltonian (Step 1), the algorithm enters an iterative loop comprising: (2a) generator identification from the Direct Interaction Space via gradient screening, (2b) optional polynomial optimization of entangler amplitudes on GPUs, (2c) GPU-accelerated operator dressing of the Hamiltonian, and (2d) convergence checking against energy and iteration thresholds. The loop terminates when convergence criteria are met, followed by finalization (Step 3). Color coding distinguishes input/output (blue), initialization and finalization (red), the iterative core (orange/green), and GPU-accelerated stages (purple).
  • Figure 2: Energy convergence for the eight Ruthenium catalyst systems optimized using iQCC on 4 NVIDIA V100s
  • Figure 3: Relative Speedup for different compute platforms for a) XVIII CAS(64e,56o) with computational details in Table \ref{['tab:v100-results1']}, and b) XVIII CAS(100e,100o) with a Hamiltonian term limit of 1,000M and 1M entanglers in the final optimization step with 100 iQCC iterations.