Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation
Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang
TL;DR
This work targets the scalability and energy efficiency gap in simulating random quantum circuits on classical hardware. It introduces a three-level parallelization scheme, hybrid low-precision inter-node communication, and an extended complex-half Einsum framework, enabling large tensor networks (up to tens of terabytes, across up to 2,304 GPUs) to be contracted efficiently. The authors demonstrate time-to-solution improvements by factors of up to an order of magnitude and substantial energy reductions compared with Google's Sycamore, including a best-case 17.18 seconds at 0.29 kWh with XEB 0.002 for a 32T network with post-processing. The results challenge the notion that quantum hardware inherently outperforms all classical approaches for RQC sampling in these regimes and point toward broader, scalable applications in quantum simulation and beyond.
Abstract
Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively.
