SW-TNC : Reaching the Most Complex Random Quantum Circuit via Tensor Network Contraction
Yaojian Chen, Zhaoqi Sun, Chengyu Qiu, Zegang Li, Yanfei Liu, Lin Gan, Xiaohui Duan, Guangwen Yang
TL;DR
This work advances classical simulation of large random quantum circuits by optimizing tensor-network contraction on the Sunway architecture. It introduces data-reuse strategies (tree-like and spindle-like), core-array fusion with RMA, and in-kernel vectorized permutation, along with split-common TTGT to handle diverse contraction patterns. The combination yields substantial complexity reductions and performance gains, demonstrated by over 10× speedups on Zuchongzhi-60-24 across 1024+ Sunway nodes, and strong scalability up to thousands of processes. These techniques not only push the practical limits of classical RQC simulation but also offer broadly transferable insights for high-performance tensor computations and quantum-device verification.
Abstract
Classical simulation is essential in quantum algorithm development and quantum device verification. With the increasing complexity and diversity of quantum circuit structures, existing classical simulation algorithms need to be improved and extended. In this work, we propose novel strategies for tensor network contraction based simulator on Sunway architecture. Our approach addresses three main aspects: complexity, computational paradigms and fine-grained optimization. Data reuse schemes are designed to reduce floating-point operations, and memory organization techniques are employed to eliminate slicing overhead while maintaining parallelism. Step fusion strategy is extended by multi-core cooperation to improve the data locality and computation intensity. Fine-grained optimizations, such as in-kernel vectorized permutations, and split-K operators, are developed as well to address the challenges in new hotspot distribution and topological structure. These innovations can accelerate the simulation of the Zuchongzhi-60-24 by more than 10 times, using more than 1024 Sunway nodes (399,360 cores). Our work demonstrates the potential for enabling efficient classical simulation of increasingly complex quantum circuits.
