Reducing the Computational Cost Scaling of Tensor Network Algorithms via Field-Programmable Gate Array Parallelism
Songtai Lv, Yang Liang, Rui Zhu, Qibin Zheng, Haiyuan Zou
TL;DR
This paper introduces a fine-grained FPGA-based parallel design to reduce the computational cost scaling of tensor-network algorithms, applying it to iTEBD and HOTRG. By implementing quad-tile partitioning and hardware-accelerated tensor contraction and SVD via two-sided Jacobi rotations, the approach achieves near-linear scaling with bond dimension $D_b$ for iTEBD and quadratic scaling for HOTRG, outperforming CPU and GPU implementations. The results show substantial speedups (e.g., up to ~$19.2\times$ for iTEBD and ~$24.7\times$ for HOTRG) and reveal power-law resource usage, supporting the feasibility of large-scale tensor-network acceleration on future FPGA architectures. Overall, this work establishes a principled hardware-accelerated framework that maps tensor-network computations to FPGA circuits, enabling scalable studies of complex quantum many-body systems and bridging tensor-network methods with hardware design.
Abstract
Improving the computational efficiency of quantum many-body calculations from a hardware perspective remains a critical challenge. Although field-programmable gate arrays (FPGAs) have recently been exploited to improve the computational scaling of algorithms such as Monte Carlo methods, their application to tensor network algorithms is still at an early stage. In this work, we propose a fine-grained parallel tensor network design based on FPGAs to substantially enhance the computational efficiency of two representative tensor network algorithms: the infinite time-evolving block decimation (iTEBD) and the higher-order tensor renormalization group (HOTRG). By employing a quad-tile partitioning strategy to decompose tensor elements and map them onto hardware circuits, our approach effectively translates algorithmic computational complexity into scalable hardware resource utilization, enabling an extremely high degree of parallelism on FPGAs. Compared with conventional CPU-based implementations, our scheme exhibits superior scalability in computation time, reducing the bond-dimension scaling of the computational cost from $O(D_b^3)$ to $O(D_b)$ for iTEBD and from $O(D_b^6)$ to $O(D_b^2)$ for HOTRG. This work provides a theoretical foundation for future hardware implementations of large-scale tensor network computations.
