Table of Contents
Fetching ...

TensorCircuit-NG: A Universal, Composable, and Scalable Platform for Quantum Computing and Quantum Simulation

Shi-Xin Zhang, Yu-Qin Chen, Weitang Li, Jiace Sun, Wei-Guo Ma, Pei-Lin Zheng, Yu-Xiang Huang, Qi-Xiang Wang, Hui Yu, Zhuo Li, Xuyang Huang, Zong-Liang Li, Zhou-Quan Wan, Shuo Liu, Jiezhong Qiu, Jiaqi Miao, Zixuan Song, Yuxuan Yan, Kazuki Tsuoka, Pan Zhang, Lei Wang, Heng Fan, Chang-Yu Hsieh, Hong Yao, Tao Xiang

TL;DR

TensorCircuit-NG addresses the need for a scalable, differentiable platform that unifies quantum physics modeling with AI and HPC. It introduces a tensor-native programming paradigm and a dual-layer architecture that decouples physics from hardware, enabling seamless backend switching and cross-framework composition. The key contributions include native backend interfaces, cross-framework translations, extensive domain modules (qudits, fermion Gaussian states, noise modeling, analog and stabilizer engines, MPS), and a distributed tensor-network contraction infrastructure. The platform demonstrates near-linear speedups on GPU clusters for variational quantum algorithms and supports end-to-end demonstrations from quantum-machine learning to many-body physics, positioning TensorCircuit-NG as a scalable, open-source framework for the next generation of quantum science.

Abstract

We present TensorCircuit-NG, a next-generation quantum software platform designed to bridge the gap between quantum physics, artificial intelligence, and high-performance computing. Moving beyond the scope of traditional circuit simulators, TensorCircuit-NG establishes a unified, tensor-native programming paradigm where quantum circuits, tensor networks, and neural networks fuse into a single, end-to-end differentiable computational graph. Built upon industry-standard machine learning backends (JAX, TensorFlow, PyTorch), the framework introduces comprehensive capabilities for approximate circuit simulation, analog dynamics, fermion Gaussian states, qudit systems, and scalable noise modeling. To tackle the exponential complexity of deep quantum circuits, TensorCircuit-NG implements advanced distributed computing strategies, including automated data parallelism and model-parallel tensor network slicing. We validate these capabilities on GPU clusters, demonstrating a near-linear speedup in distributed variational quantum algorithms. TensorCircuit-NG enables flagship applications, including end-to-end QML for CIFAR-100 computer vision, efficient pipelines from quantum states to neural networks via classical shadows, and differentiable optimization of tensor network states for many-body physics.

TensorCircuit-NG: A Universal, Composable, and Scalable Platform for Quantum Computing and Quantum Simulation

TL;DR

TensorCircuit-NG addresses the need for a scalable, differentiable platform that unifies quantum physics modeling with AI and HPC. It introduces a tensor-native programming paradigm and a dual-layer architecture that decouples physics from hardware, enabling seamless backend switching and cross-framework composition. The key contributions include native backend interfaces, cross-framework translations, extensive domain modules (qudits, fermion Gaussian states, noise modeling, analog and stabilizer engines, MPS), and a distributed tensor-network contraction infrastructure. The platform demonstrates near-linear speedups on GPU clusters for variational quantum algorithms and supports end-to-end demonstrations from quantum-machine learning to many-body physics, positioning TensorCircuit-NG as a scalable, open-source framework for the next generation of quantum science.

Abstract

We present TensorCircuit-NG, a next-generation quantum software platform designed to bridge the gap between quantum physics, artificial intelligence, and high-performance computing. Moving beyond the scope of traditional circuit simulators, TensorCircuit-NG establishes a unified, tensor-native programming paradigm where quantum circuits, tensor networks, and neural networks fuse into a single, end-to-end differentiable computational graph. Built upon industry-standard machine learning backends (JAX, TensorFlow, PyTorch), the framework introduces comprehensive capabilities for approximate circuit simulation, analog dynamics, fermion Gaussian states, qudit systems, and scalable noise modeling. To tackle the exponential complexity of deep quantum circuits, TensorCircuit-NG implements advanced distributed computing strategies, including automated data parallelism and model-parallel tensor network slicing. We validate these capabilities on GPU clusters, demonstrating a near-linear speedup in distributed variational quantum algorithms. TensorCircuit-NG enables flagship applications, including end-to-end QML for CIFAR-100 computer vision, efficient pipelines from quantum states to neural networks via classical shadows, and differentiable optimization of tensor network states for many-body physics.
Paper Structure (37 sections, 1 equation, 4 figures, 6 tables)

This paper contains 37 sections, 1 equation, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The Hierarchical Software Architecture of TensorCircuit-NG. The platform is structured into five distinct layers, bridging raw hardware acceleration with high-level physical modeling. (Bottom) The Hardware Infrastructure layer abstracts diverse compute resources, seamlessly dispatching workloads to CPUs, GPU clusters, TPUs, or quantum processors (QPUs). (Second) The Numerical Backends layer unifies industry-standard ML frameworks (JAX, TensorFlow, PyTorch) via a backend-agnostic dispatch interface. (Third) The Computational Paradigms engine serves as the core transformation layer, injecting AD, JIT, VMAP, and distributed computing capabilities into all upstream objects. (Fourth)Core Abstractions unify quantum circuits, tensor networks, and neural networks into composable, differentiable tensors. (Top) The Physical Modeling layer provides domain-specific modules for constructing Hamiltonians, simulating open system noise, and evolving many-body dynamics.
  • Figure 2: Performance Benchmarks on Distributed VQE on NVIDIA H200 Cluster. (a) Strong scalability for a 32-qubit 16-layer TFIM VQE task. The DistributedContractor achieves a $\mathbf{7.5\times}$ speedup on 8 GPUs compared to a single device, reducing the optimization step time from 17.86s to 2.38s. (b) Execution time per optimization step as a function of system size ($N \in [32, 40]$) on a fixed 8-GPU cluster. The scaling follows $T \propto 2^{1.1N}$, reflecting the combined complexity of Hilbert space dimension (N-qubit) and circuit depth ($L=N/2$). The framework successfully simulates a 40-qubit, 20-layer circuit with gradient for $11700$ circuit parameters in approximately 18 minutes per step.
  • Figure 3: Pipeline of the End-to-End QML Pipeline for CIFAR-100. The pipeline utilizes amplitude encoding for high-dimensional input ($32\times32$ pixels, 3 channels), processes data via a JAX-compiled deep quantum circuit with expressive SU(4) gates, and performs classification via probability on subset qubits. The entire workflow is differentiable and optimized end-to-end.
  • Figure 4: End-to-End Variational Workflow for Excited States. The framework operates by constructing the overlap ($S$) and Hamiltonian ($H$) matrices from a set of parameterized, non-orthogonal states $\{|\psi_i(\theta)\rangle\}$ which can be built on top of tensor networks, neural networks or quantum circuits. The variational parameters are optimized by minimizing the universal loss function $L = \text{Tr}(S^{-1}H)$ using automatic differentiation gradients $\frac{\partial L}{\partial \theta}$. Finally, the approximate low-energy spectrum is retrieved by solving the generalized eigenvalue problem $Hc=ESc$ in the post-optimization phase. Reproduced from Ref. Zhang2025es.