HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

Qi Zhou; Zi-Hao Mei; Han-Qing Shi; Liang-Liang Guo; Xiao-Yan Yang; Yun-Jie Wang; Xiao-Fan Xu; Cheng Xue; Wei-Cheng Kong; Jun-Chao Wang; Yu-Chun Wu; Zhao-Yun Chen; Guo-Ping Guo

HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

Qi Zhou, Zi-Hao Mei, Han-Qing Shi, Liang-Liang Guo, Xiao-Yan Yang, Yun-Jie Wang, Xiao-Fan Xu, Cheng Xue, Wei-Cheng Kong, Jun-Chao Wang, Yu-Chun Wu, Zhao-Yun Chen, Guo-Ping Guo

TL;DR

HiMA addresses the scalability challenge of quantum control by introducing a hierarchical microarchitecture that decentralizes circuit information into per-qubit control nodes and supports multiprocessing for quantum process-level parallelism. Implemented on a 72-qubit superconducting QPU and extensible to 6144 qubits via three-layer cascading, HiMA achieves substantial speedups (up to 4.89×) and CLOPS (up to 43,680 in cloud experiments), while maintaining high QPU utilization. The architecture relies on discrete qubit-level drive/readout, a process-based hierarchical trigger, and staggered triggering to mitigate crosstalk, enabling asynchronous parallel execution and real-time feedback essential for error correction. Practically, HiMA enables scalable quantum cloud platforms with improved throughput and flexible collaboration among on-site and remote users, paving the way for larger, fault-tolerant quantum experiments.

Abstract

Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level parallelism. This microarchitecture is based on three core elements: (i) discrete qubit-level drive and readout, (ii) a process-based hierarchical trigger mechanism, and (iii) multiprocessing with a staggered triggering technique to enable efficient quantum process-level parallelism. We implement HiMA as a control system for a 72-qubit tunable superconducting quantum processing unit, serving a public quantum cloud computing platform, which is capable of expanding to 6144 qubits through three-layer cascading. In our benchmarking tests, HiMA achieves up to a 4.89x speedup under a 5-process parallel configuration. Consequently, to the best of our knowledge, we have achieved the highest CLOPS (Circuit Layer Operations Per Second), reaching up to 43,680, across all publicly available platforms.

HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

TL;DR

Abstract

Paper Structure (34 sections, 3 equations, 10 figures, 4 tables)

This paper contains 34 sections, 3 equations, 10 figures, 4 tables.

Introduction
Background
Quantum circuits
Superconducting qubits
Defining Efficiency and Utilization of Quantum Processing Unit
The time span and efficiency of quantum applications execution
Quantum Process-level Parallelism and QPU Load Average
Circuit Layer Operations Per Second
Requirement
Scalability
Timing Synchronization
Feedback Control
Microarchitecture
Overview of HiMA
System overview
...and 19 more sections

Figures (10)

Figure 1: Scenario of Collaborative Quantum Computing. Multiple specialists concurrently test and develop applications on a shared quantum chip within a quantum computing laboratory. The system's process-level parallelism enables independent access to qubits by both on-site scientists and remote users via the cloud, fostering collaboration and accelerating quantum research.
Figure 2: Comparison of hierarchical and centralized microarchitectures. (a) An example of a quantum circuit with 100 qubits. Each dashed box represents a layer of the quantum circuit, which should be executed in parallel. (b) A schematic diagram of how centralized microarchitecture handles the corresponding quantum program. For example, 0 | Y 1 indicates a Y-gate applied to qubit 1, occurring simultaneously with the previous instruction. Hence, the quantum control processor must parse a large number of instructions to allow the circuit-level parallelism. (c) A schematic diagram of the hierarchical microarchitecture. The quantum circuit is decomposed into quantum operation sequences for each qubit, which are executed within the corresponding qubit control nodes (QCNs). Synchronization is achieved through unified triggering by the root controller.
Figure 3: The time consumption span of the execution of a quantum application and schematics diagrams of the single- and multi-processing. (a) The total time for a quantum task can be divided into three parts: preprocessing ($t_{\rm pre}$), execution on quantum control system ($t_{\rm QCS}$), and postprocessing ($t_{\rm post}$). $t_{\rm send}$ and $t_{\rm recv}$ represent the time taken by the microarchitecture to receive data packets and send back results, which is short and will not be included in the following discussions. (b) In the single-process scheme, although the pre- and post-processing can be executed asynchronizingly, the time on the quantum chip is still the sum of two subsequent quantum programs. (c) In the multiprocessing scheme, when two quantum programs do not share the same qubit, they can be executed in parallel, causing an overlap in the time on the quantum chip.
Figure 4: (a) A three-layer instance of HiMA, demonstrating its cascade structure. The root controller interfaces through middle-layer controllers, which are connected to multiple qubit cluster control subsystems (QCCSs). Each subsystem includes a leaf controller and qubit control nodes (QCNs) housed within execution modules. (b) A Multitasking Scenario of HiMA. Each QCCS is abstracted as consisting of 1 leaf controller and 24 QCNs, depicted as a diagonal corner rounded rectangular box. HiMA supports the asynchronous parallel execution of multiple tasks, with each task involving different QCNs. (c) Physical implementation of QCNs within execution modules. Execution modules include XY/Z drive modules and feedline input/output (I/O) modules. Each XY/Z drive module contains multiple qubit-level XY/Z drive units, while each feedline I/O module includes several feedline I/O units, with each unit housing multiple qubit readout (QR) I/O units corresponding to the number of qubits per feedline. A QCN is constructed by integrating the relevant qubit-level units from these modules.
Figure 5: (a) Architecture of the qubit drive unit within the XY/Z drive module. (b) The feedline Input/Output (I/O) unit manages the feedline, while the qubit readout I/O units handle the readout operations for the qubits sharing that feedline.
...and 5 more figures

HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

TL;DR

Abstract

HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

Authors

TL;DR

Abstract

Table of Contents

Figures (10)