Table of Contents
Fetching ...

A Scheduler for the Active Volume Architecture

Sam Heavey, Athena Caesura

TL;DR

A novel formula for bridge- and stale-state-qubit overheads is empirically derived and improved the accuracy of runtime estimates are improved, revealing that larger circuits can run on a given computer than previously predicted by analytic models.

Abstract

We improve the accuracy of Active Volume resource estimates by explicitly scheduling when Active Volume blocks execute. We present software that uses a greedy strategy to assign each logical qubit a role in each logical cycle (e.g., workspace, stale state storage, and bridge qubits). We empirically derive a novel formula for bridge- and stale-state-qubit overheads and improve the accuracy of runtime estimates, revealing that larger circuits can run on a given computer than previously predicted by analytic models. For a $4\times4$ Fermi-Hubbard simulation test circuit, this yields a $1.76\times$ runtime speedup with a $1.44\times$ reduction in bridge- and stale-state-qubit overheads compared to the model used in arXiv:2501.06165. Moreover, we show that for this test circuit, reaction times are insignificant in runtime estimates for computers with fewer than 600 logical qubits and that the number of reaction layers per logical cycle remains 1 in this regime. Our results pave the way for a full compilation pipeline for the Active Volume architecture and improved analytic resource estimates.

A Scheduler for the Active Volume Architecture

TL;DR

A novel formula for bridge- and stale-state-qubit overheads is empirically derived and improved the accuracy of runtime estimates are improved, revealing that larger circuits can run on a given computer than previously predicted by analytic models.

Abstract

We improve the accuracy of Active Volume resource estimates by explicitly scheduling when Active Volume blocks execute. We present software that uses a greedy strategy to assign each logical qubit a role in each logical cycle (e.g., workspace, stale state storage, and bridge qubits). We empirically derive a novel formula for bridge- and stale-state-qubit overheads and improve the accuracy of runtime estimates, revealing that larger circuits can run on a given computer than previously predicted by analytic models. For a Fermi-Hubbard simulation test circuit, this yields a runtime speedup with a reduction in bridge- and stale-state-qubit overheads compared to the model used in arXiv:2501.06165. Moreover, we show that for this test circuit, reaction times are insignificant in runtime estimates for computers with fewer than 600 logical qubits and that the number of reaction layers per logical cycle remains 1 in this regime. Our results pave the way for a full compilation pipeline for the Active Volume architecture and improved analytic resource estimates.
Paper Structure (37 sections, 20 equations, 17 figures, 3 tables, 4 algorithms)

This paper contains 37 sections, 20 equations, 17 figures, 3 tables, 4 algorithms.

Figures (17)

  • Figure 1: a) A circuit that consists of two $CNOT$ gates, ${CNOT}_A$ and ${CNOT}_B$, acting on disjoint sets of qubits (left). A $CNOT$ gate written as a series of Pauli-Product-Measurements (PPMs) (right). Note Pauli corrections are omitted, since they can be tracked classically with no physical gate cost. b) An architecture where qubits exist as a 2D array of surface code patches that have nearest-neighbor-only connections, described in detail in litinski2019game. The dashed (solid) boundaries represent the logical X (Z) operator for each patch. The blue patches with an overlaid $X$ indicate destructive $X$-basis measurements on all physical qubits. The diagram flows in order of increasing time step (which have units of logical cycles) such that the first column shows the execution of ${CNOT}_A$ between $\ket{q_1}$ and $\ket{q_5}$ and the second column shows ${CNOT}_B$ between $\ket{q_2}$ and $\ket{q_3}$. The computation employs 10 qubits over 4 logical cycles which equates to a total spacetime volume of 40 blocks. c) The AV derivations of a $CNOT$ gate using surface code qubits. We start by writing both gates as ZX diagrams (middle-left) and then convert these into orientated ZX diagrams (OZX) (middle-right). We then formally write the OZX diagram in logical block notation, counting 4 blocks. Both OZX diagrams and logical block notation have a one-to-one correspondence with spacetime blocks which represent the cost of lattice surgery between surface codes, for more details on OZX diagrams and logical block notation see Litinski22Active. d) We execute the circuit from a) on a 10 qubit computer in the AV architecture, due to long range connections both gates can be completed in one logical cycle, with a qubit to spare. Logical blocks 1,4,5,8 implement ${CNOT}_A$ and 2,3,6,7 implement ${CNOT}_B$. The total spacetime cost of this computation is 10 qubits multiplied by 1 logical cycle which equals 10 blocks.
  • Figure 2: Gates $A$ and $B$ share qubit $\ket{q_2}$. To execute $A$ and $B$ in parallel (right) we initialize the Bell state $\ket{B\tilde{B}}$ (e.g. $\frac{1}{\sqrt{2}}\left[\ket{00}+\ket{11}\right]$). $\ket{\tilde{B}}$ is used in place of $\ket{q_2}$ as an input into $A$, while $\ket{B}$ sits idle. After $A$ and $B$ are executed, we perform a Bell measurement on $\ket{q_2}$ and $\ket{B}$. A Pauli $X$ or $Z$ correction is tracked depending on the $ZZ$ and $XX$ measurement outcomes, respectively. These corrections are transformed by $A$ into $X'=A^{\dagger}XA$ and $Z'=A^{\dagger}ZA$. Note, $\ket{q_2}$ may also represent a qubit register, if that is the case then this process is applied to each qubit in the register (i.e. one Bell state for each qubit).
  • Figure 3: A diagram showing the nested definitions of different qubit types. The goal of our block scheduler will generally be to increase the workspace size as much as possible, while decreasing the number of unused and bridge qubits.
  • Figure 4: A diagram illustrating how to create a DAG (right) from a simple quantum circuit (left). The vertices of the DAG maintain crucial information about the gates such as which qubits they act upon, the number of stale states they produce, the $\ket{Y}$ states they consume (not shown), and their AV. Each of these attributes will be used by the block scheduler. Note, the attributes of the vertex $A$ were obtained using the $\frac{\pi}{8}$ PPR decomposition given in \ref{['fig:magic_state_injection']}.
  • Figure 5: Qubit usage per logical cycle for simulations of a qubitized Fermi–Hubbard model on a $6\times6$ lattice is shown for the two primary circuit gadgets: the controlled unitary (top) and the QFT$^\dagger$ (bottom). To better reveal usage trends, we apply a rolling average to each qubit category over a 25-cycle window. In both cases, the numbers of stale states, bridge qubits, and unused qubits remain well below the 20% threshold assumed in Caesura25 throughout the simulation. We plot idle $\ket{Y}$ states explicitly, even though they are data qubits, to show that our method for handling reactive $Y$ measurements (see \ref{['subsec:greedy_algorithm']}) does not cause significant buildup of these states in memory. In the controlled unitary, the PREP, SELECT, and PREP$^\dagger$ stages are clearly visible: the number of data qubits increases to accommodate qubits introduced by left-elbow gates gidney2018halving in the SELECT step. One might expect that, since the cost of the QFT is dominated by rotations, the number of bridge qubits would be higher. But we instead find that the number of bridge qubits is lower than in the controlled unitary case. Lastly, spikes in unused qubits identify algorithmic bottlenecks where high-AV gates struggle to fit within the computer, thereby highlighting potential sources of optimization.
  • ...and 12 more figures

Theorems & Definitions (6)

  • Definition 1: Memory qubit
  • Definition 2: Workspace qubit
  • Definition 3: Unused qubits
  • Definition 4: Data qubits
  • Definition 5: Stale states
  • Definition 6: Bridge qubits