Table of Contents
Fetching ...

Circuit decompositions and scheduling for neutral atom devices with limited local addressability

Natalia Nottingham, Michael A. Perlin, Dhirpal Shah, Ryan White, Hannes Bernien, Frederic T. Chong, Jonathan M. Baker

TL;DR

An optimized compiler pipeline is presented that translates an input circuit from an arbitrary gate set into a realistic neutral atom native gate set containing global gates, and focuses on decomposition and scheduling passes that minimize the final circuit's global gate count and total global rotation amount.

Abstract

Despite major ongoing advancements in neutral atom hardware technology, there remains limited work in systems-level software tailored to overcoming the challenges of neutral atom quantum computers. In particular, most current neutral atom architectures do not natively support local addressing of single-qubit rotations about an axis in the xy-plane of the Bloch sphere. Instead, these are executed via global beams applied simultaneously to all qubits. While previous neutral atom experimental work has used straightforward synthesis methods to convert short sequences of operations into this native gate set, these methods cannot be incorporated into a systems-level framework nor applied to entire circuits without imposing impractical amounts of serialization. Without sufficient compiler optimizations, decompositions involving global gates will significantly increase circuit depth, gate count, and accumulation of errors. No prior compiler work has addressed this, and adapting existing compilers to solve this problem is nontrivial. In this paper, we present an optimized compiler pipeline that translates an input circuit from an arbitrary gate set into a realistic neutral atom native gate set containing global gates. We focus on decomposition and scheduling passes that minimize the final circuit's global gate count and total global rotation amount. As we show, these costs contribute the most to the circuit's duration and overall error, relative to costs incurred by other gate types. Compared to the unoptimized version of our compiler pipeline, minimizing global gate costs gives up to 4.77x speedup in circuit duration. Compared to the closest prior existing work, we achieve up to 53.8x speedup. For large circuits, we observe a few orders of magnitude improvement in circuit fidelities.

Circuit decompositions and scheduling for neutral atom devices with limited local addressability

TL;DR

An optimized compiler pipeline is presented that translates an input circuit from an arbitrary gate set into a realistic neutral atom native gate set containing global gates, and focuses on decomposition and scheduling passes that minimize the final circuit's global gate count and total global rotation amount.

Abstract

Despite major ongoing advancements in neutral atom hardware technology, there remains limited work in systems-level software tailored to overcoming the challenges of neutral atom quantum computers. In particular, most current neutral atom architectures do not natively support local addressing of single-qubit rotations about an axis in the xy-plane of the Bloch sphere. Instead, these are executed via global beams applied simultaneously to all qubits. While previous neutral atom experimental work has used straightforward synthesis methods to convert short sequences of operations into this native gate set, these methods cannot be incorporated into a systems-level framework nor applied to entire circuits without imposing impractical amounts of serialization. Without sufficient compiler optimizations, decompositions involving global gates will significantly increase circuit depth, gate count, and accumulation of errors. No prior compiler work has addressed this, and adapting existing compilers to solve this problem is nontrivial. In this paper, we present an optimized compiler pipeline that translates an input circuit from an arbitrary gate set into a realistic neutral atom native gate set containing global gates. We focus on decomposition and scheduling passes that minimize the final circuit's global gate count and total global rotation amount. As we show, these costs contribute the most to the circuit's duration and overall error, relative to costs incurred by other gate types. Compared to the unoptimized version of our compiler pipeline, minimizing global gate costs gives up to 4.77x speedup in circuit duration. Compared to the closest prior existing work, we achieve up to 53.8x speedup. For large circuits, we observe a few orders of magnitude improvement in circuit fidelities.
Paper Structure (31 sections, 19 equations, 9 figures, 1 algorithm)

This paper contains 31 sections, 19 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Overview of our compiler pipeline. In this work, we focus mainly on optimizing the steps to schedule the circuit (Sec. \ref{['section: scheduling']}) and decompose into the NeutralAtomGateSet containing gates $\set{\texttt{GR},\texttt{Rz},\texttt{CZ},\texttt{CCZ}}$ (Sec. \ref{['section: decomposition']}).
  • Figure 2: Two decompositions of a local gate that rotates qubit $j$ about the Y axis by an angle $\theta$. Red and blue arrows track the trajectories of states initially pointing along the $+z$ and $+y$ axes of the Bloch sphere, respectively, corresponding to the initial states $\ket{0}$ and $\ket{0}+i\ket{1}$. (a) Axial decomposition of a local Ry gate, which uses global pulses to swap the Y and Z axes and imprints the rotation angle $\theta$ onto qubit $j$ with a local (axial) Rz gate. This involves a net global rotation of $\pi$. (b) Transverse decomposition of a local Ry gate, which uses global pulses to swap the $\texttt{V}_{\theta/2}$ and Z axes, thereby imprinting the rotation angle $\theta$ onto qubit $j$ with global (transverse) GR gates. This involves a net global rotation of $\lvert\theta\rvert\le\pi$.
  • Figure 3: A moment of U3 gates decomposed into the NeutralAtomGateSet using (a) Axial decomposition and (b) Transverse decomposition. Here $\texttt{GR}^\pm(\theta_{\mathrm{m}},\eta)=\texttt{GR}(\pm\theta_{\mathrm{m}}/2,\pi/2+\eta)$ and $\theta_{\mathrm{m}}=\pm\max_j\lvert\theta_j\rvert$ for shorthand. The global rotations in both decompositions cancel on qubit $q_1$, which has no gate acting on it in the original moment.
  • Figure 4: Motivation behind the Sifting scheduling algorithm. (a) A 4-qubit GHZ circuit before any compiler steps are applied. (b) The same circuit and its corresponding DAG when converted to the IntermediateGateSet, where H$=$U3$(\pi/2,0,\pi)$, and scheduled using an As-Soon-As-Possible approach. Grey rectangles show which gates are scheduled into the same SQGM. When decomposed into the NeutralAtomGateSet, this schedule will require 8 GR gates in total— two for each SQGM. (c) The schedule produced with Sifting. Now, only 4 GR gates are required in the final circuit. $V_p^{(i)}$ and $V_c^{(i)}$ are the $V_{passed}$ and $V_{caught}$ sets returned by the $i$th iteration of Sift, with $V_p^{(0)}=\emptyset$ here.
  • Figure 5: Motivation behind the $\theta$-Opt algorithm. We show an example section of a circuit when (a) scheduled using Sifting and (b) scheduled using $\theta$-Opt, where the number shown in each single-qubit gate specifies the $\theta$ parameter. Both will require 4 GR gates when the Transverse decomposition is applied, shown in (c). However, the schedule in (a) results in a GR rotation amount of $\frac{\pi}{2}+\frac{3\pi}{8}=\frac{7\pi}{8}$, while the schedule in (b) results in $\frac{\pi}{8}+\frac{\pi}{2}=\frac{5\pi}{8}$. In (c), GR angles for (a) and (b) are given in red and blue, respectively. Rz gates outlined in red or blue dashed lines indicate those which appear in the decomposition of only (a) or only (b), respectively, while all other gates appear in both. Here, the last column of Rz gates from moment M1 is commuted past the CZ gates and combined with the first column of Rz gates from M2, as described in Sec. \ref{['section: post-processing']}.
  • ...and 4 more figures