Table of Contents
Fetching ...

Efficient algorithms for quantum chemistry on modular quantum processors

Tian Xue, Jacob P. Covey, Matthew Otten

TL;DR

This work addresses the resource challenge of quantum chemistry on quantum computers by proposing dUSCC, a distributed implementation of the unitary selective coupled cluster algorithm tailored for modular quantum processors. It leverages the pseudo-commutativity of Trotterized terms to aggressively pack circuit tiles and hides inter-module communication behind intra-module calculations, enabling inter-module latency up to about $\sim 35\times$ slower than intra-module gates while preserving chemical accuracy ($1.6\ \mathrm{mHa}$) in a representative $(\mathrm{H}_4)_3$ system. The study identifies a 'free modularization' region in weakly entangled regimes, provides classical criteria to predict amenable systems, and demonstrates substantial speedups (up to ~$6\times$) over naive compilation, with benefits scaling with the number of modules and applicability to other molecules (stilbene, polyene). These results suggest a practical pathway to useful distributed quantum chemistry on near- and mid-term modular quantum hardware, with broader applicability to other Trotterized algorithms.

Abstract

Quantum chemistry is a promising application of future quantum computers, but the requirements on qubit count and other resources suggest that modular computing architectures will be required. We introduce an implementation of a quantum chemistry algorithm that is distributed across several computational modules: the distributed unitary selective coupled cluster (dUSCC). We design a packing scheme using the pseudo-commutativity of Trotterization to maximize the parallelism while optimizing the scheduling of all inter-module gates around the buffering of inter-module Bell pairs. We demonstrate dUSCC on a 3-cluster (H$_4$)$_3$ chain and show that it naturally utilizes the molecule's structure to reduce inter-module latency. We show that the run time of dUSCC is unchanged with inter-module latency up to $\sim$20$\times$ slower than intra-module gates in the (H$_4$)$_3$ while maintaining chemical accuracy. dUSCC should be "free" in the weakly entangled systems, and the existence of "free" dUSCC can be found efficiently using classical algorithms. This new compilation scheme both leverages pseudo-commutativity and considers inter-module gate scheduling, and potentially provides an efficient distributed compilation of other Trotterized algorithms.

Efficient algorithms for quantum chemistry on modular quantum processors

TL;DR

This work addresses the resource challenge of quantum chemistry on quantum computers by proposing dUSCC, a distributed implementation of the unitary selective coupled cluster algorithm tailored for modular quantum processors. It leverages the pseudo-commutativity of Trotterized terms to aggressively pack circuit tiles and hides inter-module communication behind intra-module calculations, enabling inter-module latency up to about slower than intra-module gates while preserving chemical accuracy () in a representative system. The study identifies a 'free modularization' region in weakly entangled regimes, provides classical criteria to predict amenable systems, and demonstrates substantial speedups (up to ~) over naive compilation, with benefits scaling with the number of modules and applicability to other molecules (stilbene, polyene). These results suggest a practical pathway to useful distributed quantum chemistry on near- and mid-term modular quantum hardware, with broader applicability to other Trotterized algorithms.

Abstract

Quantum chemistry is a promising application of future quantum computers, but the requirements on qubit count and other resources suggest that modular computing architectures will be required. We introduce an implementation of a quantum chemistry algorithm that is distributed across several computational modules: the distributed unitary selective coupled cluster (dUSCC). We design a packing scheme using the pseudo-commutativity of Trotterization to maximize the parallelism while optimizing the scheduling of all inter-module gates around the buffering of inter-module Bell pairs. We demonstrate dUSCC on a 3-cluster (H) chain and show that it naturally utilizes the molecule's structure to reduce inter-module latency. We show that the run time of dUSCC is unchanged with inter-module latency up to 20 slower than intra-module gates in the (H) while maintaining chemical accuracy. dUSCC should be "free" in the weakly entangled systems, and the existence of "free" dUSCC can be found efficiently using classical algorithms. This new compilation scheme both leverages pseudo-commutativity and considers inter-module gate scheduling, and potentially provides an efficient distributed compilation of other Trotterized algorithms.

Paper Structure

This paper contains 13 sections, 11 equations, 13 figures, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the modular architecture. (a) The dUSCC circuit of the $(\text{H}_4)_3$ chain with inter-cluster separations three times the intra-cluster separations: $d = 3d_0$. The dUSCC circuit is distributed to three interconnected QPUs. Classical ansatzes are prepared on each QPU followed by the VQE with inter-module dUSCC ansatz. (b) The number of inter-module CNOTs ($N_{inter}$) of dUSCC of the $(\text{H}_4)_3$ chain with $\epsilon = 10^{-3}$ at different positions of the seam between modules. $N_{inter}$ reaches its minimum where the entanglement between H$_2$ is weakest. (c) The circuit time ratio of the delayed dUSCC ($t$) and undelayed dUSCC ($t_0$) at different cluster separations $d/d_0$ and different selection accuracies $\epsilon$. The dUSCC circuit is delayed due to the buffering of Bell pairs. There is a clear boundary between the "strongly" entangled phase (yellow) and the "weakly" entangled phase (purple) depending on the entanglement between modules, and the best trade-off between accuracy and circuit time is at the phase boundary.
  • Figure 2: Optimization of the modular circuit. (a) The dUSCC ansatz can be decomposed to multiple inter-module ansatzes (b) dUSCC ansatzes are loaded to the circuit by JW transformation, Trotterization and simplified by ZX calculus. All terms in the dUSCC are compiled to pseudo-commutative circuit tiles. For illustration, single hopping tiles (tiles indexed by $t_s = t_{ij}$) are compiled to light red tiles, double hopping tiles (tiles indexed by $t_d = t_{ijkm}$) are compiled to dark red tiles, and the controlled hopping tiles (tiles indexed by $t_c = t_{inm}$) are compiled to blue tiles. Controlled hoppings are special double hoppings with $j=k\equiv n$. The exact circuits with ZX simplification are shown in Fig. \ref{['fig:zx proof']}. (c) The naive compilation of the USCC ansatz without consideration of inter-module latency and geometry of molecules (d) In the dUSCC, all circuit tiles are first sorted by their heights. (e) Using the pseudo-commutativity of circuit tiles, the circuit is packed to maximize the intra-module parallelism. All inter-module tiles are masked with additional width to simulate the inter-module latency. Intra-module tiles can fit into the mask due to the parallelism between the buffering of inter-module Bell pairs and the intra-module operations. The mask is set as $\tau_m \geq \tau$ to ensure there exists a tile that fits the mask. (f) The mask of inter-module tiles is shrunk to $2\tau$ to simulate the exact buffering time required for each tile. (g) Tiles are packed again with the shrunk mask to remove the additional cost from maximizing the parallelism between inter- and intra-module tiles from the first pack. In the systems with sparse inter-module entanglement, all inter-module communications are parallel to intra-module operations.
  • Figure 3: The abstract circuit of the 3-module $(\text{H}_4)_3$ chain with $d = 3d_0$ at different costs of inter-module gates. The effective circuit depth ($d_c$) represents the circuit time. The width of the figures also represents the effective circuit depth with calibrations at the left bottom corner. The effective circuit depth only counts the depth of CNOTs and single qubit rotations at arbitrary phases (assumed to be 10$\times$ expensive as a CNOT). (a) The dUSCC circuit packing of the $(\text{H}_4)_3$ at $\epsilon_0 = 10^{-3}$. The dUSCC circuit is first compiled to tiles and packed by the double packing algorithm to maximize the parallelism between Bell pair buffering and intra-module operations. At $\tau = 4,15$, inter-module tiles are separated to buffer Bell pairs for the next inter-module tiles. (b) The selection criterion is then lowered to $0.5\epsilon_0$. The inter-module entanglement becomes stronger, but the inter-module communications can still be buffered behind intra-module operations. At $\tau = 15$, inter-module tiles are evenly distributed over space to optimize the parallelism. The modularization of dUSCC is "free" in all plots in (a) and (b). (c) The selection criterion is lowered to $0.05\epsilon_0$ and the $(\text{H}_4)_3$ chain becomes extensively and strongly entangled. The number of inter-module tiles exceeds all available parallel space of intra-module tiles, so there is a inter-module gate tail at the end of the circuit. Increasing $\tau$ will linearly expand the tail, and the circuit can no longer be optimized by any packing scheme.
  • Figure 4: Results. All results are from the dUSCC simulation in the $(H_4)_3$ at $d = 3d_0$ (a) dUSCC circuit time compared to the no-latency case, $t/t_0$, with different inter-module gate costs $\tau$ and different selection criteria $\epsilon$. The dUSCC circuit time increases linearly with the inter-module gate costs. The slope depends on the selection criteria $\epsilon$, and the modularization of USCC circuit is "free" below the threshold $\tau$ (red box) in weakly entangled systems. The dashed pink line suggests the boundary between the existence of "free" modularization dUSCC. The "free" modularization only exists on the right of the dashed line, while the modularization always delays the circuit on the left of the dashed line. (b) The hopping diagram of $(\text{H}_4)_3$ over the selection criteria. There also exists a boundary between inter-module hoppings and intra-module hoppings, and this boundary captures the "free" modularization threshold in (a). One can calculate the hopping diagram in $O(\text{poly}(N_{orbital}))$ time, so the classical algorithm can efficiently find the existence of "free" modularization at a selection $\epsilon$. (c) The phase diagram of the delayed dUSCC time. The blue line captures the transition from the "free" dUSCC with $t/t_0 < 1.1$ to the delayed dUSCC. The existence of the "free" modularization is also consistently captured by the pink line $\epsilon = 10^{-4.2}$. (d) The time ratio between the dUSCC compilation and the Qiskit compilation at. Our compilation shows its best advantages at the blue boundary in (c).
  • Figure S1: The JW mapping. (a) Each dashed CNOT represents a ladder of CNOTs. (b) The single hopping can be mapped to two conjugations of ladders of CNOTs, where the parameters are $M_1 = Y, M_2 = H, t = t_m$ and $M_1 = H, M_2 = HS, t = -t_m$ where $Y = HS^\dagger$. (c) The controlled hopping is mapped to four conjugations of ladders of CNOTs. We only show the case $i < j <k$ here. The circuit will be different at other conditions. (d) The double hopping can be mapped to eight conjugations of ladders of CNOTs.
  • ...and 8 more figures