Table of Contents
Fetching ...

COMPAS: A Distributed Multi-Party SWAP Test for Parallel Quantum Algorithms

Brayden Goldstein-Gelb, Kun Liu, John M. Martyn, Hengyun Zhou, Yongshan Ding, Yuan Liu

TL;DR

COMPAS delivers a co-designed hardware/software approach for distributed quantum computation by implementing a constant-depth, Bell-pair–assisted multivariate trace estimation via a distributed multi-party SWAP test. It provides two two-party CSWAP realizations (telegate and teledata) and a parallel Toffoli via Fanout to preserve $O(1)$ depth while scaling Bell-pair usage as $O(nk)$. The work includes detailed circuit- and network-level error analyses and demonstrates broad applicability to Rényi entropy estimation, entanglement spectroscopy, virtual cooling/distillation, and distributed QSP, highlighting practical prospects for near-term distributed quantum hardware. It also discusses open challenges such as error correction, network topology, and Bell-pair distillation overhead, outlining a path toward scalable, architecture-aware distributed quantum algorithms. Overall, COMPAS demonstrates a viable route to performing sophisticated quantum primitives in distributed settings with controlled resource costs and fidelity losses.

Abstract

The limited number of qubits per chip remains a critical bottleneck in quantum computing, motivating the use of distributed architectures that interconnect multiple quantum processing units (QPUs). However, executing quantum algorithms across distributed systems requires careful co-design of algorithmic primitives and hardware architectures to manage circuit depth and entanglement overhead. We identify multivariate trace estimation as a key subroutine that is naturally suited for distribution, and broadly useful in tasks such as estimating Rényi entropies, virtual cooling and distillation, and certain applications of quantum signal processing. In this work, we introduce COMPAS, an architecture that realizes multivariate trace estimation across a multi-party network of interconnected modular and distributed QPUs by leveraging pre-shared entangled Bell pairs as resources. COMPAS adds only a constant depth overhead and consumes Bell pairs at a rate linear in circuit width, making it suitable for near-term hardware. Unlike other schemes, which must choose between asymptotic optimality in circuit depth or GHZ width, COMPAS achieves both at once. Additionally, we analyze network-level errors and simulate the effects of circuit-level noise on the architecture.

COMPAS: A Distributed Multi-Party SWAP Test for Parallel Quantum Algorithms

TL;DR

COMPAS delivers a co-designed hardware/software approach for distributed quantum computation by implementing a constant-depth, Bell-pair–assisted multivariate trace estimation via a distributed multi-party SWAP test. It provides two two-party CSWAP realizations (telegate and teledata) and a parallel Toffoli via Fanout to preserve depth while scaling Bell-pair usage as . The work includes detailed circuit- and network-level error analyses and demonstrates broad applicability to Rényi entropy estimation, entanglement spectroscopy, virtual cooling/distillation, and distributed QSP, highlighting practical prospects for near-term distributed quantum hardware. It also discusses open challenges such as error correction, network topology, and Bell-pair distillation overhead, outlining a path toward scalable, architecture-aware distributed quantum algorithms. Overall, COMPAS demonstrates a viable route to performing sophisticated quantum primitives in distributed settings with controlled resource costs and fidelity losses.

Abstract

The limited number of qubits per chip remains a critical bottleneck in quantum computing, motivating the use of distributed architectures that interconnect multiple quantum processing units (QPUs). However, executing quantum algorithms across distributed systems requires careful co-design of algorithmic primitives and hardware architectures to manage circuit depth and entanglement overhead. We identify multivariate trace estimation as a key subroutine that is naturally suited for distribution, and broadly useful in tasks such as estimating Rényi entropies, virtual cooling and distillation, and certain applications of quantum signal processing. In this work, we introduce COMPAS, an architecture that realizes multivariate trace estimation across a multi-party network of interconnected modular and distributed QPUs by leveraging pre-shared entangled Bell pairs as resources. COMPAS adds only a constant depth overhead and consumes Bell pairs at a rate linear in circuit width, making it suitable for near-term hardware. Unlike other schemes, which must choose between asymptotic optimality in circuit depth or GHZ width, COMPAS achieves both at once. Additionally, we analyze network-level errors and simulate the effects of circuit-level noise on the architecture.

Paper Structure

This paper contains 29 sections, 11 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Methods of distributed quantum computing from caleffi_2024_distributedferrari_2021_compiler. (a) Teleports the state $\ket{\varphi}$ between parties whereas (b) applies a CNOT gate with control located on one QPU and target residing by the other.
  • Figure 1: Cost per QPU for the telegate scheme (Sec. \ref{['sec:telegate']}) using 4 Fanout gates (Fig. \ref{['fig:parallel_toffoli']}c). Since we have two rounds of cSWAP gates in cyclic shift in multivariate trace estimation, we need to repeat (b1-b4). Ancilla can be reused across different rounds or steps.
  • Figure 2: Comparing different implementations of $k$-party SWAP test. $n$: width of $\rho$. (a) $n=1,k=8$ example. GHZ width and circuit depth are $\lceil k/2 \rceil$ and $2$, respectively. For $n$-qubit state $k$-party SWAP test, quek_2024 propose either (b) by increasing the depth but keeping the width of GHZ as $\lceil k/2 \rceil$, or (c) by increasing width of GHZ to $\lceil k/2\rceil n$ but keeping the depth constant as $2$. (d) This work. We preserve GHZ width as $\lceil k/2\rceil$ and circuit depth as constant by applying parallel Toffoli via Fanout (Sec. \ref{['sec:parallel_toffoli']}).
  • Figure 3: A depiction of the naive distribution approach for a $k=3$-party SWAP test for $n=2$-qubit states. (a) Execution on a single QPU. (b) Naive distributed implementation, where each state $\rho_i$ is partitioned into “slices’’ and each QPU is responsible for one slice. For a fixed $j$, all $\rho_i^{(j)}$ for $i \in [k]$ are sent to the same QPU, and the SWAP test is performed independently on each slice. In this example, QPU 1 is responsible for the first qubit of each state, QPU 2 the second, and QPU 3 (not shown) remains idle. (c) Worst-case distribution of states on a line topology, where QPU 1 resides at one endpoint and must send its qubits to the other.
  • Figure 4: Preparation of an $r$-party GHZ State in constant depth using gate teleportation adapted from quek_2024.
  • ...and 6 more figures