Table of Contents
Fetching ...

Benchmarking Distributed Quantum Computing Emulators

Guillermo Díaz-Camacho, Iago F. Llovo, F. Javier Cardama, Irais Bautista, Daniel Faílde, Mariamo Mussa Juane, Jorge Vázquez-Pérez, Natalia Costas, Tomás F. Pena, Andrés Gómez

TL;DR

Distributed quantum computing (DQC) aims to connect multiple smaller quantum processors to form a scalable system. The authors propose a benchmarking framework using a distributed inverse $QFT^{\dagger}$ implemented via gate teleportation to evaluate emulators on time, memory, and fidelity relative to a monolithic baseline. They benchmark four emulators (Qiskit Aer, SquidASM, Interlin-q, SQUANCH), revealing a trade-off between architectural fidelity and scalability: discrete-event, network-aware platforms (e.g., SquidASM) offer better protocol realism but come with higher resource demands and licensing constraints, while monolithic circuit simulators (Qiskit Aer) run faster but lack native DQC features. The study identifies gaps in current tooling, highlights the potential of optimization strategies like cat-entangler/disentangler to reduce inter-node teleportations, and provides a framework extendable to additional algorithms and emulators for validating distributed quantum protocols.

Abstract

Scalable quantum computing requires architectural solutions beyond monolithic processors. Distributed quantum computing (DQC) addresses this challenge by interconnecting smaller quantum nodes through quantum communication protocols, enabling collaborative computation. While several experimental and theoretical proposals for DQC exist, emulator platforms are essential tools for exploring their feasibility under realistic conditions. In this work, we introduce a benchmarking framework to evaluate DQC emulators using a distributed implementation of the inverse Quantum Fourier Transform ($\mathrm{QFT}^{\dagger}$) as a representative test case, which enables efficient phase recovery from pre-encoded Fourier states. The QFT is partitioned across nodes using teleportation-based protocols, and performance is analyzed in terms of execution time, memory usage, and fidelity with respect to a monolithic baseline. As part of this work, we review a broad range of emulators, identifying their capabilities and limitations for programming distributed quantum algorithms. Many platforms either lacked support for teleportation protocols or required complex workarounds. Consequently, we select and benchmark four representative emulators: Qiskit Aer, SquidASM, Interlin-q, and SQUANCH. They differ significantly in their support for discrete-event simulation, quantum networking, noise modeling, and parallel execution. Our results highlight the trade-offs between architectural fidelity and simulation scalability, providing a foundation for future emulator development and the validation of distributed quantum protocols. This framework can be extended to support additional algorithms and emulators.

Benchmarking Distributed Quantum Computing Emulators

TL;DR

Distributed quantum computing (DQC) aims to connect multiple smaller quantum processors to form a scalable system. The authors propose a benchmarking framework using a distributed inverse implemented via gate teleportation to evaluate emulators on time, memory, and fidelity relative to a monolithic baseline. They benchmark four emulators (Qiskit Aer, SquidASM, Interlin-q, SQUANCH), revealing a trade-off between architectural fidelity and scalability: discrete-event, network-aware platforms (e.g., SquidASM) offer better protocol realism but come with higher resource demands and licensing constraints, while monolithic circuit simulators (Qiskit Aer) run faster but lack native DQC features. The study identifies gaps in current tooling, highlights the potential of optimization strategies like cat-entangler/disentangler to reduce inter-node teleportations, and provides a framework extendable to additional algorithms and emulators for validating distributed quantum protocols.

Abstract

Scalable quantum computing requires architectural solutions beyond monolithic processors. Distributed quantum computing (DQC) addresses this challenge by interconnecting smaller quantum nodes through quantum communication protocols, enabling collaborative computation. While several experimental and theoretical proposals for DQC exist, emulator platforms are essential tools for exploring their feasibility under realistic conditions. In this work, we introduce a benchmarking framework to evaluate DQC emulators using a distributed implementation of the inverse Quantum Fourier Transform () as a representative test case, which enables efficient phase recovery from pre-encoded Fourier states. The QFT is partitioned across nodes using teleportation-based protocols, and performance is analyzed in terms of execution time, memory usage, and fidelity with respect to a monolithic baseline. As part of this work, we review a broad range of emulators, identifying their capabilities and limitations for programming distributed quantum algorithms. Many platforms either lacked support for teleportation protocols or required complex workarounds. Consequently, we select and benchmark four representative emulators: Qiskit Aer, SquidASM, Interlin-q, and SQUANCH. They differ significantly in their support for discrete-event simulation, quantum networking, noise modeling, and parallel execution. Our results highlight the trade-offs between architectural fidelity and simulation scalability, providing a foundation for future emulator development and the validation of distributed quantum protocols. This framework can be extended to support additional algorithms and emulators.

Paper Structure

This paper contains 37 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Inverse QFT on 6 qubits (light red) with initial Fourier state preparation (light green). The state preparation consists of a Hadamard state followed by a sequence of phase gates. The QFT circuit applies Hadamard and controlled phase gates, where $P_k^{\dagger} = \mathrm{P}(e^{-2\pi i / 2^k})$. The reverse operator $REV$ (which stands for the standard swap gates at the end of the inverse QFT) can be realized by classical postprocessing of the measured bit strings, reducing considerably the entanglement cost Chen2023qft.
  • Figure 2: By reordering commuting operations from Figure \ref{['fig:qft-6q-fourier']} the inverse $\mathrm{QFT}$ can be rewritten into smaller, local inverse $\mathrm{QFT}$s (light red) and a block of controlled phase gradients (light blue). For a bipartite $\mathrm{QFT}^{\dagger}$ of $n$ qubits, we need $\frac{n}{2}$ EPRs to teleport the phase gradients. The $\overline{\mathrm{QFT}}^{\dagger}$ represents the usual inverse $\mathrm{QFT}$, without the reverse operator. The local $\mathrm{REV}$ operators can be moved forward by reversing the controls of the phase gradients, so that they can be applied classically in post-processing.
  • Figure 3: Generalizing the circuit in Fig. \ref{['fig:qft-6q-fourier-2nodes']} to an arbitrary number of nodes $k$, each one with $m$ qubits, so the whole computation involves $n = km$ qubits. Here $F(\theta)$ is the (local) preparation of the Fourier state, ${\mathrm{QFT}}^{\dagger}_{m}$ is the (local) $n$-qubit inverse QFT, $CP_{a,b}^{\dagger}$ is the block of controlled phase gradients with $a$ control qubits and $b$ target qubits. Because we pushed the $\mathrm{REV}$ blocks to the end of the circuit as classical post-processing, we need to reverse the order of controls in the controlled phase gradient block. Each phase gradient block can be executed remotely through the use of telegate. The total number of needed EPRs becomes $m\binom{k}{2}=\frac{mk(k-1)}{2}$. If all of the individual $CP$ rotations were to be executed remotely, this number would grow to $\binom{mk}{2}$, that is, quadratic not only in the number of nodes but also the number of qubits per node. Assuming full parallelization of quantum operations, the total number of computing blocks is $2k-1$. Each block of phase gradients involves $m$ layers of phase gates and each ${\mathrm{QFT}}^{\dagger}_{m}$ involves $2m-1$ layers of gates. While the ${\mathrm{QFT}}^{\dagger}_{m}$ are occurring in one node, the rest are executing phase gradients on the rest, with minimal idle time. Measurements can be performed as soon as qubits have finished their gates.
  • Figure 4: Execution time and peak memory usage as a function of the number of qubits, for different numbers of nodes and emulators. Each emulator is represented by a colour gradient: the darkest curve corresponds to single-node (monolithic) execution, while progressively lighter shades indicate increasing node counts.
  • Figure 5: Classical fidelity of noiseless execution as a function of the number of qubits, for different numbers of nodes and emulators. As in \ref{['fig:scaling1']}, lighter shades of the same colour indicates increasing node counts.