QEMI: A Quantum Software Stacks Testing Framework via Equivalence Modulo Inputs
Junjie Luo, Shangzhou Xia, Fuyuan Zhang, Jianjun Zhao
TL;DR
This paper tackles the challenge of testing quantum software stacks in the absence of ground truth by addressing the oracle problem. It introduces QEMI, an EMI-inspired framework that statically inserts dead-code patterns into randomly generated quantum programs and creates EMI variants by removing the dead code, then compares their behavior via the distribution distance $H(P,Q)$ with a threshold $\delta=0.1$ and an adaptive early-stopping strategy. The approach combines a quantum random program generator, static EMI variant construction, and distribution-based behavior checking, and it is validated on Qiskit, Q#, and Cirq, uncovering $12$ real-world bugs (11 crashes and 1 distribution mismatch), with several fixed or confirmed by developers. Compared to baselines, QEMI achieves higher code coverage and surfaces bugs that differential/metamorphic methods miss, while the early-stop mechanism substantially reduces measurement cost (up to $53.83\%$ for $n=6$ and $72.21\%$ for $n=8$). Overall, QEMI expands the testing toolkit for quantum software stacks by integrating semantics-preserving mutations into quantum program analysis, offering practical impact for building more reliable quantum runtimes and compilers.
Abstract
As quantum algorithms and hardware continue to evolve, ensuring the correctness of the quantum software stack (QSS) has become increasingly important. However, testing QSSes remains challenging due to the oracle problem, i.e., the lack of a reliable ground truth for expected program behavior. Existing metamorphic testing approaches often rely on equivalent circuit transformations, backend modifications, or parameter tuning to address this issue. In this work, inspired by Equivalence Modulo Inputs (EMI), we propose Quantum EMI (QEMI), a new testing approach for QSSes. Our key contributions include: (1) a random quantum program generator that produces code with dead code based on quantum control-flow structures, and (2) an adaptation of the EMI technique from classical compiler testing to generate variants by removing dead code. By comparing the behavior of these variants, we can detect potential bugs in QSS implementations. We applied QEMI to Qiskit, Q#, and Cirq, and successfully identified 11 crash bugs and 1 behavioral inconsistency. QEMI expands the limited set of testing techniques available for quantum software stacks by going beyond structural transformations and incorporating semantics-preserving ones into quantum program analysis.
