Table of Contents
Fetching ...

Data Verification is the Future of Quantum Computing Copilots

Junhao Song, Ziqian Bi, Xinliang Chia, William Knottenbelt, Yudong Cao

TL;DR

The paper argues that quantum copilots require data verification as a minimum, due to binary correctness and an exponentially sparse space of valid designs where purely statistical learning fails. It defines verification-aware data, a priori constraints, and a verification-first architectural paradigm, and demonstrates through a Cuccaro Adder case that relying on statistics alone leads to infeasibility. Empirical results across 34 LLMs show verified-data models achieve substantially higher accuracy (0.60–0.79) and better calibration than unverified copilots, while formal verification reveals a vast design space with only a tiny fraction valid, highlighting the inefficiency of post-hoc filtering. The work advocates constraint-rich benchmarks and the explicit integration of verification into generation loops, with broad applicability to other physics- and math-constrained scientific domains.

Abstract

Quantum program generation demands a level of precision that may not be compatible with the statistical reasoning carried out in the inference of large language models (LLMs). Hallucinations are mathematically inevitable and not addressable by scaling, which leads to infeasible solutions. We argue that architectures prioritizing verification are necessary for quantum copilots and AI automation in domains governed by constraints. Our position rests on three key points: verified training data enables models to internalize precise constraints as learned structures rather than statistical approximations; verification must constrain generation rather than filter outputs, as valid designs occupy exponentially shrinking subspaces; and domains where physical laws impose correctness criteria require verification embedded as architectural primitives. Early experiments showed LLMs without data verification could only achieve a maximum accuracy of 79% in circuit optimization. Our positions are formulated as quantum computing and AI4Research community imperatives, calling for elevating verification from afterthought to architectural foundation in AI4Research.

Data Verification is the Future of Quantum Computing Copilots

TL;DR

The paper argues that quantum copilots require data verification as a minimum, due to binary correctness and an exponentially sparse space of valid designs where purely statistical learning fails. It defines verification-aware data, a priori constraints, and a verification-first architectural paradigm, and demonstrates through a Cuccaro Adder case that relying on statistics alone leads to infeasibility. Empirical results across 34 LLMs show verified-data models achieve substantially higher accuracy (0.60–0.79) and better calibration than unverified copilots, while formal verification reveals a vast design space with only a tiny fraction valid, highlighting the inefficiency of post-hoc filtering. The work advocates constraint-rich benchmarks and the explicit integration of verification into generation loops, with broad applicability to other physics- and math-constrained scientific domains.

Abstract

Quantum program generation demands a level of precision that may not be compatible with the statistical reasoning carried out in the inference of large language models (LLMs). Hallucinations are mathematically inevitable and not addressable by scaling, which leads to infeasible solutions. We argue that architectures prioritizing verification are necessary for quantum copilots and AI automation in domains governed by constraints. Our position rests on three key points: verified training data enables models to internalize precise constraints as learned structures rather than statistical approximations; verification must constrain generation rather than filter outputs, as valid designs occupy exponentially shrinking subspaces; and domains where physical laws impose correctness criteria require verification embedded as architectural primitives. Early experiments showed LLMs without data verification could only achieve a maximum accuracy of 79% in circuit optimization. Our positions are formulated as quantum computing and AI4Research community imperatives, calling for elevating verification from afterthought to architectural foundation in AI4Research.
Paper Structure (4 sections, 1 equation, 20 figures, 1 algorithm)

This paper contains 4 sections, 1 equation, 20 figures, 1 algorithm.

Figures (20)

  • Figure 1: Without data verification, outputs scatter across the exponentially low success probability space (red circle in the top right corner). Even Retrieval-Augmented Generation (RAG) is insufficient to guarantee correctness in this absence. When data verification is integrated into the training loop, only validated designs from the constrained validation space are used, ensuring 100% correct outputs. The snowflake symbol represents weights are frozen. The flame symbol represents that weights are trainable.
  • Figure 2: The key features of quantum program generation (step a) and compilation (steps b-d), using the Cuccaro Adder as an example. (a) The first level of abstraction where circuit design is at a modular level. (b) The second level where each module is further decomposed into elementary gates (NOT, CNOT and Toffoli). (c) Expanding the modules to reveal the overall circuit design with only the elementary gates, and identifying opportunities for improvements. (d) Applying the improvements (parallelized execution of gates and gate position swap due to commutativity) yields an optimized circuit with lower depth.
  • Figure 3: Average confidence assigned to the correct answer for the top-5 performing models. Higher values indicate better calibration. GPT-OSS models show higher confidence on correct answers compared to Gemma3 models, suggesting better uncertainty quantification. Note: qwen3:1.7b, despite ranking among the top-5 in accuracy, is omitted because it fails to produce valid token-level probabilities (outputting malformed responses that yield zero confidence scores).
  • Figure 4: Average accuracy per LLMs. The dashed circle marks the 25% random baseline; verification-aware models clearly exceed it, while unconstrained checkpoints remain inside.
  • Figure 5: Accuracy heatmap across 34 models and seven bit-widths. Dark cells highlight specific (quantum) verification knowledge LLMs, while lighter bands reveal general-purpose models stuck near random (25%).
  • ...and 15 more figures