Adversarial Stress Tests for Quantum Certification

Veronica Sanz; Augusto Smerzi

Adversarial Stress Tests for Quantum Certification

Veronica Sanz, Augusto Smerzi

Abstract

We develop a practical framework for semi-device-independent (SDI) certification under operational deviations from the ideal protocol model. Apparent violations of classical benchmarks need not signal genuinely non-classical behaviour; they can arise from misalignment between (i) the scoring rule, (ii) the finite-sample statistical bound applied to that score, and (iii) the operational model realised in the experiment, including bias, memory, drift, and selection effects. We formalise a protocol-agnostic alignment principle based on a martingale-safe lower confidence bound and an operationally consistent effective classical ceiling. This yields a quantitative diagnostic, the \emph{robustness gap} $Δ_{\mathrm{rob}} = S_{\mathrm{low}} - S_{C,\mathrm{eff}}$, which separates statistical fluctuations from structural modelling errors. Statistical deviations vanish asymptotically, whereas model misalignment can produce persistent false certification unless the benchmark is corrected. Using the $2\!\to\!1$ random access code as a minimal SDI testbed, we show that postselection can inflate conditional scores, whereas unconditional scoring restores the correct operational meaning of the witness. We further show that adaptive learning-based classical agents do not enlarge the admissible classical set; rather, they recover the effective classical ceiling implied by the operational model. The resulting framework provides a systematic diagnostic for certification in realistic quantum communication and measurement settings with embedded classical control, adaptive processing, and nonideal data acquisition.

Adversarial Stress Tests for Quantum Certification

Abstract

, which separates statistical fluctuations from structural modelling errors. Statistical deviations vanish asymptotically, whereas model misalignment can produce persistent false certification unless the benchmark is corrected. Using the

random access code as a minimal SDI testbed, we show that postselection can inflate conditional scores, whereas unconditional scoring restores the correct operational meaning of the witness. We further show that adaptive learning-based classical agents do not enlarge the admissible classical set; rather, they recover the effective classical ceiling implied by the operational model. The resulting framework provides a systematic diagnostic for certification in realistic quantum communication and measurement settings with embedded classical control, adaptive processing, and nonideal data acquisition.

Paper Structure (58 sections, 2 theorems, 61 equations, 5 figures)

This paper contains 58 sections, 2 theorems, 61 equations, 5 figures.

Introduction
Related work
Certification under operational assumptions
Score and witness estimation
Finite-sample confidence bounds
Classical reference values
Acceptance criterion
Example: biased $2\to 1$ RAC.
Operational model deviations
Input bias
Temporal correlations and memory
Selection effects and postselection
Adaptive classical strategies
Robust certification via alignment
Alignment principle
...and 43 more sections

Key Result

Proposition 1

Consider a finite prepare-and-measure task with preparation input $a\in\mathcal{A}$, measurement setting $y\in\mathcal{Y}$, output $b\in\mathcal{B}$, and linear score Assume that the preparation device is restricted to transmit a classical message $m\in\mathcal{M}$ with $|\mathcal{M}|\le d$, and that the operational input law $\pi(a,y)$ is fixed. Then the set of classically achievable behaviours

Figures (5)

Figure 1: Prepare-and-measure causal diagram with operational deviations. Green nodes denote observed inputs, violet nodes operational or internal variables, and the yellow node a latent shared variable. Solid arrows indicate the ideal PAM structure, while dashed arrows represent operational deviations such as memory/adaptivity, imperfect setting generation, and selection/postselection.
Figure 2: Benchmark alignment under biased inputs. The empirical score $\hat{S}$ and its lower confidence bound $S_{\mathrm{low}}$ are shown as functions of the input bias $\varepsilon$, together with the nominal classical benchmark $S_C=3/4$ and the bias-corrected effective ceiling $S_{C,\mathrm{eff}}(\varepsilon)=3/4+|\varepsilon|/2$. When compared against the nominal benchmark, the data would appear to violate the classical limit. Once the operationally correct benchmark is used, the same behaviour is seen to be entirely classical.
Figure 3: Adaptive classical learning recovers the effective classical ceiling. The analytically optimal benchmark $S_{C,\mathrm{eff}}(\varepsilon)$ is compared with a bias-aware classical baseline, a trained bandit strategy, and the lower confidence bound for the learned policy. The learned agent tracks the same operationally correct classical benchmark and does not exceed it.
Figure 4: Postselection stress test. Conditional scoring leads to inflated apparent success probabilities and spurious certification, whereas unconditional scoring restores compatibility with the effective classical ceiling.
Figure 5: Memory amplifies benchmark misalignment. Left: false-acceptance probability under careless evaluation (conditional scoring and nominal classical bound). Adaptive strategies exhibit substantially higher false-positive rates than static policies. Right: robustness gap under aligned certification (unconditional scoring and effective classical ceiling). All classical strategies satisfy $\Delta_{\mathrm{rob}}\le 0$.

Theorems & Definitions (6)

Proposition 1: Effective classical ceiling as a finite optimisation
proof
Definition 1: Operational alignment
Definition 2: Robustness gap
Proposition 2: Worst-case robust ceiling under bounded bias uncertainty
proof

Adversarial Stress Tests for Quantum Certification

Abstract

Adversarial Stress Tests for Quantum Certification

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)