Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication
Jawaher Kaldari, Saif Al-Kuwari
TL;DR
The paper proposes a quantum reinforcement learning environment framed as a challenge–response task in which a hidden classical bit is encoded in quantum circuit parameters and inferred by a Bob agent from a limited number of quantum state copies. It compares a purely classical agent, a deeper hybrid quantum–classical agent, and a simple lightweight hybrid agent, showing that the lightweight policy can reliably infer with as few as two copies ($N=2$) and remains robust to realistic quantum noise. The study demonstrates that hybrid policies outperform the classical baseline under resource constraints and that the environment serves as a practical testbed for QRL in near-term devices. It also discusses security-oriented applications, notably quantum-assisted authentication and information hiding, highlighting the environment’s value as a controlled framework for evaluating QRL under quantum constraints.
Abstract
Quantum reinforcement learning (QRL) has emerged as a promising research direction that integrates quantum information processing into reinforcement learning frameworks. While many existing QRL studies apply quantum agents to classical environments, it has been realized that the potential advantages of QRL are most naturally explored in environments that exhibit intrinsically quantum characteristics, where the agent's observations and interactions arise from quantum processes. In this work, we propose a quantum reinforcement learning environment formulated as a challenge-response task with hidden information. In the proposed environment, Alice encodes a classical bit into the parameters of a quantum circuit, while Bob, with a trained reinforcement learning agent, interacts with a limited number of quantum state copies to infer the hidden bit. The agent must select measurement strategies and decide when to terminate the interaction under explicit resource constraints. To study the solvability of the proposed environment, we consider three agents: a purely classical agent, a lightweight hybrid agent and a deep hybrid agent. Through experiments, we analyze the trade-off between inference accuracy and quantum resource consumption under varying interaction penalties. Our results show that the lightweight hybrid agent achieves reliable inference using as few as two quantum state copies, outperforming both the classical baseline and the deep hybrid agent in highly resource-constrained regimes. We further evaluate robustness under realistic quantum noise models and discuss the relevance of the proposed environment for security-oriented applications, including quantum-assisted authentication.
