Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication

Jawaher Kaldari; Saif Al-Kuwari

Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication

Jawaher Kaldari, Saif Al-Kuwari

TL;DR

The paper proposes a quantum reinforcement learning environment framed as a challenge–response task in which a hidden classical bit is encoded in quantum circuit parameters and inferred by a Bob agent from a limited number of quantum state copies. It compares a purely classical agent, a deeper hybrid quantum–classical agent, and a simple lightweight hybrid agent, showing that the lightweight policy can reliably infer with as few as two copies ($N=2$) and remains robust to realistic quantum noise. The study demonstrates that hybrid policies outperform the classical baseline under resource constraints and that the environment serves as a practical testbed for QRL in near-term devices. It also discusses security-oriented applications, notably quantum-assisted authentication and information hiding, highlighting the environment’s value as a controlled framework for evaluating QRL under quantum constraints.

Abstract

Quantum reinforcement learning (QRL) has emerged as a promising research direction that integrates quantum information processing into reinforcement learning frameworks. While many existing QRL studies apply quantum agents to classical environments, it has been realized that the potential advantages of QRL are most naturally explored in environments that exhibit intrinsically quantum characteristics, where the agent's observations and interactions arise from quantum processes. In this work, we propose a quantum reinforcement learning environment formulated as a challenge-response task with hidden information. In the proposed environment, Alice encodes a classical bit into the parameters of a quantum circuit, while Bob, with a trained reinforcement learning agent, interacts with a limited number of quantum state copies to infer the hidden bit. The agent must select measurement strategies and decide when to terminate the interaction under explicit resource constraints. To study the solvability of the proposed environment, we consider three agents: a purely classical agent, a lightweight hybrid agent and a deep hybrid agent. Through experiments, we analyze the trade-off between inference accuracy and quantum resource consumption under varying interaction penalties. Our results show that the lightweight hybrid agent achieves reliable inference using as few as two quantum state copies, outperforming both the classical baseline and the deep hybrid agent in highly resource-constrained regimes. We further evaluate robustness under realistic quantum noise models and discuss the relevance of the proposed environment for security-oriented applications, including quantum-assisted authentication.

Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication

TL;DR

) and remains robust to realistic quantum noise. The study demonstrates that hybrid policies outperform the classical baseline under resource constraints and that the environment serves as a practical testbed for QRL in near-term devices. It also discusses security-oriented applications, notably quantum-assisted authentication and information hiding, highlighting the environment’s value as a controlled framework for evaluating QRL under quantum constraints.

Abstract

Paper Structure (22 sections, 8 equations, 9 figures, 3 tables)

This paper contains 22 sections, 8 equations, 9 figures, 3 tables.

Introduction
Contributions
Organization
Preliminaries
Reinforcement Learning
Quantum Reinforcement Learning
Quantum Challenge–Response Circuit
Quantum Environment
Agent Architectures
C-agent
D-agent
S-agent
Experimental Setup
Agents
Environment
...and 7 more sections

Figures (9)

Figure 1: Quantum challenge–response circuit underlying the proposed environment.
Figure 2: D-agent agent architecture
Figure 3: S-agent agent architecture
Figure 4: Training under high-penalty environment ($X=0.5$). The solid curve shows the average number of interactions per episode, while the scatter points indicate batch accuracy.
Figure 5: Confusion matrices under the high-penalty environment ($X=0.5$).
...and 4 more figures

Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication

TL;DR

Abstract

Challenge-Response Quantum Reinforcement Learning with Application to Quantum-Assisted Authentication

Authors

TL;DR

Abstract

Table of Contents

Figures (9)