Table of Contents
Fetching ...

Formal Verification of Noisy Quantum Reinforcement Learning Policies

Dennis Gross

TL;DR

The work tackles safety verification for quantum reinforcement learning policies under quantum noise by introducing QVerifier, a framework that builds induced DTMCs from environment MDPs and QRL policies and verifies safety properties with the Storm probabilistic model checker. It demonstrates how explicit noise models alter policy behavior, revealing both degradation and occasional improvements in performance, and enables precise, pre-deployment safety assessment before hardware access. Across Frozen Lake, Ski, and Freeway, classical policies generally outperform QRL in noise-free scenarios, but certain noise regimes can shift this balance, underscoring the importance of noise-aware verification. Overall, QVerifier provides a practical pathway to safely deploy QRL by enabling exact safety verification under hardware-imperfection models and informing design choices for quantum policies and hardware.

Abstract

Quantum reinforcement learning (QRL) aims to use quantum effects to create sequential decision-making policies that achieve tasks more effectively than their classical counterparts. However, QRL policies face uncertainty from quantum measurements and hardware noise, such as bit-flip, phase-flip, and depolarizing errors, which can lead to unsafe behavior. Existing work offers no systematic way to verify whether trained QRL policies meet safety requirements under specific noise conditions. We introduce QVerifier, a formal verification method that applies probabilistic model checking to analyze trained QRL policies with and without modeled quantum noise. QVerifier builds a complete model of the policy-environment interaction, incorporates quantum uncertainty directly into the transition probabilities, and then checks safety properties using the Storm model checker. Experiments across multiple QRL environments show that QVerifier precisely measures how different noise models influence safety, revealing both performance degradation and cases where noise can help. By enabling rigorous safety verification before deployment, QVerifier addresses a critical need: because access to quantum hardware is expensive, pre-deployment verification is essential for any safety-critical use of QRL. QVerifier targets a potential classical-quantum sweet spot: trained QRL policies that execute efficiently on quantum hardware, yet remain tractable for classical probabilistic model checking despite being too slow for real-time classical deployment.

Formal Verification of Noisy Quantum Reinforcement Learning Policies

TL;DR

The work tackles safety verification for quantum reinforcement learning policies under quantum noise by introducing QVerifier, a framework that builds induced DTMCs from environment MDPs and QRL policies and verifies safety properties with the Storm probabilistic model checker. It demonstrates how explicit noise models alter policy behavior, revealing both degradation and occasional improvements in performance, and enables precise, pre-deployment safety assessment before hardware access. Across Frozen Lake, Ski, and Freeway, classical policies generally outperform QRL in noise-free scenarios, but certain noise regimes can shift this balance, underscoring the importance of noise-aware verification. Overall, QVerifier provides a practical pathway to safely deploy QRL by enabling exact safety verification under hardware-imperfection models and informing design choices for quantum policies and hardware.

Abstract

Quantum reinforcement learning (QRL) aims to use quantum effects to create sequential decision-making policies that achieve tasks more effectively than their classical counterparts. However, QRL policies face uncertainty from quantum measurements and hardware noise, such as bit-flip, phase-flip, and depolarizing errors, which can lead to unsafe behavior. Existing work offers no systematic way to verify whether trained QRL policies meet safety requirements under specific noise conditions. We introduce QVerifier, a formal verification method that applies probabilistic model checking to analyze trained QRL policies with and without modeled quantum noise. QVerifier builds a complete model of the policy-environment interaction, incorporates quantum uncertainty directly into the transition probabilities, and then checks safety properties using the Storm model checker. Experiments across multiple QRL environments show that QVerifier precisely measures how different noise models influence safety, revealing both performance degradation and cases where noise can help. By enabling rigorous safety verification before deployment, QVerifier addresses a critical need: because access to quantum hardware is expensive, pre-deployment verification is essential for any safety-critical use of QRL. QVerifier targets a potential classical-quantum sweet spot: trained QRL policies that execute efficiently on quantum hardware, yet remain tractable for classical probabilistic model checking despite being too slow for real-time classical deployment.

Paper Structure

This paper contains 25 sections, 18 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: General model checking workflow DBLP:journals/sttt/HenselJKQV22. First, the system needs to be formally modeled, for instance, via PRISM. Then, the requirements are formalized, for instance, via PCTL. Eventually, both are inputted into the model checker, like Storm, which verifies the property.
  • Figure 2: An RL agent interacts with an environment. The agent sends an action and receives the resulting state and reward, closing the feedback loop. The interaction ends when a terminal state is reached.
  • Figure 3: An example of a variational quantum circuit architecture for a quantum REINFORCE agent. The circuit operates on four qubits and has two variational layers with 24 trainable parameters (gates with name R*). State preparation encodes the input via amplitude encoding to the qubits. Layer 0 contains rotation gates (RX, RY, RZ) and CNOT entanglement in ring topology. Layer 1 contains only rotations. Pauli-Z measurements produce expectation values that are classically post-processed into action probabilities via a softmax function.
  • Figure 4: Given an MDP, a trained QRL policy, a PCTL property, and, optionally, quantum noise models as inputs, the method outputs whether the safety property is satisfied and its exact probability.
  • Figure 5: One expansion step in the incremental DTMC construction, showing two example DTMCs overlaid for comparison. Starting from state $s_0$, the policy samples an action which leads to successor states $s_1$, $s_2$, and $s_3$. Blue solid arrows represent the DTMC constructed from the noise-free policy, while red dashed arrows represent a separate DTMC constructed from the noisy policy $\pi_{\theta}^{\mathcal{E}}$, illustrating how the quantum noise model in the action sampling process alters the transition probabilities. This expansion process is repeated for each new state until all states reachable by the policy have been visited.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: MDP
  • Definition 1: MDP
  • Definition 2: Memoryless Stochastic Policy
  • Definition 3: Induced DTMC