Formal Verification of Noisy Quantum Reinforcement Learning Policies
Dennis Gross
TL;DR
The work tackles safety verification for quantum reinforcement learning policies under quantum noise by introducing QVerifier, a framework that builds induced DTMCs from environment MDPs and QRL policies and verifies safety properties with the Storm probabilistic model checker. It demonstrates how explicit noise models alter policy behavior, revealing both degradation and occasional improvements in performance, and enables precise, pre-deployment safety assessment before hardware access. Across Frozen Lake, Ski, and Freeway, classical policies generally outperform QRL in noise-free scenarios, but certain noise regimes can shift this balance, underscoring the importance of noise-aware verification. Overall, QVerifier provides a practical pathway to safely deploy QRL by enabling exact safety verification under hardware-imperfection models and informing design choices for quantum policies and hardware.
Abstract
Quantum reinforcement learning (QRL) aims to use quantum effects to create sequential decision-making policies that achieve tasks more effectively than their classical counterparts. However, QRL policies face uncertainty from quantum measurements and hardware noise, such as bit-flip, phase-flip, and depolarizing errors, which can lead to unsafe behavior. Existing work offers no systematic way to verify whether trained QRL policies meet safety requirements under specific noise conditions. We introduce QVerifier, a formal verification method that applies probabilistic model checking to analyze trained QRL policies with and without modeled quantum noise. QVerifier builds a complete model of the policy-environment interaction, incorporates quantum uncertainty directly into the transition probabilities, and then checks safety properties using the Storm model checker. Experiments across multiple QRL environments show that QVerifier precisely measures how different noise models influence safety, revealing both performance degradation and cases where noise can help. By enabling rigorous safety verification before deployment, QVerifier addresses a critical need: because access to quantum hardware is expensive, pre-deployment verification is essential for any safety-critical use of QRL. QVerifier targets a potential classical-quantum sweet spot: trained QRL policies that execute efficiently on quantum hardware, yet remain tractable for classical probabilistic model checking despite being too slow for real-time classical deployment.
