Table of Contents
Fetching ...

Reinforcement Learning for Quantum Network Control with Application-Driven Objectives

Guo Xian Yau, Alexandra Burushkina, Francisco Ferreira da Silva, Subhransu Maji, Philip S. Thomas, Gayane Vardoyan

Abstract

Optimized control of quantum networks is essential for enabling distributed quantum applications with strict performance requirements. In near-term architectures with constrained hardware, effective control may determine the feasibility of deploying such applications. Because quantum network dynamics are suitable for being modeled as a Markov decision process, dynamic programming and reinforcement learning (RL) offer promising tools for optimizing control strategies. However, key quantum network performance measures -- such as secret key rate in quantum key distribution -- often involve a non-linear relationship between interdependent variables that describe quantum state quality and generation rate. Such objectives are not easily captured by standard RL approaches based on additive rewards. We propose a novel gradient-based RL framework that directly optimizes non-linear, differentiable objective functions, while accounting for uncertainties introduced by classical communication delays. We evaluate this framework in the context of entanglement distillation between two quantum network nodes equipped with multiplexing capability, and demonstrate up to 20-23% improvement over heuristic baselines in certain parameter regimes. Our work comprises the first step towards non-linear objective function optimization in quantum networks with RL, opening a path towards more advanced use cases.

Reinforcement Learning for Quantum Network Control with Application-Driven Objectives

Abstract

Optimized control of quantum networks is essential for enabling distributed quantum applications with strict performance requirements. In near-term architectures with constrained hardware, effective control may determine the feasibility of deploying such applications. Because quantum network dynamics are suitable for being modeled as a Markov decision process, dynamic programming and reinforcement learning (RL) offer promising tools for optimizing control strategies. However, key quantum network performance measures -- such as secret key rate in quantum key distribution -- often involve a non-linear relationship between interdependent variables that describe quantum state quality and generation rate. Such objectives are not easily captured by standard RL approaches based on additive rewards. We propose a novel gradient-based RL framework that directly optimizes non-linear, differentiable objective functions, while accounting for uncertainties introduced by classical communication delays. We evaluate this framework in the context of entanglement distillation between two quantum network nodes equipped with multiplexing capability, and demonstrate up to 20-23% improvement over heuristic baselines in certain parameter regimes. Our work comprises the first step towards non-linear objective function optimization in quantum networks with RL, opening a path towards more advanced use cases.

Paper Structure

This paper contains 32 sections, 36 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: A quantum network distributes entanglement to end nodes, enabling distributed quantum applications. We focus on the quantum connectivity between two nodes (quantum switches, routers, or end nodes) in the network. This two-node system is modeled as a Markov decision process, and solved via reinforcement learning (RL). Node controllers execute the policy with locally available system state information.
  • Figure 2: Two remote nodes, each with a two-qubit memory, aim to generate entanglement of sufficiently high quality to execute an application. 1. The nodes successfully attempt entanglement generation. 2. The nodes attempt entanglement generation and fail. The existing entanglement decoheres because time has elapsed, which is depicted through a fainter, dashed line. 3. The nodes discard the entangled pair they had in memory. 4. The nodes successfully attempt entanglement generation, creating two entangled pairs simultaneously. 5. The nodes successfully purify, obtaining as a result a higher-quality entangled pair. 6. The nodes consume their high-quality entangled pair by supplying it to their application.
  • Figure 3: BB84 SKR as a function of link length for WN2M2 and BN2M2. 95% confidence intervals are present but smaller than marker size. The baseline policy for WN2M2 and $F_0 = 0.9$ is effectively CONSUME-ASAP.
  • Figure 4: Relative differences of $u_{\text{BB84}}$ for WN2M2 and BN2M2 variants.
  • Figure 5: BB84 and six-state protocol SKR as a function of link length for WN2M3. 95% confidence intervals are present but smaller than marker size.
  • ...and 3 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof