Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

Linsen Li; Pratyush Anand; Kaiming He; Dirk Englund

Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

Linsen Li, Pratyush Anand, Kaiming He, Dirk Englund

TL;DR

This work tackles the NP-hard problem of dynamic inhomogeneous quantum resource scheduling by embedding it in a digitized Monte Carlo environment and applying reinforcement learning with a Transformer-on-QuPairs architecture. The approach processes all qubit-pair links with self-attention to produce next-step scheduling decisions, yielding more than a 3× improvement in quantum-resource performance over rule-based baselines. The framework demonstrates scalability to larger qubit sets and varying inhomogeneity, and shows strong transfer learning potential across system sizes. These results support a path toward co-design of physical and control layers for quantum networks, computing, and communication, by enabling real-time, high-fidelity resource state construction under probabilistic Heralded entanglement dynamics.

Abstract

A central challenge in quantum information science and technology is achieving real-time estimation and feedforward control of quantum systems. This challenge is compounded by the inherent inhomogeneity of quantum resources, such as qubit properties and controls, and their intrinsically probabilistic nature. This leads to stochastic challenges in error detection and probabilistic outcomes in processes such as heralded remote entanglement. Given these complexities, optimizing the construction of quantum resource states is an NP-hard problem. In this paper, we address the quantum resource scheduling issue by formulating the problem and simulating it within a digitized environment, allowing the exploration and development of agent-based optimization strategies. We employ reinforcement learning agents within this probabilistic setting and introduce a new framework utilizing a Transformer model that emphasizes self-attention mechanisms for pairs of qubits. This approach facilitates dynamic scheduling by providing real-time, next-step guidance. Our method significantly improves the performance of quantum systems, achieving more than a 3$\times$ improvement over rule-based agents, and establishes an innovative framework that improves the joint design of physical and control systems for quantum applications in communication, networking, and computing.

Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

TL;DR

Abstract

improvement over rule-based agents, and establishes an innovative framework that improves the joint design of physical and control systems for quantum applications in communication, networking, and computing.

Paper Structure (34 sections, 12 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 34 sections, 12 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related works
Dynamic inhomogeneous quantum resource scheduling
Complexity of the cluster building scheduling problem
Quantum system performance benchmarking
Dynamic quantum resource scheduling example
Quantum system pre-information generation
A reinforcement learning framework
RL-based optimization framework
Transformer-on-QuPairs architecture definition
Experiments
Experimental setup and methodological comparison
Ruled-based strategies
RL-based strategies
Results
...and 19 more sections

Figures (6)

Figure 1: Dynamic quantum resource scheduling game.a, At the initial time step $N_t = 1$, individual qubit resources (represented by black circles) are depicted, poised for the formation of entanglement pairs (illustrated by dashed black lines). This figure shows only a portion of the 11 qubit nodes of the quantum resource. b, By the early stage at $N_t$ = 5, each time step carries a probability of successfully establishing entanglement pairs. Newly formed entanglements are indicated by solid red lines, and the largest connected subgraph is highlighted with red nodes. c, At a later stage, $N_t = 150$, the diagram shows a larger connected qubit cluster within the quantum resource, with earlier entanglements depicted in lighter red. d, The result table records each successful entanglement event derived from the quantum simulation. For each time step $N_t$, when the entanglement between Qubit i and Qubit j is successfully established through Monte Carlo simulation, a new entry is added to the table, updating the maximum size of the connected graph $N_\text{max}$.
Figure 2: RL-based optimization framework and dynamic scheduling strategies using the Transformer-on-QuPairs architecture.a, The entire optimization flow aimed at enhancing $V_Q$ within a quantum system, starting with inputs from system pre-information data. b, Representation of the Transformer architecture used as an RL agent in a, processing a sequence of qubit pairs with input length $N_q^2$ and feature dimensions $N_\text{dim}$. It outputs a sequence predicting the cost function for each qubit pair, formatted as $N_q^2 \times 1$. c, The output of the transformer is further processed into a matrix to determine the minimal error ($\varepsilon$) for the operations in the next step. This processed action matrix sets an error threshold at $A_\text{th}=0.02$. The suggested scheduling action, marked by a red rectangle, indicates the qubit pair with the minimum predicted error.
Figure 3: Schematic of cluster state construction example ($N_{q}=40$).a, The scheduling simulation progress example at various time steps ($N_t$). Black circles represent individual qubit resources, red-labeled circles and connected lines indicate the largest subgraph formed among the qubits, and grey lines show the established entanglements between qubit nodes. b, Accumulation of errors ($\epsilon$) during the cluster state construction, plotted against the simulation time steps ($N_t$). The black connected line corresponds to the $N_t$ time step shown in panel a. c, Representation of the maximum number of connected subgraph size $N_\text{max}$ as it evolves with $N_t$ (blue line), alongside the logarithm of the system's quantum volume ($\mu = \log_2V_Q$), which also progresses with $N_t$ (red line)
Figure 4: Comparison of qubit cluster state building strategies.a, Histograms of 100 samplings $\mu$ for various strategies: random (black), Greedy-on-QuPairs (green), and Transformer-on-QuPairs (red), with cumulative density functions overlaid. b, Probability density function (Gaussian fitting) comparison for $\bar{\mu}$ between Greedy and Transformer-on-QuPairs strategies, highlighting the superior optimization capacity of the Transformer. The $\Delta\bar{\mu}$ shows a benefit improvement in the quantum system, with $2^{\Delta\bar{\mu}}>3$ indicating a significant enhancement.
Figure 6: Protocol layout example.a, System A (B) comprises of an optical cavity with atomic qubit (diamond based color center defect as example). This atomic qubit has an electron spin qubit and a nuclear spin qubit. Laser is used to initialize and readout the electron spin qubit. The microwave antenna is used to implement quantum gates on the electron and nuclear spin qubit. The optical cavity is coupled to optical fibers allowing the transmission of the leaked photon qubit. The two photon qubits pass through the input ports of the beam splitter. The detectors $D_{1}$ and $D_{2}$ connected to the output ports of the beam splitter are monitored for a photon click. b, Four-level atomic system illustration for a atomic qubit (diamond based color center defect as an example). In color show the operators, blue, red, and orange representing the spin-conserving, spin-flipping, and MW transitions respectively.
...and 1 more figures

Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

TL;DR

Abstract

Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)