Table of Contents
Fetching ...

Noise tolerance via reinforcement: Learning a reinforced quantum dynamics

Abolfazl Ramezanpour

TL;DR

This work addresses the vulnerability of quantum simulations to noise by introducing reinforced quantum dynamics that bias evolution toward noise-free trajectories. It combines a teacher model with reinforcement and a voter-like Grover schedule, and a student model that learns a compact Hamiltonian to emulate reinforced dynamics without incurring heavy reinforcement costs. The approach shows improved success probabilities and reduced runtimes under Pauli noise for both single- and two-qubit systems, with a gradient-descent learning protocol enabling efficient approximation of reinforced dynamics. The findings highlight a path toward robust quantum annealing and near-term quantum simulations, while outlining challenges related to state estimation and scaling, and pointing to future work on larger systems and improved approximations.

Abstract

The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the efficiency of learning and optimization algorithms. In this study, we demonstrate that a reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. We study a quantum annealing process where, through reinforcement, the system is encouraged to maintain its current state or follow a noise-free evolution. A learning algorithm is employed to derive a concise approximation of this reinforced dynamics, reducing the total evolution time and, consequently, the system's exposure to noisy interactions. This also avoids the complexities associated with implementing quantum feedback in such reinforcement algorithms. The efficacy of our method is demonstrated through numerical simulations of reinforced quantum annealing with one- and two-qubit systems under Pauli noise.

Noise tolerance via reinforcement: Learning a reinforced quantum dynamics

TL;DR

This work addresses the vulnerability of quantum simulations to noise by introducing reinforced quantum dynamics that bias evolution toward noise-free trajectories. It combines a teacher model with reinforcement and a voter-like Grover schedule, and a student model that learns a compact Hamiltonian to emulate reinforced dynamics without incurring heavy reinforcement costs. The approach shows improved success probabilities and reduced runtimes under Pauli noise for both single- and two-qubit systems, with a gradient-descent learning protocol enabling efficient approximation of reinforced dynamics. The findings highlight a path toward robust quantum annealing and near-term quantum simulations, while outlining challenges related to state estimation and scaling, and pointing to future work on larger systems and improved approximations.

Abstract

The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the efficiency of learning and optimization algorithms. In this study, we demonstrate that a reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. We study a quantum annealing process where, through reinforcement, the system is encouraged to maintain its current state or follow a noise-free evolution. A learning algorithm is employed to derive a concise approximation of this reinforced dynamics, reducing the total evolution time and, consequently, the system's exposure to noisy interactions. This also avoids the complexities associated with implementing quantum feedback in such reinforcement algorithms. The efficacy of our method is demonstrated through numerical simulations of reinforced quantum annealing with one- and two-qubit systems under Pauli noise.

Paper Structure

This paper contains 8 sections, 41 equations, 13 figures.

Figures (13)

  • Figure 1: Dynamical models of the teacher and student under noise. (a) The teacher’s dynamics is reinforced at each layer $l$ by steering toward $\rho_{l+\Delta l}^{(t)}(0)$ (ideal noise-free state). (b) The student’s unitaries are trained to exclusively replicate the teacher’s noise-free reinforced dynamics.
  • Figure 2: Performance of the teacher model with $N=10$ qubits under Pauli noise channels. ((a1), (b1), (c1)) Success probability of the quantum annealing process, comparing cases with ($r=1$) and without ($r=0$) reinforcement. ((a2), (b2), (c2)) The scale of running time for the same data shown in the upper panels, as defined in Ref. Leng-prr-2025. The teacher model consists of $L_t=50$ layers. The results are averaged over $100$ random and independent realizations of single-qubit Pauli noise channels. Statistical errors in success probabilities are less than $6\times 10^{-4}$.
  • Figure 3: Learning noise-free reinforced dynamics from the teacher. (a) The teacher generates a sequence of states $\{|\psi_l\rangle\}$ from initial state $|\psi_0\rangle$ via reinforced unitaries $U_l^{(t)}(r)$. (b) Forward stage: the student obtains $\{|\phi_l^\rightarrow\rangle\}$ from $|\phi_0^\rightarrow\rangle = |\psi_0\rangle$ by applying $U_l^{(s)}$. (c) Backward stage: the student obtains $\{|\phi_l^\leftarrow\rangle\}$ via $U_l^{(s)\dagger}$ starting from $|\phi_{L_s}^\leftarrow\rangle = |\psi_{L_t}\rangle$. The student unitaries $U_l^{(s)}$ are optimized via gradient descent to align the forward and backward dynamics.
  • Figure 4: Success probability of the student model under noise-free conditions. ((a1), (b1), (c1)) For a single qubit ($d=2$). ((a2), (b2), (c2)) For two qubits ($d=4$). The number of student layers is $L_s=5$. The results are obtained after $100$ GD iterations with learning rates $\eta=1$ and $\eta=0.02$ for $d=2$ and $d=4$, respectively.
  • Figure 5: One qubit: Success probability under depolarizing noise. Here $L_t=50$ and $L_s=5$. ((a1), (b1), (c1)) For the teacher model. ((a2), (b2), (c2)) For the student model. The student results are obtained after $100$ GD iterations with learning rate $\eta=1$.
  • ...and 8 more figures