Model-free Distortion Canceling and Control of Quantum Devices

Ahmed F. Fouad; Akram Youssry; Ahmed El-Rafei; Sherif Hammad

Model-free Distortion Canceling and Control of Quantum Devices

Ahmed F. Fouad, Akram Youssry, Ahmed El-Rafei, Sherif Hammad

TL;DR

This work tackles controlling closed quantum systems when control signals are subject to unknown classical distortions and detailed system models are unavailable. It introduces a model-free DRL approach using REINFORCE to steer the system’s state probability distribution toward chosen target distributions, enabled by a novel multi-NN controller architecture that scales to multiple targets and accommodates both MDP and POMDP settings with continuous or discrete actions. Validating on a voltage-controlled photonic waveguide array, the method achieves fidelity exceeding 99% within 10 ms and effectively cancels distortions, outperforming conventional constant-step control. The framework offers robust, closed-loop quantum control without a priori system identification, with potential applicability to a wide range of quantum devices and to open-system scenarios.

Abstract

Quantum devices need precise control to achieve their full capability. In this work, we address the problem of controlling closed quantum systems, tackling two main issues. First, in practice the control signals are usually subject to unknown classical distortions that could arise from the device fabrication, material properties and/or instruments generating those signals. Second, in most cases modeling the system is very difficult or not even viable due to uncertainties in the relations between some variables and inaccessibility to some measurements inside the system. In this paper, we introduce a general model-free control approach based on deep reinforcement learning (DRL), that can work for any closed quantum system. We train a deep neural network (NN), using the REINFORCE policy gradient algorithm to control the state probability distribution of a closed quantum system as it evolves, and drive it to different target distributions. We present a novel controller architecture that comprises multiple NNs. This enables accommodating as many different target state distributions as desired, without increasing the complexity of the NN or its training process. The used DRL algorithm works whether the control problem can be modeled as a Markov decision process (MDP) or a partially observed MDP. Our method is valid whether the control signals are discrete- or continuous-valued. We verified our method through numerical simulations based on a photonic waveguide array chip. We trained a controller to generate sequences of different target output distributions of the chip with fidelity higher than 99%, where the controller showed superior performance in canceling the classical signal distortions.

Model-free Distortion Canceling and Control of Quantum Devices

TL;DR

Abstract

Paper Structure (10 sections, 5 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Problem Statement
Methods
Controller Architecture
Algorithm Design
Controller Training
Results
Training and Evaluation
Discussion
Conclusion

Figures (5)

Figure 1: The Block diagram shows the relation between the control signals $\textbf{V}(t)$ and the measured probability distribution $\textbf{P}(t)$ of the quantum state of the system. The unknown classical distortions $\mathcal{E}$ change the control signals $\textbf{V}(t)$ into distorted control signals $\mathcal{V}(t)$ before affecting the system Hamiltonian $H(t)$. The dependence $\mathcal{H}$ of $H(t)$ on $\mathcal{V}(t)$ is also unknown. The evolution unitary operator $U(t,0)$, which is the time-ordered matrix exponential of the system Hamiltonian $H(t)$, acts on the quantum system state to evolve it from $\ket{\psi(0)}$ to $\ket{\psi(t)}$. We obtain $\textbf{P}(t)$ by applying measurement to $\ket{\psi(t)}$.
Figure 2: The control loop used to control the quantum system. The inset shows the controller architecture. The controller (represented by the set of NNs) outputs the control signals $\textbf{V}(t)$ that are applied to the system. The measured state probability distribution of the system $\textbf{P}(t)$ is looped back to be the input to the controller along with the target distribution $\textbf{P}_{\text{target}}(t)$ desired at the moment. The controller consists of a set of NNs. It comprises a fully-connected feedforward NN for each desired target state probability distribution that we want to achieve. This NN can bring the system from an initial state probability distribution to the corresponding target distribution. The controller has a selector that selects the corresponding NN according to the target state probability distribution desired at the moment. This proposed controller architecture can handle any number of desired target distributions.
Figure 3: The trained NN control the chip to bring its output to the corresponding target distribution within the episode time (10 msec) in comparison to applying a constant step voltages to the chip electrodes that could achieve the same target distribution within the same time limit. The left column is the first waveguide output power ratio. The right column is the control voltages generated by the trained NN and applied to the chip electrodes.
Figure 4: Histogram for the fidelity of the sequences generated by the controller versus those generated by the step voltages listed in Table \ref{['tab3']}. The duration of each sequence is 50 msec (5 episodes), where each target distribution in the sequence lasts for 10 msec (1 episode). We generated all possible permutations of these target distributions which are 120 sequences. In (a) the fidelity is averaged over the whole sequence, in (b) the fidelity is averaged over the first 5 msec of each episode (which contain most of the transients) in the sequence, while in (c) the fidelity is averaged over the last 5 msec of each episode.
Figure 5: Sequences generated by the controller versus the same sequences generated by the step voltages listed in Table \ref{['tab3']}. The duration of each sequence is 50 msec (5 episodes), where each target distribution in the sequence lasts for 10 msec (1 episode). (a) is the sequence with the lowest fidelity (averaged over the whole sequence), (b) is the one with the mean fidelity, while (c) is the one with the highest fidelity. The right column shows the first waveguide output power ratio. The left column shows the control voltages generated by the controller and applied to the chip electrodes.

Model-free Distortion Canceling and Control of Quantum Devices

TL;DR

Abstract

Model-free Distortion Canceling and Control of Quantum Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (5)