Quantum reinforcement learning in continuous action space
Shaojun Wu, Shan Jin, Dingding Wen, Donghong Han, Xiaoting Wang
TL;DR
This work tackles learning in continuous-action quantum reinforcement learning by introducing a quantum Deep Deterministic Policy Gradient (DDPG) framework that uses variational quantum neural networks to model both the policy and value functions. The approach enables single-shot quantum state generation: a one-time optimization yields a model that outputs a sequence of parametric unitaries $U_a(\bm\theta_t)$ capable of driving any $|s_0\rangle$ to a target $|s_d\rangle$, with the inverse sequence allowing recovery of $|s_0\rangle$ from $|s_d\rangle$. The authors demonstrate applications to quantum state generation and eigenvalue problems by embedding the environment in quantum registers and using quantum phase estimation as part of the reward structure, achieving high overlap values (e.g., $p_{t+1}$ near 1) in simulations for one- and two-qubit systems. A complexity analysis shows the method requires $K=\mathcal{O}(1/\epsilon^2)$ measurements and yields gate complexities that scale with the problem size similarly to other quantum-classical hybrid methods like VQE, highlighting its potential for near-term quantum devices and broader quantum-control tasks.
Abstract
Quantum reinforcement learning (QRL) is a promising paradigm for near-term quantum devices. While existing QRL methods have shown success in discrete action spaces, extending these techniques to continuous domains is challenging due to the curse of dimensionality introduced by discretization. To overcome this limitation, we introduce a quantum Deep Deterministic Policy Gradient (DDPG) algorithm that efficiently addresses both classical and quantum sequential decision problems in continuous action spaces. Moreover, our approach facilitates single-shot quantum state generation: a one-time optimization produces a model that outputs the control sequence required to drive a fixed initial state to any desired target state. In contrast, conventional quantum control methods demand separate optimization for each target state. We demonstrate the effectiveness of our method through simulations and discuss its potential applications in quantum control.
