Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions
Naci Saldi, Sina Sanjari, Serdar Yuksel
TL;DR
The paper develops SDP-based methods for solving quantum MDPs (q-MDPs) under discounting, focusing on two policy classes: open-loop and classical-state-preserving closed-loop (qw-MDP). By establishing a duality between dynamic programming and SDP formulations, the authors show that optimal value functions are linear in the state (density operator) and that stationary optimal policies exist for both policy classes. They provide DP operators, SDP formulations, dual problems, and practical approximation schemes, including bi-linear programs to compute stationary policies and finite-state approximations of the value function. The framework unifies classical MDP techniques with quantum dynamics, extends to quantum-classical policy embeddings, and offers tractable computational tools while pointing to future directions such as solving non-convex bi-linear problems and mean-field extensions with potential quantum advantages.
Abstract
In this paper, building on the formulation of quantum Markov decision processes (q-MDPs) presented in our previous work [{\sc N.~Saldi, S.~Sanjari, and S.~Yüksel}, {\em Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies}, SIAM Journal on Control and Optimization, 2024], our focus shifts to the development of semi-definite programming approaches for optimal policies and value functions of both open-loop and classical-state-preserving closed-loop policies. First, by using the duality between the dynamic programming and the semi-definite programming formulations of any q-MDP with open-loop policies, we establish that the optimal value function is linear and there exists a stationary optimal policy among open-loop policies. Then, using these results, we establish a method for computing an approximately optimal value function and formulate computation of optimal stationary open-loop policy as a bi-linear program. Next, we turn our attention to classical-state-preserving closed-loop policies. Dynamic programming and semi-definite programming formulations for classical-state-preserving closed-loop policies are established, where duality of these two formulations similarly enables us to prove that the optimal policy is linear and there exists an optimal stationary classical-state-preserving closed-loop policy. Then, similar to the open-loop case, we establish a method for computing the optimal value function and pose computation of optimal stationary classical-state-preserving closed-loop policies as a bi-linear program.
