Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

Naci Saldi; Sina Sanjari; Serdar Yuksel

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

Naci Saldi, Sina Sanjari, Serdar Yuksel

TL;DR

The paper develops SDP-based methods for solving quantum MDPs (q-MDPs) under discounting, focusing on two policy classes: open-loop and classical-state-preserving closed-loop (qw-MDP). By establishing a duality between dynamic programming and SDP formulations, the authors show that optimal value functions are linear in the state (density operator) and that stationary optimal policies exist for both policy classes. They provide DP operators, SDP formulations, dual problems, and practical approximation schemes, including bi-linear programs to compute stationary policies and finite-state approximations of the value function. The framework unifies classical MDP techniques with quantum dynamics, extends to quantum-classical policy embeddings, and offers tractable computational tools while pointing to future directions such as solving non-convex bi-linear problems and mean-field extensions with potential quantum advantages.

Abstract

In this paper, building on the formulation of quantum Markov decision processes (q-MDPs) presented in our previous work [{\sc N.~Saldi, S.~Sanjari, and S.~Yüksel}, {\em Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies}, SIAM Journal on Control and Optimization, 2024], our focus shifts to the development of semi-definite programming approaches for optimal policies and value functions of both open-loop and classical-state-preserving closed-loop policies. First, by using the duality between the dynamic programming and the semi-definite programming formulations of any q-MDP with open-loop policies, we establish that the optimal value function is linear and there exists a stationary optimal policy among open-loop policies. Then, using these results, we establish a method for computing an approximately optimal value function and formulate computation of optimal stationary open-loop policy as a bi-linear program. Next, we turn our attention to classical-state-preserving closed-loop policies. Dynamic programming and semi-definite programming formulations for classical-state-preserving closed-loop policies are established, where duality of these two formulations similarly enables us to prove that the optimal policy is linear and there exists an optimal stationary classical-state-preserving closed-loop policy. Then, similar to the open-loop case, we establish a method for computing the optimal value function and pose computation of optimal stationary classical-state-preserving closed-loop policies as a bi-linear program.

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

TL;DR

Abstract

Paper Structure (26 sections, 17 theorems, 150 equations, 1 figure, 2 algorithms)

This paper contains 26 sections, 17 theorems, 150 equations, 1 figure, 2 algorithms.

Introduction
Contributions
Quantum Markov Decision Processes
Deterministic Reduction of Classical MDPs
Quantum Systems
Quantum Markov Decision Processes
Open-Loop and Classical-state-preserving Closed-loop Policies
Open-loop Quantum Policies
Classical Quantum Policies
Classical-state-preserving Closed-loop Quantum Policies
Algorithms for Open-loop Quantum Policies
Dynamic Programming for q-MDP with Open-loop Policies
SDP Formulation of q-MDP with Open-loop Policies
Computation of Optimal Cost Function and Policy
Algorithms for Classical-state-preserving Closed-loop Quantum Policies
...and 11 more sections

Key Result

Proposition 1

Let $\gamma: {\mathcal{D}}({\mathcal{H}}_{{\mathsf X}}) \rightarrow {\mathcal{D}}({\mathcal{H}}_{{\mathsf X}}\otimes{\mathcal{H}}_{{\mathsf A}})$ be a quantum channel. It satisfies the reversibility condition: $\mathbb{\rm Tr}_{{\mathsf A}}(\gamma(\rho)) = \rho$ for all $\rho \in {\mathcal{D}}({\mat

Figures (1)

Figure 1: Hierarchy of policies: (i) (History dependent) quantum policies utilize all historical information (and such policies may not be physically realizable, See Remark \ref{['realizability']}). (ii) Markov quantum policies rely solely on current state information without any additional constraints (and may not be physically realizable). (iii) Classical-state-preserving closed-loop quantum policies use the current state information but with a relaxed invertibility condition on the quantum channel, mirroring the classical invertibility condition. (iv) Open-loop quantum policies use current state information but impose a stricter invertibility condition on the quantum channel, again reflecting the classical case. (v) Classical quantum policies represent an embedding of classical policies within a quantum framework. We refer the reader to Remark \ref{['realizability']} on physical realizability.

Theorems & Definitions (41)

Definition 1
Definition 2
Remark 1
Definition 3
Proposition 1
Definition 4
Definition 5
Definition 6
Proposition 2
Remark 2
...and 31 more

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

TL;DR

Abstract

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (41)