Table of Contents
Fetching ...

Intermittently Observable Markov Decision Processes

Gongpu Chen, Soung-Chang Liew

TL;DR

This work tackles optimal control of Markov decision processes under intermittent state information caused by unreliable channels, modeling information loss with a Bernoulli parameter $\rho$. It develops a belief MDP ${\cal B}(\rho)$ to study value changes under information losses and a tree-MDP ${\cal C}(\rho)$ to enable finite-state approximations. The authors introduce TA($L$) and the higher-order TA($n,L$) truncations, along with a nested value iteration (NVI) method that speeds up convergence and scales to larger problems. Numerical results, including an unmanned boat case study, demonstrate the effectiveness and scalability of the proposed approaches for intermittent observability in wireless sensing scenarios.

Abstract

This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.

Intermittently Observable Markov Decision Processes

TL;DR

This work tackles optimal control of Markov decision processes under intermittent state information caused by unreliable channels, modeling information loss with a Bernoulli parameter . It develops a belief MDP to study value changes under information losses and a tree-MDP to enable finite-state approximations. The authors introduce TA() and the higher-order TA() truncations, along with a nested value iteration (NVI) method that speeds up convergence and scales to larger problems. Numerical results, including an unmanned boat case study, demonstrate the effectiveness and scalability of the proposed approaches for intermittent observability in wireless sensing scenarios.

Abstract

This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.
Paper Structure (14 sections, 12 theorems, 105 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 12 theorems, 105 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

For any $\rho\in (0,1]$, $\phi ({\bf{b}},\rho )$ is a piecewise linear and convex function of $\mathbf{b} \in \Delta_{\cal S}$.

Figures (5)

  • Figure 1: Sufficient histories organized in a tree structure (${\cal S}={\cal A}=\{1,2\}$).
  • Figure 2: Transition probabilities of taking action $a\in {\cal A}$ in $h\in {\cal G}_L$ (${\cal S}=\{1,2\}$).
  • Figure 3: State space of TA(1,3), ${\cal S}={\cal A}=\{1,2\},\mu^0_L(1)=1, \mu^0_L(2)=1$. The redundant position states at layer 1 and their descendants at layers 2 and 3 are removed.
  • Figure 4: Values and computation time of TA($L$) and TA($n,L$) policies.
  • Figure 5: Environment of the unmanned Boat MDP

Theorems & Definitions (16)

  • Lemma 1
  • Theorem 2
  • Corollary 3
  • Lemma 4
  • Theorem 5
  • Lemma 6
  • Definition 7: $L$-Truncated Approximation, TA($L$)
  • Theorem 8
  • Definition 9: TA($L$) Policy
  • Definition 10
  • ...and 6 more