Intermittently Observable Markov Decision Processes
Gongpu Chen, Soung-Chang Liew
TL;DR
This work tackles optimal control of Markov decision processes under intermittent state information caused by unreliable channels, modeling information loss with a Bernoulli parameter $\rho$. It develops a belief MDP ${\cal B}(\rho)$ to study value changes under information losses and a tree-MDP ${\cal C}(\rho)$ to enable finite-state approximations. The authors introduce TA($L$) and the higher-order TA($n,L$) truncations, along with a nested value iteration (NVI) method that speeds up convergence and scales to larger problems. Numerical results, including an unmanned boat case study, demonstrate the effectiveness and scalability of the proposed approaches for intermittent observability in wireless sensing scenarios.
Abstract
This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.
