Table of Contents
Fetching ...

Strategies in POMDPs with Stage Duration

Ivan Novikov

Abstract

Partially observable Markov decision processes (POMDPs) with stage duration provide a framework for approximating continuous-time behavior by scaling transition probabilities with a stage duration parameter $h \in (0,1]$. While previous literature has primarily focused on the limit of the discounted value as the stage duration $h$ vanishes, this paper investigates the global behavior of the asymptotic value, $V(h)$, across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration $h$ can be mimicked in the base POMDP ($h=1$). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration $h$, there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function $V(h)$ is nondecreasing with respect to $h$, and that the continuous-time limit $\lim_{h \to 0} V(h)$ exists.

Strategies in POMDPs with Stage Duration

Abstract

Partially observable Markov decision processes (POMDPs) with stage duration provide a framework for approximating continuous-time behavior by scaling transition probabilities with a stage duration parameter . While previous literature has primarily focused on the limit of the discounted value as the stage duration vanishes, this paper investigates the global behavior of the asymptotic value, , across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration can be mimicked in the base POMDP (). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration , there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function is nondecreasing with respect to , and that the continuous-time limit exists.
Paper Structure (5 sections, 4 theorems, 12 equations)

This paper contains 5 sections, 4 theorems, 12 equations.

Key Result

Theorem 1

Let $h \in (0,1)$ and let $\sigma$ be a strategy in $G_h$. Then there exists a strategy $\widehat{\sigma}$ in $G_1$ such that

Theorems & Definitions (14)

  • Definition 1: POMDP with stage duration, Nov24b and Ney13
  • Remark 1
  • proof
  • Definition 2: cf. Ara93
  • Remark 2
  • proof
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Corollary 3
  • ...and 4 more