Strategies in POMDPs with Stage Duration

Ivan Novikov

Strategies in POMDPs with Stage Duration

Ivan Novikov

Abstract

Partially observable Markov decision processes (POMDPs) with stage duration provide a framework for approximating continuous-time behavior by scaling transition probabilities with a stage duration parameter $h \in (0,1]$. While previous literature has primarily focused on the limit of the discounted value as the stage duration $h$ vanishes, this paper investigates the global behavior of the asymptotic value, $V(h)$, across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration $h$ can be mimicked in the base POMDP ($h=1$). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration $h$, there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function $V(h)$ is nondecreasing with respect to $h$, and that the continuous-time limit $\lim_{h \to 0} V(h)$ exists.

Strategies in POMDPs with Stage Duration

Abstract

. While previous literature has primarily focused on the limit of the discounted value as the stage duration

vanishes, this paper investigates the global behavior of the asymptotic value,

, across varying stage durations. Our main result demonstrates that any strategy in a POMDP with stage duration

can be mimicked in the base POMDP (

). Specifically, we provide an explicit construction showing that for any strategy in the POMDP with stage duration

, there exists a strategy in the base POMDP that secures the same asymptotic payoff. As a consequence of this theorem, we establish that the value function

is nondecreasing with respect to

, and that the continuous-time limit

exists.

Paper Structure (5 sections, 4 theorems, 12 equations)

This paper contains 5 sections, 4 theorems, 12 equations.

Introduction
POMDPs with stage duration
Main results
Proof of Theorem 1
Construction of the strategy $\widehat{\sigma}$

Key Result

Theorem 1

Let $h \in (0,1)$ and let $\sigma$ be a strategy in $G_h$. Then there exists a strategy $\widehat{\sigma}$ in $G_1$ such that

Theorems & Definitions (14)

Definition 1: POMDP with stage duration, Nov24b and Ney13
Remark 1
proof
Definition 2: cf. Ara93
Remark 2
proof
Theorem 1
Corollary 1
Corollary 2
Corollary 3
...and 4 more

Strategies in POMDPs with Stage Duration

Abstract

Strategies in POMDPs with Stage Duration

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (14)