Table of Contents
Fetching ...

ISC-POMDPs: Partially Observed Markov Decision Processes with Initial-State Dependent Costs

Timothy L. Molloy

TL;DR

ISC-POMDPs address objectives tied to an unknown initial state by introducing costs that depend on the initial state $X_0$ and its uncertainty. The authors augment the state to form a Markovian pair $(X_0,X_k)$ as a single augmented state $S_k$ and derive a recursive Bayesian smoother $\\\xi_k$ that tracks the joint posterior $p(X_0,X_k|Y^k,U^{k-1})$. They reformulate ISC-POMDPs as ($\\rho$-)POMDPs with augmented belief costs $\\overline{\\rho}(\\xi,u) = \psi(\\xi,u) + \\sum_s\\xi(s)c(s,u)$ and show how to solve via Bellman recursion, with concavity enabling PWLC approximations. Empirical gridworld experiments demonstrate that ISC-POMDPs can actively reduce initial-state uncertainty to choose the correct goal, outperforming standard POMDPs. This framework enables robust optimization for tasks where the objective depends on unknown initial conditions, with potential impact in robotics and active perception.

Abstract

We introduce a class of partially observed Markov decision processes (POMDPs) with costs that can depend on both the value and (future) uncertainty associated with the initial state. These Initial-State Cost POMDPs (ISC-POMDPs) enable the specification of objectives relative to a priori unknown initial states, which is useful in applications such as robot navigation, controlled sensing, and active perception, that can involve controlling systems to revisit, remain near, or actively infer their initial states. By developing a recursive Bayesian fixed-point smoother to estimate the initial state that resembles the standard recursive Bayesian filter, we show that ISC-POMDPs can be treated as POMDPs with (potentially) belief-dependent costs. We demonstrate the utility of ISC-POMDPs, including their ability to select controls that resolve (future) uncertainty about (past) initial states, in simulation.

ISC-POMDPs: Partially Observed Markov Decision Processes with Initial-State Dependent Costs

TL;DR

ISC-POMDPs address objectives tied to an unknown initial state by introducing costs that depend on the initial state and its uncertainty. The authors augment the state to form a Markovian pair as a single augmented state and derive a recursive Bayesian smoother that tracks the joint posterior . They reformulate ISC-POMDPs as (-)POMDPs with augmented belief costs and show how to solve via Bellman recursion, with concavity enabling PWLC approximations. Empirical gridworld experiments demonstrate that ISC-POMDPs can actively reduce initial-state uncertainty to choose the correct goal, outperforming standard POMDPs. This framework enables robust optimization for tasks where the objective depends on unknown initial conditions, with potential impact in robotics and active perception.

Abstract

We introduce a class of partially observed Markov decision processes (POMDPs) with costs that can depend on both the value and (future) uncertainty associated with the initial state. These Initial-State Cost POMDPs (ISC-POMDPs) enable the specification of objectives relative to a priori unknown initial states, which is useful in applications such as robot navigation, controlled sensing, and active perception, that can involve controlling systems to revisit, remain near, or actively infer their initial states. By developing a recursive Bayesian fixed-point smoother to estimate the initial state that resembles the standard recursive Bayesian filter, we show that ISC-POMDPs can be treated as POMDPs with (potentially) belief-dependent costs. We demonstrate the utility of ISC-POMDPs, including their ability to select controls that resolve (future) uncertainty about (past) initial states, in simulation.

Paper Structure

This paper contains 10 sections, 5 theorems, 26 equations, 2 figures, 1 table.

Key Result

Lemma III.1

Under the constraints in the ISC-POMDP eq:iscpomdp, the initial augmented-state probabilities satisfy for $s \in \mathcal{S}$ and $x \in \mathcal{X}$; the augmented state-transition probabilities satisfy for $s, \bar{s} \in \mathcal{S}$, $x,\bar{x}, x_0 \in \mathcal{X}$, and $u \in \mathcal{U}$; and, the augmented-state measurement probabilities satisfy for $s = \mathcal{L}(x_0, x) \in \mathcal

Figures (2)

  • Figure 1: Simulation Experiment: (a) Agent must move to goal in corner of quadrant of initial state $X_0$ (agent shown must move to Q2 Goal). (b) Realizations with POMDP moving to corner closet to location $X_k$ for $k = 2$ but ISC-POMDP taking steps to estimate $X_0$ then moving to correct goal (Q4 Goal).
  • Figure 2: Simulation Results: (a) Entropy $H(X_0 | y^k, u^{k-1})$ of initial-state posterior pmf $p(x_0 | y^k, u^{k-1})$ . (b) Probability at (true) initial state $X_0$ of posterior pmf $p(x_0 | y^k, u^{k-1})$.

Theorems & Definitions (9)

  • Lemma III.1
  • proof
  • Lemma III.2
  • proof
  • Theorem III.1
  • proof
  • Corollary III.1
  • Theorem III.2
  • proof