Table of Contents
Fetching ...

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Ali Devran Kara, Serdar Yuksel

TL;DR

This work surveys advances in partially observed Markov decision processes, focusing on regularity properties (weak Feller, Wasserstein contraction) and filter stability that enable existence results for discounted and average-cost criteria. It develops and analyzes practical approximation schemes (belief-space quantization, finite-window belief-MDPs) with explicit error bounds, and extends these ideas to reinforcement learning, showing convergence to near-optimal policies under both discounted and average-cost criteria. The results address robustness to incorrect priors and models and provide guidance for data-driven learning in POMDPs, including finite-memory and quantized-learning approaches that bridge theory and practical algorithms. Together, these contributions advance rigorous understanding of optimality, approximation, and learning in partially observed settings, with implications for real-world control under uncertainty.

Abstract

In this review/tutorial article, we present recent progress on optimal control of partially observed Markov Decision Processes (POMDPs). We first present regularity and continuity conditions for POMDPs and their belief-MDP reductions, where these constitute weak Feller and Wasserstein regularity and controlled filter stability. These are then utilized to arrive at existence results on optimal policies for both discounted and average cost problems, and regularity of value functions. Then, we study rigorous approximation results involving quantization based finite model approximations as well as finite window approximations under controlled filter stability. Finally, we present several recent reinforcement learning theoretic results which rigorously establish convergence to near optimality under both criteria.

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

TL;DR

This work surveys advances in partially observed Markov decision processes, focusing on regularity properties (weak Feller, Wasserstein contraction) and filter stability that enable existence results for discounted and average-cost criteria. It develops and analyzes practical approximation schemes (belief-space quantization, finite-window belief-MDPs) with explicit error bounds, and extends these ideas to reinforcement learning, showing convergence to near-optimal policies under both discounted and average-cost criteria. The results address robustness to incorrect priors and models and provide guidance for data-driven learning in POMDPs, including finite-memory and quantized-learning approaches that bridge theory and practical algorithms. Together, these contributions advance rigorous understanding of optimality, approximation, and learning in partially observed settings, with implications for real-world control under uncertainty.

Abstract

In this review/tutorial article, we present recent progress on optimal control of partially observed Markov Decision Processes (POMDPs). We first present regularity and continuity conditions for POMDPs and their belief-MDP reductions, where these constitute weak Feller and Wasserstein regularity and controlled filter stability. These are then utilized to arrive at existence results on optimal policies for both discounted and average cost problems, and regularity of value functions. Then, we study rigorous approximation results involving quantization based finite model approximations as well as finite window approximations under controlled filter stability. Finally, we present several recent reinforcement learning theoretic results which rigorously establish convergence to near optimality under both criteria.

Paper Structure

This paper contains 24 sections, 18 theorems, 66 equations.

Key Result

Theorem II.1

FeKaZg14 Under Assumption TV_channel, the transition probability $\eta(\cdot|z,u)$ of the filter process is weakly continuous in $(z,u)$.

Theorems & Definitions (32)

  • Theorem II.1
  • Theorem II.2
  • Definition II.1
  • Theorem II.3
  • Remark II.1
  • Definition II.2
  • Remark II.2
  • Definition II.3
  • Theorem II.4
  • Definition II.4
  • ...and 22 more