Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning
Ali Devran Kara, Serdar Yuksel
TL;DR
This work surveys advances in partially observed Markov decision processes, focusing on regularity properties (weak Feller, Wasserstein contraction) and filter stability that enable existence results for discounted and average-cost criteria. It develops and analyzes practical approximation schemes (belief-space quantization, finite-window belief-MDPs) with explicit error bounds, and extends these ideas to reinforcement learning, showing convergence to near-optimal policies under both discounted and average-cost criteria. The results address robustness to incorrect priors and models and provide guidance for data-driven learning in POMDPs, including finite-memory and quantized-learning approaches that bridge theory and practical algorithms. Together, these contributions advance rigorous understanding of optimality, approximation, and learning in partially observed settings, with implications for real-world control under uncertainty.
Abstract
In this review/tutorial article, we present recent progress on optimal control of partially observed Markov Decision Processes (POMDPs). We first present regularity and continuity conditions for POMDPs and their belief-MDP reductions, where these constitute weak Feller and Wasserstein regularity and controlled filter stability. These are then utilized to arrive at existence results on optimal policies for both discounted and average cost problems, and regularity of value functions. Then, we study rigorous approximation results involving quantization based finite model approximations as well as finite window approximations under controlled filter stability. Finally, we present several recent reinforcement learning theoretic results which rigorously establish convergence to near optimality under both criteria.
