Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment
Gokul Puthumanaillam, Xiangyu Liu, Negar Mehr, Melkior Ornik
TL;DR
This work addresses autonomous decision-making in environments that are stochastic, time-varying, and partially observable. It introduces Time-Varying POMDPs (TV-POMDPs) and Memory Prioritized State Estimation (MPSE) to estimate time-varying transitions using prioritized memory within a convex optimization framework for $T_t$. An MPSE-integrated planning strategy updates beliefs with the estimated $T_t$ and optimizes long-term rewards via $V_t(b)$. Empirical results from simulated marine navigation and real hardware experiments demonstrate improved estimation accuracy and planning performance over standard baselines, confirming the framework's effectiveness in time-varying stochastic domains.
Abstract
Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.
