Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

Gokul Puthumanaillam; Xiangyu Liu; Negar Mehr; Melkior Ornik

Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

Gokul Puthumanaillam, Xiangyu Liu, Negar Mehr, Melkior Ornik

TL;DR

This work addresses autonomous decision-making in environments that are stochastic, time-varying, and partially observable. It introduces Time-Varying POMDPs (TV-POMDPs) and Memory Prioritized State Estimation (MPSE) to estimate time-varying transitions using prioritized memory within a convex optimization framework for $T_t$. An MPSE-integrated planning strategy updates beliefs with the estimated $T_t$ and optimizes long-term rewards via $V_t(b)$. Empirical results from simulated marine navigation and real hardware experiments demonstrate improved estimation accuracy and planning performance over standard baselines, confirming the framework's effectiveness in time-varying stochastic domains.

Abstract

Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.

Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

TL;DR

. An MPSE-integrated planning strategy updates beliefs with the estimated

and optimizes long-term rewards via

. Empirical results from simulated marine navigation and real hardware experiments demonstrate improved estimation accuracy and planning performance over standard baselines, confirming the framework's effectiveness in time-varying stochastic domains.

Abstract

Paper Structure (18 sections, 7 equations, 5 figures, 2 tables)

This paper contains 18 sections, 7 equations, 5 figures, 2 tables.

Introduction
Related Works
Contributions
Preliminaries
Partially Observable Markov Decision Process
Time-Varying Partially Observable Markov Decision Process
Learning and Planning in a TV-POMDP
Memory Prioritized State Estimation
Memory Prioritization
Estimation of Time-Varying Transition Probability Function
Policy Optimization and Planning
Experiments and Discussions
Simulated Marine Experiment
Baselines
Simulation results
...and 3 more sections

Figures (5)

Figure 1: Time-varying environments and their effects.
Figure 2: The figures display the simulation environment and the waypoints tracked by the algorithms within a TV-POMDP (Scenario 2 with $\Delta_{\text{max}}=0.02$). If the robot's estimate from a waypoint exceeds 3m, it is marked with x indicating notable deviations. The purple lines indicate the USV's trajectory, and the black lines are the desired 150-waypoint trajectory.
Figure 3: The figures represent the hardware setup and trajectories for MPSE and baselines in a TV-POMDP (Scenario 3 with $\Delta_{\text{max}}=0.03$). Cyan corridor represents the map limits ($H_{\text{safe}}$) and red lines represent TurtleBot's trajectories. The background consists of two different terrains: the white areas depict rougher cemented terrain, while the darker areas show smoother terrain. The roughness variation between these surfaces is substantial enough to impact the TurtleBot's mobility.
Figure 4: The red line represents the robot's maximum torque, while the blue line illustrates the effect of the introduced adversarial controller. The brown line depicts the resulting slippage on the robot's wheels. To empirically validate these effects on the model's transition probabilities, we conducted trials with and without the adversarial controller. By tracking the robot's GPS transitions and comparing against the ground truth without adversarial control, we tuned the model to account for the torque and slippage impact on transitions.
Figure 5: Subfigures (a) and (b) display transition function changes in the first and second mission halves, respectively, while (c) shows maximal error over time. To ensure valid probabilities, the estimated transition probabilities are clipped between teal lines. The results depicted are after applying this clipping constraint. Although clipping alters individual probability values, it does not significantly impact the overall trends and comparative results. The key conclusions remain valid.

Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

TL;DR

Abstract

Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)