Table of Contents
Fetching ...

Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness

Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

TL;DR

TAMPURA addresses long-horizon robotic planning under partial observability and outcome uncertainty by combining belief-space task planning with a learning-driven, sparse abstract MDP. It builds on belief-state MDPs and belief-state controller MDPs, introduces belief state propositions and an abstract belief space, and grounds operators with uncertain effects via operator schemata. The core innovation is Bayes-optimistic model learning that guides deterministic planning to efficiently explore task-relevant transitions and then refines to a probabilistic sparse MDP solvable with LAO*. This approach shows superior performance on simulated long-horizon tasks and robust real-world demonstrations, enabling information gathering and safe operation under uncertainty.

Abstract

Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for TAMP with Uncertainty and Risk Awareness (TAMPURA) that is capable of efficiently solving long-horizon planning problems with initial-state and action outcome uncertainty, including problems that require information gathering and avoiding undesirable and irreversible outcomes. Our planner reasons under uncertainty at both the abstract task level and continuous controller level. Given a set of closed-loop goal-conditioned controllers operating in the primitive action space and a description of their preconditions and potential capabilities, we learn a high-level abstraction that can be solved efficiently and then refined to continuous actions for execution. We demonstrate our approach on several robotics problems where uncertainty is a crucial factor and show that reasoning under uncertainty in these problems outperforms previously proposed determinized planning, direct search, and reinforcement learning strategies. Lastly, we demonstrate our planner on two real-world robotics problems using recent advancements in probabilistic perception.

Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness

TL;DR

TAMPURA addresses long-horizon robotic planning under partial observability and outcome uncertainty by combining belief-space task planning with a learning-driven, sparse abstract MDP. It builds on belief-state MDPs and belief-state controller MDPs, introduces belief state propositions and an abstract belief space, and grounds operators with uncertain effects via operator schemata. The core innovation is Bayes-optimistic model learning that guides deterministic planning to efficiently explore task-relevant transitions and then refines to a probabilistic sparse MDP solvable with LAO*. This approach shows superior performance on simulated long-horizon tasks and robust real-world demonstrations, enabling information gathering and safe operation under uncertainty.

Abstract

Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulation assumes full observability and deterministic action effects. These assumptions limit the ability of the planner to gather information and make decisions that are risk-aware. We propose a strategy for TAMP with Uncertainty and Risk Awareness (TAMPURA) that is capable of efficiently solving long-horizon planning problems with initial-state and action outcome uncertainty, including problems that require information gathering and avoiding undesirable and irreversible outcomes. Our planner reasons under uncertainty at both the abstract task level and continuous controller level. Given a set of closed-loop goal-conditioned controllers operating in the primitive action space and a description of their preconditions and potential capabilities, we learn a high-level abstraction that can be solved efficiently and then refined to continuous actions for execution. We demonstrate our approach on several robotics problems where uncertainty is a crucial factor and show that reasoning under uncertainty in these problems outperforms previously proposed determinized planning, direct search, and reinforcement learning strategies. Lastly, we demonstrate our planner on two real-world robotics problems using recent advancements in probabilistic perception.
Paper Structure (45 sections, 10 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 45 sections, 10 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Top: Robot with wrist mounted camera looking for a banana. The robot plans to take information gathering actions based on a posterior estimate of the banana's pose shown in blue. Bottom: Robot with one wrist mounted camera and one external camera plans to complete a long-horizon manipulation task while avoiding a human in the workspace.
  • Figure 2: This figure illustrates five long-horizon planning tasks that TAMPURA is capable of solving. Each of them contains a unique type of uncertainty including uncertainty in (a) classsification, (b) pose due to noisy sensors or (c) partial observability, (d) physical properties such as friction or mass, and (e) localization/mapping due to odometry errors
  • Figure 3: Uncertainty and Risk Aware Task and Motion Planning. (a) The robot’s continuous space of probabilistic beliefs about world state is partitioned into a discrete abstract belief space, here with 9 states. TAMPURA considers a set of operators, each containing a low-level robot controller, and a description of the possible effects of executing the controller. Determinized planning computes possible sequences of controllers to reach the goal. These plans do not factor in uncertainty or risk and would be unsafe or inefficient to execute in the real world. (b) The determinized plans are executed in a mental simulation. The distribution of effects is recorded, to learn an MDP on the space of abstract belief states visited in these executions. By iterating between determinized planning and plan simulation (Sec. \ref{['sec:tampura_algorithm']}), TAMPURA learns a sparse MDP related to the original decision problem. (c) The robot calculates an uncertainty and risk aware plan in the sparse MDP, and executes it.
  • Figure 4: Comparisons of model-learning strategies on a simplified grid-world environment in which an agent must navigate from the blue cell to the green cell. Red intensity corresponds to $p$, the probability of transitioning to an irrecoverable state. $p$ for each cell is initially unknown, and must be estimated through interaction with the environment. The optimal policy given known $p$ for this sample environment is indicated with arrows. The scatter plots compare the estimated $\hat{p}$ to true $p$ at the end of model learning for several strategies across 50 different environments. The rightmost plot shows average normalized reward as a function of the number of training trajectories for our method as well as the MDP-guided method with a variety of values of epsilon. Our method quickly reaches near optimal performance, surpassing the weighted all-outcomes determinized solution under ground truth outcome probabilities.
  • Figure 5: TAMPURA moving cubes into a bowl without hitting a human in the workspace. Top row: images of robot execution. Bottom row: the robot's belief about object poses and the probabilistic occupancy grid describing the human in the workspace.
  • ...and 1 more figures