Table of Contents
Fetching ...

Feeling Optimistic? Ambiguity Attitudes for Online Decision Making

Jared J. Beard, R. Michael Butts, Yu Gu

TL;DR

This work tackles decision making under ambiguity by modeling a set of plausible transition models with Ambiguity MDPs (AMDPs) and representing uncertainty through belief functions. It introduces Ambiguity Attitude Graph Search (AAGS), a graph-based planner that blends lower and upper expectations via an ambiguity attitude parameter $\alpha$ to balance robustness and exploration, and it provides a method to compute belief functions from confidence intervals using a linear program. The approach is demonstrated in sailing-domain simulations with high-entropy transitions, showing that tuning $\alpha$ can improve outcomes and avoid failure modes common to robust methods. The contributions extend beyond safety-critical systems by generalizing robust decision making to ambiguity-aware planning, with accompanying open-source code for broader use.

Abstract

Due to the complexity of many decision making problems, tree search algorithms often have inadequate information to produce accurate transition models. This results in ambiguities (uncertainties for which there are multiple plausible models). Faced with ambiguities, robust methods have been used to produce safe solutions--often by maximizing the lower bound over the set of plausible transition models. However, they often overlook how much the representation of uncertainty can impact how a decision is made. This work introduces the Ambiguity Attitude Graph Search (AAGS), advocating for more comprehensive representations of ambiguities in decision making. Additionally, AAGS allows users to adjust their ambiguity attitude (or preference), promoting exploration and improving users' ability to control how an agent should respond when faced with a set of plausible alternatives. Simulation in a dynamic sailing environment shows how environments with high entropy transition models can lead robust methods to fail. Results further demonstrate how adjusting ambiguity attitudes better fulfills objectives while mitigating this failure mode of robust approaches. Because this approach is a generalization of the robust framework, these results further demonstrate how algorithms focused on ambiguity have applicability beyond safety-critical systems.

Feeling Optimistic? Ambiguity Attitudes for Online Decision Making

TL;DR

This work tackles decision making under ambiguity by modeling a set of plausible transition models with Ambiguity MDPs (AMDPs) and representing uncertainty through belief functions. It introduces Ambiguity Attitude Graph Search (AAGS), a graph-based planner that blends lower and upper expectations via an ambiguity attitude parameter to balance robustness and exploration, and it provides a method to compute belief functions from confidence intervals using a linear program. The approach is demonstrated in sailing-domain simulations with high-entropy transitions, showing that tuning can improve outcomes and avoid failure modes common to robust methods. The contributions extend beyond safety-critical systems by generalizing robust decision making to ambiguity-aware planning, with accompanying open-source code for broader use.

Abstract

Due to the complexity of many decision making problems, tree search algorithms often have inadequate information to produce accurate transition models. This results in ambiguities (uncertainties for which there are multiple plausible models). Faced with ambiguities, robust methods have been used to produce safe solutions--often by maximizing the lower bound over the set of plausible transition models. However, they often overlook how much the representation of uncertainty can impact how a decision is made. This work introduces the Ambiguity Attitude Graph Search (AAGS), advocating for more comprehensive representations of ambiguities in decision making. Additionally, AAGS allows users to adjust their ambiguity attitude (or preference), promoting exploration and improving users' ability to control how an agent should respond when faced with a set of plausible alternatives. Simulation in a dynamic sailing environment shows how environments with high entropy transition models can lead robust methods to fail. Results further demonstrate how adjusting ambiguity attitudes better fulfills objectives while mitigating this failure mode of robust approaches. Because this approach is a generalization of the robust framework, these results further demonstrate how algorithms focused on ambiguity have applicability beyond safety-critical systems.
Paper Structure (11 sections, 1 theorem, 7 equations, 5 figures, 2 algorithms)

This paper contains 11 sections, 1 theorem, 7 equations, 5 figures, 2 algorithms.

Key Result

Lemma 1

Let $U \cup L \cup \Theta$ be the entire outcome space of the decision, where $U$ has a reward equal to the upper bound of the outcome space and $L$ the lower bound. For a given belief function $b$ over $U \cup L \cup \Theta$, inclusion of any other element $\omega= (s',r) : L \leq r \leq U$ does no

Figures (5)

  • Figure 1: Suppose an agent (blue) needs to reach a goal (green). The agent is penalized for time, but receives increasing reward in darker cells. When conducting tree search, each node will have a limited number of samples to approximate the transition model. From these samples, multiple equally valid models may be inferred, resulting in ambiguity. Robust methods assume this ambiguity should be avoided and aim to improve the worst-case outcomes. This may require long planning horizons to escape local minima and reach the goal (top). Conversely, agents with more optimistic attitudes, as this work demonstrates, seek ambiguity to explore their environment and reach goals beyond their planning horizon (bottom).
  • Figure 2: Our approach accepts multinomial distributions estimated by collecting samples from a blackbox simulator. (a) Using confidence intervals of size $\epsilon$ for each mass term, we constrain the model. This yields upper $Pl$ and lower $Bel$ estimates on the mass of each outcome. (b) Using these bounds, we distribute the mass to a belief function. Such a formulation assigns mass to multiple valid distributions when information is ambiguous.
  • Figure 3: Example of a sailing world environment; the agent (blue triangle) aims to reach the goal in green and is increasingly rewarded (darker regions) as it gets closer to the goal (green). Concurrently, the agent must try to avoid boundaries and follow the wind (red triangles).
  • Figure 4: (a)The average steps in the reefs for UCT and robust solutions (AAGS at $\alpha = 0$ and GBOP) and (b) the average steps AAGS spent in reefs for different $\alpha$ values. GBOP often became mired in the reefs. However, UCT and robust AAGS actively avoided them. By increasing $\alpha$, AAGS is incrementally more likely to spend time in reefs.
  • Figure 5: (a) Reward for UCT and robust solutions (AAGS at $\alpha = 0$ and GBOP) and (b) reward achieved by AAGS by varying $\alpha$. Focusing on raising the lower bound, robust attitudes achieve lower reward. Conversely, slightly positive attitudes can outperform ambiguity neutral methods (UCT). When attitudes are too optimistic ($\alpha=1$), unnecessary gambles harm performance.

Theorems & Definitions (1)

  • Lemma 1