Table of Contents
Fetching ...

Inferring Implicit Goals Across Differing Task Models

Silvia Tulli, Stylianos Loukas Vasileiou, Mohamed Chetouani, Sarath Sreedharan

TL;DR

The paper tackles value alignment when users hold implicit subgoals that differ from the agent's world model by formalizing implicit subgoals within an MDP framework and exploiting bottleneck states as candidates. It introduces a two-model setup with $\mathcal{M}^R$ and $\mathcal{M}^H$, and develops a minimal-information querying strategy by formulating a query-MDP whose optimal policy yields the smallest expected query cost while ensuring a policy exists to achieve the true implicit subgoals $\mathcal{I}_G$. Key contributions include determinization-based bottleneck identification, a method to extract maximal achievable subgoal subsets $\mathbb{I}$, and a meta-policy $\Pi^Q$ that efficiently guides queries, along with empirical evaluation on grid-world benchmarks showing substantial query-time reductions and robust performance across diverse human-models. The work advances the ability to infer unstated goals under model mismatch, with practical impact for safe, value-aligned human-AI interactions and potential future integration with user studies and learning-from-queries in more complex environments. All mathematical notation is used to precisely define the problem setup, bottleneck concepts, and the query strategy.

Abstract

One of the significant challenges to generating value-aligned behavior is to not only account for the specified user objectives but also any implicit or unspecified user requirements. The existence of such implicit requirements could be particularly common in settings where the user's understanding of the task model may differ from the agent's estimate of the model. Under this scenario, the user may incorrectly expect some agent behavior to be inevitable or guaranteed. This paper addresses such expectation mismatch in the presence of differing models by capturing the possibility of unspecified user subgoal in the context of a task captured as a Markov Decision Process (MDP) and querying for it as required. Our method identifies bottleneck states and uses them as candidates for potential implicit subgoals. We then introduce a querying strategy that will generate the minimal number of queries required to identify a policy guaranteed to achieve the underlying goal. Our empirical evaluations demonstrate the effectiveness of our approach in inferring and achieving unstated goals across various tasks.

Inferring Implicit Goals Across Differing Task Models

TL;DR

The paper tackles value alignment when users hold implicit subgoals that differ from the agent's world model by formalizing implicit subgoals within an MDP framework and exploiting bottleneck states as candidates. It introduces a two-model setup with and , and develops a minimal-information querying strategy by formulating a query-MDP whose optimal policy yields the smallest expected query cost while ensuring a policy exists to achieve the true implicit subgoals . Key contributions include determinization-based bottleneck identification, a method to extract maximal achievable subgoal subsets , and a meta-policy that efficiently guides queries, along with empirical evaluation on grid-world benchmarks showing substantial query-time reductions and robust performance across diverse human-models. The work advances the ability to infer unstated goals under model mismatch, with practical impact for safe, value-aligned human-AI interactions and potential future integration with user studies and learning-from-queries in more complex environments. All mathematical notation is used to precisely define the problem setup, bottleneck concepts, and the query strategy.

Abstract

One of the significant challenges to generating value-aligned behavior is to not only account for the specified user objectives but also any implicit or unspecified user requirements. The existence of such implicit requirements could be particularly common in settings where the user's understanding of the task model may differ from the agent's estimate of the model. Under this scenario, the user may incorrectly expect some agent behavior to be inevitable or guaranteed. This paper addresses such expectation mismatch in the presence of differing models by capturing the possibility of unspecified user subgoal in the context of a task captured as a Markov Decision Process (MDP) and querying for it as required. Our method identifies bottleneck states and uses them as candidates for potential implicit subgoals. We then introduce a querying strategy that will generate the minimal number of queries required to identify a policy guaranteed to achieve the underlying goal. Our empirical evaluations demonstrate the effectiveness of our approach in inferring and achieving unstated goals across various tasks.

Paper Structure

This paper contains 13 sections, 6 theorems, 6 equations, 2 figures, 1 algorithm.

Key Result

Proposition 1

Given a model $\mathcal{M}$ and its determinization $\delta(\mathcal{M})$, a state $s$ is a bottleneck state for $\mathcal{M}$ if and only if it is a bottleneck state in $\delta(\mathcal{M})$.

Figures (2)

  • Figure 1: Bottleneck states are critical waypoints essential for reaching the goal in a given world model. Given a set of humans' world models $\mathbb{M}^{H}$, the robot has to compute a policy $\pi$ accounting for humans' $\mathbb{B}$, as they might be candidates for human implicit subgoals. Whenever the robot cannot reach a human's bottleneck due to discrepancies in world models, it queries whether this bottleneck is in fact a human subgoal.
  • Figure 2: On Top: Performance comparison between Strategic Query and Query-All approaches across four environments with 4×4 grid size. Results show mean execution times $\pm$ standard deviation based on 20 human preference models, 10% obstacle density, and 3 runs per configuration with a query threshold of 1000. Table: Basic Performance Metrics showing query counts and reduction percentages.

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Definition 6
  • ...and 3 more