Table of Contents
Fetching ...

Goal Recognition Design for General Behavioral Agents using Machine Learning

Robert Kasumba, Guanghui Yu, Chien-Ju Ho, Sarah Keren, William Yeoh

TL;DR

The paper addresses goal recognition design (GRD) by introducing a data-driven framework that learns an oracle to predict worst-case distinctiveness $wcd$ and uses a gradient-based optimization with Lagrangian relaxation to modify environments under budgets for general agent behavior. By training a CNN-based predictor from simulations and optimizing modifications, the approach achieves significant runtime improvements while reducing $wcd$ across grid-world and Overcooked-AI domains, including settings with suboptimal or human-behavior models. Human-subject experiments validate that GRD-designed environments facilitate more efficient human goal recognition, highlighting practical value for human–AI collaboration. The findings demonstrate scalability, adaptability to diverse constraints, and robustness to non-ideal agent behavior, marking a notable advance in data-driven GRD methodologies.

Abstract

Goal recognition design (GRD) aims to make limited modifications to decision-making environments to make it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we leverage machine learning methods for goal recognition design that can both improve run-time efficiency and account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness (wcd) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the wcd for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing wcd and enhances runtime efficiency. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we conducted human-subject experiments that demonstrate that our method creates environments that facilitate efficient goal recognition from human decision-makers.

Goal Recognition Design for General Behavioral Agents using Machine Learning

TL;DR

The paper addresses goal recognition design (GRD) by introducing a data-driven framework that learns an oracle to predict worst-case distinctiveness and uses a gradient-based optimization with Lagrangian relaxation to modify environments under budgets for general agent behavior. By training a CNN-based predictor from simulations and optimizing modifications, the approach achieves significant runtime improvements while reducing across grid-world and Overcooked-AI domains, including settings with suboptimal or human-behavior models. Human-subject experiments validate that GRD-designed environments facilitate more efficient human goal recognition, highlighting practical value for human–AI collaboration. The findings demonstrate scalability, adaptability to diverse constraints, and robustness to non-ideal agent behavior, marking a notable advance in data-driven GRD methodologies.

Abstract

Goal recognition design (GRD) aims to make limited modifications to decision-making environments to make it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we leverage machine learning methods for goal recognition design that can both improve run-time efficiency and account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness (wcd) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the wcd for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing wcd and enhances runtime efficiency. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we conducted human-subject experiments that demonstrate that our method creates environments that facilitate efficient goal recognition from human decision-makers.
Paper Structure (31 sections, 2 equations, 10 figures, 5 tables)

This paper contains 31 sections, 2 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: We first train a wcd predictor from simulated data. We then perform gradient-based optimization that leverages the predictor to identify environment modifications that minimize wcd with a given agent behavior model.
  • Figure 2: The benchmark environments. The left one is a grid world. The agent starts at a position marked 'S' and aims to reach one of the goal positions labeled 'G'. The agent must navigate through the grid, avoiding blocked cells marked with 'x'. The right one is an Overcooked-AI setting, where the agent's objective is to pick up ingredients and complete their target recipe, which constitutes their goal.
  • Figure 3: wcd reduction in a grid world when only blocking modifications are allowed. Exhaustive search and Pruned-Reduce are not included in (b) because they take more than an hour to compute for a single environment.
  • Figure 4: wcd reduction in settings with two types of modifications. We only included greedy as the state-of-the-art baselines - Exhaustive search and Pruned-Reduce are not applicable in these settings.
  • Figure 5: Performance in other settings: (a) Overcooked-AI environment with optimal agent behavior, and (b) Grid world ($6 \times 6$) with suboptimal agent behavior, demonstrating the method's adaptability to non-optimal decision-making.
  • ...and 5 more figures