Table of Contents
Fetching ...

Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent

Shenghui Chen, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu

TL;DR

The paper addresses coordination between autonomous agents and humans under incomplete information by extending the shared-control game to allow multi-action turns and introducing multi-step intents. It combines a memory module that maintains a probabilistic belief over unknown human dynamics with an online planning algorithm, IntentMCTS, which augments environment rewards with multi-step intent signals during planning. Through agent-to-agent simulations in Gnomes at Night and a human-user study, the approach achieves fewer steps and control switches, higher success rates, and lower cognitive load compared to baselines, including single-step intent and heuristic controllers. These findings demonstrate that intent-aware, probability-based planning enables more efficient and satisfying long-horizon human–agent collaboration, with potential extensions to natural-language intent and data-driven intent generation.

Abstract

Strategic coordination between autonomous agents and human partners under incomplete information can be modeled as turn-based cooperative games. We extend a turn-based game under incomplete information, the shared-control game, to allow players to take multiple actions per turn rather than a single action. The extension enables the use of multi-step intent, which we hypothesize will improve performance in long-horizon tasks. To synthesize cooperative policies for the agent in this extended game, we propose an approach featuring a memory module for a running probabilistic belief of the environment dynamics and an online planning algorithm called IntentMCTS. This algorithm strategically selects the next action by leveraging any communicated multi-step intent via reward augmentation while considering the current belief. Agent-to-agent simulations in the Gnomes at Night testbed demonstrate that IntentMCTS requires fewer steps and control switches than baseline methods. A human-agent user study corroborates these findings, showing an 18.52% higher success rate compared to the heuristic baseline and a 5.56% improvement over the single-step prior work. Participants also report lower cognitive load, frustration, and higher satisfaction with the IntentMCTS agent partner.

Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent

TL;DR

The paper addresses coordination between autonomous agents and humans under incomplete information by extending the shared-control game to allow multi-action turns and introducing multi-step intents. It combines a memory module that maintains a probabilistic belief over unknown human dynamics with an online planning algorithm, IntentMCTS, which augments environment rewards with multi-step intent signals during planning. Through agent-to-agent simulations in Gnomes at Night and a human-user study, the approach achieves fewer steps and control switches, higher success rates, and lower cognitive load compared to baselines, including single-step intent and heuristic controllers. These findings demonstrate that intent-aware, probability-based planning enables more efficient and satisfying long-horizon human–agent collaboration, with potential extensions to natural-language intent and data-driven intent generation.

Abstract

Strategic coordination between autonomous agents and human partners under incomplete information can be modeled as turn-based cooperative games. We extend a turn-based game under incomplete information, the shared-control game, to allow players to take multiple actions per turn rather than a single action. The extension enables the use of multi-step intent, which we hypothesize will improve performance in long-horizon tasks. To synthesize cooperative policies for the agent in this extended game, we propose an approach featuring a memory module for a running probabilistic belief of the environment dynamics and an online planning algorithm called IntentMCTS. This algorithm strategically selects the next action by leveraging any communicated multi-step intent via reward augmentation while considering the current belief. Agent-to-agent simulations in the Gnomes at Night testbed demonstrate that IntentMCTS requires fewer steps and control switches than baseline methods. A human-agent user study corroborates these findings, showing an 18.52% higher success rate compared to the heuristic baseline and a 5.56% improvement over the single-step prior work. Participants also report lower cognitive load, frustration, and higher satisfaction with the IntentMCTS agent partner.

Paper Structure

This paper contains 28 sections, 1 theorem, 17 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let the prior belief about $\theta = b(s,a)$ follow a Beta distribution, $\theta \sim \text{Beta}(\alpha, \beta)$, i.e., We use a weighted likelihood for positive ($y = 1$) and negative ($y = 0$) evidence, with confidence factors $c^+, c^- \in \mathbb{R}^+$, where $c^+ > c^-$: The confidence factors must satisfy Upon observing new evidence $y$, the posterior expectation of $\theta$ is:

Figures (4)

  • Figure 1: Snapshot of Gnomes at Night gameplay from the agent's perspective. Left: The agent's side of the maze, with the human's intent marked in large green squares and its own intent in small red squares. Right: The agent’s probabilistic belief of the wall layout in the human’s maze.
  • Figure 2: Performance comparison of our method (orange) with three baselines, measuring steps taken (top) and control switches (bottom). The Y-axis is on a log-10 scale, and the X-axis represents the oracle episode length, indicating task difficulty.
  • Figure 3: (a, b) Box plots for steps and control switches taken, with medians labeled in white text; (c) Bar chart for average success rates; (d) Radar chart for average ratings from 7-point Likert scale survey (Lower values are preferred in all questions, e.g., 1$=$very satisfied and 7$=$not satisfied at all).
  • Figure 4: Visualization of the agent's belief of the wall layout in human’s maze at key steps in gameplay. Darker lines indicate a stronger belief in the presence of walls.

Theorems & Definitions (2)

  • Definition 1
  • Theorem 1