Table of Contents
Fetching ...

In Pursuit of Predictive Models of Human Preferences Toward AI Teammates

Ho Chit Siu, Jaime D. Peña, Yutai Zhou, Ross E. Allen

TL;DR

This work tackles the challenge of identifying objective AI-behavior metrics that predict human preferences for AI teammates. Using Hanabi as a controlled testbed, it analyzes AI-only metrics across task performance, information theory, and game theory, then collects subjective teamwork ratings from a large human sample (N=241) to assess correlations. The study finds that final team scores are only weakly predictive of human preferences, whereas information-theoretic metrics (e.g., AD-Entropy, ARD-Entropy) and instantaneous coordination show stronger associations, and certain game-theoretic actions (dominated vs. dominant moves) have robust effects on perceived teamwork. These results suggest that RL reward shaping for human-collaborative AI should balance payoff with diverse, context-aware, and rational-seeming behaviors to align with human teammates, with implications for broader human-AI collaboration beyond Hanabi.

Abstract

We seek measurable properties of AI agents that make them better or worse teammates from the subjective perspective of human collaborators. Our experiments use the cooperative card game Hanabi -- a common benchmark for AI-teaming research. We first evaluate AI agents on a set of objective metrics based on task performance, information theory, and game theory, which are measurable without human interaction. Next, we evaluate subjective human preferences toward AI teammates in a large-scale (N=241) human-AI teaming experiment. Finally, we correlate the AI-only objective metrics with the human subjective preferences. Our results refute common assumptions from prior literature on reinforcement learning, revealing new correlations between AI behaviors and human preferences. We find that the final game score a human-AI team achieves is less predictive of human preferences than esoteric measures of AI action diversity, strategic dominance, and ability to team with other AI. In the future, these correlations may help shape reward functions for training human-collaborative AI.

In Pursuit of Predictive Models of Human Preferences Toward AI Teammates

TL;DR

This work tackles the challenge of identifying objective AI-behavior metrics that predict human preferences for AI teammates. Using Hanabi as a controlled testbed, it analyzes AI-only metrics across task performance, information theory, and game theory, then collects subjective teamwork ratings from a large human sample (N=241) to assess correlations. The study finds that final team scores are only weakly predictive of human preferences, whereas information-theoretic metrics (e.g., AD-Entropy, ARD-Entropy) and instantaneous coordination show stronger associations, and certain game-theoretic actions (dominated vs. dominant moves) have robust effects on perceived teamwork. These results suggest that RL reward shaping for human-collaborative AI should balance payoff with diverse, context-aware, and rational-seeming behaviors to align with human teammates, with implications for broader human-AI collaboration beyond Hanabi.

Abstract

We seek measurable properties of AI agents that make them better or worse teammates from the subjective perspective of human collaborators. Our experiments use the cooperative card game Hanabi -- a common benchmark for AI-teaming research. We first evaluate AI agents on a set of objective metrics based on task performance, information theory, and game theory, which are measurable without human interaction. Next, we evaluate subjective human preferences toward AI teammates in a large-scale (N=241) human-AI teaming experiment. Finally, we correlate the AI-only objective metrics with the human subjective preferences. Our results refute common assumptions from prior literature on reinforcement learning, revealing new correlations between AI behaviors and human preferences. We find that the final game score a human-AI team achieves is less predictive of human preferences than esoteric measures of AI action diversity, strategic dominance, and ability to team with other AI. In the future, these correlations may help shape reward functions for training human-collaborative AI.

Paper Structure

This paper contains 24 sections, 7 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Letter-value plot of subjective teamwork rating statistics for each AI agent used in human-AI teaming experiments.
  • Figure 2: Scores during human-AI games vs human teamwork ratings. There is a medium to strong statistically significant correlation. Blue is data with RandomBot, green is without. The data from both the x and y axes come from the same games, so there is no need to take the mean of one side as in most other plots.
  • Figure 3: Teamwork rating vs self-play, intra-XP, and inter-XP scores. Correlation values match the colors of the associated lines (blue is with RandomBot, green is without). Intra-algorithm cross-play (intra-XP) cannot be evaluated on non-learning-based agents, thus rule-base bots (RandomBot, SmartBot, HolmesBot) do not appear in the second plot.
  • Figure 4: Information-theoretic metrics vs teamwork rating. Note that while the y-axes and colorbar are the same across all plots, the x-axes are not.
  • Figure 5: Frequency of discarding a card known to be presently playable (G1), playing a card known to be presently unplayable (G2), and playing a card known to be presently playable (G3) vs teamwork rating. G1 and G2 are dominated moves, while G3 is a dominant move. Note that while the y axes and colorbar are the same across all plots, the x axes are not.
  • ...and 6 more figures