In Pursuit of Predictive Models of Human Preferences Toward AI Teammates
Ho Chit Siu, Jaime D. Peña, Yutai Zhou, Ross E. Allen
TL;DR
This work tackles the challenge of identifying objective AI-behavior metrics that predict human preferences for AI teammates. Using Hanabi as a controlled testbed, it analyzes AI-only metrics across task performance, information theory, and game theory, then collects subjective teamwork ratings from a large human sample (N=241) to assess correlations. The study finds that final team scores are only weakly predictive of human preferences, whereas information-theoretic metrics (e.g., AD-Entropy, ARD-Entropy) and instantaneous coordination show stronger associations, and certain game-theoretic actions (dominated vs. dominant moves) have robust effects on perceived teamwork. These results suggest that RL reward shaping for human-collaborative AI should balance payoff with diverse, context-aware, and rational-seeming behaviors to align with human teammates, with implications for broader human-AI collaboration beyond Hanabi.
Abstract
We seek measurable properties of AI agents that make them better or worse teammates from the subjective perspective of human collaborators. Our experiments use the cooperative card game Hanabi -- a common benchmark for AI-teaming research. We first evaluate AI agents on a set of objective metrics based on task performance, information theory, and game theory, which are measurable without human interaction. Next, we evaluate subjective human preferences toward AI teammates in a large-scale (N=241) human-AI teaming experiment. Finally, we correlate the AI-only objective metrics with the human subjective preferences. Our results refute common assumptions from prior literature on reinforcement learning, revealing new correlations between AI behaviors and human preferences. We find that the final game score a human-AI team achieves is less predictive of human preferences than esoteric measures of AI action diversity, strategic dominance, and ability to team with other AI. In the future, these correlations may help shape reward functions for training human-collaborative AI.
