Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma, Guy Davidson, Khimya Khetarpal, Anssi Kanervisto, Udit Arora, Katja Hofmann, Ida Momennejad
TL;DR
The paper tackles human-AI alignment in large-scale multiplayer games by introducing a Task-sets framework to derive an interpretable behavioral manifold with axes Fight-Flight, Explore-Exploit, and Solo-Multi-Agent. It analyzes ~100K Bleeding Edge games to extract human behavioral patterns and trains a ~222M-parameter Generative Pretrained Causal Transformer via behavior cloning, projecting both human and AI behaviors onto the same manifold for comparison. Findings show substantial human variability along the three axes, while the AI agent exhibits uniform, predominantly solo behavior, highlighting alignment gaps. The framework enables interpretable evaluation of alignment and offers a pathway to targeted agent design and broader application in human-centered AI development.
Abstract
Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
