CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang, Zhengran Ji, Boyuan Chen
TL;DR
CREW introduces a unified, open-source platform for real-time Human-AI teaming research that integrates extensible Unity-based environments, multimodal data collection, parallel sessions, and ML-friendly modular algorithms. It enables large-scale human-in-the-loop experiments and investigates how individual human differences affect guided-agent learning, demonstrated through benchmarking Real-Time Human-Guided RL (c-Deep TAMER) across multiple tasks with 50 participants. The study reveals significant correlations between cognitive traits and agent training performance, highlights scalability challenges in complex tasks, and establishes CREW as a scalable infrastructure for multidisciplinary, reproducible Human-AI teaming research. Overall, CREW provides a practical foundation for advancing human-AI collaboration, with broad potential for task diversification, advanced physiological analytics, and broader algorithmic benchmarking.
Abstract
With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.
