Table of Contents
Fetching ...

CREW: Facilitating Human-AI Teaming Research

Lingyu Zhang, Zhengran Ji, Boyuan Chen

TL;DR

CREW introduces a unified, open-source platform for real-time Human-AI teaming research that integrates extensible Unity-based environments, multimodal data collection, parallel sessions, and ML-friendly modular algorithms. It enables large-scale human-in-the-loop experiments and investigates how individual human differences affect guided-agent learning, demonstrated through benchmarking Real-Time Human-Guided RL (c-Deep TAMER) across multiple tasks with 50 participants. The study reveals significant correlations between cognitive traits and agent training performance, highlights scalability challenges in complex tasks, and establishes CREW as a scalable infrastructure for multidisciplinary, reproducible Human-AI teaming research. Overall, CREW provides a practical foundation for advancing human-AI collaboration, with broad potential for task diversification, advanced physiological analytics, and broader algorithmic benchmarking.

Abstract

With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.

CREW: Facilitating Human-AI Teaming Research

TL;DR

CREW introduces a unified, open-source platform for real-time Human-AI teaming research that integrates extensible Unity-based environments, multimodal data collection, parallel sessions, and ML-friendly modular algorithms. It enables large-scale human-in-the-loop experiments and investigates how individual human differences affect guided-agent learning, demonstrated through benchmarking Real-Time Human-Guided RL (c-Deep TAMER) across multiple tasks with 50 participants. The study reveals significant correlations between cognitive traits and agent training performance, highlights scalability challenges in complex tasks, and establishes CREW as a scalable infrastructure for multidisciplinary, reproducible Human-AI teaming research. Overall, CREW provides a practical foundation for advancing human-AI collaboration, with broad potential for task diversification, advanced physiological analytics, and broader algorithmic benchmarking.

Abstract

With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.
Paper Structure (24 sections, 12 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 12 figures, 5 tables, 1 algorithm.

Figures (12)

  • Figure 1: CREW is a platform to facilitate Human-AI teaming research. CREW is designed under the vision of multidisciplinary collaboration research from both human and AI science. CREW allows real-time interaction among multi-human and multi-agents while enabling extensive data collection on both AI agents and human agents.
  • Figure 2: Left: Example of adding objects to an environment in CREW through drag-and-drop. Right: Example of scaling up the complexity of hide-and-seek to a search-and-rescue task.
  • Figure 3: CREW supports multiple tasks from single agent (A: Find Treasure) to multi-agent competitive setting (B: 1v1 Hide-and-Seek), and multi-agent collaborative and competitive setting (C: NvN Hide-and-Seek). We also show different camera views supported by CREW for perceptual-motor research.
  • Figure 4: Environment generation in CREW. (A) Randomized maze. (B) Procedure generated terrains.
  • Figure 5: (A) Discrete scalar feedback. (B) Option to take control of the agent and teleoperate. (C) Continuous scalar feedback: the human can hover the mouse over this window to provide per-step feedback.
  • ...and 7 more figures