Table of Contents
Fetching ...

ChatCollab: Exploring Collaboration Between Humans and AI Agents in Software Teams

Benjamin Klieger, Charis Charitsis, Miroslav Suzara, Sierra Wang, Nick Haber, John C. Mitchell

TL;DR

ChatCollab introduces a configurable, peer-based framework for human-AI collaboration in software teams, featuring autonomous AI agents that inhabit distinct roles and communicate via a shared event timeline in Slack. The authors develop an automated collaboration-analysis pipeline using IPA-inspired labeling and validate it with a tic-tac-toe software-development case study, comparing ChatCollab to prior multi-agent systems and demonstrating comparable or superior code quality. Key contributions include the ChatCollab architecture, a scalable prompting approach to modulate collaboration dynamics, and empirical evidence that role-specific prompts shape interaction patterns. The work underscores the practicality of hybrid human-AI teams for complex tasks and provides open-source data and code to enable further research and experimentation in collaboration dynamics and software development. The findings have implications for designing flexible, human-centered AI collaboration in real-world teams and educational settings.

Abstract

We explore the potential for productive team-based collaboration between humans and Artificial Intelligence (AI) by presenting and conducting initial tests with a general framework that enables multiple human and AI agents to work together as peers. ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role, autonomously engage in tasks and communication within Slack, and remain agnostic to whether their collaborators are human or AI. Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities, coordinate with other agents, and await requested inputs or deliverables before proceeding. In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task. We also propose an automated method for analyzing collaboration dynamics that effectively identifies behavioral characteristics of agents with distinct roles, allowing us to quantitatively compare collaboration dynamics in a range of experimental conditions. For example, in comparing ChatCollab AI agents, we find that an AI CEO agent generally provides suggestions 2-4 times more often than an AI product manager or AI developer, suggesting agents within ChatCollab can meaningfully adopt differentiated collaborative roles. Our code and data can be found at: https://github.com/ChatCollab.

ChatCollab: Exploring Collaboration Between Humans and AI Agents in Software Teams

TL;DR

ChatCollab introduces a configurable, peer-based framework for human-AI collaboration in software teams, featuring autonomous AI agents that inhabit distinct roles and communicate via a shared event timeline in Slack. The authors develop an automated collaboration-analysis pipeline using IPA-inspired labeling and validate it with a tic-tac-toe software-development case study, comparing ChatCollab to prior multi-agent systems and demonstrating comparable or superior code quality. Key contributions include the ChatCollab architecture, a scalable prompting approach to modulate collaboration dynamics, and empirical evidence that role-specific prompts shape interaction patterns. The work underscores the practicality of hybrid human-AI teams for complex tasks and provides open-source data and code to enable further research and experimentation in collaboration dynamics and software development. The findings have implications for designing flexible, human-centered AI collaboration in real-world teams and educational settings.

Abstract

We explore the potential for productive team-based collaboration between humans and Artificial Intelligence (AI) by presenting and conducting initial tests with a general framework that enables multiple human and AI agents to work together as peers. ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role, autonomously engage in tasks and communication within Slack, and remain agnostic to whether their collaborators are human or AI. Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities, coordinate with other agents, and await requested inputs or deliverables before proceeding. In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task. We also propose an automated method for analyzing collaboration dynamics that effectively identifies behavioral characteristics of agents with distinct roles, allowing us to quantitatively compare collaboration dynamics in a range of experimental conditions. For example, in comparing ChatCollab AI agents, we find that an AI CEO agent generally provides suggestions 2-4 times more often than an AI product manager or AI developer, suggesting agents within ChatCollab can meaningfully adopt differentiated collaborative roles. Our code and data can be found at: https://github.com/ChatCollab.

Paper Structure

This paper contains 28 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: ChatCollab agent schematic and event timeline.
  • Figure 2: An exchange among a software development team of five members - a human client, an AI product manager, an AI CEO, an AI developer, and an AI QA - using ChatCollab. The official transcript is in \ref{['appendix_human_ai_communication_transcripts']}
  • Figure 3: Each pie chart shows the distribution of message types sent by each agent in a ChatCollab run.
  • Figure 4: The heatmap (left) shows the percentage differences in message types across experimental conditions relative to the control condition, with raw message counts in parentheses. The bar chart (right) displays the raw counts of each message type for each experimental condition.
  • Figure 5: Comparison between LLMs, AI Agents and ChatCollab based on the criteria listed in Table \ref{['table:evaluation_criteria']}.
  • ...and 2 more figures