Table of Contents
Fetching ...

Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats

Mateusz Jacniacki, Martí Carmona Serrat

TL;DR

This work tackles the challenge of enabling AI to act as a natural facilitator in asynchronous multi-person group chats. It introduces HUMA, an LLM-based facilitator built on an event-driven architecture with three components (Router, Action Agent, Reflection) to manage when to speak, who to address, and how to handle interruptions, including realistic typing delays. The method extends the MUCA 3W framework with 20 strategies, timing regularization, and tool-use constraints to support diverse, context-sensitive participation. In a controlled study with 97 participants across four-person role-play chats, HUMA was nearly indistinguishable from human community managers and yielded comparable subjective experience measures, suggesting practical viability for scalable, trustworthy group chat facilitation.

Abstract

Conversational agents built on large language models (LLMs) are becoming increasingly prevalent, yet most systems are designed for one-on-one, turn-based exchanges rather than natural, asynchronous group chats. As AI assistants become widespread throughout digital platforms, from virtual assistants to customer service, developing natural and humanlike interaction patterns seems crucial for maintaining user trust and engagement. We present the Humanlike Multi-user Agent (HUMA), an LLM-based facilitator that participates in multi-party conversations using human-like strategies and timing. HUMA extends prior multi-user chatbot work with an event-driven architecture that handles messages, replies, reactions and introduces realistic response-time simulation. HUMA comprises three components-Router, Action Agent, and Reflection-which together adapt LLMs to group conversation dynamics. We evaluate HUMA in a controlled study with 97 participants in four-person role-play chats, comparing AI and human community managers (CMs). Participants classified CMs as human at near-chance rates in both conditions, indicating they could not reliably distinguish HUMA agents from humans. Subjective experience was comparable across conditions: community-manager effectiveness, social presence, and engagement/satisfaction differed only modestly with small effect sizes. Our results suggest that, in natural group chat settings, an AI facilitator can match human quality while remaining difficult to identify as nonhuman.

Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats

TL;DR

This work tackles the challenge of enabling AI to act as a natural facilitator in asynchronous multi-person group chats. It introduces HUMA, an LLM-based facilitator built on an event-driven architecture with three components (Router, Action Agent, Reflection) to manage when to speak, who to address, and how to handle interruptions, including realistic typing delays. The method extends the MUCA 3W framework with 20 strategies, timing regularization, and tool-use constraints to support diverse, context-sensitive participation. In a controlled study with 97 participants across four-person role-play chats, HUMA was nearly indistinguishable from human community managers and yielded comparable subjective experience measures, suggesting practical viability for scalable, trustworthy group chat facilitation.

Abstract

Conversational agents built on large language models (LLMs) are becoming increasingly prevalent, yet most systems are designed for one-on-one, turn-based exchanges rather than natural, asynchronous group chats. As AI assistants become widespread throughout digital platforms, from virtual assistants to customer service, developing natural and humanlike interaction patterns seems crucial for maintaining user trust and engagement. We present the Humanlike Multi-user Agent (HUMA), an LLM-based facilitator that participates in multi-party conversations using human-like strategies and timing. HUMA extends prior multi-user chatbot work with an event-driven architecture that handles messages, replies, reactions and introduces realistic response-time simulation. HUMA comprises three components-Router, Action Agent, and Reflection-which together adapt LLMs to group conversation dynamics. We evaluate HUMA in a controlled study with 97 participants in four-person role-play chats, comparing AI and human community managers (CMs). Participants classified CMs as human at near-chance rates in both conditions, indicating they could not reliably distinguish HUMA agents from humans. Subjective experience was comparable across conditions: community-manager effectiveness, social presence, and engagement/satisfaction differed only modestly with small effect sizes. Our results suggest that, in natural group chat settings, an AI facilitator can match human quality while remaining difficult to identify as nonhuman.

Paper Structure

This paper contains 26 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: HUMA system architecture. Events from the chat platform trigger a three-stage workflow: the Router selects an appropriate conversational strategy, the Action Agent executes it using available tools, and the Reflection component synthesizes context for future iterations. The workflow can be interrupted by new events, enabling natural adaptation to rapid conversation dynamics.
  • Figure 2: AI detection results showing the percentage of participants classifying the community manager as "human" in each condition, with 95% Wilson score confidence intervals. Both conditions cluster near chance level (50%, dashed line). Participants exhibited symmetric confusion, unable to distinguish AI from human community managers.
  • Figure 3: Survey scale scores by condition. Bars represent mean scores with standard error of the mean. Differences between conditions were consistently small ($|d| < 0.4$), with substantial overlap in distributions. The pattern of similar ratings across diverse measures indicates comparable participant experiences with human and AI community managers.
  • Figure 4: Community Manager Effectiveness individual item scores by condition. The scale comprises seven items assessing different aspects of facilitation quality. Bars show mean scores with standard error. While some items (e.g., "Encouraged Participation," "Bridged Viewpoints") show modest trends favoring human CMs, differences remain small across all items, consistent with the overall scale similarity.