Table of Contents
Fetching ...

Multi-Party Conversational Agents: A Survey

Sagar Sapkota, Mohammad Saqib Hasan, Mubarak Shah, Santu Karmaker

TL;DR

This survey addresses the challenge of multi-party conversational agents (MPCAs) by introducing a three-theme taxonomy: State of Mind Modeling, Semantic Understanding, and Agent Action Modeling. It reviews over 70 studies across traditional ML, NLP, LLMs, and multimodal systems, highlighting Theory of Mind as a core requirement for socially intelligent MPCAs and identifying multimodal grounding as a promising but underexplored avenue. The authors analyze task families such as emotion recognition, dialogue disentanglement, dialog summarization, turn detection, addressee selection, and response generation, summarizing methodological trends and benchmarking limitations. They propose future directions focusing on integrating ToM, advancing multimodal fusion and grounding, and developing richer, real-time evaluation benchmarks to push MPCAs toward more robust and human-like group dialogue capabilities.

Abstract

Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.

Multi-Party Conversational Agents: A Survey

TL;DR

This survey addresses the challenge of multi-party conversational agents (MPCAs) by introducing a three-theme taxonomy: State of Mind Modeling, Semantic Understanding, and Agent Action Modeling. It reviews over 70 studies across traditional ML, NLP, LLMs, and multimodal systems, highlighting Theory of Mind as a core requirement for socially intelligent MPCAs and identifying multimodal grounding as a promising but underexplored avenue. The authors analyze task families such as emotion recognition, dialogue disentanglement, dialog summarization, turn detection, addressee selection, and response generation, summarizing methodological trends and benchmarking limitations. They propose future directions focusing on integrating ToM, advancing multimodal fusion and grounding, and developing richer, real-time evaluation benchmarks to push MPCAs toward more robust and human-like group dialogue capabilities.

Abstract

Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.

Paper Structure

This paper contains 25 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Example of a multi-party conversation demonstrating key challenges that MPCAs must handle. At each time step of the conversation, the MPCA must identify the states of mind of each participant (e.g., curiosity, frustration, etc.), have semantic understanding of the conversation (e.g., speaker actions like criticize and explain, dialog summary, etc.), and be able to take the appropriate action (e.g., response, turn-taking, identifying addressee, etc.). Combining these capabilities makes for a social and intelligent agent.
  • Figure 2: Thematic taxonomy of MPC tasks and recent works focusing on these tasks.
  • Figure 3: Full taxonomy of MPC tasks and recent work under them.