Multi-Party Conversational Agents: A Survey
Sagar Sapkota, Mohammad Saqib Hasan, Mubarak Shah, Santu Karmaker
TL;DR
This survey addresses the challenge of multi-party conversational agents (MPCAs) by introducing a three-theme taxonomy: State of Mind Modeling, Semantic Understanding, and Agent Action Modeling. It reviews over 70 studies across traditional ML, NLP, LLMs, and multimodal systems, highlighting Theory of Mind as a core requirement for socially intelligent MPCAs and identifying multimodal grounding as a promising but underexplored avenue. The authors analyze task families such as emotion recognition, dialogue disentanglement, dialog summarization, turn detection, addressee selection, and response generation, summarizing methodological trends and benchmarking limitations. They propose future directions focusing on integrating ToM, advancing multimodal fusion and grounding, and developing richer, real-time evaluation benchmarks to push MPCAs toward more robust and human-like group dialogue capabilities.
Abstract
Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.
