Fast Multi-Party Open-Ended Conversation with a Social Robot
Authors
Giulio Antonio Abbo, Maria Jose Pinto-Bernal, Martijn Catrycke, Tony Belpaeme
Abstract
Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.