When to Say "Hi" -- Learn to Open a Conversation with an in-the-wild Dataset
Michael Schiffmann, Felix Struth, Sabina Jeschke, Anja Richert
TL;DR
This study addresses natural conversation initiation for socially interactive agents by learning when and how to open a dialogue from users' body language in a real-world museum setting. It introduces the Interaction Initiation System (IIS), a two-stage pipeline combining a BlockRNN-based pose forecast with an SVM-based action classifier to decide between Wait, Speak, and Listen. Field data from 201 interactions (26,675 labeled frames) enable end-to-end timing predictions with a reported weighted accuracy around 74% and macro-F1 around 69%, highlighting challenges in the Listen class. The work demonstrates the feasibility of data-driven initiation in-the-wild, outlines limitations, and points to future improvements like larger datasets and higher-fidelity sensing to enhance naturalness and generalizability.
Abstract
The social capabilities of socially interactive agents (SIA) are a key to successful and smooth interactions between the user and the SIA. A successful start of the interaction is one of the essential factors for satisfying SIA interactions. For a service and information task in which the SIA helps with information, e.g. about the location, it is an important skill to master the opening of the conversation and to recognize which interlocutor opens the conversation and when. We are therefore investigating the extent to which the opening of the conversation can be trained using the user's body language as an input for machine learning to ensure smooth conversation starts for the interaction. In this paper we propose the Interaction Initiation System (IIS) which we developed, trained and validated using an in-the-wild data set. In a field test at the Deutsches Museum Bonn, a Furhat robot from Furhat Robotics was used as a service and information point. Over the period of use we collected the data of \textit{N} = 201 single user interactions for the training of the algorithms. We can show that the IIS, achieves a performance that allows the conclusion that this system is able to determine the greeting period and the opener of the interaction.
