Classification of User Satisfaction in HRI with Social Signals in the Wild
Michael Schiffmann, Sabina Jeschke, Anja Richert
TL;DR
The study tackles automatic assessment of user satisfaction in human-robot interaction by leveraging time-series social signals (body language, facial expressions, distance) collected in-the-wild from a museum deployment. It compares three feature-engineering pipelines—tsfresh, catch22, and handcrafted features—within LOOCV using multiple classifiers, achieving up to 97.8% accuracy with tsfresh-derived features. The results demonstrate the feasibility of annotation-free, real-world satisfaction classification and highlight the strengths and limitations of each feature approach, including concerns about generalizability and overfitting on a small dataset. This work paves the way for automated feedback mechanisms to improve SIA performance and user experience, while outlining avenues for transfer learning and broader validation across agents and contexts.
Abstract
Socially interactive agents (SIAs) are being used in various scenarios and are nearing productive deployment. Evaluating user satisfaction with SIAs' performance is a key factor in designing the interaction between the user and SIA. Currently, subjective user satisfaction is primarily assessed manually through questionnaires or indirectly via system metrics. This study examines the automatic classification of user satisfaction through analysis of social signals, aiming to enhance both manual and autonomous evaluation methods for SIAs. During a field trial at the Deutsches Museum Bonn, a Furhat Robotics head was employed as a service and information hub, collecting an "in-the-wild" dataset. This dataset comprises 46 single-user interactions, including questionnaire responses and video data. Our method focuses on automatically classifying user satisfaction based on time series classification. We use time series of social signal metrics derived from the body pose, time series of facial expressions, and physical distance. This study compares three feature engineering approaches on different machine learning models. The results confirm the method's effectiveness in reliably identifying interactions with low user satisfaction without the need for manually annotated datasets. This approach offers significant potential for enhancing SIA performance and user experience through automated feedback mechanisms.
