Using Voice and Biofeedback to Predict User Engagement during Product Feedback Interviews
Alessio Ferrari, Thaide Huichapa, Paola Spoletini, Nicole Novielli, Davide Fucci, Daniela Girardi
TL;DR
This study addresses how to automatically predict user engagement during product feedback interviews by leveraging biometric signals and voice analysis. It collects biofeedback from an Empatica E4 wristband (EDA, BVP, HR) and voice from a laptop during structured interviews about Facebook, then trains multiple classifiers to predict engagement in terms of valence and arousal. Results show that voice features alone achieve high $F1$ scores (up to $F1=0.71$ for arousal and $F1=0.68$ for valence), biofeedback also performs well (up to $F1=0.65$ for arousal), and multimodal fusion offers limited gains. The work demonstrates the practicality and cost advantages of voice-based engagement detection for remote or scalable requirements elicitation, and contributes to affective RE by integrating biometrics and speech analysis with machine learning, providing a replication package for further studies.
Abstract
Capturing users' engagement is crucial for gathering feedback about the features of a software product. In a market-driven context, current approaches to collect and analyze users' feedback are based on techniques leveraging information extracted from product reviews and social media. These approaches are hardly applicable in bespoke software development, or in contexts in which one needs to gather information from specific users. In such cases, companies need to resort to face-to-face interviews to get feedback on their products. In this paper, we propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about the engagement of the user on the discussed product-relevant topics. We evaluate our approach by interviewing users while gathering their physiological data (i.e., biofeedback) using an Empatica E4 wristband, and capturing their voice through the default audio-recorder of a common laptop. Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data (F1=0.72), and that voice features alone are sufficiently effective (F1=0.71). Our work contributes with one the first studies in requirements engineering in which biometrics are used to identify emotions. This is also the first study in software engineering that considers voice analysis. The usage of voice features could be particularly helpful for emotion-aware requirements elicitation in remote communication, either performed by human analysts or voice-based chatbots, and can also be exploited to support the analysis of meetings in software engineering research.
