Speejis: Enhancing User Experience of Mobile Voice Messaging with Automatic Visual Speech Emotion Cues
Ilhan Aslan, Carla F. Griggio, Henning Pohl, Timothy Merritt, Niels van Berkel
TL;DR
This paper tackles the lack of nonverbal emphasis in mobile voice messages by introducing speejis, a system that automatically derives continuous emotion cues from speech and displays them as emojis and colored waveform visualizations. It implements a Flask-based prototype leveraging a PARALINGUISTIC SER model and Whisper transcription to map valence/arousal/dominance to visual cues, testing with a 12-participant user study that compares experiences with and without speejis. Results show that speejis increase perceived attractiveness, novelty, and stimulation, while also raising considerations around dependability and user editorial control; qualitative data reveal strong user interest in transcription, design refinements, and the implications of AI-mediated emotion interpretation. The study highlights practical design guidelines for emotion-aware augmentations in mobile messaging and discusses future directions for personalization, transparency, and broader deployment. Overall, speejis demonstrate the potential of continuous emotion-aware augmentation to enhance messaging accessibility and expressiveness, while underscoring the need for responsible AI practices in proactive, user-centric interfaces.
Abstract
Mobile messaging apps offer an increasing range of emotional expressions, such as emojis to help users manually augment their texting experiences. Accessibility of such augmentations is limited in voice messaging. With the term "speejis" we refer to accessible emojis and other visual speech emotion cues that are created automatically from speech input alone. The paper presents an implementation of speejis and reports on a user study (N=12) comparing the UX of voice messaging with and without speejis. Results show significant differences in measures such as attractiveness and stimulation and a clear preference of all participants for messaging with speejis. We highlight the benefits of using paralinguistic speech processing and continuous emotion models to enable finer grained augmentations of emotion changes and transitions within a single message in addition to augmentations of the overall tone of the message.
