Table of Contents
Fetching ...

Speejis: Enhancing User Experience of Mobile Voice Messaging with Automatic Visual Speech Emotion Cues

Ilhan Aslan, Carla F. Griggio, Henning Pohl, Timothy Merritt, Niels van Berkel

TL;DR

This paper tackles the lack of nonverbal emphasis in mobile voice messages by introducing speejis, a system that automatically derives continuous emotion cues from speech and displays them as emojis and colored waveform visualizations. It implements a Flask-based prototype leveraging a PARALINGUISTIC SER model and Whisper transcription to map valence/arousal/dominance to visual cues, testing with a 12-participant user study that compares experiences with and without speejis. Results show that speejis increase perceived attractiveness, novelty, and stimulation, while also raising considerations around dependability and user editorial control; qualitative data reveal strong user interest in transcription, design refinements, and the implications of AI-mediated emotion interpretation. The study highlights practical design guidelines for emotion-aware augmentations in mobile messaging and discusses future directions for personalization, transparency, and broader deployment. Overall, speejis demonstrate the potential of continuous emotion-aware augmentation to enhance messaging accessibility and expressiveness, while underscoring the need for responsible AI practices in proactive, user-centric interfaces.

Abstract

Mobile messaging apps offer an increasing range of emotional expressions, such as emojis to help users manually augment their texting experiences. Accessibility of such augmentations is limited in voice messaging. With the term "speejis" we refer to accessible emojis and other visual speech emotion cues that are created automatically from speech input alone. The paper presents an implementation of speejis and reports on a user study (N=12) comparing the UX of voice messaging with and without speejis. Results show significant differences in measures such as attractiveness and stimulation and a clear preference of all participants for messaging with speejis. We highlight the benefits of using paralinguistic speech processing and continuous emotion models to enable finer grained augmentations of emotion changes and transitions within a single message in addition to augmentations of the overall tone of the message.

Speejis: Enhancing User Experience of Mobile Voice Messaging with Automatic Visual Speech Emotion Cues

TL;DR

This paper tackles the lack of nonverbal emphasis in mobile voice messages by introducing speejis, a system that automatically derives continuous emotion cues from speech and displays them as emojis and colored waveform visualizations. It implements a Flask-based prototype leveraging a PARALINGUISTIC SER model and Whisper transcription to map valence/arousal/dominance to visual cues, testing with a 12-participant user study that compares experiences with and without speejis. Results show that speejis increase perceived attractiveness, novelty, and stimulation, while also raising considerations around dependability and user editorial control; qualitative data reveal strong user interest in transcription, design refinements, and the implications of AI-mediated emotion interpretation. The study highlights practical design guidelines for emotion-aware augmentations in mobile messaging and discusses future directions for personalization, transparency, and broader deployment. Overall, speejis demonstrate the potential of continuous emotion-aware augmentation to enhance messaging accessibility and expressiveness, while underscoring the need for responsible AI practices in proactive, user-centric interfaces.

Abstract

Mobile messaging apps offer an increasing range of emotional expressions, such as emojis to help users manually augment their texting experiences. Accessibility of such augmentations is limited in voice messaging. With the term "speejis" we refer to accessible emojis and other visual speech emotion cues that are created automatically from speech input alone. The paper presents an implementation of speejis and reports on a user study (N=12) comparing the UX of voice messaging with and without speejis. Results show significant differences in measures such as attractiveness and stimulation and a clear preference of all participants for messaging with speejis. We highlight the benefits of using paralinguistic speech processing and continuous emotion models to enable finer grained augmentations of emotion changes and transitions within a single message in addition to augmentations of the overall tone of the message.

Paper Structure

This paper contains 23 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of design probes used in the study, representing conditions for baseline voice messages without speejis (left) and the SER augmented design with speejis (right).
  • Figure 2: The 22 emojis used as speejis in the study to automatically augment voice messages.
  • Figure 3: Concept for the colour mapping used to augment the audio waveform and create an emotional waveform.
  • Figure 4: Overview of the speejis system illustrating the components of the system and how they interact with each other to provide the material needed to compose speejis and augment voice messaging UIs.
  • Figure 5: Illustration of study setup.
  • ...and 3 more figures